date
slug
status
tags
summary
type
MI300X vs H100 vs H200 Benchmark Part 1: Training – CUDA Moat Still Alive 通过 benchmarking,对比了 MI300X 和 H100/200 的性能
Key Takeaway
Key Findings
- On paper FLOPS 不可靠,靠 benchmark 才能说服人
- NVIDIA 的 out of box experience 远好于 AMD,这是由于 software stack 的质量差异带来的
- software stack 可能阻碍 user 发挥硬件的性能潜力
- software stack 的 user experience 很重要
Miscellany
- GEMM 是现代深度学习最重要的 benchmark 对象之一
- NVIDIA 高效的网络拓扑 nvlink 也是其 gpu 高性能的关键之一

- Author:Lifan Sun
- URL:stevensun.site/article/mi300x-vs-h100-200
- Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!
Relate Posts
Reading Notes: “DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving”

Reading Notes: “Preble: Efficient Distributed Prompt Scheduling for LLM Serving”

Reading Note: “ORCA: A Distributed Serving System for Transformer-Based Generative Models”

Reading Notes: “FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness”

Reading Notes: “Efficient Memory Management for Large Language Model Serving with PagedAttention”

Reading Notes: “Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning”
