Reading Notes: MI300X vs H100 vs H200 Benchmark Part 1: Training – CUDA Moat Still Alive | Lifan Sun’s Blog

date

Jan 27, 2025

slug

mi300x-vs-h100-200

status

Published

tags

MLSys

summary

type

Post

本文是阅读 MI300X vs H100 vs H200 Benchmark Part 1: Training – CUDA Moat Still Alive 的简单总结。

MI300X vs H100 vs H200 Benchmark Part 1: Training – CUDA Moat Still Alive 通过 benchmarking，对比了 MI300X 和 H100/200 的性能

Key Takeaway

Key Findings

On paper FLOPS 不可靠，靠 benchmark 才能说服人

NVIDIA 的 out of box experience 远好于 AMD，这是由于 software stack 的质量差异带来的

software stack 可能阻碍 user 发挥硬件的性能潜力

software stack 的 user experience 很重要

Miscellany

GEMM 是现代深度学习最重要的 benchmark 对象之一

NVIDIA 高效的网络拓扑 nvlink 也是其 gpu 高性能的关键之一

notion image

Author:Lifan Sun
URL:stevensun.site/article/mi300x-vs-h100-200
Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!

Relate Posts

CS336 Assignment 2 Key Takeaway

Lazy loaded image

CS336 Assignment 1 Key Takeaway

Lazy loaded image

Reading Notes: “DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving”

Lazy loaded image

Reading Notes: “Preble: Efficient Distributed Prompt Scheduling for LLM Serving”

Lazy loaded image

Reading Note: “ORCA: A Distributed Serving System for Transformer-Based Generative Models”

Lazy loaded image

Reading Notes: “FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness”

Lazy loaded image

Deep Learning Performance Background An Architecture Overview of ML Systems

Loading...

Catalog

0%

Lifan Sun

Software Dev Engineer

Latest posts

CS336 Assignment 2 Key Takeaway

Class Loading in Java

LLVM IR Tutorial: Phis, GEPs, and other things, oh my! - LLVM Developers Conference 2019

The Elements of Search-Based Software Testing Techniques

Reading Note: “An Industrial Evaluation of Unit Test Generation: Finding Real Faults in a Financial Application”, ICSE-SEIP '17

Reading Note: “Whole Test Suite Generation”, TSE 2012

Catalog

0%