Early LLM serving experience and performance results with AMD Instinct MI300X

Jun 25, 2024 ·

As OCI Compute works towards launching AMD Instinct MI300X GPU bare metal machine offerings in the coming months, this blog post recounts our technical journey running real-world large language model (LLM) inference workloads using Llama 2 70B and shares our insights from experiments on this AMD MI300X hardware. This post shares the LLM serving and inference workload development, deployment, and performance benchmark results.