Musk’s xAI Taps Nvidia to Expand Grok Using ‘World’s Largest AI Supercomputer’

Chipmaker Nvidia announced Monday that its Spectrum-X networking technology has helped expand startup xAI’s Colossus supercomputer, now recognized as the largest AI training cluster in the world. 

Located in Memphis, Tennessee, Colossus serves as the training ground for the third generation of Grok, xAI’s suite of large language models developed to power chatbot features for X Premium subscribers.

Colossus, completed in just 122 days, began training its first models 19 days after installation. Tech billionaire Elon Musk’s startup xAI plans to double the system’s capacity to 200,000 GPUs, Nvidia said in a statement on Monday.

At its core, Colossus is a giant interconnected system of GPUs, each specialized in processing large datasets. When Grok models are trained, they need to analyze enormous amounts of text, images, and data to improve their responses. 

Touted by Musk as the most powerful AI training cluster in the world, Colossus connects 100,000 NVIDIA Hopper GPUs using a unified Remote Direct Memory Access network. Nvidia’s Hopper GPUs handle complex tasks by separating the workload across multiple GPUs and processing it in parallel.

The architecture allows data to move directly between nodes, bypassing the operating system and ensuring low latency as well as optimal throughput for extensive AI training tasks.

While traditional Ethernet networks often suffer from congestion and packet loss—limiting throughput to 60%—Spectrum-X achieves 95% throughput without latency degradation. 

Spectrum-X allows large numbers of GPUs to communicate more smoothly with one another, as traditional networks can get bogged down with too much data. 

The technology allows Grok to be trained faster and more accurately, which is essential for building AI models that respond effectively to human interactions.

Monday’s announcement had little effect on Nvidia’s stock, which dipped slightly. Shares traded at $141 as of Monday, with the company’s market cap at $3.45 trillion.

Edited by Sebastian Sinclair

Generally Intelligent Newsletter

A weekly AI journey narrated by Gen, a generative AI model.

Source: https://decrypt.co/288755/musks-xai-taps-nvidia-to-expand-grok-using-worlds-largest-ai-supercomputer