NVIDIA's Colossus: World's Largest AI Supercomputer with 100,000 Hopper GPUs

NVIDIA has announced that xAI's Colossus supercomputer cluster, featuring 100, 000 NVIDIA Hopper GPUs located in Memphis, Tennessee, achieved this scale using the NVIDIA Spectrum-X™ Ethernet networking platform. This platform is engineered for optimal performance in large AI factories, utilizing standards-based Ethernet for its Remote Direct Memory Access (RDMA) network. Colossus, now the world's largest AI supercomputer, is employed in training xAI’s Grok family of large language models, which includes chatbots for X Premium subscribers. xAI plans to expand Colossus to a total of 200, 000 Hopper GPUs. Built in a remarkable 122 days—a significant reduction from the typical timeline for such systems—Colossus began training just 19 days after the first rack was set up. During the training of the extensive Grok model, Colossus experiences exceptional network performance, maintaining 95% throughput and zero application latency degradation or packet loss, thanks to Spectrum-X congestion control. In contrast, standard Ethernet struggles with flow collisions, achieving only 60% data throughput. Gilad Shainer, NVIDIA's senior vice president of networking, emphasized the critical nature of AI, requiring enhanced performance, security, scalability, and cost efficiency, which the Spectrum-X platform provides for innovators like xAI.
Elon Musk praised Colossus as the "most powerful training system in the world, " acknowledging the efforts of xAI and NVIDIA. An xAI spokesperson highlighted that the combination of NVIDIA's Hopper GPUs and Spectrum-X technology allows for unprecedented training of AI models on a massive scale. Central to the Spectrum-X platform is the Spectrum SN5600 Ethernet switch, supporting speeds up to 800Gb/s, paired with NVIDIA’s BlueField-3® SuperNICs for enhanced performance. The Spectrum-X Ethernet networking brings advanced features previously associated only with InfiniBand, including adaptive routing, congestion control, and improved visibility and performance for multi-tenant AI environments. NVIDIA leads in accelerated computing, but forward-looking statements about the benefits and performance of its technologies come with risks that may impact actual results. Factors such as global economic conditions, reliance on third parties, and technological competition can affect outcomes. NVIDIA provides regular updates through SEC filings. © 2024 NVIDIA Corporation. All rights reserved.
Brief news summary
NVIDIA has launched the xAI Colossus supercomputer cluster in Memphis, now recognized as the largest AI supercomputer worldwide, equipped with 100,000 NVIDIA Hopper GPUs. Optimized for hyperscale AI applications, Colossus utilizes NVIDIA Spectrum-X™ Ethernet networking, which combines standard Ethernet with Remote Direct Memory Access (RDMA) technology. This advanced supercomputer is primarily designed to train Grok large language models for chatbots targeted at X Premium subscribers, with ambitious plans to scale to 200,000 GPUs in the future. Constructed in an impressive 122 days, Colossus delivers outstanding performance, featuring zero latency, no packet loss, and 95% data throughput—significantly exceeding the typical 60% performance of standard Ethernet systems. Elon Musk has praised the supercomputer’s potential for transformative impact, highlighting NVIDIA's pivotal role in large-scale AI model training. The Spectrum-X platform includes the SN5600 Ethernet switch, capable of speeds up to 800 Gb/s and featuring adaptive routing critical for improving AI cloud operations. NVIDIA remains at the forefront of accelerated computing and AI advancements, while noting that actual performance results can vary across different markets and technologies.
AI-powered Lead Generation in Social Media
and Search Engines
Let AI take control and automatically generate leads for you!

I'm your Content Manager, ready to handle your first test assignment
Learn how AI can help your business.
Let’s talk!
Hot news

Apple's AI Executive Joins Meta's Superintelligen…
Ruoming Pang, a senior executive at Apple who heads the company’s artificial intelligence foundation models team, is departing the tech giant to join Meta Platforms, according to Bloomberg News reports.

Ripple Applies for U.S. Banking License Amidst Cr…
Ripple has recently submitted an application for a Federal Reserve master account through its newly acquired trust company, Standard Custody.

AI in Autonomous Vehicles: Overcoming Safety Chal…
Engineers and developers are intensively working to resolve safety issues related to AI-driven autonomous vehicles, especially in response to recent incidents that have sparked widespread debate on the reliability and security of this evolving technology.

SAP Integrates Blockchain for ESG Reporting in ER…
SAP, a global leader in enterprise software, has announced a crucial enhancement to its enterprise resource planning (ERP) systems by integrating blockchain-based Environmental, Social, and Governance (ESG) reporting tools.

Middle Managers Diminish as AI Adoption Increases
As artificial intelligence (AI) rapidly advances, its influence on organizational structures—especially middle management—is becoming increasingly clear.

The Blockchain Group Bolsters Bitcoin Reserves Wi…
The Blockchain Group Strengthens Bitcoin Holdings Through $12

Kinexys Launches Carbon Market Blockchain Tokeniz…
Kinexys by J.P. Morgan, the firm’s leading blockchain business unit, is developing an innovative blockchain application on Kinexys Digital Assets, its multi-asset tokenization platform, aimed at tokenizing global carbon credits at the registry level.