No result

See all results

No result

See all results

No result

See all results

The Secrets of xAI Colossus: 100,000 GPUs

by MasterTrend Insights

February 20, 2025

in AI

Reading time:Lectura de 8 minutos

🌟 The Secrets of xAI Colossus: Discover Elon Musk's 100,000 GPU AI Cluster 🚀

3

SHARED

9

Views

Share on Facebook Share on Twitter

🌟 The Secrets of xAI Colossus: Discover Elon Musk's 100,000 GPU AI Cluster 🚀

If you're passionate about artificial intelligence and cutting-edge technology, you can't miss out on what Elon Musk is doing with his AI cluster. This tech giant, known as xAI Colossus, is making waves in the tech world. With a staggering 100,000 GPU processing power, this cluster is a true marvel of modern engineering. 🤖💻

In this article, we are going to unravel the secrets behind this amazing innovation. technological. We will explore how xAI Colossus is revolutionizing the field of artificial intelligence and what this means for the future. 🌟 Get ready for a fascinating journey into the heart of one of the greatest feats technological of our time. 🚀 Don't miss it!

Elon Musk's expensive new project, the xAI Colossus AI supercomputer, has been detailed for the first time. YouTuber ServeTheHome was given access to the Supermicro servers inside the 100,000 watt beast. GPU, showcasing various facets of this supercomputer. Musk's xAI Colossus supercluster has been online for nearly two months, following an assembly that took 122 days. 🔧💡

Inside the world's largest AI supercluster, xAI Colossus – YouTube

What's inside a 100,000 GPU cluster? 🤔

Patrick from ServeTheHome takes us on a tour with his camera through different parts of the server, offering a bird's-eye view of its operations. While some details More specific details about the supercomputer, such as its power consumption and the size of the bombs, could not be revealed due to a confidentiality agreement, xAI was responsible for blurring and censoring parts of the video before its release. 🎥

Despite this, the most important thing, like the servers GPU from Supermicro, remained virtually untouched throughout the footage. These GPU servers are Nvidia HGX H100, a powerful server solution featuring eight H100 GPUs each. 🚀 The HGX H100 platform is integrated within the 4U Universal GPU Liquid Crystal Display System Cooled from Supermicro, providing easily hot-swappable liquid cooling for each GPU. ❄️

These servers are organized in racks containing eight servers each, totaling 64 GPU per rack. 1U manifolds are sandwiched between each HGX H100, providing the necessary liquid cooling for the servers. At the bottom of each rack, we find another 4U Supermicro unit, this time equipped with a redundant pump system and a rack monitoring system. 🔍

Four banks of xAI HGX H100 server racks, each with capacity for eight servers.

(Image credit: ServeTheHome)

The rear access of an xAI Colossus GPU server. Nine Ethernet cables run out of each server, with four power supplies on each. The power and water cooling hoses are also visible.

(Image credit: ServeTheHome)

🖥️ These racks are organized in groups of eight, allowing for 512 GPU per array. Each server is equipped with four power supplies redundant. At the back of the racks GPU, there are three-phase power supplies, Ethernet switches, and a rack-sized manifold that provides all the liquid cooling. 💧

There are more than 1,500 racks in the Colossus cluster. GPU, distributed across nearly 200 rack sets. According to Jensen Huang, CEO of Nvidia, the GPUs in these 200 dies were fully installed in just three weeks. 🚀

Since an AI supercluster constantly training models requires huge bandwidth, xAI went further in its interconnectivity of gridEach graphics card has a dedicated 400GbE NIC (network interface controller), with an additional 400Gb NIC per server. 🔗 This means each HGX H100 server has 3.6 terabits per second of Ethernet. Impressive, isn't it? And yes, the entire cluster runs on Ethernet, rather than InfiniBand or other exotic connections that are standard in the supercomputing world. 🌐

A shot looking down at the waves and waves of yellow Ethernet cables connecting the xAI Colossus cluster to itself. Several layers of overly wide cables are embedded in the ceiling.

(Image credit: ServeTheHome)

xAI's Colossus CPU computing servers, which look exactly like Supermicro's storage servers, are also used extensively on site.

(Image credit: ServeTheHome)

Of course, a supercomputer like the Grok 3 chatbot, which trains AI models, needs more than just GPU to run at its best. 🔥 While details on storage and CPU servers in Colossus are somewhat limited, thanks to Patrick's video and the blog post, we know that these servers are usually in Supermicro chassis. 🚀

NVMe-forward 1U servers with x86 platform CPUs inside are used, providing both storage and computing capacity. computing, and are equipped with liquid cooling at the rear. 💧 In addition, outside you can see banks of batteries Tesla Megapack very compact. ⚡️

The start-stop feature of the array, with its millisecond latency between banks, was too much for the conventional power grid or Musk's diesel generators. So several Tesla Megapacks (each with a capacity of 3.9 MWh) are used as an intermediate power source between the grid electrical and the supercomputer. 🖥️🔋 This ensures optimal and efficient operation, avoiding interruptions. 🚦✨

🌟 Using Colossus and Musk's stable supercomputer 🌟

The xAI supercomputer Colossus is currently, according to Nvidia, the largest AI supercomputer in the world. 🤯 While many of the world's leading supercomputers are used in research by contractors or academics to study weather patterns, diseases, or other complex tasks, Colossus has sole responsibility for training X's (formerly Twitter) various AI models. Most notably, Grok 3, Elon's "anti-woke" chatbot that's available only to X Premium subscribers. 🤖

Additionally, ServeTheHome was informed that Colossus is training AI models "of the future"; models whose uses and capabilities are supposedly beyond the current capabilities of AI. 🚀 The first phase of construction of Colossus is complete and the cluster is fully operational, but it's not all over yet. The Memphis supercomputer will soon be will update to double its GPU capacity, with an additional 50,000 H100 GPUs and 50,000 next-generation H200 GPUs. 🔥

This update It will also more than double its energy consumption, which is already too much for the 14 diesel generators Musk added to the site in July to handle. ⚡ While it is short of Musk's promise of 300,000 H200s inside Colossus, that could be part of Phase 3 of updates. 🔋

On the other hand, the 50,000 GPU Cortex supercomputer at Tesla's "Giga Texas" plant also belongs to a Musk company. Cortex is dedicated to training the technology Tesla's autonomous AI projects through camera streaming and image detection, as well as Tesla's autonomous robots and other AI projects. 🤖🚗

Additionally, Tesla will soon see the construction of the Dojo supercomputer in Buffalo, New York, a $500 million project coming soon. 💸 Meanwhile, industry speculators like Baidu CEO Robin Li predict that 99% of AI companies could collapse when the bubble bursts. Whether Musk's record spending on AI will backfire or pay off remains to be seen. ⏳

Tags: Evergreen Content GPU Innovation

Microsoft Edge crashes on Windows

Next publication

How to use AdGuard DNS on Android in 2025

MasterTrend Insights

Our editorial team shares in-depth reviews, tutorials, and recommendations to help you get the most out of your digital devices and tools.

Next publication

How to use AdGuard DNS on Android in 2025

5 3 votes

Article Rating

Subscribe

0 Comments

Oldest

Newest Most voted

Online Comments

See all comments

MasterTrend Info is your go-to source for technology: discover news, tutorials, and analysis on hardware, software, gaming, mobile devices, and artificial intelligence. Subscribe to our newsletter and don't miss any trends.

Follow us

Browse by Category

Recent News

RustDesk controls any mobile phone in two steps: a person with a headset connects their phone to another smartphone for remote assistance and control.

Rustdesk: Free remote access to your mobile phone. Now! 🔒

August 24, 2025

Student Laptop 2025: Lightweight and thin laptop with near-edgeless display on an outdoor table, ideal for classes and homework.

Student Laptops 2025: 7 Models Tested Today ⚡️

August 23, 2025

Copyright © 2025 https://mastertrend.info/ - All rights reserved. All trademarks are property of their respective owners.

No result

See all results

Copyright © 2025 https://mastertrend.info/ - All rights reserved. All trademarks are property of their respective owners.

Your Mastodon Instance