Groq 3 LPU Redefines Nvidia's Roadmap

Groq 3 LPU and the strategic shift at Rubin

The unveiling of the Groq 3 at GTC 2026 is more than just a technical launch: it marks a strategic shift in how Nvidia structures its inference platform. More than just a new chip, it redefines Rubin's internal hierarchy and anticipates a distinct phase in the competition for specialized silicon.

At GTC 2026, held in San Jose, Nvidia unveiled the Groq 3 inference accelerator: the first chip to emerge from its $20 billion licensing and talent agreement signed on December 24, 2025. It is an LPU (language processing unit) based on SRAM that Nvidia It's integrated into the Vera Rubin platform as a dedicated coprocessor for the decoding phase. The manufacturer announced an expected shipment date of the third quarter of 2026; production will be handled by Samsung on a 4nm node. It's also Nvidia's first rack-scale product designed around non-GPU silicon, and its arrival has prompted a reordering of its own components in the roadmap.

The heart of the Groq 3 LPX is the LP30 chip: 512 MB of SRAM per die and 150 TB/s of memory bandwidth per chip. To put this in perspective, a Rubin GPU with 288 GB of HBM4 offers around 22 TB/s; the order-of-magnitude difference is not a nuance but an architectural choice. A full LPX rack houses 256 LPUs, totaling 128 GB of SRAM and 40 PB/s of aggregate bandwidth. Nvidia claims that, combined with a Rubin NVL72, an LPX rack delivers up to 35 times the performance per megawatt compared to an NVL72 alone in trillion-parameter models, with an operating cost target of $45 per million tokens.

Groq 3 and the function in Rubin

Rubin rack rendering illustrating the SuperPOD architecture — Nvidia outlined its seven-chip Rubin SuperPOD strategy at GTC 2026. (Image credit: Nvidia)

In the planned operation, Rubin GPUs handle the prefill phase—processing long contexts and high-density calculations—while Groq LPUs manage decoding and token generation with reduced latency. Dynamo orchestrates this heterogeneous distribution, assigning tasks based on batch size and parallelism to balance performance and energy cost.

Groq's original LPU design prioritized determinism: a VLIW (Very Long Instruction Word) pipeline with large SRAM banks and a compiler that pre-planned execution, eliminating cache misses and unexpected halts. This resulted in very high token rates per user, but revealed a capacity problem: previous generations with 230 MB of SRAM per chip required many dies to accommodate mid-sized models, and the architecture It was born oriented towards convolutional networks rather than modern language models.

The LP30 mitigates some of these limitations with 512 MB of SRAM per die and 1.23 PFLOPS of FP8 compute capacity. Samsung has scaled up production—from approximately 9,000 to approximately 15,000 wafers, according to the announcements—by moving from samples to commercial manufacturing. At GTC, it was also announced that AWS will deploy Groq 3 LPUs alongside more than one million Nvidia GPUs as part of its infrastructure expansion.

Beyond the LP30, Nvidia mentioned a product roadmap: an LP35 with NVFP4 support intended to align with the Rubin Ultra generation, and an LP40 planned for the Feynman architecture cycle later on.

What's happening with Rubin CPX?

At GTC, the absence of the Rubin CPX, the inference accelerator based on GDDR7 that Nvidia It had been announced in September 2025. It didn't appear on the main slides nor was it present on stage. Everything indicates—without full official confirmation—that the CPX has been removed from the roadmap and replaced in the platform hierarchy by the LPX Groq 3.

CPX was initially conceived as a lower-cost alternative to accelerate the context phase using GDDR7, leveraging its greater availability in the face of HBM shortages. However, Groq's LPUs eliminate the need for large external memory modules and offer significantly higher bandwidth per die—a clear advantage in a market where HBM supply remains tight and GDDR7 production is still scaling up. While CPX units already committed to customers may continue to be delivered, the strategic preference now appears to be shifting towards LPU integration.

There is also an operational analogy with the acquisition of Mellanox in 2019: startup technologies that end up forming new architectural layers within Nvidia's infrastructure — in their case NVLink/InfiniBand — and, in this scenario, Groq could become a similar structural component within the Rubin ecosystem.

Consolidation of the inference chip market

The deal with Groq was the most visible piece of a 2025 consolidation wave focused on inference chips. That year, AMD acquired the Untether AI team, Nvidia acquired Enfabrica's equipment and IP for over $900 million, Meta bought Rivos, and there were talks—ultimately abandoned—between Intel and SambaNova that resulted in a $350 million investment and partnership. This move reflects the fact that competing independently against Nvidia's CUDA ecosystem and scale presents severe economic challenges, even when the technology has technical merit.

The recurring pattern is the absorption of talent and technology by the major players. Groq, for example, expected around €500 million in revenue by 2025, but that figure wasn't enough to maintain its independence in the face of strategic pressure from dominant manufacturers. Analysts point out that non-exclusive licensing agreements preserve the appearance of competition, but in practice neutralize rivals by integrating their technology into the buyer's platform.

Custom silicon in hyperscalers

Meta MTIA Roadmap Diagram for Inference Accelerators — Meta presented its MTIA roadmap recently. (Image credit: Meta)

While startups are integrating into larger companies, major cloud providers are pushing their own silicon inference pipelines.

Meta announced successive generations of MTIA, developed with Broadcom: from MTIA 300—already in production for ranking and recommendation—to MTIA 500, geared towards generative inference and planned for mass deployment in 2027. Google maintains its TPU line (Ironwood v7) with TFLOPS figures and large-scale pods, and AWS continues developing Trainium and Inferentia, although internal data up to 2024 showed relatively low adoption compared to GPUs in AWS's own infrastructure.

Industry surveys and projections reinforce diversification: In November 2025, Futurum Group ranked XPU accelerators as the fastest-growing segment in data center spending for 2026, and TrendForce projected a notable increase in shipments of custom ASICs by cloud providers for that same year.

Nvidia's reaction has been clear: to secure the presence of non-GPU silicon within its platform before third parties do. The Groq 3 LPU is the tangible manifestation of that strategy; the future of the Rubin CPX, however, remains uncertain for now.