AI & Machine Learning

The Silicon Decoupling: Meta's 1GW MTIA, OpenAI's $20B Cerebras Deal, and AI's Quiet Escape From Nvidia

Meta's Broadcom partnership, OpenAI's Cerebras contract, and Perplexity's on-device Personal Computer all point to the same shift — the 'one vendor, one GPU, one cloud' AI stack is quietly unbundling.

Bhanu Pratap
Share this article

Share:

Meta's Broadcom partnership, OpenAI's Cerebras contract, and Perplexity's on-device Personal Computer all point to the same shift — the 'one vendor, one GPU, one cloud' AI stack is quietly unbundling.
Table of Contents

For three years, the AI infrastructure story was one company’s story. Nvidia shipped GPUs, hyperscalers bought them as fast as TSMC could fabricate them, and the question everyone asked about any AI roadmap was: how many H100s / H200s / B200s do you have? That was the story. The stack was monolithic: one vendor, one GPU family, one cloud at a time.

This week that story stopped being true. Not in a dramatic, revolutionary way — there is no “Nvidia killer” product. It stopped being true in a quieter, more structural way: three separate announcements from three different companies all point to the same shift. The AI compute stack is unbundling along two axes at once — by workload (training vs. inference), and by location (cloud vs. device).

The Three Announcements That Tell the Same Story

On April 15, Meta and Broadcom announced they are extending their custom silicon partnership through 2029, with Meta committing to a one-gigawatt initial deployment of its Meta Training and Inference Accelerator — MTIA — chips. One gigawatt. For context, that’s a multi-billion-dollar commitment of compute capacity dedicated entirely to chips Meta designed in-house.

Around the same window, OpenAI signed a multi-year contract worth more than $20 billion with Cerebras, the wafer-scale compute startup, to use Cerebras servers over the next three years. OpenAI is not replacing Nvidia — Sam Altman has simultaneously been talking about massive GPU commitments — but it is no longer willing to depend on a single silicon vendor for its frontier workloads.

And on April 16, Perplexity shipped Personal Computer for Mac, an agent that orchestrates local files, native macOS applications, and the browser directly on the user’s hardware rather than shipping everything into the cloud. It’s the same Perplexity Computer architecture introduced in February 2026, but now with a local execution layer for sensitive data.

Three different product categories, three different companies, one pattern: AI is moving off “the Nvidia GPU in the hyperscaler data center” as the default unit of compute.

Why Now: The Unit Economics Broke First

The macro driver here isn’t politics or vendor lock-in frustration, it’s arithmetic. Nvidia GPUs are general-purpose matrix-multiplication machines. That generality is a feature for research workloads, where the shape of the model is still changing, and a tax for inference workloads, where the shape is frozen and you’re paying full retail for flexibility you don’t use.

When inference is a rounding error in total spend, nobody notices. When inference is the dominant line item — which it is now, by a wide margin, for every hyperscaler running consumer AI products — the tax becomes the story.

This is where ASICs (application-specific integrated circuits) come in. Meta’s MTIA, Google’s TPU, Amazon’s Trainium/Inferentia, Microsoft’s Maia — they all make the same trade: lose generality, gain efficiency. They’re cheaper per token, cooler per rack, and more predictable to schedule against than GPUs, as long as you only run the workloads they were designed for.

Meta’s latest chip, the MTIA 500, is reportedly the first AI-dedicated silicon on a 2nm process node. The published roadmap — MTIA 300, 400, 450, 500, shipping across 2026 and 2027 — shows Meta treating custom silicon not as a science project but as the main path.

Workload Segmentation, Not Vendor Replacement

The failure mode of every “Nvidia is doomed” take is assuming custom silicon replaces GPUs across the board. It doesn’t, because training workloads still benefit enormously from Nvidia’s mature CUDA stack, high-bandwidth memory, and collective communication libraries. Nobody is training a frontier foundation model on an MTIA cluster in 2026.

What’s happening instead is segmentation. Training stays on Nvidia (and increasingly on wafer-scale systems like Cerebras for specific use cases). Inference — especially high-volume, low-latency serving of a frozen model — migrates to custom ASICs. Edge and on-device work absorbs whatever’s left.

That’s why the OpenAI–Cerebras deal is so interesting: Cerebras isn’t an ASIC vendor and isn’t trying to replace Nvidia for everything. Their pitch is a specific shape of compute — a single giant wafer of silicon that eliminates the cross-chip communication overhead that makes training large models slow on GPU clusters. OpenAI is effectively buying a second, differently-shaped training substrate alongside its primary Nvidia footprint.

The Edge Angle: Perplexity Personal Computer

The Perplexity announcement looks smaller but is architecturally the most interesting of the three. Personal Computer for Mac doesn’t rent cloud compute at all for the local orchestration layer — it runs a capable-enough model and agent harness directly on the user’s Mac, using cloud inference only when the task explicitly needs it.

This matters for two reasons. First, latency: a round-trip to a cloud inference endpoint adds 100–400ms per tool call, which dominates the user experience for an agent that makes many small decisions. Second, and more importantly, privacy and data gravity: once your agent can read your local files and native apps, shipping all of that context to someone else’s server is a hard sell, both to enterprise buyers and increasingly to consumers.

The hardware that makes this newly viable is Apple Silicon and its neural engine, NVIDIA’s Jetson-class edge modules, and the new crop of NPU-rich PC chipsets. Apache-licensed open models like Gemma 4 and Mistral Medium 3 — which we covered last week — are the software side of the same shift. When a 31B-parameter open model under an Apache 2.0 license can match last year’s frontier closed model, and consumer laptops can run it at interactive speed, there’s no economic reason to round-trip every query to a data center.

The Enterprise Consequence: 20% of Companies Are Capturing 74% of the Value

PwC’s 2026 AI Performance Study landed on April 13 and it frames the strategic consequence sharply. Based on interviews with 1,217 senior executives across 25 sectors, the study found that 20% of companies are capturing roughly 74% of the economic gains from AI. The leaders aren’t winning because they bought more GPUs. They’re winning because they redesigned workflows around AI — and because they made calculated decisions about which workloads belong on general-purpose Nvidia infrastructure and which belong on specialized or on-device compute.

McKinsey’s 2026 State of Organizations report echoes the finding from the people side: high-performing AI adopters spend roughly $5 on process and people change for every $1 spent on technology. The money is not in the silicon choice itself — it’s in using silicon choice as an enabler for workflow redesign.

The practical takeaway for AI/ML teams: the infrastructure question is no longer “Nvidia or not.” It’s a portfolio question. Training on GPUs, high-volume inference on custom ASICs or hyperscaler-provided inference chips, and latency- or privacy-sensitive work on edge compute. Getting that portfolio right is starting to separate the companies that are getting real returns on AI from the 80% that are still running pilots.

The Cerebras Signal, and What It Means For Training

The Cerebras story deserves its own beat, because it’s the training-side counterpart to the inference-side ASIC wave. Wafer-scale compute — stitching together an entire silicon wafer as one giant chip instead of cutting it into many smaller dies — has existed as a curiosity for years. OpenAI’s $20B+ three-year commitment moves it from curiosity to line-item.

The pitch is specific: for training workloads where model parallelism across GPU clusters becomes the bottleneck, a single wafer eliminates cross-chip network overhead entirely. For a company pushing the frontier on very large reasoning models — where training compute scales faster than parameter count — that network tax matters.

This doesn’t mean Cerebras replaces Nvidia at OpenAI. It means OpenAI is willing to spend $20B over three years to have a second substrate available for workloads where the Cerebras architecture maps better. The era of “pick one vendor, all in” is over at the frontier.

What To Watch

Three things are worth tracking as this plays out over the next quarter.

First, the MTIA 500 ramp. Meta has never before deployed a 2nm ASIC at data-center scale, and the transition from MTIA 300 (already in production for ranking) to MTIA 400/450/500 (lab to deployment) is where the roadmap gets tested. Broadcom’s delivery cadence through 2029 is the real signal — if Meta hits volume, every other hyperscaler’s custom-silicon timeline gets pulled forward.

Second, on-device agent economics. Perplexity Personal Computer is one product. The question is whether on-device orchestration becomes a shipping default across productivity apps — notes, email, IDE, calendar — or remains a premium SKU. The moment a free or commodity-tier consumer agent runs meaningfully offline, the economics of cloud-only competitors shift sharply.

Third, the middle of the market. This quarter’s story is hyperscaler-scale. The harder question is what happens to the long tail of enterprises that can’t afford custom silicon but can’t justify Nvidia H-class GPUs for every inference workload either. Hyperscaler-managed ASIC services (Amazon Inferentia, Google TPU-on-demand) and open-model-friendly inference clouds are the interesting layer to watch — that’s where most real enterprise AI spend will actually land.

The headline is the billions. The signal is the shape. AI compute is unbundling — by workload, by vendor, by location — and the teams that understand their workload portfolio at that level of detail will be the ones capturing the 74%.

Sources

Back to Blog

Related Posts

View All Posts »