CBRS — Irrational Analysis 2026 Edition
Plain-language synthesis of the May 3, 2026 Irrational Analysis (IA) equity research piece on Cerebras Systems ahead of its 2nd IPO attempt. Original PDF in
KB/raw/equity-research/cbrs-irrational-analysis-2026.pdf. IA's prior (Oct 2024) report covered the failed first IPO and is preserved atcbrs-irrational-analysis.md.
Source + Author Bias
- Author: "Irrational Analysis" — Substack newsletter run by an anonymous hardware engineer (X/Twitter @insane_analyst).
- Disclosure: "I will probably buy $50K worth of Cerebras stock in the IPO for fun. Small YOLO position. Will have lots of fun trading the crap out of the options chain too."
- Posture: Engineering-first, profanity-laden, anti-marketing. Strong opinions on chip architecture; admits "I am a hardware guy" and credits a "galaxy-brain" friend for the inference math chapter.
- What this is NOT: Not a sell-side initiation, not a balanced view, not GAAP-disciplined. Treat as one informed bear/bull hybrid input among several. Cross-check against SA mirror, S-1 filing, and SemiAnalysis when available.
TL;DR Thesis
"The clown car that stumbled upon a gold mine of the AI era."
Cerebras should have died years ago. Training (the original goal) was a complete failure: bone-headed product decisions, a botched Qualcomm partnership, and a ≥90%-of-revenue dependence on a single Middle East customer (G42) kept them on life support. Then they stumbled into ultra-fast inference — a niche where the wafer-scale chip's architectural quirks become genuine advantages — and OpenAI signed up as the new anchor in Jan 2026.
IA's verdict: buy a small position for fun, not a thesis position. The technology is real; the economics are not yet good; the path to good economics depends on management fixing obvious WSE-4 problems they have ignored for three generations.
WSE-3 Architecture (Technical, Kept Largely Intact)
Why "wafer-scale"
Normal chip-makers cut a silicon wafer into hundreds of small chips (dies). Cerebras keeps the entire wafer as one giant chip — the Wafer-Scale Engine (WSE-3).
- 84 reticles stitched together across scribe lines using special TSMC-co-developed IP (5nm class).
- 10,700 cores per reticle × 84 reticles = ~900,000 cores per wafer.
- Each core: 48 KB private SRAM + 512 B local cache → 44 GB total on-wafer SRAM (not GiB).
- Per core: 16 general-purpose registers, 48 data-structure registers, 8-way 16-bit SIMD, 16-way 8-bit SIMD. No FP8 support — caps weights at 16-bit (BF16).
- Hardwired non-linear activation functions (sigmoid, tanh, ReLU, Leaky ReLU, Maxout, ELU).
Mental model
- Nvidia GPU: one big kitchen with a giant pantry next door (HBM). Slower to fetch ingredients, but you can store a lot (288 GB on Blackwell Ultra).
- Cerebras WSE: 900,000 tiny food trucks, each with a shoebox of ingredients. Incredibly fast if your recipe fits in the shoebox. Useless if it doesn't.
- IA's marketing critique: Cerebras' "2,625× more memory bandwidth than Nvidia B200" claim is technically true at the SRAM level but compares apples to oranges — Nvidia's L2 is shared across cores, Cerebras' 48 KB is private per core, inducing massive compiler complexity.
The two genuinely hard problems Cerebras solved
-
Cross-reticle stitching IP. Diamond saws aren't infinitely thin; standard fabrication assumes nothing useful sits in the keep-out-zone (scribe line). Cerebras + TSMC co-developed special IP to route wires across the scribe line. Patented; no competitor has it. IA: "You could say this is their most important hardware IP."
-
PVT (Process / Voltage / Temperature) calibration across an entire wafer. Manufacturing variation across a single wafer is huge — one part may have fast NMOS transistors while a chip 50 mm away has slow NMOS. Wafer maps from ATE testing screen for cores that run too hot, too slow, or are intrinsically unstable. Cerebras has to clock the entire wafer, OR calibrate per region, without butchering performance to the slowest common denominator. IA: "PVT calibration and tolerance across an entire wafer is by far the most underrated innovation of the WSE."
Yield strategy
- Industry baselines: small smartphone chips yield 60–70% post-ramp; large GPU dies yield 30–60% post-stabilization.
- Cerebras claims 100% wafer-level yield. Mechanism: over-provisioned NoC mesh routes around defective cores. IA dismisses the routing-around-defects narrative as "you over-provisioned a giant mesh and have some extra NoC paths to route around dead cores." The real achievement is PVT.
- IA's bottoms-up estimate puts wafer + packaging yield at ~40% — well below claims.
What Cerebras Got Wrong (Years of Bad Bets)
1. SwarmX and MemoryX appliances are useless
Two off-the-shelf appliances built mostly from AMD parts, sold alongside the WSE.
- Built only for training, not inference.
- Bottleneck: WSE external I/O is severely limited (1.2 Tbps total). Blowing up BOM cost on AMD CPUs + DRAM butchers gross margins.
- Cerebras spent years pitching these as the cluster-scale training story. They lost the training market entirely. The appliances now sit on the shelf.
2. Unstructured weight sparsity is useless
Cerebras designed the cores to skip over zeros in matrix math at 8:1 ratios → theoretically 8× compute boost on sparse weights.
- Sparse matrix: matrix with mostly zeros. Structured sparsity = zeros follow a pattern (easy to skip). Unstructured = zeros are random (hard to skip without complex indexing).
- Cerebras claimed 1 byte/FLOP memory bandwidth requirement for sparse MatMul vs ~0.001 byte/FLOP for dense, with WSE-3 providing 2 byte/FLOP — positioning itself as "the only HW to accelerate all forms of sparsity."
- Why it failed:
- MoE (Mixture-of-Experts) ≠ weight sparsity. MoE routes activations to a subset of expert matrices; weights stay dense. Cerebras still has to load all weights into SRAM regardless of MoE.
- FP4 / MXFP precision compression gives bigger wins per unit of engineering effort and is what the industry adopted.
- Sparse weights only work if the model is trained sparse from scratch. Nobody does this.
- Hacker News + AI-Overview screenshots in the report all confirm the consensus: weight sparsity is a research curiosity, not a deployment paradigm.
- IA: silicon area burned on a feature nobody uses.
3. No FP8 support
- Modern AI hardware uses 8-bit (FP8) or even 4-bit (FP4 / OCP MXFP) numbers to fit bigger models.
- Cerebras' chip only supports BF16 (16-bit).
- Doubles the memory footprint of every model loaded into the precious 44 GB SRAM pool.
- IA: "inexcusable incompetence."
4. Cores are too small
- 48 KB SRAM per core forces compilers into "asinine" complexity managing hundreds of thousands of tiny cores on a giant mesh.
- Three generations of WSE are "effectively the same chip" with minor ISA updates and process shrinks.
- IA: 80% confident this is a mixture of laziness, incompetence, and risk-aversion.
5. Botched Qualcomm partnership
- March 2024: Cerebras announces Qualcomm Cloud AI100 Ultra as their inference partner. Joint slide: train on WSE-3, infer on Qualcomm. Sparse training + speculative decoding + MX6 + Cloud AI 100 → "reducing inference cost 10×."
- August 2024: Five months later at Hot Chips, Cerebras silently abandons Qualcomm and runs inference on its own WSE.
- IA: "Literally less than 6 months after this Cerebras × Qualcomm partnership was announced, Cerebras ditched Qualcomm and started running inference on their own WSE."
- Implication: the deal was a dud, and Cerebras decided the WSE itself was the better inference platform — by accident.
The Accidental Gold Mine: Ultra-Fast Inference
After all those mis-bets, Cerebras stumbled onto something:
Inference = running the trained model to answer user queries (vs training, which teaches it). Cerebras' chip produces tokens (words) extremely fast because model weights sit in on-wafer SRAM with no HBM bottleneck.
How the inference pipeline works (simplified)
- Pipeline parallelism. Each model layer is mapped to a strip of the wafer (e.g., Llama-70B has 80 layers → 80 strips).
- A token enters layer 1, moves to layer 2, …, exits layer 80, then loops back as the input for the next token.
- Why fast: the next layer is physically adjacent on the wafer, so latency between pipe stages is ~nanoseconds (vs microseconds when GPUs talk over NVLink/PCIe).
- Multi-user: each user's token sits at a different pipe stage. With 80 layers, ~80 users can be in flight simultaneously before the pipeline is full.
- Optimal batch size = 1 (vs Groq optimal = 2 due to imbalance).
- Optimal #queries-in-flight = #model layers (fills the pipe without increasing user latency).
- Headline speed: ~3,000 tokens/sec/user vs typical GPU 50–200 tokens/sec/user.
Multi-wafer chaining
- For Llama-70B (140 GB FP16 weights): chain 4× WSE-3 (176 GB total SRAM) over RDMA-over-Ethernet, <5 µs latency between systems.
- Only activations transfer between wafers — bandwidth need <100 Gbps out of 1.2 Tbps available.
- IA's complaint: you're using 1 of 12 transceivers, leaving the other 11 stranded because of design constraints. "5 µs I/O latency is dogshit but sure latency dominated by other factors whatever."
Where it breaks: KV cache
- When a model processes long context (your entire codebase, e.g.), it builds up a KV (key-value) cache of attention state per query.
- KV cache MUST sit on the WSE because off-wafer I/O is too slow.
- KV cache competes with model weights for the tiny 44 GB SRAM pool.
- Long context = economically deadly because KV grows linearly with context length.
Hardware Requirement Math (Chapter 3 reproduced)
"This chapter was HEAVILY assisted by a friend who actually understands this shit."
General assumptions
- Scratchpad SRAM = 48 KB/core × 900K cores = 44 GB (not GiB) per WSE-3.
- Compute and SRAM bandwidth balanced ("full bandwidth for full SIMD") per Cerebras claims.
- Optimal #queries in flight = #model layers (fills pipe, no latency hit).
- Each user has independent KV → economic optimum stores model weights + every user's KV.
- Context length severely impacts Cerebras inference economics.
- "8.25-bit" weight encoding = 8-bit int + scale + offset (MXFP4-ish quantization). IA calls this "very optimistic" — Cerebras has not confirmed they implemented it.
- "Sliding-window FastAttention" = optimal KV pattern. Also optimistic; assumes Cerebras' compiler is sliding-window-aware.
GPT-OSS-120B (likely the model behind OpenAI Codex on Cerebras)
| FP16 (baseline) | 8.25-bit (optimistic) | |
|---|---|---|
| Weights | 233.6 GB | 120.5 GB |
| 128 K Context KV/user | 9 GB | 4.84 GB |
| # KVs for full util | 36 | 36 |
| KV total need | 324 GB | 174.24 GB |
| Total SRAM need | 557.6 GB | 294.74 GB |
| WSE SRAM available | 44 GB | 44 GB |
| KV cache % of SRAM | 58.1% | 59.1% |
| # WSE needed | 13 | 7 |
| Total CapEx (@ $300K ASP) | $3.9M | $2.1M |
| Tokens/s/user delivered | 3,000 | 3,000 |
| Ktok/s per WSE | 8.3 | 15.4 |
| $ per million tokens | 0.75 | 0.75 |
| Seconds to billing | 120 | 65 |
| $ revenue per WSE per hour | $22.50 | $41.54 |
Deepseek-4 1600B
| 8.25-bit | |
|---|---|
| Weights | 1,622.15 GB |
| 1 M Context KV | "It's complicated." |
| # KVs for full util | 61 |
| KV total need | 364.63 GB |
| Total SRAM need | 1,986.78 GB |
| Naive # WSE needed | 45 |
| Actual # WSE needed | 61 (layer + KV-cache locality forces extras) |
| Total CapEx | $18.3M |
| Throughput / pricing | "???" — author admits non-viable without KV streaming |
KV streaming math (the most important part)
If Cerebras could stream KV off-wafer instead of storing it, they'd dramatically reduce WSE per cluster. The bandwidth required (assuming 10× byte-to-bit overhead):
| Model | Required I/O per WSE | Available (1.2 Tbps) | Shortfall |
|---|---|---|---|
| Llama-3.1-70B | 33.8 Tbps (3382 GB/s) | 1.2 Tbps | ~28× short |
| GPT-OSS-120B | 48.37 Tbps (4837 GB/s) | 1.2 Tbps | ~40× short |
| Deepseek v3 | ~1× available (3 layers fit) | 1.2 Tbps | Streaming feasible |
| Deepseek v4 | Streaming actually works | 1.2 Tbps | Possible (small KV) |
Punchline: modern MoE models with compressed attention (DSv4-style) make KV streaming less valuable — costs are dominated by weight storage and revenue by token throughput. The "KV streaming saves Cerebras" thesis only partially holds. Even if WSE I/O improves 10×, savings on Llama-3.1-70B would only get you from 80 WSE down to 6 WSE (a 13× cost reduction) on a model that's already aging.
Gross Margin Analysis
S-1 income statement (USD thousands)
| 2025 | 2024 | |
|---|---|---|
| Hardware revenue | 358,440 | 211,965 |
| Cloud + other services | 151,551 | 78,287 |
| Total revenue | 509,991 | 290,252 |
| Hardware COGS | 204,746 | 137,310 |
| Cloud + services COGS | 106,174 | 30,204 |
| Gross profit | 199,071 | 122,738 |
| R&D | 243,319 | 158,234 |
| S&M | 70,645 | 20,980 |
| G&A | 30,969 | 44,962 |
| Loss from operations | (145,862) | (101,438) |
| Other income (expense), net | 390,746 | (378,237) |
| Income tax expense | 7,057 | 1,927 |
| Net income (loss) | 237,827 | (481,602) |
| Non-GAAP operating loss | (96,095) | (42,874) |
| Net cash from ops | (10,050) | 451,978 |
- Hardware GM 2025 = 43% (up from 35% in 2024 per the prior S-1).
- IA's benchmark: "In semiconductors you want AT LEAST 55% gross margin. Otherwise you're a commodity loser."
- The $238M net income in 2025 is largely driven by $390M of "other income" — likely warrant fair-value adjustments and one-time items. The operating business still loses $146M. Headlines about "Cerebras profitable" are misleading.
The OpenAI warrant trick (Chapter 4 setpiece)
Per S-1 page 81:
- Issued December 2025 in connection with the Master Revenue Agreement (MRA) with OpenAI.
- 33,445,026 shares of Class N common stock at exercise price $0.00001/share (one one-thousandth of a cent — the standard warrant strike is $0.01, so this is 1/100th of normal).
Vesting schedule:
- 4,459,337 shares vested January 2026 upon receipt of the Working Capital Loan.
- 5,574,171 shares vest on the earlier of (i) the first date market cap exceeds $40B (an IPO milestone) and (ii) receipt of certain fee payments from OpenAI.
- 23,411,518 shares vest in tranches as OpenAI takes delivery of compute capacity, fully vesting only if OpenAI buys 2 GW of inference capacity total.
IA's polemical framing (the burger analogy):
"Imagine a burger shop that sells each burger for $5. Nobody buys the burgers cause they suck. So the fast food joint wraps each burger in a stock option instead of regular wax paper. Each burger is wrapped in a piece of paper that allows the customer to buy a share of the burger company for $0.00001."
At any plausible IPO price, this is hundreds of millions of dollars of free equity to OpenAI bundled into product sales. The 43% reported gross margin is gamed. IA: "I FUCKING HATE WARRANTS. THEY DISTORT (IN SPIRIT) GROSS MARGINS."
IA's bottoms-up COGS estimate
Cerebras Hardware Gross Margin — Bottoms-Up
| Bucket | Item | Cost ($K) | Comment |
|---|---|---|---|
| WSE-3 Cost | Custom TSMC N5 wafer | 25 | Intentionally higher than normal due to low volume + specialized reticle stitching |
| Power delivery (Vicor) | 5 | Specialized vertical power | |
| Package | 5 | Fully custom | |
| ATE test cost | 5 | Intense PVT calibration testing | |
| Wafer + packaging yield | 40 | Multiple sources say yield is shit; packaging yield especially poor | |
| Packaged wafer final cost | 100 | ||
| Support Gear | Cooling + mech | 30 | Fully custom, low volume, high density |
| Optical transceivers | 2 | ||
| AMD/Xilinx FPGA | 10 | Converts proprietary WSE I/O → 100G Ethernet | |
| AMD CPU servers | 20 | Support functions | |
| MISC IT (UPS, switches, frontend NIC) | 10 | ||
| Supporting equip final cost | 72 | ||
| Total COGS | 172 | ||
| S-1 GM % | 42 | ||
| Implied WSE-3 rack ASP | 297 |
Three Improvement Levers Cerebras MUST Hit (WSE-4)
If WSE-4 doesn't deliver these, IA's view is Cerebras becomes a commodity inference vendor with single-customer concentration risk. If it does, gross margins could justify the valuation.
1. FP8 / OCP MX4/6/9 support
- Half (or quarter) the memory per weight → double (or quadruple) effective model size in same SRAM.
- Industry-standard already; refusing to adopt is "infuriating incompetence."
2. Hybrid-bonded SRAM wafer
- Physically stack a second wafer of pure SRAM on top of the logic wafer using TSV (Through-Silicon Vias).
- "Step-function" capacity increase — solves the binding constraint on which models can be served.
- Cerebras is uniquely positioned to pull this off because they already understand routing-around-defects on a wafer mesh.
- Catch: thermal management becomes a "satanic nightmare" but should be solvable.
3. WSE I/O upgrade
- Current 1.2 Tbps total per wafer is the binding constraint preventing KV cache offload.
- IA estimates need is 5–10× current.
- Currently 12× reticles on top + 12× reticles on bottom edge of the WSE; either each reticle supports 50 Gbps ("horrific shoreline density") or only one edge is being used at 100 Gbps. Both scenarios "frankly embarrassing."
- IA: "The time for excuses is over. You are gonna raise money from IPO. Go use money to make product 100× better."
Customer Concentration + Competitive Context
G42 dependency
- ≥ 90% of revenue last 2 years came from G42 (UAE-backed).
- Andrew Feldman is credited with personally securing this lifeline. IA's image: Feldman bowing to MBS.
- One customer = fragile. Diversifying away from G42 is what the OpenAI deal does — but now concentration shifts to OpenAI.
Nvidia × Groq vs OpenAI × Cerebras deal timing
- Dec 24, 2025: Nvidia/Groq deal announced.
- Jan 14, 2026: OpenAI/Cerebras deal announced.
- Two of IA's contacts (independently, both biased) say:
- Nvidia/Groq was rushed in <1 week (deals normally take months).
- Rumors of OpenAI/Cerebras leaked, prompting Nvidia to panic.
- Groq tried to sell themselves earlier in 2025 and "nobody bit."
- Read-throughs:
- Nvidia views fast-inference specialists as a strategic threat.
- Groq's failure to find a buyer is bearish for the category of fast-inference startups.
- Was Jensen proactive or reactive? IA: "Something to think about."
Management Read
Andrew Feldman (CEO)
- Ex-employees describe him as ignorant, arrogant, technically incompetent.
- IA's read after a 1-hour meeting two years ago: a true believer. Marketing bullshit pisses IA off, but his passion, drive, commitment, and fundraising hustle are real.
- His passion is "very obvious in person" and probably the reason Cerebras is still alive — he had to convince the Middle East anchor to save the company until OpenAI stepped in.
Sean Lie (CTO)
- IA: "an invertebrate who does not take WSE flaws seriously and gives lame excuses."
- Has not pushed for FP8, larger cores, or WSE I/O fixes — three things every engineer IA spoke to says are obvious.
IA's overall take
"In a way, I admire them both for suffering through years of fail, several near-death experiences, and finally making it. If nothing else, Cerebras is a classic Silicon Valley story: 1. Try to make crazy thing. 2. Company almost die several times. 3. Watch everyone else go to the moon as you eat shit. 4. Finally make it. Get rich and change computer history."
Side Observations
- GTC 2025 challenge coin. IA crashed a Cerebras GTC party with someone else's ticket. Took home a "fake gold" challenge coin reading "In Fast Inference We Trust." The coin oxidized. IA uses it as a metaphor for the company: training (the original goal) is dead, but the underlying technology is solid and re-usable for ultra-fast inference.
DD Question Playbook for IPO Meetings (Chapter 6 reproduced)
IA explicitly wrote this for finance people meeting management. Treat as a checklist for Pink to pull from for an IBKR-broker-call or any chance to ping Cerebras IR pre- or post-IPO.
How to increase WSE value?
- [ ] What floating-point formats will WSE-4 support? FP8? OCP MX4/6/9?
- [ ] What are the plans to meaningfully increase SRAM capacity?
- [ ] Hybrid bonding of SRAM wafer?
- [ ] Larger cores with higher local SRAM per core while reducing NoC area?
- [ ] Custom ultra-high-density SRAM cells?
- [ ] What is your plan to decouple model weights from KV cache (i.e., offload KV cache)?
- [ ] How much WSE I/O bandwidth is needed to stream KV cache? (Hint: 5–10× current 1.2 Tbps.)
How to reduce WSE cost?
- [ ] What limits WSE yields?
- [ ] Parametric yield of cores?
- [ ] Yield from packaging and vertical power delivery?
- [ ] Reliability issues while handling the wafers?
- [ ] Which cost factors will benefit from economies of scale?
- [ ] Wafer price?
- [ ] Custom cooling/mechanical/tooling?
- [ ] When will you design an ASIC to replace the expensive AMD/Xilinx FPGAs for WSE I/O → Ethernet translation?
Bull / Bear
| Bull case | Bear case |
|---|---|
| Genuine technological moat (cross-reticle stitching IP, PVT calibration) | 43% hardware GM in semis is weak; warrants distort the real number |
| OpenAI as anchor customer; total revenue 1.76× YoY (290 → 510) | ≥90% of historical revenue from one customer (G42); now shifting concentration to OpenAI |
| Inference speed is a real product the market wants (3K tok/s/user vs GPU ~100) | KV cache + small SRAM caps which models can be served economically; long context is deadly |
| WSE-4 has obvious upgrade path (FP8, hybrid bonding, more I/O) | Three WSE generations have failed to deliver any of those upgrades — execution risk |
| Author is buying $50K at IPO | Author calls it a "small YOLO," not a thesis position |
| Cerebras may be the only company that can hybrid-bond an entire SRAM wafer | Groq tried to sell themselves in early 2025 — bearish for the whole fast-inference category |
| Net cash flow positive at the line that matters; revenue growing | Operating losses still $146M in 2025; "net income" inflated by other-income items |
Bottom Line
IA's stance: buy a small position for fun, not a thesis position. Sized "$50K YOLO" out of a $2.1M trading account = ~2.4% of the trading book. The technology is real, the economics are not yet good, and the path to good economics depends entirely on whether management can stop making bone-headed decisions in WSE-4.
For Pink's positioning: similar sizing logic applies. A small IPO allocation as a tactical / event-driven trade with explicit downside acceptance — not a long-term semis hold like ALAB or NVDA. Re-evaluate once WSE-4 spec sheet drops or once the OpenAI 2 GW commitment milestones are publicly disclosed.
Open Questions / Things to Verify
- [ ] Cross-check 2024 hardware GM (35%) against the actual S-1 filing — IA cites the prior S-1 from "two years ago." Did 2024 finals come in different?
- [ ] Verify 33.4M warrant share count and $0.00001 strike against actual S-1 page 81 (figure 4.2 in the IA report screenshot looks legit but worth checking).
- [ ] Does Cerebras' S-1 disclose the OpenAI MRA committed-delivery schedule? If yes, that's the time-series Pink should track.
- [ ] What does SemiAnalysis or another technical source say about IA's KV-streaming math? Sanity-check the "28× / 40× short" claims.
- [ ] Confirm "GPT-OSS-120B is what Cerebras serves OpenAI Codex on" — IA flags this as "through a source, I believe" (i.e., not 100%).
- [ ] WSE-3 rack ASP "$297K" is IA's back-calc, not disclosed. Cross-check vs S-1 customer commitment dollar amounts when the warrant tranches publish.
- [ ] Map this against Pink's existing AI-infra exposure (NVDA, ALAB, AVGO, semi names) — is CBRS additive or redundant?
Related
- CBRS — Canonical Cerebras page (updated to v2 thesis after this synthesis)
- cbrs-irrational-analysis — Prior IA edition (Oct 2024, first IPO attempt)
- cbrs-cerebras-deck — Cerebras product/architecture deck
- cbrs-filings — Filings tracker for CBRS
Source Files
- IA report PDF (39 pages):
KB/raw/equity-research/cbrs-irrational-analysis-2026.pdf - IA Substack URL: https://irrationalanalysis.substack.com/p/cerebras-cbrso-equity-research-report
- IA Twitter: @insane_analyst