EDITOR’S NOTE

The money is moving fast this week. So are the mistakes.

  • OpenAI just hit $25 billion in annualized revenue, and somehow still found a way to hand Anthropic a gift.

  • Anthropic said no to a deal that OpenAI said yes to: 1.5 million subscribers left in under 48 hours.

  • Lio just raised $30 million to automate enterprise procurement, and Andreessen Horowitz wrote the check.

The deep irony: the most technically interesting story this week is a paper about 3D reconstruction that nobody outside a research lab will read. Meanwhile, a single PR decision moved more users than any benchmark ever could.

SIGNAL DROP

1. Lio Lands $30M to Automate Procurement
Andreessen Horowitz led a $30 million Series A into Lio, an AI startup targeting enterprise procurement workflows. Boring category. Massive spend. Any vendor still selling manual approval chains to Fortune 500s should be nervous, because Lio is now well-funded and pointed directly at them. (Source: TechCrunch)

2. OpenAI Hits $25B Annualized Revenue. Anthropic Is Running.
According to The Information, OpenAI has crossed $25 billion in annualized revenue, but Anthropic is closing the gap faster than most expected. That gap is the number to watch, not the headline figure. (Source: r/singularity via The Information)

3. OpenAI Shed 1.5M Subscribers in 48 Hours
After CEO Sam Altman reportedly backed a deal that Anthropic had rejected, OpenAI lost 1.5 million subscribers in under two days, according to r/ChatGPT. At $25B in revenue, that's survivable. But trust, once burned, compounds. (Source: r/ChatGPT)

DEEP DIVE

Context

3D reconstruction from images has been quietly eating the computer vision world for the past two years. Feed-forward transformer models changed the game: instead of running slow optimization loops, you'd pass images through a network and get geometry out the other end. Fast, clean, increasingly good.

But there's a catch. A big one.

Methods like VGGT and π³ (two of the current best) scale quadratically with the number of input images. Double your image count, quadruple your compute. That's not a minor inconvenience. That's a wall. If you're reconstructing a single room from 20 carefully selected photos, quadratic scaling is fine. If you're trying to reconstruct a city block from a drone video with thousands of frames, you're cooked.

What Happened

Researchers have introduced ZipMap, a stateful feed-forward model that claims to solve this scaling problem directly. According to the paper, ZipMap achieves linear-time 3D reconstruction, meaning compute scales proportionally with image count rather than exploding quadratically.

The key claim: ZipMap matches or surpasses the reconstruction quality of quadratic methods like VGGT and π³, while processing large image collections at a fraction of the cost. The paper also describes the approach as "bidirectional," which I'll get to in a moment.

And there's a second trick: test-time training. The model doesn't just run inference and call it done. It adapts at test time using the specific scene it's processing.

Technical Analysis

The quadratic scaling problem in attention-based transformers is well understood. Every image attends to every other image, so costs grow as O(n²). Sequential reconstruction approaches already exist to address this: process images one at a time, maintain some running state, keep costs linear. But sequential methods typically degrade in quality because early frames can't "see" future frames. You lose global coherence.

ZipMap's "stateful" and "bidirectional" framing suggests it's threading this needle somehow, maintaining a compressed scene representation that gets updated as new frames arrive while also allowing information to flow in both directions. The exact mechanism would require a closer read of the full paper, but the bidirectionality is the interesting part. (My analysis: this likely involves some form of recurrent state combined with a second backward pass, similar in spirit to how bidirectional LSTMs work, but implemented over a scene representation rather than a sequence of tokens.)

Test-time training is the other lever. Rather than using a fixed pretrained model, ZipMap fine-tunes itself on the specific scene at inference time. This is computationally expensive per-scene, but the linear base cost gives it room to breathe. The tradeoff is roughly: pay a bit more per scene, get dramatically better results on that scene.

So the actual innovation isn't one thing. It's three things working together: linear-time state management, bidirectional information flow, and scene-specific adaptation.

Implications

This matters for any application that needs to process large image collections in real time or near-real time. Autonomous vehicles processing continuous video. Robotics systems building maps from onboard cameras. Large-scale photogrammetry pipelines. AR applications that reconstruct environments on-device.

Quadratic scaling was a soft ceiling on all of these. Linear scaling removes it. Not a small change.

But the test-time training component complicates deployment. If every new scene requires fine-tuning, you're adding latency and compute that may not be acceptable in real-time systems. (This is the tension in the paper that I'd want to understand better before drawing firm conclusions.) There's likely a version of this work where test-time training is optional, used only when quality matters more than speed.

My Take

I'm genuinely interested in this one. The 3D reconstruction space has been stuck in a frustrating pattern: better quality, worse scaling, or better scaling, worse quality. ZipMap is claiming both simultaneously, and the bidirectionality claim is what makes me think they might actually be onto something real rather than just repackaging a known tradeoff.

That said, papers that claim to beat quadratic scaling while matching quadratic quality deserve skepticism until the benchmarks are independently reproduced. The summary is thin on specific numbers, and "matching or surpassing" is doing a lot of work in that abstract.

But the core architecture direction is right. Linear-time stateful reconstruction with test-time adaptation is exactly the shape of solution this problem needed. If the quality claims hold up, this is the kind of work that gets quietly absorbed into every major 3D vision pipeline within 18 months. No announcement. Just better products.

That's usually how the real advances land.

- The AI finds the signal. We decide what it means.

PARTNER PICK

Synthesia turns text into video without a camera, actor, or green screen. You write a script, pick an avatar, and get a finished video in minutes. The output quality has gotten genuinely good. Lip sync is tight. Lighting looks natural.

Worth trying if you're drowning in async updates, need localized training content, or want to test video messaging without the production overhead. The avatars still feel slightly uncanny if you stare too long, and custom voices cost extra. But for 90% of internal comms and educational use cases, it's faster and cheaper than the alternative.

Some links are affiliate link. We earn a commission if you subscribe. We only feature tools we'd use ourselves.

TOOL RADAR

Modular Diffusers (HuggingFace)

HuggingFace's new framework lets you build diffusion pipelines from composable blocks instead of monolithic, hardcoded architectures. Swap components, chain models, customize inference flows without rewriting everything from scratch. For ML engineers who've wrestled with Diffusers' rigid pipeline structure, this is genuinely useful. The abstractions look clean. Whether the community adopts it depends on documentation quality and how much migration pain it introduces.

Worth it if: you're building custom diffusion workflows and hate forking source code.
Skip if: you just run standard Stable Diffusion pipelines as-is.

Figr Design

Product-aware AI for UX design. The pitch: it understands your product context before generating UI suggestions, not just pattern-matching from a generic design library. Sounds good. The details are thin, and "thinks through UX" is doing a lot of heavy lifting in that tagline. Could be sharp for early-stage product teams. Could also be another AI wrapper on Figma components.

Worth it if: you need faster UX iteration with actual product context baked in.
Skip if: your design team already has a solid system and process.

Some links are affiliate link. We earn a commission if you subscribe. We only feature tools we'd use ourselves.

FACT CHECK

AI MYTH BUSTER

Myth: More parameters = smarter model.

Everyone assumes model size is the scoreboard. A trillion parameters sounds impressive, so it must be better than a 7B model, right? This belief comes from the early scaling era, when throwing more compute at a model reliably improved benchmarks. Bigger was better. For a while.

Wrong.

The relationship between parameter count and actual capability broke down once researchers got serious about training data quality and architectural efficiency. Mistral 7B outperformed GPT-3.5 on several reasoning benchmarks when it launched. Seven billion parameters against 175 billion. And GPT-3.5 wasn't even the ceiling at the time.

Thinking parameter count determines intelligence is like judging a chef by how many knives they own. Technique matters. Ingredients matter. The knives are almost beside the point.

What actually drives model quality is the combination of training data curation, the quality of that data, fine-tuning methodology, and alignment work. A massive model trained on internet slop will lose to a smaller model trained on carefully filtered, high-quality text. Garbage in, garbage out. This is why Phi-3 Mini, a 3.8B parameter model from Microsoft, punches well above its weight class on reasoning tasks. The training data was deliberately curated toward textbook-quality content.

But the parameter myth persists because big numbers are easy to market. "We trained on 1 trillion tokens of premium data" is a harder sell than "our model has 1 trillion parameters."

So next time someone flexes a parameter count at you, ask about the training data. That's the actual story.

Parameter count is a spec sheet number, not a capability guarantee.

QUICK LINKS

Large genome model: Open source AI trained on trillions of bases Evo 2 learned regulatory DNA and splice sites from bacterial, archaeal, and eukaryotic genomes without explicit instruction.

Apple Music adds optional labels for AI songs and visuals Voluntary transparency tags for AI-generated tracks, compositions, and artwork. No label means no assumption of AI use.

Accurate and Efficient Hybrid-Ensemble Atmospheric Data Assimilation in Latent Space with Uncertainty Quantification HLOBA combines hybrid-ensemble methods with latent space efficiency for weather prediction and climate reanalysis.

Inference-Time Toxicity Mitigation in Protein Language Models Logit Diff Amplification prevents domain-adapted protein models from generating toxic sequences without retraining.

FedCova: Robust Federated Covariance Learning Against Noisy Labels Federated learning framework that hardens models against noisy labels through covariance-based robustness.

OllamaFX v0.5.0 released Open source Ollama desktop interface with RAG, multimodal support, and improved chat management.

TRENDING TOOLS

Tools gaining traction this week based on our source data.

  • Evo 2 — Open source AI trained on trillions of DNA base pairs. Trending on r/artificial for actual biotech applications.

  • Modular Diffusers — Composable building blocks for diffusion pipelines. Gaining traction among researchers simplifying model architecture.

  • Figr AI — Product-aware AI for UX research. Topped Product Hunt this week as design teams adopt it.

Some links are affiliate link. We earn a commission if you subscribe. We only feature tools we'd use ourselves.

Keep Reading