World Models Replace Foundation Models as AI's Core Primitive

The Signal

LeCun is publicly reframing the entire AI stack. World models—systems that predict evolution of environments and predict effects of actions in those environments—are now the foundation layer, not giant text-prediction models. This isn't incremental. It's a claims reversal about what gets trained first and what everything else builds on. The field is listening.

IMPORTANT

Foundation models become downstream applications of world models, not the other way around.

What's Moving

World models as substrate — LeCun arguing these aren't video-generation systems (JEPA reference); they're learned representations of how systems evolve under action. Encoders from diverse world models become reusable for downstream tasks. The architecture flips. (via @ylecun)
Gemini 3.5 Flash dominance in chat — @bindureddy actively migrating workloads from OpenAI/Anthropic to Flash on instruction-following and grounding. Paired with prediction that 3.5 Pro will top GPT 5.6 and Opus. Adoption signal, not just benchmarks. Pricing still wrong though.
Agentic orchestration is the new skill frontier — @svpino: "difference between junior and senior developers." LobeHub's Chief Agent Operator dispatching parallel Claude Code sessions across GitHub issues. Spoki managing entire customer journeys through agents. The interface is disappearing; prompt becomes the API.
Sub-agents everywhere scaling — @svpino explicitly building everything as agents ("Everything that can be an agent, should be an agent"). Anthropic pushing TUI support for agents. Agent swarms with task-specific models (Opus for frontend, GPT 5.5 for backend, Flash for cheap loops) becoming standard practice.
Local inference catching up — gemma-4:26b running native on Mac Studio. Hermes going mobile natively. This isn't theoretical—practitioners are shipping with sub-Opus models locally.

Crosscurrents

Pricing as execution risk — OpenAI's GPT 5.5 priced out of adoption despite being "exceptionally good at long-running complex tasks." Google priced Gemini 3.5 Pro wrong. DeepSeek Flash dirt cheap, winning on unit economics for batch/agentic loops. The best model doesn't win; the right price-to-capability ratio does.
Agent coding performance variance — Flash gets "bad reputation" for agentic coding loops but excels at chat and instruction-following. Use-case specialization, not general superiority. Practitioners are learning to route work by task type, not by model brand.

Tradecraft

BULL

World model framing validates years of Anthropic's planning work. If encoders from action-conditioned models become the reusable layer, that's a moat for anyone shipping multi-modal world models early.

BEAR

If world models replace foundation models as the primitives, current training runs (predicting tokens at scale) may be architecturally obsolete. Replatforming costs are immense.

WATCH

When first major world model framework (not video-gen, but action-conditioned representation learning) ships open or via major lab. That's the signal LeCun's thesis is moving from theory to practice.

Desk Notes

@ylecun — Pushing against pixel-level video generation; emphasizing action-conditioned state prediction and JEPA. Foundational reframe, not optimization.
@bindureddy — Ruthless on model routing by task; Flash for chat/cheap, Opus for complex coding, DeepSeek for scale. Adoption as lens, not benchmarks.
@svpino — Treating agents as the unit of work composition. LobeHub solving the parallel dispatch problem is the practical unlock.
@emostaque — Quiet on architecture; noted autoregression → diffusion inference weight conversion and Stability's regretted decision to share inference revenue with open-source devs.