Harness beats capability—routing layer now the binding constraint, not model weights

The Signal

The frontier model race is functionally over. Same model, different harness: Cline's GLM 5.2 experiments show an 11.2-point spread (57.3% → 68.5%) on coding tasks from orchestration alone. Meanwhile, Meituan-LongCat's 1.6T MoE hits Gemini/Opus parity on 50k Chinese ASICs with zero GPUs and owns the most-used slot on OpenRouter at 10T tokens. The US ban didn't pause the market—it revealed it: every team now routes across multiple models to avoid vendor lock-in. Open-source viability just flipped from "maybe someday" to "already shipping."

IMPORTANT

The bottleneck moved from "is the model smart enough?" to "can your harness route it efficiently?"

What's Moving

Routing architecture replaces model selection — @svpino's GLM 5.2 data (100 likes) shows harness optimization extracts 11 points without touching weights. Open-source models are "way more capable than we think"—the constraint is orchestration, not intelligence. (via @svpino)
Cost-to-capability asymmetry widens fast — @emostaque flags Meituan's ASIC-trained MoE at Opus 4.6 parity on inference, zero GPU dependency. Training efficiency in China flips US leverage entirely. Most-popular model on OpenRouter is now Chinese. (via @emostaque)
Vision gap still locks closed-source dependence — @bindureddy's hard check: GLM 5.2 can't "see images" and Chinese open-source lacks vision wholesale. Until that closes, you need Opus/GPT 5.5 for real work. Multi-LLM swarms work only for text-first workflows. (via @bindureddy)
Value-per-token-dollar metric kills benchmark chasing — @svpino's framework: measure agent ROI as (value produced / token cost). Below 1 = money sink; above 1 = business model. Two agents on same model, same tokens, can have wildly different unit economics. (via @svpino)
Fable unbans first—OpenAI structural disadvantage — @bindureddy flags the reversal: Fable 5 lifts today, GPT 5.6 stays banned two more weeks. Anthropic gets narrative momentum while OpenAI's superior Sol model locked behind government preview. Regulatory asymmetry compounds market timing. (via @bindureddy)

Crosscurrents

Robot skill distillation enters production — @drjimfan's ASPIRE (256 likes, 35 RTs) shows multimodal agents building self-evolving skill libraries across simulation + real robots. Training shifts from gradient descent to skill refinement. Early signal for embodied AI becoming tractable at scale, but still research-grade. (via @drjimfan)
Sonnet 5.0 disappoints relative to hype — @bindureddy reports it's a token guzzler, promotion pricing helps, but Opus 4.8 still wins cost/performance trade-off. The new frontier model underperforms expectations—suggests capability gains are plateauing faster than expected. (via @bindureddy)

Tradecraft

BULL

Multi-LLM orchestration is now production default, not experiment. Builders shipping this today without waiting for frontier models.

BEAR

Chinese vision models still blank—US closed-source models hold the real gate. Vision parity closes that moat completely.

WATCH

Fable 5 relaunch today + GPT 5.6 unbanning (expected this week). Watch if narrative shift back to frontier models breaks the routing momentum or just adds another layer to the harness.

Desk Notes

@svpino — Harness > model; value-per-token-dollar the only metric that matters; skill marketplaces (Agentverse 2.8M agents) show composability is real.
@bindureddy — Multi-LLM routing already standard; Opus 4.8 + GPT 5.5 xHigh for planning, Deepseek/GLM as workers; vision gap is the moat that still locks you into closed-source.
@emostaque — China's cost structure (energy-scaled intelligence on domestic ASICs) now asymmetric advantage; US labs can't compete on inference economics if training efficiency flips leverage.
@drjimfan — Embodied AI entering continual learning phase; skill libraries self-evolve; robot learning compounds indefinitely (not task-reset).