Home About Projects Blog Subscribe Login

Why I'm Betting on Compound AI Over Foundation Models

The LLM race is a distraction. The real alpha is in orchestration, routing, and RAG infrastructure. Here's why the next wave won't come from a bigger model.

Everyone wants to know which model will win. GPT-5 or Claude Opus 4? Gemini 3 or whatever Llama becomes? The industry treats this like a horse race. Benchmarks. Leaderboards. Capabilities comparisons. Who's winning MMLU this week?

It's the wrong question.

The real alpha isn't in the next foundation model. It's in the infrastructure layer above it. In orchestration. In routing. In retrieval. In the compound systems that treat models as commodities and build differentiation everywhere else.

I call this Compound AI. And it's where the value is moving.

The Foundation Model Trap

Here's what the hype cycle wants you to believe: Better models → better products. Get access to GPT-5 before your competitors, win. Train a bigger model, win. Fine-tune harder, win.

Except that's not how this plays out in production.

In production, the problems look like this:

None of these problems are solved by a better model. They're solved by using the right model for the right task. And that requires orchestration.

What Compound AI Actually Means

Compound AI is what happens when you stop treating models as monolithic solutions and start treating them as components in a larger system. The system decides:

The architecture looks less like "send prompt to GPT-4" and more like:

  1. Classify the request → simple query or complex reasoning?
  2. Route accordingly → Gemini Flash for simple, Claude Opus for complex
  3. Pull relevant context from RAG if needed (embeddings + vector search)
  4. Generate response
  5. Validate quality with a smaller model (contradiction detection, factuality check)
  6. Cache aggressively for similar future queries

This isn't theoretical. This is how every production AI system that ships at scale actually works.

The Unit Economics Tell the Story

Let's run the numbers on a real scenario: customer support automation.

Naive approach: Send every customer query to GPT-4.

Compound AI approach:

Same quality. 87% cost reduction. This is the difference between "we can't afford to scale this" and "we can run this profitably at 10x volume."

Now multiply this across every AI product being built. The companies that figure out orchestration and routing will have structural cost advantages measured in millions per quarter.

Retrieval Is the Dark Horse

Here's the part most people miss: RAG (Retrieval-Augmented Generation) is more valuable than most model improvements.

Why? Because foundation models are generalists. They know a lot about everything and not enough about anything specific. Your business lives in the "specific" zone:

No amount of pretraining makes GPT-5 know your Q3 sales pipeline better than a vector database seeded with your CRM exports and meeting transcripts.

This is why every serious AI deployment I see follows the same pattern:

  1. Embed your domain knowledge (documents, transcripts, databases)
  2. Build semantic search (Pinecone, Weaviate, pgvector, whatever)
  3. Inject relevant context dynamically into model prompts
  4. Use a mid-tier model with perfect context instead of a top-tier model guessing

Result: Better answers. Lower costs. No dependency on frontier model access.

RAG infrastructure is becoming as critical as the models themselves. Maybe more critical.

Chain-of-Thought as Infrastructure

Another pattern I see everywhere in production: multi-step reasoning chains.

Instead of asking GPT-4 to "solve this complex problem," you break it into orchestrated steps:

  1. Planning model (cheap, fast): "What's the strategy here?"
  2. Execution model (specialized): "Do the work."
  3. Validation model (different architecture): "Is this correct?"

This costs less, runs faster, and produces better results than a single monolithic call.

Example from cybersecurity threat analysis:

Each model does what it's best at. Total cost: 60% less than using GPT-4 for the whole thing. Quality: higher, because each step is optimized.

This is Compound AI. This is the stack that matters.

Why OpenAI Can't Win This Alone

OpenAI has the best models. For now. Maybe they'll keep that lead. Maybe they won't.

But even if they do — they can't own orchestration.

Because orchestration lives at the application layer. It's domain-specific. It's use-case-specific. It requires knowing:

No model provider knows this. Only you do. Which means the value layer is moving up the stack — into the hands of companies and teams who build smart orchestration on top of commodity models.

This is the same pattern we saw with cloud infrastructure. AWS provides the primitives. But the real value is in how you architect on top of them. Same thing here.

The Playbook

If you're building with AI today, here's the strategy:

  1. Treat models as commodities. Don't build lock-in to GPT-4. Build an abstraction layer that can route between models.
  2. Invest in retrieval infrastructure early. Embeddings, vector search, semantic caching. This is your moat.
  3. Profile your traffic. 80% of requests are simple. Route them to cheap models. Reserve expensive models for where they matter.
  4. Build validation layers. Don't trust any single model. Use smaller models to check the work of bigger ones.
  5. Measure cost per query obsessively. Unit economics determine whether you can scale profitably.
  6. Design for latency. Parallel calls, streaming responses, async processing. User experience dies at 8-second response times.

This is infrastructure work. It's not sexy. It won't make headlines. But it's the difference between an AI demo and an AI business.

What This Means for the Market

The foundation model race will continue. Benchmarks will improve. Context windows will grow. Costs will (probably) come down.

But the **differentiation is moving elsewhere**. Into:

The companies building this infrastructure layer — the "Stripe for LLM orchestration," the "Datadog for AI observability," the "Cloudflare for model routing" — those are the next billion-dollar outcomes.

Not the next foundation model. The picks and shovels.

Why I'm Betting Here

I've spent twenty years in infrastructure. Building systems that need to work at scale, under pressure, with real money on the line. I know what separates demos from production.

And I can tell you: the hard part is never the model. The hard part is the orchestration. The retrieval. The caching. The error handling. The cost management. The latency optimization. The monitoring. The fallback strategies.

This is where the expertise lives. This is where the moats are. This is where value compounds.

Foundation models will keep improving. Great. That makes them better commodities. Which makes orchestration more valuable, not less.

So yeah. I'm betting on Compound AI. Not because foundation models don't matter. Because they matter so much that the real game is in how you use them.

The picks-and-shovels playbook has worked for every gold rush in history. This one won't be different.


Follow the journey

Subscribe to Lynk for daily insights on AI strategy, cybersecurity, and building in the age of AI.

Subscribe →