Speed Is a Feature. Latency Is a Tax.
In 2006, Amazon discovered something that changed how they thought about infrastructure forever: every 100 milliseconds of added latency cost them 1% in sales.
Think about that math. A half-second delay? 5% revenue loss. Scale that across billions in GMV and you're looking at hundreds of millions left on the table—not because of bad product-market fit, not because of poor marketing, but because the bytes took too long to move.
Yet most founders I talk to have no idea what their P99 latency looks like. They can tell me their MRR down to the dollar. They can recite their CAC and LTV ratios. But ask them how fast their API responds under load and you get a shrug.
This is leaving money on the table. Let me show you why.
The Compounding Effect Nobody Measures
Latency doesn't just cost you one conversion. It cascades:
- User perception: 53% of mobile users abandon sites that take longer than 3 seconds to load (Google, 2017—still true in 2026)
- SEO penalties: Core Web Vitals are a ranking factor. Slow sites get buried. Less organic traffic = higher CAC
- Conversion dropoff: Every 1-second delay in page response reduces conversions by 7% (Akamai)
- Retention damage: Users who experience slow performance are 3x less likely to return
Now layer AI into this equation. If your LLM-powered feature takes 8 seconds to respond, users will tab away. The magic of AI evaporates when it feels slower than just Googling the answer.
Speed isn't a nice-to-have. It's a moat.
Where Latency Hides
The problem with latency is that it's fractal. It compounds at every layer:
1. DNS Resolution
Before your user even hits your server, their browser has to look up your domain. A slow DNS provider adds 50-200ms before anything happens. Most people never think about this.
2. TLS Handshake
HTTPS is non-negotiable in 2026. But the handshake (especially without session resumption or 0-RTT) can add 100-300ms. Use modern TLS 1.3, enable OCSP stapling, and configure your CDN correctly.
3. Geographic Distance
Physics is undefeated. Speed of light in fiber is ~200,000 km/s. A round trip from San Francisco to Frankfurt? That's 150ms minimum, before any processing. If your users are global and your infra is single-region, you're paying this tax on every request.
4. Database Queries
The N+1 query problem is alive and well. One slow, unoptimized query can turn a 50ms response into a 2-second timeout. Index your tables. Use read replicas. Cache aggressively.
5. Third-Party APIs
Every external dependency is a latency lottery ticket. Stripe, Twilio, Auth0—these are rock-solid services, but if you're chaining 5 external calls in series, you're adding 500ms+ of uncontrollable lag. Parallelize. Use webhooks instead of polling. Fail fast.
6. LLM Inference
GPT-4 streaming feels fast because you see tokens as they generate. But time-to-first-token can still be 1-3 seconds depending on prompt length and model load. If you're using an LLM for every user action, you've just made your product feel sluggish. Use smaller models for simple tasks. Cache common outputs. Route intelligently.
The 100ms Rule
Here's my north star: every user-facing interaction should complete in under 100ms.
Not "sometimes." Not "on average." P95 should be sub-100ms.
Why 100ms? Because that's the threshold where interactions feel instant. Below that, the system feels like an extension of thought. Above that, users start to notice the wait.
Can you always hit this? No. LLM inference, video encoding, complex analytics—these take time. But for 80% of actions (page loads, form submissions, navigation, search), 100ms is achievable with the right architecture.
How to Optimize (Without Burning Out Your Team)
Measure first. You can't optimize what you don't measure. Instrument your stack with proper observability:
- Use RUM (Real User Monitoring) tools like Datadog, Sentry, or Vercel Analytics
- Track P50, P95, P99 latencies—averages lie
- Break down latency by endpoint, region, and user segment
Low-hanging fruit:
- CDN everything static. HTML, CSS, JS, images—serve from the edge. Cloudflare, Fastly, or Vercel make this trivial.
- Enable HTTP/2 and Brotli compression. Cuts payload size 20-40%.
- Lazy-load non-critical resources. Don't block render waiting for analytics scripts.
- Use edge functions for dynamic content. Cloudflare Workers, Vercel Edge, or Fastly Compute can run logic closer to users.
Architecture patterns that matter:
- Read replicas: Route read-heavy queries to replicas in user-local regions
- Write-back caching: Accept writes locally, sync async to the source of truth
- Circuit breakers: Fail fast when dependencies are slow—don't wait for timeouts
- Request coalescing: Batch identical concurrent requests to avoid redundant work
When to use edge compute:
Not everything belongs at the edge. Edge is great for:
- Authentication checks
- A/B test routing
- Geolocation-based redirects
- Simple API transformations
Edge is bad for:
- Complex database queries (cold start + network latency to DB kills the win)
- Heavy computation (limited CPU/memory at edge nodes)
- Stateful workflows (edge is ephemeral)
The AI Latency Explosion
AI is making the latency problem worse—and most teams aren't ready.
Consider a typical AI-powered SaaS flow:
- User submits query (50ms API round-trip)
- Retrieve context from vector DB (150ms for embedding + similarity search)
- Send prompt to LLM (2000ms time-to-first-token, 5000ms total streaming)
- Parse response and update UI (50ms)
Total: 7+ seconds. That's an eternity in UX time.
How to fix it:
- Stream everything. Show tokens as they generate. Perceived latency drops dramatically.
- Prefetch aggressively. If you know what the user is likely to ask next, pre-generate responses.
- Use smaller models for simple tasks. GPT-4 for summarization is overkill. Use Llama 3.1 8B or Gemini Flash.
- Cache at the semantic level. Embed the user query, check vector similarity against previous queries, serve cached responses if close enough.
The Bottom Line
Latency is invisible until it isn't. By the time users complain, you've already lost them.
The best companies treat latency as a product metric—not an ops metric. They have latency budgets. They monitor it in real time. They optimize for P99, not averages.
Because at scale, every millisecond compounds into revenue. Amazon's 100ms = 1% rule isn't just an Amazon problem. It's everyone's problem.
The question is: are you measuring it?
If not, you're paying the invisible tax—and you don't even know how much.
Follow the journey
Subscribe to Lynk for daily insights on AI strategy, cybersecurity, and building in the age of AI.
Subscribe →