The 3 AM Phone Call
It's always 3 AM. Or at least it feels that way.
Your monitoring screams. Traffic patterns don't make sense. Customers are complaining. The executive team is waking up to red alerts. And you—whoever "you" is on rotation tonight—have about 90 seconds to decide whether this is a false positive, a minor issue, or the start of something that will define your career.
The runbook says: "Identify. Isolate. Contain. Eradicate. Recover. Post-mortem."
Beautiful. Clean. Linear.
Reality? Not even close.
What the Playbook Misses
I've led incident response for over 20 years, from DDoS mitigation at Link11 to infrastructure resilience at scale. I've seen everything from script-kiddie attacks to nation-state operations. And here's what I've learned:
The playbook is a starting point, not a finish line.
The frameworks—NIST, SANS, your vendor's shiny "best practices" PDF—they're all useful. But they're abstractions. Real incidents are chaotic, political, time-pressured nightmares where the org chart suddenly matters more than the network topology.
1. The First Decision Is Always a Guess
You don't have complete information. You never will.
Your logs are incomplete. Your monitoring has blind spots. The attack vector might be one you've never seen before. And every minute you wait for "perfect data" is a minute the attacker is moving laterally, exfiltrating data, or preparing the next stage.
The skill isn't knowing—it's deciding with 30% confidence and adjusting fast.
In 2019, we had a customer under a multi-vector DDoS attack: volumetric floods combined with application-layer exploits targeting their auth layer. Initial telemetry suggested the volumetric attack was the main threat. Wrong. The L7 attack was the real payload; the DDoS was cover.
We pivoted in 12 minutes. But only because we expected to be wrong and designed our response to allow for rapid course correction.
Lesson: Build decision trees, not decision waterfalls. Every step should have an escape hatch.
2. Isolation Sounds Great Until It Breaks the Business
"Isolate the affected system." Sure. Easy.
Except that "affected system" is the payment gateway. Or the login service. Or the database that every microservice depends on.
Isolation isn't a binary—it's a negotiation between security and availability. And that negotiation happens live, under pressure, with incomplete information and executives screaming in Slack.
You need to know the business impact of every isolation decision before the incident starts.
This means:
- Dependency maps (what breaks if X goes down?)
- Revenue impact models (what does 10 minutes of downtime cost?)
- Fallback architectures (can we route around this?)
At Link11, we built "degradation profiles" for every critical service. If we had to pull the plug on something, we knew exactly what would break, how much revenue we'd lose per minute, and what fallback path existed.
When the incident hit, we didn't debate. We executed.
3. Communication Is Half the Battle (And You'll Lose It)
Technical response is table stakes. The real challenge? Managing the humans.
You'll have:
- Executives demanding updates every 5 minutes
- Customer success forwarding angry emails
- Legal asking if you need to file breach notifications
- PR drafting statements before you even know what happened
- Engineers in 4 time zones asking what they should be doing
If you don't control the narrative, the narrative controls you.
Best practice: Assign a communications lead who is NOT on the technical response team. Their only job is to manage stakeholder comms, draft updates, and shield the responders from noise.
Every major incident I've seen derailed? Half were technical failures. The other half were communication breakdowns.
4. The Attacker Is Adapting Faster Than Your Runbook
Here's the thing about sophisticated attackers: they know your playbook. They've read the same SANS courses. They know you're going to isolate, contain, eradicate.
And they're already three steps ahead.
Modern attacks are adaptive. Ransomware operators watch your response and pivot. APT groups plant multiple persistence mechanisms knowing you'll find the obvious one and declare victory.
If your response is scripted, you're fighting the last war.
The best responders I've worked with treat the runbook like jazz: there's a structure, but you improvise within it. You anticipate the adversary's next move. You set traps. You use deception.
In one case, we left a "cleaned" honeypot server online after eradication, instrumented to hell. The attacker came back 48 hours later. We captured their entire toolkit, pivoted to their C2 infrastructure, and fed threat intel to law enforcement.
The runbook said "eradicate." We said "bait."
What Actually Matters
After two decades, here's my mental checklist for incident response:
Before the Incident
- Know your crown jewels. What's the one system that, if compromised, ends the company?
- Map dependencies. What breaks if X goes down? Build the graph.
- Define degradation modes. How do you operate at 80%? 50%? 20%?
- Assign roles in advance. Incident commander, comms lead, technical leads. No debates during the fire.
- Simulate chaos. Tabletop exercises are fine. Chaos engineering is better. Break things on purpose.
During the Incident
- Decide fast, adjust faster. Waiting for certainty is losing.
- Communicate in layers. Execs get the 3-sentence summary. Engineers get the technical detail. Don't mix them.
- Assume persistence. If they got in once, they have a backup. Find it.
- Log everything. You'll need it for forensics, legal, and the post-mortem. Future-you will thank present-you.
- Know when to escalate. Pride kills companies. If you're out of your depth, bring in help.
After the Incident
- Blameless post-mortems. If people fear retribution, they'll hide problems. You want transparency, not theater.
- Fix the system, not the symptom. "We'll train people better" is not a fix. "We'll automate this so humans can't screw it up" is.
- Update the runbook. What did you learn? What would you do differently? Write it down while it's fresh.
- Measure everything. Time to detect, time to respond, time to recover, business impact. You can't improve what you don't measure.
The Real Lesson
Incident response isn't a checklist. It's a discipline.
The runbook is your foundation. But the real skill is knowing when to follow it and when to throw it out the window.
The best responders aren't the ones who memorize frameworks. They're the ones who can think clearly under pressure, make hard calls with incomplete data, and adapt faster than the adversary.
You can't learn that from a PDF. You learn it from scars.
So the next time you're on-call and the alerts start firing at 3 AM, remember:
The playbook is a guide. Your judgment is the weapon.
Use it wisely.
Jens-Philipp Jung is CEO of Link11 and founder of Lynk. He's spent 20+ years in cybersecurity, from defending against nation-state attacks to building DDoS mitigation infrastructure at scale. He writes about infrastructure, security, and the messy reality of building resilient systems.
Follow the journey
Subscribe to Lynk for daily insights on AI strategy, cybersecurity, and building in the age of AI.
Subscribe →