The King of His Own Realm
There’s a particular kind of phone call you get as a consultant. It always starts the same way. Something is broken, nobody knows why, and the person who would have known is no longer here.
The details change. The shape of the story never does.
But before I tell you this story, let me ask you something. Think of the most critical system at your company. The one that, if it went down right now, would ruin someone’s week. Now: how many people truly understand how it works? Not “could Google their way through it in a crisis.” Actually understand it.
Hold that number. We’ll come back to it.
The call
A few months ago I got one of these calls from a web agency. Decent size, twenty-something people, a portfolio of client sites they’d been running for years. The sites were slow, some were crashing under traffic, and costs were climbing in ways nobody could explain. Could I take a look?
The first thing I do in these situations is ignore the symptoms. Slow sites and rising costs are what brought them to the phone, but they’re almost never the actual problem. The actual problem is usually hiding behind something nobody wants to talk about.
So I asked: who built this infrastructure?
A pause. Then: “Marco. He left about eight months ago.”
There it was. The shape of the story, already visible.
Marco (not his real name, but let’s keep it simple) had been their senior engineer. The one who set everything up. And by “everything,” I mean everything. He’d built a clean architecture: containers for each client, Traefik as a reverse proxy, Let’s Encrypt for certificates, all configuration committed to Git. For a web agency this size, it was genuinely good work. Structured, versioned, thoughtful.
Here’s a question worth sitting with: if the architecture was good, if the work was solid, if the decisions were sound… then why did everything fall apart when one person left?
The deploy process went like this: you’d update the docker-compose file, commit it, then SSH into the server and run a git pull. There was no CI/CD pipeline, no automated deployment, no runbook. There didn’t need to be. Marco knew what to do. Marco always knew what to do.
When I started looking at what happened after he left, the picture got clearer. Traffic had spiked on one of their bigger client sites. The remaining team, staring at a system they could operate but not understand, did the only logical thing available to them. They doubled the server resources. More CPU. More RAM. Problem solved, temporarily.
Except the problem wasn’t CPU. It wasn’t RAM. The bottleneck was disk IOPS, a concept the remaining team had never encountered, on a volume type with burst credits they didn’t know existed. They had been running on borrowed performance for months without realizing it. When the credits ran out, everything slowed to a crawl, and they threw money at the wrong wall.
But here’s what bothered me more than the technical misdiagnosis. Every single client, every container, every database, every piece of state… all on one server. Not because Marco didn’t know better. He almost certainly did. But the decision to consolidate everything onto a single machine was never challenged, never documented, never questioned. Because questioning it would have meant questioning Marco.
And you don’t question the person holding everything together. Do you?
The pattern
I wish this were an unusual story. It is not.
Around the same time, I was looking at a completely different situation. A bank. Large, institutional, the kind of place where change requests go through committees with acronyms. The bank was an Azure shop. Everything in Azure, everyone trained on Azure, all the contracts and compliance and procurement pipelines built around Azure.
Except one engineer had started building in AWS.
He had reasons. Maybe AWS had better tooling for his specific use case. Maybe he liked it more. Maybe nobody told him not to. Whatever the reason, over the course of a couple years, he’d built a parallel universe. Working infrastructure, real services, actual production workloads. All in an AWS account that existed somewhat outside the normal governance.
He was competent. The things he built worked. The architecture was reasonable. By most technical standards, he did a good job.
Then he left.
His replacement had been hired specifically for Azure expertise. He was now responsible for an AWS account he’d never seen, built by someone he’d never met, with no documentation beyond what could be reverse-engineered from the console. When I got involved and started looking at the AWS bill, I found S3 buckets where seventy percent of the storage was ghost data, previous versions of files the application had already deleted, silently accumulating because versioning was enabled without lifecycle policies. Nobody had tagged anything. Nobody could explain what half the services were for.
The infrastructure worked. It just couldn’t be understood by anyone who didn’t build it.
Now, two completely different companies. Different industries, different scales, different technologies. But the same outcome. Why?
What everybody says
At this point in the conversation, someone always says it: “This is a documentation problem.”
Is it, though? Both of these engineers used Git. Both of them committed configuration. The code was there. The infrastructure was inspectable. If documentation is the answer, then why do companies with wikis full of documentation still end up in the same situation?
Someone else will say: “This is a bus factor problem.” But we’ve had a name for this risk since before most of us started our careers. We put it in our risk registers. We nod solemnly. Has naming it ever actually fixed it?
Then the more thoughtful person in the room will say: “This is a culture problem.” Which sounds right. It sounds deep. But what does it actually mean?
When you say “it’s a culture problem,” who exactly are you pointing at? If culture is the problem, and culture is everyone’s responsibility, then is it actually anyone’s fault? Or is “culture problem” just a polite way of saying “nobody’s problem”?
Let me try a different question.
The question nobody asks
Think about your company’s last round of promotions. Think about who got recognized, who got rewarded, who got called out as a high performer.
Was any of them promoted for writing documentation so thorough that they became unnecessary?
Did anyone receive a raise explicitly because they transferred all their critical knowledge to the team?
Was anyone praised in a performance review for making their own role redundant?
Or were the rewards given to the firefighter? The one who gets called at 2 AM. The one who “owns” the system. The one everybody depends on. The one who, if they gave two weeks’ notice tomorrow, would cause a small organizational crisis.
If your company says it values knowledge sharing, but promotes the person who is irreplaceable… which one is the real culture? The value statement, or the promotion?
Following the incentive
Marco at the web agency wasn’t hoarding knowledge out of malice. The AWS engineer at the bank wasn’t refusing to document things because he was lazy. So what were they doing?
What would you do, if you were good at your job, and you noticed that the people who get kept around are the ones nobody can replace? What would you do if you realized that the only real job security your company offers isn’t a title, isn’t a contract, isn’t a policy… it’s being the person who holds the keys?
You’d become that person. Of course you would. It’s not a character flaw. It’s a rational response.
The company puts up a poster that says “knowledge sharing is a core value.” The company also sends a very clear signal, through every emergency escalation, every “only Marco knows how to do this,” every skipped documentation sprint, every promotion given to the person who saved the day instead of the person who prevented the emergency from happening: be the one they can’t afford to lose, or be the one they don’t notice losing.
Which signal do you think people actually follow? The poster, or the promotion?
And so the kingdoms get built. Not by villains. By rational actors responding to a system that rewards indispensability. One server with everything on it, because the person who understood the risks was the same person whose job security depended on being the only one who understood the risks.
Are you the problem?
But here’s where it gets uncomfortable. We like to talk about this as something that happens to organizations. “The system” creates bad incentives. “The culture” rewards the wrong behavior.
But who is the system? Who is the culture?
Is it the manager who rewards the hero and overlooks the person who made heroics unnecessary? Is it the team lead who demanded documentation but never once opened the wiki to read it? Is it the engineer who built a kingdom because kingdoms feel safe and nobody ever told them it was a problem. Because, honestly, it wasn’t a problem. Not until they left.
Or is it all of them? Is it all of us?
Every time we celebrate the firefighter and overlook the fire marshal, we are building the next kingdom. Every time we say “just ask Marco, he knows” instead of asking “why isn’t this written down somewhere,” we are making a choice. A small one. A rational one. One that compounds.
The post-mortem after Marco leaves always identifies the same root cause: insufficient documentation, inadequate knowledge sharing, too high a bus factor. And the remediation plan always says the same thing: document critical systems, cross-train the team, reduce single points of failure.
Six months later, a new kingdom is under construction. Different person, different system, same incentives.
So let me bring it back to you. Remember that number I asked you to hold at the beginning? The number of people who truly understand your most critical system?
If that number is one… you already know who your Marco is.
And before you blame them for not documenting their way out of a job, ask yourself honestly: would you?
Would you spend your evenings writing runbooks so thorough that your company could replace you by Friday? In a company that has never, not once, promoted anyone for being replaceable?