When Your AI Agent Breaks, You Don't Have a Backup Plan
- 7 min read
You built an AI agent to handle your customer support queue. It’s working great. Saves your team 10 hours a week. Then one Tuesday morning, it starts giving refund advice to customers who didn’t ask for refunds.
Your support team didn’t notice for an hour. By the time they did, you had a Slack fire and an angry customer on Twitter.
This isn’t a hypothetical. I’ve watched this exact scenario play out three times in the last six months - once at a client company, twice at Jetpack Labs.
Here’s the thing nobody talks about with AI agents: they’re like hiring a really smart intern who never sleeps. Except this intern sometimes hallucinates. Sometimes gets confused by edge cases. Sometimes decides that the best way to solve a problem is something nobody trained them to do.
And unlike your human intern, you can’t just ask them to double-check their work before hitting send.
The Gap Between “Works” and “Reliable”
Most founders approach AI agents the same way they approach hiring a developer. You define what you want done, you build the thing, and then you assume it works.
That’s fine for a feature in your web app. Users find bugs, you fix them, iterate. Low consequence.
It’s catastrophic for an AI agent touching customer-facing operations.
Here’s what I’ve learned: there’s a massive gap between “the AI agent produces reasonable output” and “the AI agent is safe to deploy unsupervised.” Most teams skip the hard work that lives in between.
The hard work looks like this:
First: You need to know when it’s failing. Not when it fails completely (that’s obvious). When it’s producing output that looks correct but isn’t. When it’s giving subtly wrong answers. When it’s been prompted in a direction you didn’t anticipate.
At Jetpack, we use Claude to generate automated standups from meeting notes. The agent works great 95% of the time. But sometimes it misses important context, or it summarizes a decision that didn’t actually get decided. We had to build a system where every standup gets human review before it posts. That review takes 30 seconds. But those 30 seconds catch the 5% where something went sideways.
That’s not AI. That’s the human overlay that makes AI safe.
Second: You need graceful degradation. What happens when the AI agent stops working? Not permanently - I mean, what’s your move for the next 8 hours while you’re scrambling to figure out why it’s broken?
Most teams don’t have one. They have an alert. Then panic. Then whoever built the system three months ago (and has since moved to another project) has to drop everything.
The teams doing this right have a fallback. Maybe it’s a manual process that’s slower but works. Maybe it’s reverting to rule-based logic for certain categories of requests. Maybe it’s routing to a human faster than usual. But there’s a plan for “the agent isn’t running.”
Third: You need constraints that actually hold. Not constraints in the prompt (those are suggestions). Constraints in the architecture.
If your AI agent can theoretically send emails, refund money, or delete data, you’re not actually running an agent. You’re running a lottery. I’ve seen teams give their customer service AI access to refund APIs because “it only refunds legitimate cases.” That’s not how this works. The agent doesn’t know what “legitimate” means if you haven’t built guardrails for it.
The best safeguard is structural: the AI agent doesn’t have the ability to do the dangerous thing in the first place. It can draft a refund message. But sending the refund? That requires a human approval, or it goes to a review queue, or it’s behind a daily spending cap that the human team controls.
That sounds annoying. Until you’re explaining to a customer why their account got refunded without their permission.
Why This Matters for Your Hiring
Here’s where this gets interesting for founders thinking about their technical team.
If you’re planning to lean on AI to make your existing team more efficient, that’s solid. AI can amplify a good team’s output.
But if you’re relying on AI to replace judgment and oversight, you’re building on sand.
The teams that are winning with AI agents aren’t shipping more volume. They’re running smaller teams that ship volume with fewer catastrophes. That’s a systems and architecture question. Not an AI question.
It requires someone who understands both the AI capability and the failure modes - both what the agent can do and what happens when it does the wrong thing.
That person probably doesn’t exist on your team yet. And you probably can’t afford to hire them as a full-time role.
But you need them in the room when you’re building AI systems that touch anything that matters - customer data, money, operations, any of it.
The Real Cost of Skipping This
Let me be direct about what happens when you skip the hard work:
You get a system that works great until it doesn’t. Then you’re in firefighting mode, trying to figure out if it’s a prompt problem, a model problem, a context-length problem, a hallucination problem. Your team is paralyzed because they can’t trust the output anymore.
Then you shut it down, manually process the backlog, and spend the next sprint rebuilding it “correctly.”
That’s a three-week detour that cost you the productivity gains from the previous six months.
The teams I know who are running AI in production without that loop have one thing in common: they treat the AI agent like a junior person, not like a software feature.
Junior people need:
- Supervision
- Escalation paths
- Clear boundaries on what they can and can’t do
- Feedback loops
Sound familiar? That’s not a feature roadmap. That’s organizational design.
How to Actually Do This
If you’re running an AI agent right now, here’s the minimum:
-
Audit what it can touch. What data can it access? What actions can it take? What customer-facing operations does it control? If the answer is “a lot,” you need more oversight. Not as a suggestion. As a system.
-
Define your failure mode. What does it look like when the agent is wrong? Not broken - wrong. What’s your detection mechanism? Who notices first? How fast can you shut it down?
-
Build the human layer. For anything customer-facing, build a review step, an approval queue, or a daily audit. If that sounds expensive, you haven’t priced out the cost of an agent going rogue in production.
-
Test the bad cases. Don’t just test the happy path. Test the agent when it’s confused. Test it with adversarial input. Test it when the context is ambiguous. See what breaks.
-
Have a fallback. What happens when the agent is offline? If the answer is “we’re stuck,” you haven’t actually automated anything. You’ve created a new single point of failure.
This isn’t sexy. It’s not the “AI-powered this” pitch you see on TechCrunch. But it’s the difference between an agent that creates value and an agent that creates liability.
The Founder Opportunity
Here’s the contrarian bit: most of the hype around AI agents is centered on replacing humans or achieving full autonomy.
The real value right now is much smaller and much more practical.
It’s building systems where an AI agent handles the repetitive part, a human handles the judgment part, and together they move faster than either could alone.
That’s not revolutionary. But it works. And it’s way less likely to destroy your customer relationships.
The founders who get this - who invest in the boring infrastructure that makes AI safe instead of just implementing whatever the latest tool promises - they’re going to win.
Everyone else will have a great story about the time their AI agent almost refunded the entire customer base.
Schedule a call if you’re building with AI and want to talk through your safety architecture. These conversations are worth having early.