Context Engineering for Claude Code: Why Most Agent Failures Are Context Failures

When Claude Code does something wrong - uses the wrong pattern, ignores a constraint, makes an obviously bad architectural decision - there are two possible explanations. Either the model doesn’t have the capability to do better, or the model doesn’t have the information it needed.

In my experience, it’s almost always the second one.

Tobi Lütke, Shopify’s CEO, articulated this in June 2025 when he described what he’s now calling context engineering: “the art of providing all the context for the task to be plausibly solvable by the LLM.” Phil Schmid at Hugging Face expanded it: “the discipline of designing and building dynamic systems that provide the right information and tools, in the right format, at the right time.”

MIT Technology Review named it a defining 2025 software development term alongside vibe coding. The difference between the two is that vibe coding is about what you ask an AI to do. Context engineering is about what the AI knows when it does it.

For Claude Code specifically, this means every configuration decision - CLAUDE.md, tool definitions, memory management, even how you structure your codebase - is a context engineering decision. Most developers treat these as afterthoughts. Treating them as a system design problem is the shift that makes Claude Code actually reliable.

The Five Layers of Context

Claude Code’s context at any given moment comes from five distinct sources. Understanding them separately is the prerequisite for managing them well.

Layer 1: System prompt / CLAUDE.md (persistent project knowledge)

This is the context that survives session resets. It loads every time Claude Code starts on your project. Everything else in this list is ephemeral - CLAUDE.md is your one guaranteed persistent channel into Claude’s context window.

Most developers use it like a checklist. It should function more like onboarding documentation for a new senior engineer: architectural decisions, the reasoning behind non-obvious choices, what to use where, what to avoid and why.

Layer 2: Conversation history (short-term memory within session)

Everything Claude has seen and said in the current session. This is valuable - Claude remembers what you discussed, what you decided, what it already tried. But it degrades. Long sessions accumulate noise along with signal. Important decisions made early in a session compete for attention with tool outputs and intermediate reasoning generated later.

Layer 3: Tool results (what Claude can observe)

When Claude reads a file, runs a command, or calls an MCP tool, the result goes into context. This is how Claude builds a live picture of your system - not from memory, but from actual observation. The design of your tools and what they return is a context engineering decision.

Layer 4: File contents / codebase (what it can read)

Claude Code can read any file in your project. The structure of that project - naming conventions, co-location of related files, how consistent the patterns are - determines how much useful signal Claude extracts from a read versus how much noise it has to filter.

Layer 5: Memory compaction (what survives session resets)

Claude Code automatically compacts long conversations. When this happens, the detailed history is replaced with a summary. Some things survive compaction well (explicit decisions, key conclusions). Some things don’t (intermediate reasoning, specific tool call details, the exact content of files Claude read).

The Diagnostic Question

When a Claude Code session fails - wrong output, bad decision, ignored constraint - ask this before doing anything else:

Was this a model failure or a context failure?

A model failure means Claude had the information and still got it wrong. That’s a real category but it’s rare. A context failure means one of these things was true:

Claude didn’t have the information (missing context)
Claude had the wrong information (stale or contradictory context)
Claude had the right information but it was ambiguous (underspecified context)

In my experience, roughly 90% of failures fall into one of those three categories. The debugging path for each is different, and none of them involve accepting that Claude just can’t do it.

CLAUDE.md Design Principles That Actually Work

Most CLAUDE.md files I’ve seen commit the same mistakes: they’re too generic, they tell Claude what to do rather than what not to do, and they reference patterns that exist in the codebase without pointing to them.

Negative examples outperform positive examples. “Use the Repository pattern for data access” is easy to ignore. “Do not call Eloquent models directly from controllers - all database access goes through Repository classes in app/Repositories/” is harder to misinterpret. The more specific you are about what you’re preventing, the less drift you get.

Reference files by path, not by description. “Follow the existing component patterns” tells Claude nothing useful - existing patterns may be inconsistent, and Claude has to guess which ones you prefer. “Use src/components/UserCard.tsx as the reference implementation for all new card components” is unambiguous. Claude will read that file and use it as a template.

Document what deviates from framework defaults. If you’re using a framework, Claude knows the framework. What it doesn’t know is where your project diverges from standard usage. Document those deviations explicitly. “We use custom authentication middleware instead of Laravel Sanctum” saves Claude from repeatedly proposing the default.

Write for compaction, not just for humans. Your CLAUDE.md survives compaction - it reloads fresh every session. But decisions you make mid-session don’t survive unless you explicitly preserve them. If you reach an important architectural decision mid-session, write it to memory or update CLAUDE.md before the session ends.

Memory Compaction: Designing for What Survives

Compaction is not a bug. It’s a necessary feature for long sessions, and you can design around it.

What survives compaction reasonably well: agent scratchpad entries, explicit memory writes, conclusions stated clearly at the end of a reasoning chain.

What gets lost: intermediate tool call details, “I tried X and it didn’t work” context, specific file contents Claude read several exchanges ago, nuances buried in long code outputs.

The practical implication: when something important happens mid-session, say it explicitly. “We’ve decided to use optimistic locking for the payment processor because of the race condition we found. This is not negotiable.” That kind of explicit statement has a much higher chance of surviving compaction than the same conclusion reached implicitly through a series of tool calls and reasoning steps.

For decisions that need to persist across sessions entirely, don’t rely on memory compaction at all. Update CLAUDE.md. That’s what it’s for.

Tool Definitions Matter More Than You Think

If you’re using MCP tools or custom tool definitions with Claude Code, the quality of those definitions is a context engineering decision with large downstream effects.

A poorly defined tool creates two problems. First, Claude may not call it when it should - if the description doesn’t clearly indicate when the tool is appropriate, Claude will fall back to other approaches. Second, Claude may misuse it - call it at the wrong time, pass wrong parameters, misinterpret the output.

Good tool descriptions include four things:

What it does - the action it performs
When to use it - the specific conditions that make it the right choice
What not to use it for - the cases that look similar but should use something else
Expected output format - what Claude should expect to receive back

That last one is often omitted. If your tool returns a JSON object with a specific schema, document that schema in the tool description. Claude will parse the output more accurately and will know immediately if something went wrong.

Project Structure as Context

The way your codebase is structured affects how much useful context Claude can extract from it. This isn’t about aesthetics - it’s about signal density.

Consistent naming means Claude can infer what a file does before reading it. Co-located tests mean Claude reads the spec alongside the implementation. Conventional directory structure means Claude can navigate without exploring. Clear separation between layers means Claude doesn’t have to trace through the whole codebase to understand one part of it.

None of this is new advice for human developers. The reason it matters more for Claude Code is that Claude is reading your codebase cold, often reconstructing its understanding of the system from scratch within each session. A well-structured codebase is a better-contextualized agent.

Conversely: a codebase where patterns are inconsistent, where business logic is scattered across layers, where historical decisions are encoded in the structure without any documentation - that’s a codebase that will produce lower-quality AI assistance regardless of how good your CLAUDE.md is. The context is just worse.

A Practical Audit

If you want to improve your Claude Code results without changing anything about the model or your prompts, run through this:

CLAUDE.md audit: Does it include explicit references to specific files? Does it document deviations from framework defaults? Does it have negative examples, not just positive ones? Have you updated it with decisions made in recent sessions?

Session design audit: Are you keeping sessions focused on a single coherent task? Are you explicitly stating important decisions so they survive compaction? Are you starting new sessions when the previous one has accumulated too much noise?

Tool definition audit: For any custom or MCP tools, does each description answer the four questions above? Have you observed Claude using any tool in ways you didn’t intend - and if so, is the description the root cause?

Codebase structure audit: If Claude had to understand this project cold, what would it get wrong? Those are gaps in your context engineering.

The model is not the bottleneck. For most teams using Claude Code today, the bottleneck is context quality - what Claude knows, how clearly it knows it, and whether that knowledge is available at the right time. That’s an engineering problem, and it has engineering solutions.

Context Engineering for Claude Code: Why Most Agent Failures Are Context Failures

Context Engineering for Claude Code: Why Most Agent Failures Are Context Failures

The Five Layers of Context

The Diagnostic Question

CLAUDE.md Design Principles That Actually Work

Memory Compaction: Designing for What Survives

Tool Definitions Matter More Than You Think

Project Structure as Context

A Practical Audit

Let's talk