A new study benchmarking context files across four coding agents and hundreds of real-world tasks found that auto-generated CLAUDE.md files hurt performance and inflate costs. Here’s what to do instead.
Run /init. Watch Claude Code analyze your repo, enumerate every directory, catalog your build tools, and produce a sprawling CLAUDE.md full of information it could have just read from package.json. Commit it. Feel productive. Ship worse code.
That, at least, is the surprising conclusion of “Evaluating AGENTS.md”, a February 2026 paper from ETH Zurich and LogicStar.ai. The researchers built a new benchmark called AgentBench — 138 real GitHub issues across 12 repositories that already contain developer-written context files — and ran four major coding agents against it in three configurations: no context file at all, an LLM-generated context file (the /init approach), and the developer’s own hand-written context file.
The results should make anyone who cargo-culted a 600-line CLAUDE.md stop and reconsider.
The Numbers Don’t Lie
Across Claude Code with Sonnet 4.5, Codex with GPT-5.2 and GPT-5.1 Mini, and Qwen Code with Qwen3-30B, LLM-generated context files reduced task success rates by an average of 2–3% while increasing inference costs by over 20%. That’s right: the agent spent more money to produce worse results.
| Setting | Avg. Resolution Rate | Cost Change | Extra Steps |
|---|---|---|---|
| No context file | Baseline | — | — |
LLM-generated (/init) | −2–3% | +20–23% | +2.5–3.9 |
| Developer-written (minimal) | +4% avg. | ~+19% | +3.3 |
Developer-written context files did marginally better than nothing — a 4% average improvement — but they still increased cost and step count. The only setting that was consistently, unambiguously better was having no context file at all from a cost-efficiency standpoint.
“Unnecessary requirements from context files make tasks harder, and human-written context files should describe only minimal requirements.” — Gloaguen et al., “Evaluating AGENTS.md,” 2026
Why /init Makes Things Worse
The paper identifies a core problem: auto-generated context files are almost entirely redundant with documentation the agent can already discover. The researchers found that 100% of Sonnet-4.5-generated context files contained codebase overviews, and 95–99% of files generated by other models did too. These overviews describe directory structures, explain what src/ contains, and catalog build tools — all things the agent would find in seconds by running ls and reading README.md.
Worse, the study measured how quickly agents found the files relevant to an issue, and discovered that context files didn’t help at all. Agents with context files took the same or more steps to reach the right files as agents without them. In some cases, agents wasted steps hunting for the context file itself, reading it multiple times despite it already being in their context window.
The behavioral analysis is damning. When context files were present, agents ran more tests, grepped more files, read more files, and invoked more repository-specific tooling. The agents were following the instructions — dutifully using uv when told to, running pytest with specific flags when instructed — but all that additional work didn’t translate into solving more tasks. It just burned tokens. GPT-5.2’s reasoning token usage jumped 22% with LLM-generated context files. The agent was literally thinking harder about instructions that weren’t helping it.
Key finding: When the researchers removed all documentation from the repos (READMEs, docs folders, example code) and left only the context file, LLM-generated files finally outperformed having nothing. This confirms that
/initessentially parrots back existing documentation. If your repo already has a README, the auto-generated CLAUDE.md is noise.
The Exact Opposite of /init
So what should you actually put in a CLAUDE.md? The answer follows from the research: only things the agent cannot easily glean by reading the repo itself.
Your CLAUDE.md should be a tiny pointer file. A few lines of orientation, followed by links to deeper documents that the agent can pull in on demand when they’re relevant. Think of it as a lobby directory, not an encyclopedia.
What /init generates (don’t do this)
# Project Overview
This is a Python web application using
FastAPI with SQLAlchemy ORM...
## Directory Structure
- src/ — Main application code
- src/api/ — API route handlers
- src/models/ — Database models
- src/services/ — Business logic
- tests/ — Test suite
- docs/ — Documentation
## Tech Stack
- Python 3.12
- FastAPI 0.109
- SQLAlchemy 2.0
- PostgreSQL
- Redis for caching
- pytest for testing
## Build & Test
- pip install -e ".[dev]"
- pytest tests/
- ruff check src/
## Code Style
- Follow PEP 8
- Use type hints
- Docstrings for public functions
(... 200 more lines ...)
What you should write instead
# Acme API
Test: `make test`
Lint: `make lint`
Single test: `pytest tests/path -k name`
## Non-Obvious Rules
- Migrations: never edit models without `make migration`
- Auth uses custom HMAC, not JWT. Read docs/SSD.md §4 first.
- Module X and Module Y must not cross-import.
## Key Documents
- System design & architecture: docs/SSD.md
- Deployment & environment setup: docs/deploy.md
- Third-party API quirks: docs/integrations.md
The first example describes things Claude can discover in five seconds. The second tells it things it would never guess: the non-obvious migration workflow, the custom auth scheme that looks like JWT but isn’t, and where to find the document that explains the system’s actual design intent.
Link Out to Contextual Documents
The philosophy is progressive disclosure. Your CLAUDE.md stays lean — maybe 30 lines — and links to richer documents that the agent reads only when it’s working in that area. Those linked documents should each contain knowledge that isn’t self-evident from reading the code.
The kind of documents worth linking
Not every markdown file in your docs/ folder is worth pointing to. The test is simple: could a competent developer figure this out by reading the source code and existing documentation? If yes, don’t link it. If no, it’s a candidate.
Useful linked documents tend to fall into a few categories. Workflow documents explain non-obvious processes: “we deploy via a Slack bot, not CI,” or “database changes require a review from the DBA team before merging.” Decision records capture why the code is the way it is — the rejected alternatives, the constraints that shaped the design. API integration guides describe quirks and rate limits of third-party services that the code doesn’t make obvious.
But the single most valuable linked document — the one that gives a coding agent the deepest leverage — is a Software System Design document.
The SSD: Your Agent’s Architectural Brain
An SSD (Software System Design, sometimes called SDD) is the document that bridges intent and implementation. It describes what the system does, why it’s structured the way it is, how the components relate, and what the constraints are. It’s the document you’d write before building the system if you were being disciplined about it.
For a coding agent, the SSD is transformative for a simple reason: code tells you what is, but not what should be. When Claude reads your FastAPI route handlers, it can see the current request/response shapes, the middleware chain, the database queries. What it cannot see is that the auth service is intentionally stateless because you plan to deploy it at the edge. Or that the event system uses eventual consistency because the team evaluated and rejected strong consistency after load testing. Or that module X and module Y must never import from each other because of a planned future extraction into a separate service.
Without an SSD, the agent will cheerfully introduce a circular dependency, add state to the auth service, or use synchronous calls where you need async events — and its solution will pass all your tests while violating your architecture.
What belongs in an SSD
- System purpose and scope — the one-paragraph answer to “what is this and who is it for?”
- Architectural decisions and constraints — the choices that shaped the system and the alternatives you rejected. This is the most valuable section for an agent.
- Component responsibilities and boundaries — what each major module owns, and critically, what it must not own.
- Data flow and integration patterns — how data moves through the system, especially non-obvious paths like event buses, caches, or async queues.
- Security model — authentication, authorization, data handling constraints.
- Known traps — the things that look simple but aren’t. Every codebase has these.
The SSD doesn’t need to be a 40-page IEEE-standard document. A focused 2–4 page markdown file that a senior engineer would write before handing a system to a new team member is exactly right. It should be a living document — updated when architectural decisions change, not when variable names change.
When you link to it from CLAUDE.md with a line like For system architecture and design rationale, see docs/SSD.md, you’re giving the agent something it cannot get any other way: your intent.
A Practical Structure
Here’s the model I’d recommend, drawn from the research and from what I’ve found working with Claude Code on my own projects:
# Project Name
One-sentence description of the project.
## Commands
Test: `make test`
Lint: `make lint`
Type check: `make typecheck`
Single test: `pytest tests/path.py -k test_name`
## Non-Obvious Rules
- Migrations: never edit models without `make migration`
- Auth uses custom HMAC, not JWT. Read docs/SSD.md §4 first.
- Module X and Module Y must not cross-import.
## Key Documents
- System design & architecture: docs/SSD.md
- Deployment & environment setup: docs/deploy.md
- Third-party API quirks: docs/integrations.md
That’s it. No directory tree. No list of what’s in src/. No explanation of what Python is. Every line either tells the agent something non-obvious or points to a document that does.
What not to include
Anything the agent can learn by reading the code. This includes: your tech stack (it’s in your dependency files), your directory structure (it can run ls), your code style (it can read existing files and match the pattern), your API endpoints (they’re defined in the code), and general programming best practices (it already knows those). Every one of these categories appeared in the /init-generated files that the ETH Zurich study found to be counterproductive.
The Takeaway
The research is clear: auto-generated context files are redundant documentation that costs you money and focus. Human-written context files help only when they’re minimal — describing requirements the agent can’t discover on its own.
Your CLAUDE.md should be a short, hand-crafted file that links out to deeper contextual documents. The most important of those linked documents is an SSD — a Software System Design document that captures your architectural intent, constraints, and decisions. The SSD gives the agent something no amount of code-reading can provide: an understanding of why the system is the way it is and what invariants must be preserved.
Delete your /init output. Write 30 lines by hand. Write an SSD. Link to it. Watch your agent do better work for less money.
The exact opposite of /init is thinking carefully about what only you know — and writing that down.
References
- Gloaguen, T., Mündler, N., Müller, M., Raychev, V., & Vechev, M. (2026). “Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?” arXiv preprint arXiv:2602.11988.
- Anthropic. (2025). “Best Practices for Claude Code.” code.claude.com.
- AGENTS.md. (2025). “A simple, open format for guiding coding agents.” agents.md.