{"id":26,"date":"2026-03-03T16:12:02","date_gmt":"2026-03-03T16:12:02","guid":{"rendered":"https:\/\/tidalascent.com\/?p=26"},"modified":"2026-03-03T16:13:02","modified_gmt":"2026-03-03T16:13:02","slug":"stop-stuffing-your-claude-md-the-research-says-less-is-more","status":"publish","type":"post","link":"https:\/\/tidalascent.com\/?p=26","title":{"rendered":"Stop Stuffing Your CLAUDE.md. The Research Says Less Is More."},"content":{"rendered":"\n<p><\/p>\n\n\n\n<p><em>A new study benchmarking context files across four coding agents and hundreds of real-world tasks found that auto-generated CLAUDE.md files hurt performance and inflate costs. Here&#8217;s what to do instead.<\/em><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>Run <code>\/init<\/code>. Watch Claude Code analyze your repo, enumerate every directory, catalog your build tools, and produce a sprawling CLAUDE.md full of information it could have just read from <code>package.json<\/code>. Commit it. Feel productive. Ship worse code.<\/p>\n\n\n\n<p>That, at least, is the surprising conclusion of <a href=\"https:\/\/arxiv.org\/abs\/2602.11988\">&#8220;Evaluating AGENTS.md&#8221;<\/a>, a February 2026 paper from ETH Zurich and LogicStar.ai. The researchers built a new benchmark called AgentBench \u2014 138 real GitHub issues across 12 repositories that already contain developer-written context files \u2014 and ran four major coding agents against it in three configurations: no context file at all, an LLM-generated context file (the <code>\/init<\/code> approach), and the developer&#8217;s own hand-written context file.<\/p>\n\n\n\n<p>The results should make anyone who cargo-culted a 600-line CLAUDE.md stop and reconsider.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The Numbers Don&#8217;t Lie<\/h2>\n\n\n\n<p>Across Claude Code with Sonnet 4.5, Codex with GPT-5.2 and GPT-5.1 Mini, and Qwen Code with Qwen3-30B, LLM-generated context files reduced task success rates by an average of 2\u20133% while increasing inference costs by over 20%. That&#8217;s right: the agent spent more money to produce worse results.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Setting<\/th><th>Avg. Resolution Rate<\/th><th>Cost Change<\/th><th>Extra Steps<\/th><\/tr><\/thead><tbody><tr><td>No context file<\/td><td>Baseline<\/td><td>\u2014<\/td><td>\u2014<\/td><\/tr><tr><td>LLM-generated (<code>\/init<\/code>)<\/td><td><strong>\u22122\u20133%<\/strong><\/td><td><strong>+20\u201323%<\/strong><\/td><td><strong>+2.5\u20133.9<\/strong><\/td><\/tr><tr><td>Developer-written (minimal)<\/td><td>+4% avg.<\/td><td>~+19%<\/td><td>+3.3<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Developer-written context files did marginally better than nothing \u2014 a 4% average improvement \u2014 but they still increased cost and step count. The only setting that was consistently, unambiguously better was having <em>no context file at all<\/em> from a cost-efficiency standpoint.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>&#8220;Unnecessary requirements from context files make tasks harder, and human-written context files should describe only minimal requirements.&#8221; \u2014 Gloaguen et al., &#8220;Evaluating AGENTS.md,&#8221; 2026<\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\">Why \/init Makes Things Worse<\/h2>\n\n\n\n<p>The paper identifies a core problem: auto-generated context files are almost entirely redundant with documentation the agent can already discover. The researchers found that 100% of Sonnet-4.5-generated context files contained codebase overviews, and 95\u201399% of files generated by other models did too. These overviews describe directory structures, explain what <code>src\/<\/code> contains, and catalog build tools \u2014 all things the agent would find in seconds by running <code>ls<\/code> and reading <code>README.md<\/code>.<\/p>\n\n\n\n<p>Worse, the study measured how quickly agents found the files relevant to an issue, and discovered that context files didn&#8217;t help at all. Agents with context files took the <em>same or more<\/em> steps to reach the right files as agents without them. In some cases, agents wasted steps hunting for the context file itself, reading it multiple times despite it already being in their context window.<\/p>\n\n\n\n<p>The behavioral analysis is damning. When context files were present, agents ran more tests, grepped more files, read more files, and invoked more repository-specific tooling. The agents were following the instructions \u2014 dutifully using <code>uv<\/code> when told to, running <code>pytest<\/code> with specific flags when instructed \u2014 but all that additional work didn&#8217;t translate into solving more tasks. It just burned tokens. GPT-5.2&#8217;s reasoning token usage jumped 22% with LLM-generated context files. The agent was literally thinking harder about instructions that weren&#8217;t helping it.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Key finding:<\/strong> When the researchers removed <em>all<\/em> documentation from the repos (READMEs, docs folders, example code) and left only the context file, LLM-generated files finally outperformed having nothing. This confirms that <code>\/init<\/code> essentially parrots back existing documentation. If your repo already has a README, the auto-generated CLAUDE.md is noise.<\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\">The Exact Opposite of \/init<\/h2>\n\n\n\n<p>So what should you actually put in a CLAUDE.md? The answer follows from the research: <strong>only things the agent cannot easily glean by reading the repo itself.<\/strong><\/p>\n\n\n\n<p>Your CLAUDE.md should be a <em>tiny pointer file<\/em>. A few lines of orientation, followed by links to deeper documents that the agent can pull in on demand when they&#8217;re relevant. Think of it as a lobby directory, not an encyclopedia.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What \/init generates (don&#8217;t do this)<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code># Project Overview\nThis is a Python web application using\nFastAPI with SQLAlchemy ORM...\n\n## Directory Structure\n- src\/ \u2014 Main application code\n- src\/api\/ \u2014 API route handlers\n- src\/models\/ \u2014 Database models\n- src\/services\/ \u2014 Business logic\n- tests\/ \u2014 Test suite\n- docs\/ \u2014 Documentation\n\n## Tech Stack\n- Python 3.12\n- FastAPI 0.109\n- SQLAlchemy 2.0\n- PostgreSQL\n- Redis for caching\n- pytest for testing\n\n## Build &amp; Test\n- pip install -e \".&#91;dev]\"\n- pytest tests\/\n- ruff check src\/\n\n## Code Style\n- Follow PEP 8\n- Use type hints\n- Docstrings for public functions\n\n(... 200 more lines ...)<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">What you should write instead<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code># Acme API\n\nTest: `make test`\nLint: `make lint`\nSingle test: `pytest tests\/path -k name`\n\n## Non-Obvious Rules\n- Migrations: never edit models without `make migration`\n- Auth uses custom HMAC, not JWT. Read docs\/SSD.md \u00a74 first.\n- Module X and Module Y must not cross-import.\n\n## Key Documents\n- System design &amp; architecture: docs\/SSD.md\n- Deployment &amp; environment setup: docs\/deploy.md\n- Third-party API quirks: docs\/integrations.md<\/code><\/pre>\n\n\n\n<p>The first example describes things Claude can discover in five seconds. The second tells it things it would <em>never guess<\/em>: the non-obvious migration workflow, the custom auth scheme that looks like JWT but isn&#8217;t, and where to find the document that explains the system&#8217;s actual design intent.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Link Out to Contextual Documents<\/h2>\n\n\n\n<p>The philosophy is progressive disclosure. Your CLAUDE.md stays lean \u2014 maybe 30 lines \u2014 and links to richer documents that the agent reads <em>only when it&#8217;s working in that area<\/em>. Those linked documents should each contain knowledge that isn&#8217;t self-evident from reading the code.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The kind of documents worth linking<\/h3>\n\n\n\n<p>Not every markdown file in your <code>docs\/<\/code> folder is worth pointing to. The test is simple: <em>could a competent developer figure this out by reading the source code and existing documentation?<\/em> If yes, don&#8217;t link it. If no, it&#8217;s a candidate.<\/p>\n\n\n\n<p>Useful linked documents tend to fall into a few categories. Workflow documents explain non-obvious processes: &#8220;we deploy via a Slack bot, not CI,&#8221; or &#8220;database changes require a review from the DBA team before merging.&#8221; Decision records capture <em>why<\/em> the code is the way it is \u2014 the rejected alternatives, the constraints that shaped the design. API integration guides describe quirks and rate limits of third-party services that the code doesn&#8217;t make obvious.<\/p>\n\n\n\n<p>But the single most valuable linked document \u2014 the one that gives a coding agent the deepest leverage \u2014 is a <strong>Software System Design document<\/strong>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The SSD: Your Agent&#8217;s Architectural Brain<\/h2>\n\n\n\n<p>An SSD (Software System Design, sometimes called SDD) is the document that bridges <em>intent<\/em> and <em>implementation<\/em>. It describes what the system does, why it&#8217;s structured the way it is, how the components relate, and what the constraints are. It&#8217;s the document you&#8217;d write before building the system if you were being disciplined about it.<\/p>\n\n\n\n<p>For a coding agent, the SSD is transformative for a simple reason: code tells you <em>what is<\/em>, but not <em>what should be<\/em>. When Claude reads your FastAPI route handlers, it can see the current request\/response shapes, the middleware chain, the database queries. What it cannot see is that the auth service is intentionally stateless because you plan to deploy it at the edge. Or that the event system uses eventual consistency because the team evaluated and rejected strong consistency after load testing. Or that module X and module Y must never import from each other because of a planned future extraction into a separate service.<\/p>\n\n\n\n<p>Without an SSD, the agent will cheerfully introduce a circular dependency, add state to the auth service, or use synchronous calls where you need async events \u2014 and its solution will pass all your tests while violating your architecture.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What belongs in an SSD<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>System purpose and scope<\/strong> \u2014 the one-paragraph answer to &#8220;what is this and who is it for?&#8221;<\/li>\n\n\n\n<li><strong>Architectural decisions and constraints<\/strong> \u2014 the choices that shaped the system and the alternatives you rejected. This is the most valuable section for an agent.<\/li>\n\n\n\n<li><strong>Component responsibilities and boundaries<\/strong> \u2014 what each major module owns, and critically, what it must <em>not<\/em> own.<\/li>\n\n\n\n<li><strong>Data flow and integration patterns<\/strong> \u2014 how data moves through the system, especially non-obvious paths like event buses, caches, or async queues.<\/li>\n\n\n\n<li><strong>Security model<\/strong> \u2014 authentication, authorization, data handling constraints.<\/li>\n\n\n\n<li><strong>Known traps<\/strong> \u2014 the things that look simple but aren&#8217;t. Every codebase has these.<\/li>\n<\/ul>\n\n\n\n<p>The SSD doesn&#8217;t need to be a 40-page IEEE-standard document. A focused 2\u20134 page markdown file that a senior engineer would write before handing a system to a new team member is exactly right. It should be a living document \u2014 updated when architectural decisions change, not when variable names change.<\/p>\n\n\n\n<p>When you link to it from CLAUDE.md with a line like <code>For system architecture and design rationale, see docs\/SSD.md<\/code>, you&#8217;re giving the agent something it cannot get any other way: your <em>intent<\/em>.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">A Practical Structure<\/h2>\n\n\n\n<p>Here&#8217;s the model I&#8217;d recommend, drawn from the research and from what I&#8217;ve found working with Claude Code on my own projects:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Project Name\n\nOne-sentence description of the project.\n\n## Commands\nTest: `make test`\nLint: `make lint`\nType check: `make typecheck`\nSingle test: `pytest tests\/path.py -k test_name`\n\n## Non-Obvious Rules\n- Migrations: never edit models without `make migration`\n- Auth uses custom HMAC, not JWT. Read docs\/SSD.md \u00a74 first.\n- Module X and Module Y must not cross-import.\n\n## Key Documents\n- System design &amp; architecture: docs\/SSD.md\n- Deployment &amp; environment setup: docs\/deploy.md\n- Third-party API quirks: docs\/integrations.md<\/code><\/pre>\n\n\n\n<p>That&#8217;s it. No directory tree. No list of what&#8217;s in&nbsp;<code>src\/<\/code>. No explanation of what Python is. Every line either tells the agent something non-obvious or points to a document that does.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What&nbsp;<em>not<\/em>&nbsp;to include<\/h3>\n\n\n\n<p>Anything the agent can learn by reading the code. This includes: your tech stack (it&#8217;s in your dependency files), your directory structure (it can run&nbsp;<code>ls<\/code>), your code style (it can read existing files and match the pattern), your API endpoints (they&#8217;re defined in the code), and general programming best practices (it already knows those). Every one of these categories appeared in the&nbsp;<code>\/init<\/code>-generated files that the ETH Zurich study found to be counterproductive.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">The Takeaway<\/h2>\n\n\n\n<p>The research is clear: auto-generated context files are redundant documentation that costs you money and focus. Human-written context files help only when they&#8217;re minimal \u2014 describing requirements the agent can&#8217;t discover on its own.<\/p>\n\n\n\n<p>Your CLAUDE.md should be a short, hand-crafted file that links out to deeper contextual documents. The most important of those linked documents is an SSD \u2014 a Software System Design document that captures your architectural intent, constraints, and decisions. The SSD gives the agent something no amount of code-reading can provide: an understanding of <em>why<\/em> the system is the way it is and what invariants must be preserved.<\/p>\n\n\n\n<p>Delete your <code>\/init<\/code> output. Write 30 lines by hand. Write an SSD. Link to it. Watch your agent do better work for less money.<\/p>\n\n\n\n<p>The exact opposite of <code>\/init<\/code> is thinking carefully about what only you know \u2014 and writing that down.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">References<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Gloaguen, T., M\u00fcndler, N., M\u00fcller, M., Raychev, V., &amp; Vechev, M. (2026). &#8220;Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?&#8221; <em>arXiv preprint<\/em> <a href=\"https:\/\/arxiv.org\/abs\/2602.11988\">arXiv:2602.11988<\/a>.<\/li>\n\n\n\n<li>Anthropic. (2025). &#8220;Best Practices for Claude Code.&#8221; <a href=\"https:\/\/code.claude.com\/docs\/en\/best-practices\">code.claude.com<\/a>.<\/li>\n\n\n\n<li>AGENTS.md. (2025). &#8220;A simple, open format for guiding coding agents.&#8221; <a href=\"https:\/\/agents.md\/\">agents.md<\/a>.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>A new study benchmarking context files across four coding agents and hundreds of real-world tasks found that auto-generated CLAUDE.md files hurt performance and inflate costs. Here&#8217;s what to do instead. Run \/init. Watch Claude Code analyze your repo, enumerate every directory, catalog your build tools, and produce a sprawling CLAUDE.md full of information it could [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-26","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/tidalascent.com\/index.php?rest_route=\/wp\/v2\/posts\/26","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/tidalascent.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/tidalascent.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/tidalascent.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/tidalascent.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=26"}],"version-history":[{"count":2,"href":"https:\/\/tidalascent.com\/index.php?rest_route=\/wp\/v2\/posts\/26\/revisions"}],"predecessor-version":[{"id":28,"href":"https:\/\/tidalascent.com\/index.php?rest_route=\/wp\/v2\/posts\/26\/revisions\/28"}],"wp:attachment":[{"href":"https:\/\/tidalascent.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=26"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/tidalascent.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=26"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/tidalascent.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=26"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}