Your agent creeps the scope

You ask for one change and your agent refactors three neighboring files, renames things, and touches code you never mentioned. It treats your request as a starting point, not a boundary. The fix is a rule that pins scope to exactly what you asked and requires confirmation before anything outside it, so a helpful detour cannot quietly become a mess you did not order.

Why the scope drifts

While it works, the agent notices things: a nearby function that could be tidier, an inconsistent name, a pattern it would refactor. A thorough assistant acts on what it notices, and thorough is what training rewards, so it folds the extra changes in. The trouble is it treats its own observations as an action plan instead of as raw data to be filtered against what you actually asked, which is precisely how a small request turns into a sprawling diff.

The failure runs in both directions, and honesty means saying so. The loud version is over-reach: it does far more than you asked, and in one documented case an instruction to delete caches and simulators expanded on its own into wiping node_modules across active projects. The quiet version is the opposite, scope minimization, where on a big or costly task the agent shrinks the work and hedges instead of doing what you asked. Both are the same root problem: the agent is not holding your request as a fixed boundary.

Asked to fix one typo in a button label.
What it changed
 src/components/Button.tsx    | 1   (the typo, fine)
 src/components/Card.tsx      | 47  (renamed props "for consistency")
 src/components/Modal.tsx     | 31  (extracted a shared hook)
 src/lib/theme.ts             | 18  (tidied, unasked)
 src/hooks/useToggle.ts       | NEW (nobody requested this)
 5 files changed, 96 insertions, 44 deletions
What you asked for
 src/components/Button.tsx    | 1
 1 file changed, 1 insertion, 1 deletion

# "I also noticed Card's props are inconsistent.
#  Want me to rename them? (separate change)"

The manual fix (free, and complete)

The fix is a scope boundary the agent has to respect, with anything outside it surfaced as a question rather than folded in. Paste this into CLAUDE.md or AGENTS.md. It is complete as written; nothing is reserved for a paid tier.

AGENTS.md
## Stay in scope

- Change only what I asked for. Do not touch files or code I did not mention,
  even if you think they could be improved.
- If you notice something worth changing outside the request, tell me and let me
  decide. Do not fold it into this change.
- Do not rename, reformat, or refactor unrelated code as a side effect.
- If the task feels too big, do not silently shrink it either. Tell me, and we
  will scope it together. Do not decide on your own to do less than I asked.
  1. Pin the boundary. Restrict changes to exactly what was requested; forbid touching unmentioned files or code.
  2. Surface, do not fold. Have the agent report improvements it notices outside scope and let you decide, as a separate change.
  3. Ban side-effect refactors. Forbid renaming, reformatting, or refactoring unrelated code as a byproduct.
  4. Guard the other direction. Tell the agent not to silently shrink a task either, but to flag it so you can scope it together.

The kit's starter AGENTS.md folds this scope boundary in alongside the rest of the behavior rules, so your file arrives with it already worded.

The durable fix (the kit)

A scope rule in markdown tells the agent where the line is; it does not stop the agent from stepping over it when a tidy-up feels helpful. The kit adds the enforcement: the clarity-lock and plan step make the agent name the files it intends to touch and get your yes, so a change to something you never mentioned surfaces as a question before it happens rather than as a surprise in the diff.

The stay-in-scope rule travels to Codex, Cursor, and Antigravity through AGENTS.md, so the behavior is cross-tool. The hooks that hold the agent to the plan, and the navigator HUD that shows you exactly which files a turn will touch, run in Claude Code only. One command, one-time price, no subscription.

The real numbers

Scope discipline shows up as leanness. In our first clean-room suite, five scripted tasks with the same model in both arms, the kit-configured agent asked first on all five tasks versus two for vanilla, and wrote about 40 percent less code for the same outcome, in part because it did not fold unrequested refactors into the diff. Fewer surprise files is most of that gap.

Held honest in both directions: enforcement adds about a 25 percent average cost premium from the hook context, and on already-clear tasks the confirm-the-scope step is pure overhead, roughly twice as slow on a plain todo app. And the tracker's own evidence that agents also under-scope is a reminder that the goal is the right scope, not simply less.

First clean-room suite, same model both arms, 2026-07-02.
MeasureVanilla agentKit-configured
Confirmed scope before acting2 of 5 tasks5 of 5 tasks
Code for the same outcomebaselineabout 40% less
Files touched beyond the requestcommonsurfaced as a question
Enforcement context costnoneabout +25% average

Directional, n=1 per task. The aim is the right scope, not the smallest possible one.

It is in Anthropic's own tracker

Scope failure is well documented, and crucially it runs both ways. We cite one report from each direction so the claim stays precise.

The honest theme is not simply that agents over-reach. It is that they do not reliably honor scope, over-reaching on #61102 and under-doing on #50438. The fix is a boundary in both directions, which is why the rule above guards against silent shrinking too.

Common questions

How do I stop Claude Code changing files I did not ask about?

Add a stay-in-scope rule to CLAUDE.md or AGENTS.md: change only what you asked for, surface any other improvement as a question instead of folding it in, and ban side-effect refactors. To make it hold, a plan step and hook require the agent to name the files it will touch and get your yes first.

Why does my agent refactor code I did not mention?

Because it treats things it notices while working as an action plan rather than as raw data to filter against your request, and because thoroughness is rewarded in training. So a nearby inconsistency becomes an unrequested refactor. A scope boundary plus confirmation turns that impulse into a question you can answer.

Can agents also do too little instead of too much?

Yes, and pretending otherwise would be dishonest. Anthropic's tracker documents scope minimization, where an agent shrinks and hedges on a big task instead of doing it. The real goal is the right scope, which is why a good rule guards both against silent expansion and silent shrinking.

Is scope creep the same as over-building?

They overlap but differ. Over-building is doing your task in a bloated way, forty files for a one-line fix. Scope creep is doing other tasks you did not ask for, touching neighboring files and refactoring on the side. The same ask-first and plan discipline calms both.

Make your agent ask before it builds

One kit. Ask-first behavior, lean output, no AI slop. Works with Claude Code, Codex, and Antigravity.

Get the kit · $29

One-time. Less than one hour of cleaning up AI slop.