Three days ago, Anthropic shipped Opus 4.7. Within 48 hours, developers across Discord, Reddit, and Hacker News reported the same bewildering pattern: carefully tuned prompts that had worked for months started producing slightly wrong outputs. Not hallucinations. Not refusals. Just... different.
The culprit wasn't a regression. It was an upgrade. Opus 4.7 does something no previous Claude model did this consistently: it reads your prompt and does exactly what you asked. Not what you meant. Not what a reasonable person might infer from context. What you actually typed.
The Literalism Shift
Previous Claude models were forgiving. Write "try to extract all email addresses from the document if possible" and Opus 4.6 would cheerfully extract emails, ignoring the hedging language. It understood your intent behind the vague phrasing.
Opus 4.7 sees that same prompt differently. "Try to"? It'll try — and might stop trying if it encounters ambiguity. "If possible"? It evaluates whether extraction is feasible before committing. The hedges you sprinkled in for politeness are now parsed as conditional logic.
A real before/after:
Vague (worked fine on 4.6, unpredictable on 4.7):
Try to extract all email addresses from the document if possible.
Also maybe include phone numbers.
Precise (works on both, but 4.7 actually requires it):
Extract every email address in the document. Return a JSON array.
If none exist, return an empty array. Also extract all phone numbers
in the same format.
The second version was always the better prompt. On 4.6, the first one worked well enough that nobody bothered rewriting it. That safety net is gone.
Three Patterns That Break
The literal interpretation surfaces in three recurring ways I've seen since the release.
Quantified instructions get followed to the letter. "Write exactly 3 functions" on 4.6 might produce 4 if the model decided a helper was useful. On 4.7, you get 3. Period. Even when 4 would be more elegant. If you want "at least 3," write "at least 3."
Format instructions are absolute. "Respond in JSON" used to sometimes produce a short prose preamble before the actual JSON block — a friendly "Here's the data:" before the curly brace. On 4.7 you get raw JSON and nothing else. Excellent for API pipelines. Catastrophic for any downstream parser that expected that preamble.
Scope stays exactly where you drew it. Tell 4.7 to "fix the bug in the login function" and it fixes the bug in the login function. It will not refactor the adjacent validation code it spotted while it was in there. Opus 4.6 would have "helpfully" touched neighboring code. Whether this is a feature or a regression depends entirely on your workflow — but either way, it's a prompt-level decision now, not a model-level default.
The API Surface Got Smaller
Alongside the behavioral shift, Anthropic removed several API parameters that prompt engineers relied on as tuning knobs.
temperature, top_p, and top_k all return 400 errors at non-default values. You're supposed to use prompting to shape variation now — ironic, given everything above about prompts being taken literally.
Assistant prefills are also gone. You can no longer inject a fake assistant turn like "Sure, here's how:" to steer the response format. Use structured outputs or output_config.format instead. This also quietly killed the "sockpuppeting" jailbreak that was circulating last week, where attackers injected compliant prefills to bypass safety filters across 11 different models. Nice security side-effect.
Extended thinking budgets (budget_tokens) got replaced by effort levels — a single effort parameter in output_config that ranges from low to max, with a new xhigh tier designed for agentic coding. The model now decides how much to reason based on that dial rather than a raw token count.
# Old (4.6)
thinking={"type": "enabled", "budget_tokens": 32000}
# New (4.7)
thinking={"type": "adaptive"}
output_config={"effort": "xhigh"}
One more catch: the new tokenizer uses 1–1.35x more tokens for the same text. If your max_tokens was tuned tight, bump it by at least 20%. Anthropic recommends starting at 64k for xhigh effort and tuning down.
The Prompt Engineer's Actual Migration
Forget the official checklist for a minute. Here's what matters in practice:
Grep your system prompts for hedging language. "Try to," "if possible," "you might want to," "consider doing" — all of these are conditional instructions now instead of polite suggestions. Rewrite them as direct imperatives or delete them.
Remove all reasoning scaffolding. "Think step by step," "plan before acting," "reason carefully" — these are redundant. The effort parameter handles this at the API level. You're paying tokens for instructions the model already follows.
Kill forced status messages. "After each tool call, summarize what you did" — unnecessary. Opus 4.7 provides progress updates natively in agentic traces. Removing this saves tokens and avoids the model producing stilted, formulaic summaries.
Test your edge cases at low effort. The literalism is most aggressive at low and medium effort. If you have prompts that need to work across effort levels, the lowest setting is your compatibility floor.
What This Actually Means
What Opus 4.7 exposed is how much of "prompt engineering" was actually model coddling. We wrote vague instructions and counted on the model to divine our intent. We padded prompts with reasoning scaffolding because older models couldn't reason well on their own. We fiddled with temperature because the API let us.
That entire surface area is shrinking. The model is smarter, so it needs less hand-holding. But because it's smarter, it also takes you at your word. Say something ambiguous and you get an ambiguous result — delivered with perfect confidence.
Every model generation will be more literal, more capable, and less forgiving of sloppy instructions. The prompts that survive are the ones that were precise all along. The ones riding on model charity will keep breaking with every upgrade.
Write what you mean. The model finally does.