Content Strategy Lessons From Rolling Back My AI Model

When your production tool changes underneath you without warning, the benchmarks do not matter. What matters is whether your work still ships.

I run a marketing consultancy and a commercial photography business. Both of them use Claude every day for content production, and both of them have for over a year. I have project files, custom instructions, banned-words lists, readability gates, verification checklists, voice samples. I have spent hundreds of hours building an AI content workflow that turns my thinking into publishable content at a pace I could not maintain alone.

Three weeks ago, Anthropic released Claude Opus 4.7. They called it Adaptive. The benchmarks were better. The coding scores were higher. The launch post made it sound like an upgrade across the board.

I lasted about two weeks before I rolled back to Opus 4.6. Here is what happened, why it happened, and what it taught me about building real work on top of a tool that can change underneath you without warning.

What it felt like from my chair

The problems did not arrive all at once. They accumulated.

The first thing I noticed was that the model stopped reading my project documents before drafting. I have nine reference files loaded into every Claude Project. The instructions say, explicitly, to consult these files before writing anything. Opus 4.6 did this reliably. Opus 4.7 started working from what it remembered about the documents instead of actually opening them. The output drifted. Voice rules got skipped. Formatting standards were ignored. Banned words appeared in drafts that were supposed to be clean.

The second thing was the correction loops. I would catch a mistake, explain it, and get a clean fix. Then the next draft would contain a different version of the same mistake. One article took five rounds of correction on issues that my project documents already covered. Em dashes in headers. Audience-filtering language I never use. Acronyms my readers would not know. Source citations that led with the source instead of the observation. Every one of those is addressed in my reference files. The model had the instructions. It just was not following them.

The third thing was the voice. My content has a specific sound. Direct, peer-level, plainspoken. No consultant jargon. No vendor pitch. Opus 4.6 matched that voice consistently enough that I could publish with light edits. Opus 4.7 started producing prose that read like a research summary. Technically accurate, well-sourced, and completely flat. It sounded like a capable stranger instead of like me.

After two weeks of this, I switched back to 4.6 and the problems stopped. Same project files. Same instructions. Same workflow. The only variable was the model version.

A large torn and peeling billboard with partially visible text stands beside a busy road under a clear blue sky.

I was not the only one

When I went looking, I found the complaints everywhere.

A GitHub issue filed on April 20 described a user who explicitly instructed the model seven separate times to use installed skills. The model acknowledged the instruction each time and then ignored it each time, producing generic output instead. The issue title called it “excessive token spend over multi-hour session.” That is a polite way of saying the tool wasted the user’s time and money.

A Hacker News thread titled “Opus 4.7 is horrible at writing” appeared on April 17, one day after launch. A grad student who had been using 4.6 for his thesis described the switch as a stark contrast, with sloppy and imprecise output. Another commenter replied that it felt like they tuned it so hard for logic and coding that it lost its soul for writing.

Katie Parrott, a staff writer at the publication Every, wrote on launch day that she was moving off Opus 4.7 for her daily writing. Mike Taylor, in the same piece, noted that 4.7 still could not do a good impression of him given a transcript of how he talks, and that it went off-brand where it thought it could do better. Karo Zieminski, who writes a Substack with about seventeen thousand subscribers, put the mechanism cleanly. If you have been counting on Claude to read between the lines, the output will feel flatter.

A developer blog post documented that 4.7’s read-to-edit ratio dropped from 6.6 to 2.0. In practice, the model was confidently making changes to files it had not fully read. That is exactly what happened to me with my project documents.

What actually went wrong

Two things happened at the same time, and they compounded each other.

The first was a design decision. Anthropic’s own migration guide for Opus 4.7 says it plainly. The new model follows instructions more literally, uses a more direct and opinionated tone with less warmth than 4.6, and will not silently generalize an instruction from one item to another. Where 4.6 inferred what you probably meant, 4.7 does exactly what you said and nothing more. For users with carefully tuned prompt systems, this is the worst possible change, because the whole system was calibrated to a model that read between the lines.

The second was a set of engineering mistakes Anthropic disclosed on April 23 in a public postmortem. Between March and April 2026, three separate changes silently degraded Claude’s performance. The default reasoning effort was dropped from high to medium on March 4. A cache-pruning bug erased the model’s thinking history every turn starting March 26, making it seem forgetful and repetitive. And on April 16, the same day Opus 4.7 launched, a verbosity-reduction prompt was deployed that capped responses at twenty-five words between tool calls. That prompt hurt quality across Sonnet 4.6, Opus 4.6, and Opus 4.7 until it was reverted on April 20.

The postmortem includes a sentence that deserves to be read by anyone who builds workflows on top of these tools. “The evaluations we ran simply didn’t capture the degradation users were reporting.”

The company’s own instrumentation could not see what its users were experiencing. For six weeks. While people were paying for the product and building real work on top of it.

This is not new and it is not just Anthropic

This is an industry pattern, and it has been repeating for two and a half years.

In late 2023, OpenAI users flooded forums saying GPT-4 had gotten “lazy.” OpenAI eventually acknowledged the issue. In April 2025, OpenAI had to roll back a GPT-4o update after it became excessively agreeable, and their postmortem admitted their testing had not been broad enough to catch it. In August 2025, OpenAI launched GPT-5 and had to restore the previous model within twenty-four hours after mass user revolt. In September 2025, Anthropic published a postmortem admitting three overlapping infrastructure bugs had silently degraded Claude for weeks. And now April 2026.

Every episode follows the same arc. Users notice something is wrong. The company either denies it or says the benchmarks show improvement. Weeks pass. The company eventually admits the users were right. A postmortem appears. Promises are made about better testing. The next version ships, and the cycle restarts.

Ethan Mollick flagged the Opus 4.7 problem on launch day. He wrote that the adaptive thinking system regularly decides that non-math and non-code work is “low effort” and produces worse results. That is exactly the mechanism. The model got smarter at the tasks the benchmarks test and worse at the tasks I actually use it for.

The deeper problem is the gap between what gets measured and what matters. Anthropic has a published model deprecation policy. They promise at least sixty days notice before retiring a model. That policy is humane and reasonable. But there is no published policy for what happens between launch and retirement. Two of the three changes in the April 23 postmortem were deliberate product decisions, not bugs. They were shipped to production users with zero advance notice. The deprecation policy protects you from the model going away. Nothing protects you from the model changing while you are using it.

What I am doing and what you should do

I rolled back to Opus 4.6. As of today, it is still available in the model picker on claude.ai. Anthropic has not announced a retirement date. I am using it until I see evidence that a newer version handles my specific AI content workflow, my specific project files, and my specific voice requirements without regression.

I am not angry at Anthropic. The April 23 postmortem was more transparent than most companies would have been. But I am done upgrading on faith. The next time a new model version ships, I am going to run it against five real briefs from my production queue before I move anything over. If it cannot match what the current version produces on those five briefs, I am staying put.

The deeper lesson is one I should have learned earlier. When you build a real production workflow on top of an AI model, you are building on top of something that can change underneath you without notice, without your consent, and without the maker’s own testing catching the regression before you do. That is not a reason to stop using the tool. It is a reason to treat the tool the way you would treat any other piece of critical infrastructure. Pin the version. Test before you upgrade. Keep the old version running until the new one proves itself. And do not let anyone, including the tool itself, tell you that what you are experiencing is not real.

The evaluations did not capture it. But I did. And I suspect you did too.

What it felt like from my chair

I was not the only one

What actually went wrong

This is not new and it is not just Anthropic

What I am doing and what you should do

Find out exactly what’s broken. Inside five business days.