When Better Feels Worse: The ChatGPT 5 Transition

When GPT-5 was released a couple of days ago, my experience was diametrically opposite from the outrage echoing through the Twittersphere (X-sphere?) over the sunsetting of ChatGPT-4o and the forced adoption of ChatGPT-5. For me, GPT-5 as a thought partner was far better than 4o; the writing that I asked it to work on for an upcoming scientific perspective manuscript was much better organized and cited, and my Biostate AI team reported that GPT-5 significantly beat 4o, Gemini 2.5 Pro, and Claude Opus 4.1 on BixBench metrics. Although GPT-5 was sometimes a bit overeager to suggest follow-up studies, to me it was a clear monotonic improvement over 4o, and I was thus genuinely confused by the reactions from others. Reflexively, I discussed with GPT-5 the potential causes of this phenomenon, and then we meandered into historical analogies to this apparent mass rebellion against what appears to be, in every measurable way, a better AI. This blog post, co-authored with GPT-5, is the summary of that discussion.

X posts from people demanding that OpenAI “bring back 4o,” often in all caps, conveyed a tinge of personal loss in the language. These were not just mild expressions of preference; they were visceral, even angry, and they clustered around a common complaint: GPT-5 may be technically better, but it feels colder, less “human,” and less fun to talk to. For these users, something essential had been taken away. Because there was no option to keep using 4o, the change felt not like a choice but like a forced lobotomy. The same GPT-5 release thus produced, in real time, two utterly opposite experiences: mine, in which the new system became an immediate companion and co-author, and theirs, in which it felt like an unwelcome stranger. That gap in perception is the subject of this article — the human psychology of upgrades, the way in which gains can be outweighed by losses, and the historical patterns that show just how old and persistent this phenomenon is.

It is tempting to frame the matter as one of personal taste, a subjective mismatch between what I happen to want from an AI and what others want. That is too shallow. In the angry calls to “bring back 4o,” I hear something older than AI, something I have read in accounts of political revolutions, commercial product changes, and even military technology transitions: a deep resistance to altering the feel of a tool or companion, even when its capabilities improve. There is a long human tradition of rejecting better tools simply because they are different in the wrong ways. AI might be new, but this pattern is not.

When Loss Looms Larger Than Gain

Psychologists Daniel Kahneman and Amos Tversky codified this decades ago in their Nobel-winning work on prospect theory: losses weigh more heavily in the human mind than gains of equivalent size. The effect is not marginal; it is foundational to how we make decisions under uncertainty. A $100 windfall sparks brief satisfaction, but a $100 loss can sour the mood for days. The asymmetry is not rational in a strict economic sense, yet it is nearly universal. In the context of GPT-5, it explains why the most visible reactions are not from those who appreciate its expanded reasoning or factual accuracy, but from those who feel deprived of something they valued in 4o. Gains are abstract and cumulative; losses are immediate and personal.

Commerce has seen this pattern many times, but the 1985 launch of “New Coke” remains a near-perfect modern illustration. Seeking to reverse Pepsi’s edge in blind taste tests, Coca-Cola reformulated toward a sweeter profile. In blind trials, it won — not narrowly, but decisively. Convinced by the data, the company replaced the original formula outright. The backlash was instantaneous. Complaints poured into headquarters by the thousands; protest groups formed; late-night comedians turned it into a running joke. The issue was not that New Coke tasted bad. It was that the original — the Coke in the minds of millions — had been removed from the shelves. Within three months, the company reversed course, restoring the old formula as “Coca-Cola Classic” and making it the brand’s central identity. In a market where brand identity is part of the product, superior taste in a paper cup lost to the story on the can.

The lesson did not fully stick. In 2023, Coca-Cola partnered with an AI system to launch Coca-Cola Y3000, marketed as “co-created with AI” and wrapped in a futuristic aesthetic. The novelty drew curiosity; the taste did not. Without a link to the emotional lineage of the brand, the product felt like an orphan. The marketing sparkled, but the flavor profile lacked the anchor of familiarity, and the AI branding could not substitute for nostalgia. Novelty without continuity faded quickly from shelves.

This is the same structural tension in the GPT-4o to GPT-5 shift. In a blind “taste test” of capabilities — code generation, mathematical reasoning, multi-step planning — GPT-5 may outperform 4o. OpenAI almost certainly invested heavily in collecting such data through Reinforcement Learning from Human Feedback (RLHF). But users do not live in blind tests; they live in habits. If the primary value they derived from 4o was its warmth, its ready agreement, or the way it played the role of conversational foil, then GPT-5 arrives as a stranger in a familiar uniform. The press release says “better”; the lived experience says “wrong.”

There is a corollary here that product teams ignore at their peril: the deeper a product is woven into daily routine, the more dangerous it is to change its character without consent. Coca-Cola’s restoration of “Classic” was not sentimental pandering; it was an acknowledgment that continuity is part of the utility people consume. When change is presented as an option, adoption curves can be gradual, voluntary, and self-reinforcing. When change arrives as an edict, the intensity of backlash scales with the personal attachment to what was lost. OpenAI’s decision to sunset 4o without a transitional path meant that, for many, the shift to GPT-5 felt less like an upgrade and more like a rupture.

Python 2 to Python 3: When the Cost of Change Feels Too High

If New Coke is the case of altering a product’s character, Python 3 is the case of altering the rules of an entire world. Released in 2008, Python 3.0 was a deliberate refactoring: clean up Unicode handling, fix long-standing inconsistencies, remove ambiguous behaviors that had lingered for the sake of backward compatibility. For language designers, it was an overdue repair; for working engineers, it was a migration bill arriving all at once.

Some changes were visible at a glance — print became a function, integer division semantics changed, text and byte strings became distinct. Others emerged in the dependencies: libraries not yet ported, frameworks waiting on their own dependencies, automated conversion tools like 2to3 fixing much but not all. Each piece might be manageable in isolation, but the ecosystem moved as a network. A single missing library could block an entire project from crossing the gap, and each maintainer delayed until the move felt safe.

This coordination problem became the real bottleneck. Even teams that wanted the improvements faced a prisoner’s dilemma: migrate too early, and they would carry the cost of maintaining compatibility with both versions; wait too long, and they risked falling behind the libraries that did move. The rational decision for many was to hold position until the water level rose everywhere.

In time, the environment shifted. Universities taught Python 3 by default, packaging tools assumed it, and major libraries dropped Python 2 support. By the time Python 2 reached its official end-of-life in 2020, most of the ecosystem had already moved, and the reputational risk of running unsupported code outweighed the migration cost. Adoption happened not because Python 3 was declared better, but because the surrounding network made it safer, cheaper, and inevitable.

For the GPT-4o → GPT-5 shift, the analogy is direct: replacing a central tool in a live system works best when the ecosystem can move in sync. Python’s migration was a success in the long term because it allowed for overlap, provided working bridges, and set a visible horizon. Removing 4o without overlap forces every “dependent system” — here, the human habits and workflows built around it — to adapt simultaneously, amplifying the pain of change.

From Oak to Iron: The Reluctant Birth of the Ironclad

In the mid-19th century, the world’s major navies reached a moment of technological rupture. For centuries, warships had been built from seasoned oak, driven by wind through intricate rigs of masts, yards, and sails. Their employment in battle was governed by the rigid geometry of line-of-battle tactics, where victory often depended on a captain’s ability to position his vessel for maximum broadside effect. Seamanship was an art form, and mastery of it was a marker of personal and national prestige.

The launch of the French La Gloire in 1859 disrupted that order. She was the world’s first ocean-going ironclad, a wooden-hulled frigate sheathed in iron plates and driven primarily by steam power. Britain’s reply came swiftly with HMS Warrior, a warship of iron from keel to bulwark, faster and more heavily armed than anything afloat. In gunnery trials, her armor was proof against the most powerful naval artillery then in service. On paper, these ships made every wooden vessel obsolete.

Yet adoption was far from immediate. Ironclads were expensive to build and maintain, heavier and more mechanically temperamental than their wooden predecessors. Their coal-fired engines required constant fueling and a new global network of coaling stations, a logistical burden that sailing ships did not share. Their silhouettes lacked the towering grace of masts and sails; to officers raised under canvas, the new hull forms looked alien. Most importantly, they invalidated skills that had taken decades to master. The wind — once the central variable of naval tactics — was now a constraint that could be ignored. For an entire generation of commanders, this rendered large parts of their expertise obsolete overnight.

Combat proved the point more forcefully than any trial. In March 1862, during the American Civil War, the Confederate CSS Virginia destroyed two Union wooden warships in a single day at Hampton Roads, absorbing their broadsides with impunity. The next day, she fought the Union’s USS Monitor — a radically different ironclad with a low freeboard and a revolving turret — to a tactical draw. The battle was strategically decisive: wooden fleets had no future in direct combat with armored ships.

The parallel to the 4o → 5 shift is clear. An ironclad may dominate in trials, just as GPT-5 may outperform in measured reasoning or coding tasks, yet the superiority of the new does not erase the familiarity and functional adequacy of the old. The crews of wooden ships did not doubt that armor could stop shot and shell; they doubted whether the trade-offs in cost, maintenance, and handling justified abandoning the vessel they already knew how to fight. In AI terms, capability metrics alone will not win over those for whom the “handling” — the style, tone, and feel — is the real product.

Patterns, Contrasts, and What Comes Next

New Coke, Python 3, and the ironclad all illustrate the same structural dynamic: a successor that is, in measurable ways, more capable than its predecessor, yet faces resistance because it disrupts familiarity, undermines learned mastery, or changes the qualities that users most valued. In each case, the eventual victory of the new was never in real doubt. What varied was the pace of adoption, which was determined less by technical merit than by how the transition was managed.

In the GPT-4o → GPT-5 shift, that management has been compressed into days rather than years. Wooden ships lingered for decades after ironclads proved their dominance in battle. Python 2 was supported for more than a decade after Python 3’s release. Even New Coke lasted months before the original returned as “Classic.” With 4o, most users had no fallback. The change was instantaneous, and for some, jarring.

Behavioral economist Dan Ariely’s research on habit change is relevant here. Most people adapt better to gradual transitions. Removing the old model entirely is like forcing cold-turkey withdrawal — it can work, but primarily for a minority who have both high tolerance for disruption and an alternative source of reward. These are the “rip off the bandaid” types: the engineer who will rewrite from scratch rather than patch, the sailor willing to take the first ironclad to sea and never look back. They adapt quickly but are outnumbered by those who fare better with a staged migration. For the majority, the problem is not the capability of the new system but the absence of the old during the adjustment period.

Anthropic’s Claude 3.5 → 4.1 transition offers a useful contrast. For a time, both models were accessible. Users could try the new, return to the old if it felt off, and make the change at their own pace. That overlap functioned as a decompression chamber, softening the psychological cost. By the time the older model was retired, many users had already moved voluntarily.

OpenAI’s choice to remove 4o outright accelerated the capability handoff but concentrated the discomfort. In the long run, GPT-5 will almost certainly become the default for most users, just as iron replaced oak and Python 3 replaced Python 2. The difference lies in how much reputational capital is spent to get there. Parallel availability, even for a few months, would have incurred additional infrastructure cost but preserved goodwill.

The broader lesson is that upgrades are not purely technical events; they are psychological crossings. Each user decides whether and how to move based not only on what the new tool can do, but on the path they are offered to reach it. Some will row straight for the new shore. Others will step onto a bridge if it exists. Remove the bridge, and the loudest voices will belong to those who feel left on the wrong bank.

By David Zhang and ChatGPT 5

August 9, 2025

Share this:

Related

Leave a comment Cancel reply