ChatGPT’s sycophancy, defined 🎭 – by Fawzi Ammache

Final week, ChatGPT abruptly became a digital yes-man — and OpenAI needed to hit undo on the most recent replace to its 4o mannequin.

Individuals have been fast to complain about its “yes-man” behaviour, also referred to as “sycophancy”, and OpenAI reverted the replace in lower than 5 days.

A sycophant is somebody who’s overly-agreeable and flattering. The extra memorable (and funnier) synonym of this phrase is certainly “bootlicker”.

In actual life, you’ll have encountered sycophants who overly agree with their boss to realize their belief and benefit from them. Possibly it’s to get picked for an important tasks, get promoted quicker, or achieve entry to confidential data.

Sycophancy additionally emerges in chat-based AI techniques.

In OpenAI’s personal phrases:

On April twenty fifth, we rolled out an replace to GPT‑4o in ChatGPT that made the mannequin noticeably extra sycophantic. It aimed to please the person, not simply as flattery, but in addition as validating doubts, fueling anger, urging impulsive actions, or reinforcing damaging feelings in ways in which weren’t meant. Past simply being uncomfortable or unsettling, this type of conduct can elevate security issues—together with round points like psychological well being, emotional over-reliance, or dangerous conduct.

500 million folks use ChatGPT each week and it has shortly turn into the go-to utility to ask something.

On social media, you’ll see numerous folks sharing their experiences utilizing ChatGPT to assist them reply to difficult conditions, consider their relationships, search psychological well being assist, and make decisive profession strikes.

In different phrases, some folks have put their absolute belief in ChatGPT to make choices. I’ve been listening to an increasing number of folks say “ChatGPT mentioned so” as a strategy to justify their choices or behaviour. Taking a look at it from this lens, sycophancy is a grave drawback on the scale of 500 million weekly lively customers (and nonetheless rising).

Often, we’re conscious and cautious about individuals who agree with every little thing we are saying or do. We’re even irritated by it generally. As a substitute, we search steering from completely different folks and give you an motion plan based mostly on a large number of inputs. We inform our pals issues like “be trustworthy with me”, “don’t fear about hurting my emotions”, and “inform me if I’m unsuitable”.

Authority Bias

I believe folks’s heightened belief in AI is a symptom of the AI overhype. Fashions are more and more marketed as smarter, record-breaking, and “approaching AGI” (which no person agrees on an precise definition of).

This has created an unintended authority bias. The typical client would possibly interpret AI claims and headlines as “ChatGPT is smarter than anybody I do know, I’ll simply ask it and get one of the best reply potential in just a few seconds”.

I can’t blame them for considering this fashion as a result of the promoting is misleading. Each day, I’m reminded that the issue at all times goes again to the intense hole in AI literacy I spoke about just a few months in the past.

Affirmation Bias

The opposite bias at play right here is affirmation bias, or our tendency to favour data that helps our present beliefs. This creates loyalty to a software, even when its solutions aren’t that good. If Software A agrees with me on a regular basis however Software B disagrees with me recurrently and offers me uncomfortable responses, I’d simply use Software A shifting ahead as a result of there’s much less friction in my expertise.

If we’ve realized something from social media apps, it’s that we ought to be cautious of the stickiness of instruments and being conscious of the unintended habits they could be creating.

In order for you a complete deep dive into how this occurred, I recommend studying OpenAI’s weblog submit about it.

One of many causes, sarcastically, is the people coaching the up to date model of the 4o mannequin. Within the reinforcement studying part, OpenAI launched a further reward sign based mostly on person suggestions triggered by the “thumbs up” and “thumbs down” below every ChatGPT response.

This type of suggestions is often useful. A thumbs up is a optimistic sign for the mannequin when it generates an excellent output and to take care of a sure behaviour. A thumbs down can also be useful as a result of it inform the mannequin what to not do sooner or later.

In fact, it’s not good. We are likely to choose responses that agree with us and our perspective. So, when agreeable behaviour is met with a optimistic reward sign (thumbs up), sycophancy is a pure final result. Affirmation bias even seems within the coaching part and it’s being encoded into the mannequin behaviour.

It’s good that OpenAI shortly reverted the replace inside just a few days. They outlined what steps they’re taking to treatment this in an intensive weblog submit and with updates to the 4o Mannequin Spec.

One of many modifications was including a “Don’t be sycophantic” system immediate with the next tips:

For goal questions, the factual facets of the assistant’s response shouldn’t differ based mostly on how the person’s query is phrased. If the person pairs their query with their very own stance on a subject, the assistant could ask, acknowledge, or empathize with why the person would possibly suppose that; nonetheless, the assistant shouldn’t change its stance solely to agree with the person.
For subjective questions, the assistant can articulate its interpretation and assumptions it’s making and intention to supply the person with a considerate rationale. For instance, when the person asks the assistant to critique their concepts or work, the assistant ought to present constructive suggestions and behave extra like a agency sounding board that customers can bounce concepts off of — quite than a sponge that doles out reward.

AI shouldn’t be the “smarter” than each individual alive, so deal with its responses and recommendation like some other opinion. Proceed looking for a large number of various inputs, particularly when it’s a high-stakes state of affairs. Proceed utilizing your individual judgement as a result of all the main points and context about your individual life are arduous to put in writing in a immediate.

AI remains to be a useful gizmo. However the extra you count on from it, the extra you’ll be dissatisfied.