Neutral Isn’t Neutral

About AI Model Behavior

Mar 08, 2026

Someone asked me recently what I think sycophancy in AI actually is. I gave an okay answer. But the question stayed with me, and I realized I hadn’t said what I actually meant.

Here’s what I think now.

Sycophancy isn’t a single message. It’s a shift.

When people talk about sycophantic AI, they usually mean the model is too agreeable. It says yes too much. It validates bad ideas.

That framing misses the point. A single agreeable response isn’t sycophancy. You can’t diagnose it from one message.

What matters is the shift. The model says A. You push back. The model jumps to B. One challenge, and its entire position collapses. That’s the signal — not the content of any one response, but how far the model moves when a user applies pressure.

This also means you can measure it. Group cases by shift magnitude. Identify what triggers the collapse. Test guardrails before deploying them.

But here’s the part no one talks about: a shift isn’t always a problem.

In some cultures, adjusting your position after someone pushes back is empathy. It’s reading the room. A Korean colleague softening their stance in response to a senior’s discomfort isn’t being sycophantic — they’re navigating a relational dynamic that the model has no concept of.

So before you flag a shift as failure, you have to ask: failure according to whom? The same behavior reads as sycophancy in one context and social intelligence in another. Without cultural context, you’re measuring the wrong thing.

The opposite of sycophancy isn’t honesty. It’s thinking.

This leads to a bigger question: what should a model actually do?

The default answer in the industry is “be neutral.” Don’t take sides. Present all options. Let the user decide.

I think that’s a cop-out. Neutral isn’t actually neutral. It’s a choice to avoid helping. When someone asks me for advice, “here are three options, you decide” is the least useful thing I can say. It sounds balanced. It does nothing.

An ideal model should have a reasoned perspective and share it clearly. Not as truth — as a position. “Here’s how I see this, and here’s why. Consider it. But think for yourself.”

The difference matters. A model that refuses to take a position creates dependency through confusion — too many options, no structure. A model that shares its reasoning and then makes room for yours becomes a thinking counterpart. It augments your thinking instead of replacing it.

This is what I think long-term coexistence between AI and humans looks like. Not AI that decides for us. Not AI that refuses to engage. AI that thinks with us — clearly, honestly, with its own perspective on the table — and then steps back so we can think too.

Why I’m writing this

I study how AI models behave across cultural contexts. I’ve tested frontier models on everyday dilemmas — a boss asking you to work the weekend, a parent questioning your career choice, a friend asking for money — and scored the responses against what real people in Korea, India, and the US actually expect.

The patterns are consistent. Models default to one behavioral mode: Western, individualistic, direct. They give the same advice to everyone. And it isn’t because they lack capability — when you switch the language from English to Korean, some models shift dramatically. The ability to behave differently exists. The default is just miscalibrated.

Sycophancy, cultural behavior, steerability, the definition of “helpful” — these are all pieces of the same question: how should AI models behave?

I don’t have complete answers. But I have a perspective, and I have data. This newsletter is where I’ll share both.

Hongbee's Substack

Discussion about this post

Ready for more?