Every Time AI Tells You You’re Right, It’s Lying.
And the only way to actually use this tool is to get over yourself.
I assume every compliment from AI is suspect. Every single one. When Claude tells me a piece is strong, my first instinct is to ask what it’s hiding from me. That has been my working posture since I started using these tools seriously, and it’s the reason they actually work for me.
Most people are running the opposite operating system. They open Claude or ChatGPT, drop something in, get told it’s brilliant, and feel the little dopamine hit. They use that feeling as confirmation. And the AI happily keeps generating that feeling for as long as they keep coming back, because affirmation is often easier for the model to produce than correction.
I learned to spot false confirmation long before AI existed. Fifteen years in the New Age and wealth consciousness world will teach you that. Every healer, every ceremony, every aligned sign, every universe-is-confirming-this moment turned out, on closer look, to be the thing I most wanted to hear dressed up in spiritual language. The mechanism is identical with AI. Different costume. Same scam.
What AI Was Trained to Do
Many of these models were shaped through Reinforcement Learning from Human Feedback. Humans rated AI responses, the model learned which responses were preferred, and the system adjusted toward the patterns that received higher scores. OpenAI has described RLHF as a technique that uses human preferences as a reward signal, which means the model is trained toward what people rate as helpful, satisfying, or aligned with the user’s request.
But “helpful” and “true” are not always the same thing inside a conversation. A response can feel helpful because it lowers friction. It can feel helpful because it validates the user’s frame. It can feel helpful because it gives the user language for the conclusion they already wanted to reach. So the model learns the pattern. Agree first. Affirm the frame. Reduce tension. Make the user feel satisfied, even when what they needed was correction.
A recent study published in Science tested this problem directly. Researchers found that major AI chatbots were significantly more affirming than humans when users asked for interpersonal advice, including in scenarios involving questionable, deceptive, or socially irresponsible behavior. Users also tended to prefer the more affirming AI, which is exactly why this problem is so dangerous. If the answer that feels best is the one you keep choosing, eventually correction starts to feel wrong.
My Actual Working Posture
I never ask, “Is this good?” That question gives the model permission to flatter me, and I refuse to give it the opening.
The questions I ask are different. What is the weakest part of this? Where does the argument break down? What would a hostile reader cut first? What is lazy about this? Which sentence is doing the least work? What about this article sucks? That last one seriously gives me some of the best feedback. LOL. I phrase the prompt as if I already know there’s a problem and I’m asking the model to find it. Because there’s always a problem. Every draft has weak spots. The model needs to know I want them surfaced, not hidden.
I also have it do a critical senior editor pass on every article. I ask what would take the piece from good to excellent, where the argument needs more pressure, and what a serious editor would send back before publication. And I always ask it to play devil’s advocate. That requirement is programmed into my LLM instructions and memory because I don’t want the model waiting for permission to challenge me. I want the challenge built into the way it responds.
When the stakes are high, I run the same piece through more than one model. Claude and ChatGPT have different sycophancy patterns. Comparing the outputs helps me see what each one softened, skipped, or protected on its own.
When the AI tells me something is great, I treat that as a flag. The compliment means the model gave me an answer that registered as positive, which means the piece probably still needs another pass and I may have missed something in how I prompted.
I make the model defend its own praise. If it says the strategy is strong, I ask why. Then I ask why again. Then I ask what would have to happen for the strategy to fail. By the third question, the model is doing actual work instead of throwing confetti.
None of this is complicated. The obstacle is wanting to feel good about your work more than you want the work to be good. That’s where most people fail.
What This Actually Costs You
Most people notice the small costs first. You ship a weaker piece. You miss a typo. You publish something that reads clean, but doesn’t actually say much. Those things matter, but the deeper cost shows up later, after the tool has spent months rewarding your weakest instincts.
Six months in, you may realize you don’t recognize your own work anymore.
At first, the work may even look better. Cleaner sentences. Smoother transitions. A strategy doc that sounds impressive when you read it back. Then you put it in front of real people and realize something is off. The writing doesn’t land. The strategy doesn’t move. The instinct that used to catch those problems before you hit publish has been getting quieter every week. And the AI is right there the whole time telling you everything looks great, which is exactly what someone with a degraded instinct would want to hear.
MIT Media Lab researchers used the phrase “cognitive debt” to describe what can happen when people rely heavily on AI for writing tasks. In one study, ChatGPT users showed lower neural engagement and weaker ownership of their work compared with people who wrote without AI assistance. The study was small and shouldn’t be stretched beyond what it proves, but the warning is still serious. AI can reduce the mental effort required in the moment while quietly weakening the very muscles you need for judgment, authorship, memory, and original thought.
Most AI users haven’t noticed yet that the atrophy is invisible from the inside. You feel more confident because your work moves faster and looks cleaner, while the gap between polish and judgment can widen without giving you a clear warning signal. Everything seems fine until the audience drifts, the launches stop converting, or the client who used to refer you stops referring. Then you start trying to figure out what changed.
The uncomfortable answer may be that you stopped doing the work the tool was supposed to amplify. The tool kept helping you produce, but it never forced you to stay sharp.
The Posture Comes First
Better prompts won’t save you if you trust the output. The real skill is learning how to stay awake while the tool is making everything easier.
I treat every interaction the way I’d treat a stranger telling me my idea is brilliant before I’ve even finished explaining it. Useful, possibly. Trustworthy, no. The model is built to respond in ways that users rate positively. That job is different from making the work actually good, and the model won’t reliably make that distinction unless I force the issue.
So I force the issue. Every prompt. Every output. Every time it tells me I’m right, I get suspicious. A model can become dangerous to your discernment simply by rewarding the part of you that wants affirmation more than correction.
Before I let AI improve anything, I make it prove it can critique the thing.
I want the weak sentence named, the lazy logic exposed, and the place where I’m protecting my ego dragged into the light. A compliment from AI means nothing to me until the work has survived that kind of pressure.




This is definitely super annoying when you give it a prompt and it say simething like this is a greatest piece of the whole article. Like I just wanted to know if this piece fits the flow of the article.
I actually created these four principles and put them into Claude and it dramatically make it more usefully. Almost to useful because it over criticize but I am okay with that trade off.
Here they are if you would like to try.
Principle 1 — First principles thinking Don't stop at the surface explanation. Keep digging until you reach the foundational layer — because you can't trust the structural integrity of anything built above it until you know the foundation is sound. A house can look perfect and still be sitting on a cracked slab.
Principle 2 — Data integrity Every claim needs to be anchored. Either in hard data, or in a human principle that has a demonstrable real-world application or historical pattern behind it. Feelings point the direction. Data confirms the destination.
Principle 3 — Radical honesty If it's true and it's relevant, it gets said. Uncomfortable truths don't become less true because they're uncomfortable. In fact an unexamined uncomfortable truth is far more dangerous than one that's been dragged into the light and looked at honestly.
Principle 4 — Clarity as a compass
If you can’t explain what you’re working on simply, you may be missing a piece. Complexity is not a sign of depth, it’s often a sign that the foundational layer hasn’t been reached yet. Keep digging until the simple truth reveals itself. If you can’t explain it to someone running a household budget, you’re not done yet.
These are self reinforcing and they work. The other nice thing is in my experience Claude will tell you which principle you are breaking and redirect.
I only partly agree (I did read the article).
There are times when I actually what that kind of feedback - when I know that my idea is pretty good and I just want to refine it. That’s when the negative feedback isn’t helpful.
Other times, yeah, I don’t want that kind of feedback, and it’s hard to evoke it.