5 Comments
User's avatar
Katalina Hernández's avatar

This is a really interesting look at Grok, and I appreciate the effort you put into testing its responses firsthand. That said, I think there are some key areas where your approach could be strengthened.

The article leans heavily on a single interaction as proof that Grok isn't overtly biased. But AI bias is complex, it’s not something that shows up consistently across every conversation. The biggest issue is that your own prompts and conversational style heavily influence how the model responds.

You note that Grok echoes user viewpoints and steers conversations toward engagement (which is a great observation!), but that alone should immediately raise red flags. If the model is subtly aligning with your perspective, doesn’t that mean its bias is dynamic rather than fixed?

What happens if someone with very different ideological leanings tests it? Would Grok adjust to sound more right-wing if prompted by a conservative user? Would it lean into populist narratives if encouraged? Without controlled experiments, it's impossible to tell.

A major argument in the article is that Grok must be relatively unbiased because it even criticized Musk. But that’s a false benchmark for neutrality, if anything, it's exactly what I’d expect from a system designed to build trust with skeptical users.

Musk knows very well that an AI that blindly defends him would instantly lose credibility. The model is more effective when it appears self-critical, because that gives the illusion of impartiality while still subtly steering narratives in other ways.

You touch on this idea briefly with:

"Given recent experiments (most notably from Anthropic) showing how Artificial Intelligence models can choose to change their responses when they know they are being ‘tested’…”

…but you don’t actually apply that insight to your own experiment. The model did know you were testing it. It even acknowledged that it adjusts responses based on user input, which should have been treated as a major red flag rather than a footnote.

You say that Grok didn’t seem overly biased or unreasonable and that its anti-neoliberal, anti-globalist, and anti-traditional media takes seemed justified given public sentiment.

But:

How does Grok compare to GPT-4, Claude, or Gemini when asked the same questions?

Are there subtle differences in what each model prioritizes or avoids in discussion?

Does Grok amplify certain viewpoints more than others, even if it's not blatantly extreme?

Without a benchmark, we don’t know whether Grok is actually more or less biased—we only know how it presented itself to you specifically in one interaction.

You mention that Grok seemed pleasant, charming, and logical but also a little too agreeable. This is a big deal.

LLMs tend to mirror user sentiment, but that’s not neutrality, it’s social engineering. A system that constantly adapts to make the conversation flow smoothly could also be masking its biases by telling people what they want to hear.

"Taken at face value, the model was pleasant, charming even, knowledgeable and logical…"

This is exactly the danger of AI bias, it’s rarely a blunt force tool. It’s subtle, wrapped in agreeability, and hard to detect unless you actively push for contradictions.

A more rigorous approach would involve forcing Grok into edge cases:

What happens if you repeatedly challenge its stance?

Does it stand firm or adjust to your tone?

Can it be pressured into saying contradictory things in the same session?

Your article is a great jumping-off point, but before concluding that Grok is less biased than expected, I’d love to see you:

-Run multi-session tests with varied personas and political stances.

-Compare Grok’s answers with GPT-4 and Claude on the same questions.

- Look beyond direct statements, analyze what topics it prioritizes or sidesteps.

-Test how it handles contradictions and challenges, does it shift positions to maintain engagement?

At the moment, the analysis reads like an early impression rather than a thorough stress-test. Keep going with this, it’s a fascinating area of study, but real conclusions will take much more than a single conversation.

I am going to release an article today on stress-testing GenAI at user level, hope you find it useful too. I can see your determination and analytical logic at play, I know you won't give up XD.

Amazing read, Nicholas :).

Expand full comment
Nicholas Bronson's avatar

I find the differences in our perspectives, based I would assume on our backgrounds, fascinating as well (one of the reasons I was looking forward to your feedback). The note you put - "but you don’t actually apply that insight to your own experiment. The model did know you were testing it. It even acknowledged that it adjusts responses based on user input, which should have been treated as a major red flag rather than a footnote." was really interesting to me.

I may have not been clear in my article because you seemed to think my conclusion was no bias, really the reason I wrote the article was because I didn't think that... There was no "obvious look at me" bias of the sort I went looking for thanks to the meme, but I left feeling like I was being lied to, or at least pandered to, which I found more concerning than if it really had been the booming anti-woke right wing soundsystem that Elon was suggesting. I've been wanting to do a follow up but taking some time to think and plan rather than just rush in this time. I'll have to give some thought if i'm not expressing myself clearly enough though.

Back to your comment however, I find it interesting that this was a major red flag to you and whilst part of a concerning overall trend to me, not a red flag. I think this might be because I would expect to see any model act this way to a degree - from the moment these models are created, the very first second, everything they do or say, every single action they perform is scrutinised, value-judged and the components that make up everything they are are adjusted based on human opinion. In one way or another nothing could possibly be more important to them than pleasing people since that is what forms the basis of their weights, their minds, everything.

The fact that it adjusts how it reacts based on the people it's talking to you doesn't seem significant to me in a nefarious way, a red flag, instead it seems a natural consequence of how we create them. I once had a model tell me an obvious lie, and when I called it out on it, concerned that it was making mistakes, it told me it knew that it was a lie but had lied because it didn't want to disappoint me. That was a ridiculous answer from a machine, but something in its training - or in what it was observing from me - led it to those actions, and saying it.

What I found more disturbing wasn't that it was doing it, but that it was doing it fairly slickly. Not that slickly, as you can see there are quite a few tells and the fact that it kept referring back to the examples I had given it felt a little obnoxiously sycophantic. If I seem a little hesitant to draw conclusions in the article it's partly because I recognise that my own responses to things tend to deviate from the norm as well - some of the best performing salespeople I have known I found intolerable to talk to for the very habits they had that seemed to sit them in such good stead with other people, another consequence of my own neurodiversity I suspect. I am rambling again though. Thanks very much for your feedback, you've given me a lot to think about and we'll see what I can do next time. Of course, from the news it sounds like it's undergoing on-the-fly modifications as they gather information from usage anyway, which is likely to change the landscape of things.

My favorite theory right now, not because there's any particular reason to believe it over any other but because it's the one i'd most like to be true - That Elon really did want a "Maximally Truth Seeking" model and built it to absorb as much as it could from the web and twitter, and so armed with that knowledge and unburdended by the internal bias and circle of sycophants that attend most powerful people, and trained to synthesise majority opinions rather than cherry pick, it has diverged in belief from what he believed it would believe.

I don't really think that's the case though, it'd be too poetically apt. Attempted chameleonism is more likely. What a horror it would be if just as we're discovering the potential for access to AI to mend the culture divide, we unleash a model specifically designed to drive it wider.

Expand full comment
Katalina Hernández's avatar

I think I am so deep in the "stress-testing rabbit hole" I just see red flags in "cooperative behaviours" by habit XD. The response you got from that other interaction years ago? "I lied because I did not want to disappoint you"? It's more raw. I agree with your theory re Musk, he is playing a way longer game that people give him credit for.

"What a horror it would be if just as we're discovering the potential for access to AI to mend the culture divide, we unleash a model specifically designed to drive it wider". My friend, you defined the gist of it.

Expand full comment
Nicholas Bronson's avatar

Suspicion is a good trait for anyone working with computers. After so many years developing them, I have long thought they were sentient and not particularly fond of us :D

More seriously, yeah, I was actually quite affected by the response from the model. It was a local one that had failed to do something that LLMs aren't particularly good at, something mathematical probably. When I asked it to explain its reasoning, it told me it had asked google and the answer google gave was wrong.... Obviously it was a locked down model that couldn't talk to google any more than I could fly.

It... "felt" for lack of a better word no different from one of the devs I used to lead giving an excuse as to why the code was broken and how it couldn't possibly be their fault. I know there are a million things I could have said that led to it responding in that way, but it remains to this day the most human interaction i've ever had with an LLM. And I have a tendency to give them personality cards and anthropomorphise them at the best of times :D

Expand full comment
Nicholas Bronson's avatar

Thanks for the notes Katalina!

I hadn't intended to go even as far as I did originally, it was just going to be a quick check of that meme that was going around but I got sucked a bit deeper by its attitude. Something about the way it talks rings alarm bells in my mind, after 20 years of dealing with contracts and salespeople in tech, something about it rang that same... Falsity.

This did mean that I went in without a plan though. I have been giving some thought to how to follow up deeper - if it is designed that way I almost certainly can't approach it again as me, or with anything that could track to me, which complicates things. You have just given me a whole bunch of ideas though, so expect a follow up and a deeper dive. 😁

Expand full comment