Numerous faked images and a string of startlingly inaccurate responses from Gemini and Grok are part of a tidal wave of AI slop engulfing coverage of the Iran war
Since you really don’t believe me, here is an experiment to show how accurately a vector embedding represents the real image:
Original:
Output from the LLM:
Clearly, what the LLM is seeing is an incredibly detailed and accurate representation of the real image. Practically the same quality as a JPEG. (Not quite, as you can see in the garbled text on the tag, but nearly the same quality.)
Okay, after spending way too much time researching this, I’ve come to the conclusion that my original statement was wrong - and I finally also got my chosen AI assistant to agree with me.
My original claim: “The AI said the picture was fake because it can’t even see the picture. It only got a description from a vision model.”
My revised explanation: “It got it wrong because the image embedding it received didn’t contain the relevant information to make that distinction.”
So yes. While it’s true that the LLM can’t see the picture in the human sense of vision, that’s not what I was talking about. It can “see” the image well enough to tell the difference. It simply doesn’t know what to look for. It’s not really the LLM’s fault - it’s because the image encoder wasn’t trained on enough pictures labeled as AI-generated.
Since you really don’t believe me, here is an experiment to show how accurately a vector embedding represents the real image:
Original:
Output from the LLM:
Clearly, what the LLM is seeing is an incredibly detailed and accurate representation of the real image. Practically the same quality as a JPEG. (Not quite, as you can see in the garbled text on the tag, but nearly the same quality.)
Okay, after spending way too much time researching this, I’ve come to the conclusion that my original statement was wrong - and I finally also got my chosen AI assistant to agree with me.
My original claim: “The AI said the picture was fake because it can’t even see the picture. It only got a description from a vision model.”
My revised explanation: “It got it wrong because the image embedding it received didn’t contain the relevant information to make that distinction.”
So yes. While it’s true that the LLM can’t see the picture in the human sense of vision, that’s not what I was talking about. It can “see” the image well enough to tell the difference. It simply doesn’t know what to look for. It’s not really the LLM’s fault - it’s because the image encoder wasn’t trained on enough pictures labeled as AI-generated.