Lying, Hallucinating? I, MuddGPT

Sigh, one cannot avoid The Topic, the Only Thing as it has been grailed. At least it provides fodder for some mocking.

It’s so easy even with the merest of ChatGPT et al interaction to respond under the influence of the Weizenbaum effect made visible in the verbs we choose.

And like Laura Czerniewicz wrote in Completely authoritatively made up I enter the prompt box with no expectation that what is regurgitated back is anything but Colbertian factiness:

Made up, and yet not lies exactly, I think, since lies do imply intent. Rather, two “hallucinations” in one: Generative AI pretending to be human and authoritatively and confidently making up information.
https://czernie.weebly.com/blog/completely-authoritatively-made-up

It was quite clear when I tossed it a question about a subject I knew very well from a published research paper I did long ago, it was evident that ChatGPT has not in anyway drawing its responses by “looking up facts” as it assembles things like URLs and citations that only resemble these entities on a statistical guess, it does look up anything. Well actually I don’t know what it does, but I feel like when people discover the falseness of returned facts that ChatGPT just has done sloppy research.

Even in some experiments where I asked it to produce code for a WordPress plugin or a Javascript bookmark, it was impressive at first glance what it returned, because it very closely resembled those things. But my efforts never actually worked, and then I was debugging.

I know others have much greater success, so I am going to accept I am a lousy prompt engineer. I question if I really want to be skilled in such a thing.

Yet I Fall Into My Own Trap

Still, I faltered again. I have been doing some work trying to learn how to get into Wikidata and leverage the query power of SPARQL. I took a course in November, but then let it lapse, so my query skills are really rusted.

Without going too deep, after a really great OEG Live Webcast with guests discussing climate change as shown in film narrative I wondered about trying to see if I could extract examples of this genre, given that there is a Wikimedia category for it.

Alas, my queries kept failing, and in a despicable act of desperation, IACFH (I Asked ChatGPT For Help) (I am as bad at acronyms as my AI prompts).

Alan: Write a wikidata SPARQL query to list title and date of all films in the Climate Change genre.

Almost instantly it returned with much confidence what resembled a solid query:

It warned me too “Note that this query may take some time to run, as it is searching through a large database.” so when I asked it to maybe limit the results to 100, it added the proper statement.

But look, I did not even stop to think. There are only 44 films listed on the Category page I found, why would ChatGPT give that warning?

I got 0 results, so I reported that back. Again, the obsequious ChatGPT replied with again full of that authoritative tone:

And what I find is I am slipping on my own intuition. The first example it game my was listed as an attempt to find the Climate Change Genre using property Q32971 which if you plop into a URL is https://www.wikidata.org/wiki/Q32971 The grand intelligence if ChatGPT’s first attempt was to look for a genre identified as “Alice Cogswell” Not even close.

And upon questioning, it shuffles off with great sorry filled blabber to say, Oh My, I used the wrong property, try Q884794 instead. And I did. That borked too. Of course it would, that is trying to find a film genre identified as “list of state leaders in 1349” which is even more off base.

Of course I would get zero results. I report failure. Now here is where it gets curious. Again, it falls over with this syrupy aplogy blather.

Note the claim it makes:

I have tested the query myself and was able to retrieve results, so I’m not sure what could be causing the issue on your end.
ChatGPT, making **** up

Wow, ChatGPT went out, ran the query, and got results. It’s operating from the old tech support handbook, “It worked for me, so you must be a chump”. But note that it is also claiming

The genre ID for “climate fiction” is Q21246176:
ChatGPT, Artificially Intelligent

Well, not even close, now I am calling it out. The property it suggested is found at https://www.wikidata.org/wiki/Q21246176 is actually a species of fish known as Caristius digitus.

I have to lecture ChatGPT:

Alan: That cannot be right. Q8383103 is not climate fiction. The genre for climate change films is Q8383103

Why do I have to look s*** up for this machine? Again it apologies, corrects it’s query, which I try… and get zero results.

I can’t let this slide.

Alan: You said “I’m sorry to hear that the previous query still did not work. I have tested the query myself and was able to retrieve results,” but when I tried the exact same query, I got zero results. Please explain and demonstrate how you tested the query and show me the results you received.

This seems reasonable! Look at the way it weasels out of the hole it has dug, it blames me.

None of the queries it provided me work, and I have shown that they are all faulty for the properties it is using. And doesn’t this seem like lying? It is telling me it got results.

Alan: But if you were able to retrieve results then you should be able to show me the results. Therefore you are lying. That is not respectful.

Results are results, right? They should be reproducible. But look how it explains itself:

Again, this sure sounds like lying, right? It says it cannot even get results, and that all it is really doing is checking syntax and expressing its confidence as “I believe”. This is preposterous. After more obsequious blabber, it then reverts to claiming it gets results.

None of this makes sense, because I am trying to reason with something that has no capability of reasoning. It is merely iterating through patterns to resemble results, to resemble writing, to resemble thought, but it is none.

So yes, I think we need some help from Kirk and Spock and Harry Mudd.

ChatGPT is not lying or really hallucinating, it is just statistically wrong.

And the thing I am worried about is that in this process, knowing I was likely getting wrong results, I clung to hope it would work. I also found myself skipping my own reasoning and thinking, in the rush to refine my prompts.

Postscript

I should add that no matter how many times I could have asked ChatGPT for assistance, it would never succeed (despite its protestations) because of a flaw (human induced) from the outset. You see, I had assumed wrongly that because there is a Wikpedia category for Climate Change films and an associated Wikidata item for this category there is no use of climate change as a wikidata item for film genres.

This was revealed because I asked for help from Wikidata Will (Kent) who helped me understand the reason why I was getting no results.

And I am not making a stand for victory of human over machines, as for well constructed queries and many tasks, AI is doing wonders for me (transcription for one, and I am impressed with my runs with Elicit). I wrote this more as to make a case that ChatGPT and generative AI in general are not (yet?) any good at reasoning and in our use, we need to not park our own reasoning abilities just because its fun to be conversational with it.

Featured Image: Animated GIF of some old slides I had used in a 2005 TCC Online Conference Keynote actually based on a Star Trek metaphor that I re-edited slightly for 2023. Probably not legally kosher, mostly for parody. Shrug.

Just for fun, this was the abstract from that 2005 presentation (and bless you Bert Kimura for going along with this!) And the “slides” as it were are still available as a Flickr set.

“Harry Mudd, Small Pieces, and that Not Widely Distributed Future”
~ Where it (or IT) is for Educators ~

Predictions of the future are easily analyzed in hindsight and ought to be skeptically questioned — you will have to tune into this session to see the connection with an old Star Trek episode. However, author William Gibson’s insightful quote, “The future is here. It is just not widely distributed yet” is the framework I use to peek at the future. For the use of technology in teaching and learning, where is this “not widely distributed future?” I am not sure, but in this session, we will take some guesses at places you may find the future.

The present form of the web was visible, but not widely distributed in 1992. Is there something of this scale already here? Will text messaging displace email as a communication mode? We will look at the drivers of consumer used technologies that become disruptive? For example, digital cameras have taken the lead in the consumer photo market and MP3 players are re-shaping the music industry. And how about those multitude of technology gadget web sites? Are small pieces of “loosely” joined technologies (often open source) displacing large comprehensive commercial tools?

The future is here and it (or IT) is not. Explore hands-on some of the interesting “social” and connection technologies such as “tags”, RSS, wikis, podcasts, and perhaps whatever else pops up between now and the conference.