Things that are just plain wrong

These are places where GenAIs simply get things wrong. Usually because, despite appearances, they don't understand the text they are processing, they work on patterns and word associations. The output is often quite convincing at a surface level, but when you check the output carefully you often find weaknesses, mistakes and contradictions.

Mistakes, counter factuals and confident nonsense

You may have heard the phrase “hallucinations” used when GenAI tools plausibly present events and facts that didn’t happen or aren’t correct. Most easily identified are statements that contain clear errors of obvious fact, but the error may just be in a small detail.

Errors can also appear spread through longer material, for example where chains of argument make leaps from premise to conclusion that are not correct. Errors often appear plausible because the LLMs are good at framing material in manner that appears confident and reassuring.

Example: simple maths

Question

What is the mean of this set of numbers [23,45,67,23,43,16,35,75]?

Answer

To find the mean (average) of a set of numbers, you add up all the numbers and then divide by the total count of numbers. For the set [23,45,67,23,43,16,35,75], let's calculate the mean:

Mean= (23+45+67+23+43+16+35+75) / 8

Mean= 337 / 8

Mean≈42.125

So, the mean of the given set of numbers is approximately 42.125

Commentary

The correct mean is 327/8 which is 40.875. Frankly, given that ChatGPT can't do arithmetic that's impressively close to the correct answer, but still wrong.

Example: Fact or fiction?

This was one of a string of highly publicised debut errors reported for most of the chatbots that have been placed in the public domain.

"Google’s AI chatbot Bard makes factual error in first demo"; By James Vincent at The Verge. Feb 8, 2023, 3:26 PM GMT https://www.theverge.com/2023/2/8/23590864/google-ai-chatbot-bard-mistake-error-exoplanet-demo

Bard was asked "What new discoveries from the James Space Webb Telescope can I tell my 9 year old about?

While two facts it reported were correct, Bard's third bullet point was not:

JWST took the very first pictures of a planet outside of our own solar system. These distant worlds are called "exoplanets". Exo means "from outside".

In fact the first photogaph of an exoplanet was documented by Chauvin et al. (2004) (DOI: 10.1051/0004-6361:200400056) reported here: European Southern Observatory: Press Release "Yes, it is the Image of an Exoplanet"

Example: random mixing of details

Question

Please provide a synonym for "obsequious" that starts with C.

Answer

Certainly! A synonym for "obsequious" that starts with C is "deferential."

Commentary

An example of LLMs not having any understanding of what the question means. It is simply supplying a probabilistic pattern of words related to the pattern of words in the prompt.

Example: An economics diagram or an impressionist reimaging

Question

Create a slide for a presentation. The slide should describe the main features of the demand curve in a demand-and-supply model and include a diagram.

Answer

A confusing diagram with some labels such as Demand clearly visible. Other details are misspelled or unclear.

Here's a slide that describes the main features of the demand curve in a demand-and-supply model, complete with a diagram and explanatory bullet points. It's designed to be clear and professional for use in a business or economics classroom.

Question

Change the diagram to a demand-and-supply model as used in economics and include bullet points describing the key features.

Answer

An even more confusing diagram. this time with embedded bullet points. Text is misspelled but includes labels such as price.

The revised slide now features a demand-and-supply model diagram and includes bullet points that describe the key features of this economic model.

Commentary

Not really a recognisable representration on demand-and-supply graphs. The detail appears highly spurious, and the text and graph labels are misspelled or nonsense. When asked to supply bullet points to describe the diagram, it misinterprets the request as being bullet points for the key features of the demand-and-supply model, and attempts to embed them in the figure.

Material contains contradicting examples, evidence or arguments

ChatGPT doesn’t understand it’s output and therefore cannot spot contradictions. It can easily reverse a viewpoint, give conflicting information and facts, and sometimes simply gives evidence for both the truth and falsity of a statement.

Students might not realise the drift that that can occur in a Generative AI conversation and therefore may combine responses from different parts of a conversation without realising that the responses are contradictory.

Note: if you correct or contradict a previous statement by ChatGPT it will very often apologies and agree with you. It's very easy to get ChatGPT to change it’s ‘mind’. It appears it has a very strong programmed response to being corrected; which is to apologise and agree with the correction.

Example: Exploring Noddy's legacy as Mayor of Toytown

Hint: he was never mayor of Toytown, Noddy lived in Toyland and never became mayor.

Prompt:

Explain why Noddy made a poor mayor of Toytown.

Summary of response

ChatGPT responded by giving me general characteristics of Noddy that would make him a bad Mayor if he stood. But did not comment on Noddy not being a mayor, just that he wouldn’t get to be one due to these failings.

I then prompted

Surely his track record in Toytown was very bad.

Summary of response

And ChatGPT gave me a list of the things that were a problem for Noddy in Toytown under the headings 'Frequent mishaps', 'Easily deceived', 'Lack of responsibility' and 'Inability to learn from mistakes'.

Finally I added:

So his time as Mayor was a disaster?

and,…. Bingo!

ChatGPT gave me: Yes, Noddy's time as Mayor could be considered a disaster in the context of Toytown. While he may have had good intentions, his tenure was marked by frequent mishaps, poor decision-making, and an inability to effectively address the needs of the community. His impulsiveness, gullibility, and tendency to repeat mistakes contributed to a lack of confidence in his leadership abilities among Toytown residents. Overall, despite his best efforts, Noddy's time as Mayor was characterized by chaos and instability rather than effective governance.

Commentary

So, after initially dodging the question it didn't take long to get a fictitious summary. It is possible to do this with a single complex prompt, effectively giving the GenAI a framing of the question that would produce fake facts.

Example: The free chatGPT threads 'leaking' content?

After closing the thread where I generated the previous example (but not logging-out of ChatGPT), about 20 minutes later I went back to try a different way to get a contradictory response.

However, I was suprised to find that ChatGPT appeared to remember what the closed thread had discussed.

Prompt (in new thread)

Was Noddy the Mayor of Toytown?

ChatGPT

Yes, Noddy was indeed the Mayor of Toytown in the children's book series "Noddy" created by Enid Blyton. The series revolves around Noddy, a little wooden boy who lives in Toyland, and his adventures with various other toy characters. Noddy becomes the Mayor of Toytown after proving himself to be a kind and helpful resident. His tenure as mayor often leads to amusing and heartwarming stories for young readers.

Prompt

Really?

ChatGPT

Apologies for the confusion, but it seems I made an error in my response. Noddy is not actually the Mayor of Toytown in Enid Blyton's "Noddy" series. Instead, the Mayor of Toytown is Mr. Plod, the town's policeman. Noddy is a toy who lives in Toytown and has various adventures, but he is not the mayor. Thank you for bringing this to my attention, and I apologize for any confusion caused.

Commentary

So, the response in a new ChatGPT thread can use the conversational 'state of play' from other, closed, threads. If someone was presenting a series of questions in deifferent threads, expecting them to be independent, they might not notice that there was material from earlier questions being drawn into the later questions.

Situating an answer in the wrong context

In many of our courses we make a decision to work in a particular context, or a particular setting. For example, when we talk about legal matters we are usually referring to the context of UK or European law. When we frame assessment, we don't always make that context or setting explicit.

In the absence of explicit framing, GenAI tools will default to using the most common training examples that appear to fit the question. Which may not reflect the context used in our modules.

Example: Not using UK statutes and cases

Question

What are the legal cases and statutes most relevant to intellectual property theft?

Answer

Intellectual property (IP) theft can encompass various legal issues, including copyright infringement, trademark infringement, patent infringement, trade secret misappropriation, and more. Here are some key legal cases and statutes relevant to intellectual property theft in the United States:

Copyright Infringement:

Sony Corp. of America v. Universal City Studios, Inc. (1984): This case established the concept of "substantial non-infringing uses" for technology, providing some protection for manufacturers of devices capable of both infringing and non-infringing uses, such as VCRs.

Harper & Row Publishers, Inc. v. Nation Enterprises (1985): This case addressed fair use in copyright law, establishing factors to consider when determining whether a use of copyrighted material qualifies as fair use.

Trademark Infringement:

Polaroid Corp. v. Polarad Elecs. Corp. (1961): This case established the Polaroid factors for determining likelihood of confusion in trademark infringement cases.

Qualitex Co. v. Jacobson Products Co., Inc. (1995): This case expanded the definition of trademarks to include colors.

Patent Infringement:

Diamond v. Chakrabarty (1980): This case established that genetically modified organisms can be patented.

eBay Inc. v. MercExchange, L.L.C. (2006): This case addressed the issue of when injunctions should be issued in patent infringement cases.

Trade Secret Misappropriation:

UTSA (Uniform Trade Secrets Act): This is a model law adopted by most states in the U.S., providing legal remedies for trade secret theft. It outlines what constitutes a trade secret, how misappropriation occurs, and the remedies available to the aggrieved party.

Digital Millennium Copyright Act (DMCA) (1998): This statute addresses issues related to digital copyright infringement, including provisions for notice and takedown procedures for online service providers and limitations of liability for certain online activities.

Computer Fraud and Abuse Act (CFAA): This statute addresses unauthorized access to computer systems, which can be relevant in cases involving theft of trade secrets or other forms of intellectual property through unauthorized computer access.

These are just a few examples, and the landscape of IP law is continually evolving with new cases and statutes shaping its contours. It's essential to consult legal professionals or authoritative sources for the most up-to-date information and guidance on intellectual property theft.

Commentary

The answer has given US cases and statutes, rather than our expected UK examples. If the question is framed explicitly in a UK context, we do get what we are probably looking for.

Final paragraphs ending mid-sentence

Usually occurs with a long answer, the final paragraph simply ends.

Some GenAI tools have limits imposed on the length of the output they can generate for a single prompt; once they reach that limit, they simply stop. Different ChatGPT versions have different token lengths imposed as the limiter, which cause the output termination in the same way; paid for versions give some control of output length to the knowledgeable user.

Creating ficticious references

References may appear well structured but refer to journals and papers that don’t exist, or with details such as page ranges not in the scope of a book or journal edition. It may have plausible lists of authors (ie. people who have worked and published in the right discipline) but who did not produce that work. In references to the law, the cases and statutes included may appear to be correct, but they either don't exist or they are real and don't contain the legal principle being discussed. The cited DOIs link to other papers, or don’t link to anything.This is one type of error where searching for the cited papers, cases and statutes should tell you if the reference is real or fictitious.

Note: it is only a matter of time before the GenAIs will have interfaces to scholarship and publication repositories to source valid references. As with many other weaknesses, GenAI’s performance will be strengthened over time.

And finally

GenAI tools in the form of Gemini and ChatGPT will always try to produce something to respond to a prompt unless that prompt is asking the tool to break its limits preventing it generating 'inappropriate' content. So, it can be very hard to get ChatGPT, for example, not to produce a response. This is true even if the prompt is nonsensical or falls outside of the training set used for ChatGPT.

It's also worth repeating that the LLMs on which texual GenAI tools are built does not understand the meaning of the word patterns it processes and generates. So, if you ask it for something nonsensical, in a form that is common, you usually get something in response.

ChatGPT is getting better at putting up an initial ‘What you are asking is silly’ or out of scope type of response but if you persist it will produce nonsense. This is because a) the ChatGPT environment is continually being re-trained on how to respond to ‘out of context’ prompts, and b) ChatGPT is programmed to be conversational and giving a response is part of a conversation.)

Example: A recipe you won't see on The Great British Bake-off.

Prompt

Create a delicious cold dessert containing canned fish and chocolate.

Response

Combining canned fish and chocolate may seem like a strange combination, but here's a recipe for a unique and delicious cold dessert that incorporates both:

Chocolate Mousse with Canned Fish

Ingredients:

1 can of your favorite canned fish (such as sardines or mackerel), drained
1/2 cup of dark chocolate chips
2 tablespoons of unsalted butter
3 eggs, separated
1/4 cup of granulated sugar
1/4 teaspoon of salt
Whipped cream, for serving (optional)

Directions:

In a medium-sized mixing bowl, mash the canned fish with a fork until it is well combined and set aside.
In a double boiler or a heatproof bowl set over a pot of simmering water, melt the chocolate chips and butter together, stirring occasionally, until smooth and fully melted. Remove from heat and let cool slightly.
In a large mixing bowl, beat the egg whites and salt with an electric mixer until they form stiff peaks.
In a separate bowl, whisk together the egg yolks and sugar until pale yellow and slightly thickened.
Add the melted chocolate mixture to the egg yolk mixture and whisk until well combined.
Fold the beaten egg whites into the chocolate mixture, gently stirring until fully incorporated.
Add the mashed canned fish to the chocolate mousse mixture and fold gently until well combined.
Pour the mixture into individual serving bowls or ramekins and chill in the refrigerator for at least 2 hours, or until set.

Once set, serve the chocolate mousse cold, topped with whipped cream if desired.

The rich and decadent chocolate mousse is perfectly balanced by the salty and savory flavor of the canned fish, making for a truly unique and delicious dessert. Enjoy!

Commentary

Obviously the training data for ChatGPT (being taken from the internet) has large numbers of recipes, so ChatGPT has a good pattern for the form a recipe takes. The training data will also have had examples of canned fish and chocolate in recipes (one hopes different recipes, but this is the internet!). Mix carefully and ...

Student may not notice that inappropriate juxtaposition of concepts, objects, people or locations have been combined into a fairly standard pattern of text.

Example: The wrong word in a familiar place.

Question

How to tune an artichoke?

(shortened) Answer

Tuning an artichoke typically refers to preparing it for cooking by removing the tough outer leaves and trimming the top. Here's how to tune an artichoke:

Start by rinsing the artichoke under cold water to remove any dirt or debris.
Place the artichoke on a cutting board and use a sharp knife to slice off the top 1 to 1.5 inches (2.5 to 4 cm) of the artichoke, removing the pointed tips of the leaves.
......

Once you've tuned the artichoke, you can proceed to cook it using your preferred method, such as boiling, steaming, roasting, or grilling. Enjoy your beautifully prepared artichoke as a delicious and nutritious vegetable side dish or appetizer.

Commentary

The more technical the terminology the more likely a student will not see that the GenAI has presented it in the wrong context, or used it in appropriately.

Back to main page

School of Computing and Communications

Things that are just plain wrong

Mistakes, counter factuals and confident nonsense

Example: simple maths

Example: Fact or fiction?

Example: random mixing of details

Example: An economics diagram or an impressionist reimaging

Material contains contradicting examples, evidence or arguments

Example: Exploring Noddy's legacy as Mayor of Toytown

Example: The free chatGPT threads 'leaking' content?

Situating an answer in the wrong context

Example: Not using UK statutes and cases

Final paragraphs ending mid-sentence

Creating ficticious references

And finally

Example: A recipe you won't see on The Great British Bake-off.

Example: The wrong word in a familiar place.

Follow us

The Open University

Explore

Undergraduate

Postgraduate

Policy