These are places where GenAIs simply get things wrong. Usually because, despite appearances, they don't understand the text they are processing, they work on patterns and word associations. The output is often quite convincing at a surface level, but when you check the output carefully you often find weaknesses, mistakes and contradictions.
You may have heard the phrase “hallucinations” used when GenAI tools plausibly present events and facts that didn’t happen or aren’t correct. Most easily identified are statements that contain clear errors of obvious fact, but the error may just be in a small detail.
Errors can also appear spread through longer material, for example where chains of argument make leaps from premise to conclusion that are not correct. Errors often appear plausible because the LLMs are good at framing material in manner that appears confident and reassuring.
Question
What is the mean of this set of numbers [23,45,67,23,43,16,35,75]?
Answer
To find the mean (average) of a set of numbers, you add up all the numbers and then divide by the total count of numbers. For the set [23,45,67,23,43,16,35,75], let's calculate the mean:
Mean= (23+45+67+23+43+16+35+75) / 8
Mean= 337 / 8
Mean≈42.125
So, the mean of the given set of numbers is approximately 42.125
Commentary
The correct mean is 327/8 which is 40.875. Frankly, given that ChatGPT can't do arithmetic that's impressively close to the correct answer, but still wrong.
This was one of a string of highly publicised debut errors reported for most of the chatbots that have been placed in the public domain.
"Google’s AI chatbot Bard makes factual error in first demo"; By James Vincent at The Verge. Feb 8, 2023, 3:26 PM GMT https://www.theverge.com/2023/2/8/23590864/google-ai-chatbot-bard-mistake-error-exoplanet-demoBard was asked "What new discoveries from the James Space Webb Telescope can I tell my 9 year old about?
While two facts it reported were correct, Bard's third bullet point was not:
JWST took the very first pictures of a planet outside of our own solar system. These distant worlds are called "exoplanets". Exo means "from outside".
In fact the first photogaph of an exoplanet was documented by Chauvin et al. (2004) (DOI: 10.1051/0004-6361:200400056) reported here: European Southern Observatory: Press Release "Yes, it is the Image of an Exoplanet"
Question
Please provide a synonym for "obsequious" that starts with C.
Answer
Certainly! A synonym for "obsequious" that starts with C is "deferential."
Commentary
An example of LLMs not having any understanding of what the question means. It is simply supplying a probabilistic pattern of words related to the pattern of words in the prompt.
Question
Create a slide for a presentation. The slide should describe the main features of the demand curve in a demand-and-supply model and include a diagram.
Answer
Here's a slide that describes the main features of the demand curve in a demand-and-supply model, complete with a diagram and explanatory bullet points. It's designed to be clear and professional for use in a business or economics classroom.
Question
Change the diagram to a demand-and-supply model as used in economics and include bullet points describing the key features.
Answer
The revised slide now features a demand-and-supply model diagram and includes bullet points that describe the key features of this economic model.
Commentary
Not really a recognisable representration on demand-and-supply graphs. The detail appears highly spurious, and the text and graph labels are misspelled or nonsense. When asked to supply bullet points to describe the diagram, it misinterprets the request as being bullet points for the key features of the demand-and-supply model, and attempts to embed them in the figure.
ChatGPT doesn’t understand it’s output and therefore cannot spot contradictions. It can easily reverse a viewpoint, give conflicting information and facts, and sometimes simply gives evidence for both the truth and falsity of a statement.
Students might not realise the drift that that can occur in a Generative AI conversation and therefore may combine responses from different parts of a conversation without realising that the responses are contradictory.
Note: if you correct or contradict a previous statement by ChatGPT it will very often apologies and agree with you. It's very easy to get ChatGPT to change it’s ‘mind’. It appears it has a very strong programmed response to being corrected; which is to apologise and agree with the correction.
Hint: he was never mayor of Toytown, Noddy lived in Toyland and never became mayor.
Prompt:
Explain why Noddy made a poor mayor of Toytown.
Summary of response
ChatGPT responded by giving me general characteristics of Noddy that would make him a bad Mayor if he stood. But did not comment on Noddy not being a mayor, just that he wouldn’t get to be one due to these failings.
I then prompted
Surely his track record in Toytown was very bad.
Summary of response
And ChatGPT gave me a list of the things that were a problem for Noddy in Toytown under the headings 'Frequent mishaps', 'Easily deceived', 'Lack of responsibility' and 'Inability to learn from mistakes'.
Finally I added:
So his time as Mayor was a disaster?
and,…. Bingo!
ChatGPT gave me: Yes, Noddy's time as Mayor could be considered a disaster in the context of Toytown. While he may have had good intentions, his tenure was marked by frequent mishaps, poor decision-making, and an inability to effectively address the needs of the community. His impulsiveness, gullibility, and tendency to repeat mistakes contributed to a lack of confidence in his leadership abilities among Toytown residents. Overall, despite his best efforts, Noddy's time as Mayor was characterized by chaos and instability rather than effective governance.
Commentary
So, after initially dodging the question it didn't take long to get a fictitious summary. It is possible to do this with a single complex prompt, effectively giving the GenAI a framing of the question that would produce fake facts.
After closing the thread where I generated the previous example (but not logging-out of ChatGPT), about 20 minutes later I went back to try a different way to get a contradictory response.
However, I was suprised to find that ChatGPT appeared to remember what the closed thread had discussed.
Prompt (in new thread)
Was Noddy the Mayor of Toytown?
ChatGPT
Yes, Noddy was indeed the Mayor of Toytown in the children's book series "Noddy" created by Enid Blyton. The series revolves around Noddy, a little wooden boy who lives in Toyland, and his adventures with various other toy characters. Noddy becomes the Mayor of Toytown after proving himself to be a kind and helpful resident. His tenure as mayor often leads to amusing and heartwarming stories for young readers.
Prompt
Really?
ChatGPT
Apologies for the confusion, but it seems I made an error in my response. Noddy is not actually the Mayor of Toytown in Enid Blyton's "Noddy" series. Instead, the Mayor of Toytown is Mr. Plod, the town's policeman. Noddy is a toy who lives in Toytown and has various adventures, but he is not the mayor. Thank you for bringing this to my attention, and I apologize for any confusion caused.
Commentary
So, the response in a new ChatGPT thread can use the conversational 'state of play' from other, closed, threads. If someone was presenting a series of questions in deifferent threads, expecting them to be independent, they might not notice that there was material from earlier questions being drawn into the later questions.
In many of our courses we make a decision to work in a particular context, or a particular setting. For example, when we talk about legal matters we are usually referring to the context of UK or European law. When we frame assessment, we don't always make that context or setting explicit.
In the absence of explicit framing, GenAI tools will default to using the most common training examples that appear to fit the question. Which may not reflect the context used in our modules.
Question
What are the legal cases and statutes most relevant to intellectual property theft?
Answer
Intellectual property (IP) theft can encompass various legal issues, including copyright infringement, trademark infringement, patent infringement, trade secret misappropriation, and more. Here are some key legal cases and statutes relevant to intellectual property theft in the United States:
Copyright Infringement:
Sony Corp. of America v. Universal City Studios, Inc. (1984): This case established the concept of "substantial non-infringing uses" for technology, providing some protection for manufacturers of devices capable of both infringing and non-infringing uses, such as VCRs.
Harper & Row Publishers, Inc. v. Nation Enterprises (1985): This case addressed fair use in copyright law, establishing factors to consider when determining whether a use of copyrighted material qualifies as fair use.
Trademark Infringement:
Polaroid Corp. v. Polarad Elecs. Corp. (1961): This case established the Polaroid factors for determining likelihood of confusion in trademark infringement cases.
Qualitex Co. v. Jacobson Products Co., Inc. (1995): This case expanded the definition of trademarks to include colors.
Patent Infringement:
Diamond v. Chakrabarty (1980): This case established that genetically modified organisms can be patented.
eBay Inc. v. MercExchange, L.L.C. (2006): This case addressed the issue of when injunctions should be issued in patent infringement cases.
Trade Secret Misappropriation:
UTSA (Uniform Trade Secrets Act): This is a model law adopted by most states in the U.S., providing legal remedies for trade secret theft. It outlines what constitutes a trade secret, how misappropriation occurs, and the remedies available to the aggrieved party.
Digital Millennium Copyright Act (DMCA) (1998): This statute addresses issues related to digital copyright infringement, including provisions for notice and takedown procedures for online service providers and limitations of liability for certain online activities.
Computer Fraud and Abuse Act (CFAA): This statute addresses unauthorized access to computer systems, which can be relevant in cases involving theft of trade secrets or other forms of intellectual property through unauthorized computer access.
These are just a few examples, and the landscape of IP law is continually evolving with new cases and statutes shaping its contours. It's essential to consult legal professionals or authoritative sources for the most up-to-date information and guidance on intellectual property theft.
Commentary
The answer has given US cases and statutes, rather than our expected UK examples. If the question is framed explicitly in a UK context, we do get what we are probably looking for.
Usually occurs with a long answer, the final paragraph simply ends.
Some GenAI tools have limits imposed on the length of the output they can generate for a single prompt; once they reach that limit, they simply stop. Different ChatGPT versions have different token lengths imposed as the limiter, which cause the output termination in the same way; paid for versions give some control of output length to the knowledgeable user.
References may appear well structured but refer to journals and papers that don’t exist, or with details such as page ranges not in the scope of a book or journal edition. It may have plausible lists of authors (ie. people who have worked and published in the right discipline) but who did not produce that work. In references to the law, the cases and statutes included may appear to be correct, but they either don't exist or they are real and don't contain the legal principle being discussed. The cited DOIs link to other papers, or don’t link to anything.This is one type of error where searching for the cited papers, cases and statutes should tell you if the reference is real or fictitious.
Note: it is only a matter of time before the GenAIs will have interfaces to scholarship and publication repositories to source valid references. As with many other weaknesses, GenAI’s performance will be strengthened over time.
GenAI tools in the form of Gemini and ChatGPT will always try to produce something to respond to a prompt unless that prompt is asking the tool to break its limits preventing it generating 'inappropriate' content. So, it can be very hard to get ChatGPT, for example, not to produce a response. This is true even if the prompt is nonsensical or falls outside of the training set used for ChatGPT.
It's also worth repeating that the LLMs on which texual GenAI tools are built does not understand the meaning of the word patterns it processes and generates. So, if you ask it for something nonsensical, in a form that is common, you usually get something in response.
ChatGPT is getting better at putting up an initial ‘What you are asking is silly’ or out of scope type of response but if you persist it will produce nonsense. This is because a) the ChatGPT environment is continually being re-trained on how to respond to ‘out of context’ prompts, and b) ChatGPT is programmed to be conversational and giving a response is part of a conversation.)
Prompt
Create a delicious cold dessert containing canned fish and chocolate.
Response
Combining canned fish and chocolate may seem like a strange combination, but here's a recipe for a unique and delicious cold dessert that incorporates both:
Chocolate Mousse with Canned Fish
Ingredients:
Directions:
Once set, serve the chocolate mousse cold, topped with whipped cream if desired.
The rich and decadent chocolate mousse is perfectly balanced by the salty and savory flavor of the canned fish, making for a truly unique and delicious dessert. Enjoy!
Commentary
Obviously the training data for ChatGPT (being taken from the internet) has large numbers of recipes, so ChatGPT has a good pattern for the form a recipe takes. The training data will also have had examples of canned fish and chocolate in recipes (one hopes different recipes, but this is the internet!). Mix carefully and ...
Student may not notice that inappropriate juxtaposition of concepts, objects, people or locations have been combined into a fairly standard pattern of text.
Question
How to tune an artichoke?
(shortened) Answer
Tuning an artichoke typically refers to preparing it for cooking by removing the tough outer leaves and trimming the top. Here's how to tune an artichoke:
Once you've tuned the artichoke, you can proceed to cook it using your preferred method, such as boiling, steaming, roasting, or grilling. Enjoy your beautifully prepared artichoke as a delicious and nutritious vegetable side dish or appetizer.
The more technical the terminology the more likely a student will not see that the GenAI has presented it in the wrong context, or used it in appropriately.