Using generative AI in relation to legal issues

About: This guidance note, produced by The Open University in collaboration with the law firm Mishcon de Reya LLP, gives a high-level overview of how generative artificial intelligence (GenAI) works, its risks, and how it may appropriately be used in relation to basic day-to-day legal issues. This note was produced on 27 September 2024 and accordingly reflects the legal and technological position of GenAI as of that date.

When used appropriately, GenAI can be a powerful tool to help complete tasks more efficiently and refine work outputs. Lawyers are exploring the capabilities of GenAI in their own work with regards to drafting, research and administrative tasks. There is also real potential for GenAI to help facilitate greater access to justice in the future – but, given the technology's very early stage of adoption and the need for information and outputs relating to legal needs to be highly accurate and contextualised, there are significant risks with respect to using publicly available GenAI tools in relation to legal queries.

This guidance does not constitute legal advice and you should seek legal advice on your specific circumstances where appropriate (we have signposted some resources below). To ensure the guidance is as accessible and jargon-free as possible, we have omitted or simplified some technical nuances and details with respect to how GenAI works.

What is GenAI?

Artificial intelligence (AI) and machine learning (ML) are sub-sets of the broader field of data science, which seeks to extract insights from a given set of data. GenAI is a form of AI that can generate, for example, text, images, video and other data in response to user queries, or 'prompts'. Large language models (LLMs) are a form of GenAI that have exploded in popularity and usage in both professional and personal contexts since the most well-known LLM-powered chatbot service, ChatGPT, was released by OpenAI in November 2022. Levels of awareness and adoption of LLMs amongst the general public has continued to scale month-onmonth, but many people are still coming to grips with how these technologies work and how to use them. Other GenAI tools you may have come across include Bing Chat, Microsoft Copilot or Claude. Some GenAI tools and models have been developed by private companies, with their own terms of service that govern their permitted use by end users. In some cases, such as with ChatGPT, access to more powerful models and advanced features may only be available by way of a paid subscription. Conversely, other models are 'open source' and made freely available for use, typically with far fewer restrictions on use than private, 'closed source', proprietary counterparts.

How do LLMs work?

LLMs use a complex set of algorithms to process natural language inputs - text - from a vast array of training data. Whilst the exact content of these datasets is not publicly known, it is generally understood that LLMs have been trained on huge swathes of content available on the internet (including, for example, public wikis such as Wikipedia).

Based on the recurrence of phrases in a particular order within the training data and the prominence of strings of text over others in differing contexts, LLMs can process a response to a user prompt by reference to what the LLM has already 'seen' in its training data. This allows the LLM to start generating an output one word at a time, to subsequently 'predict' the next word in the output's sequence, and so on, and ultimately to form an answer to a user prompt.

For example, if you ask an LLM to explain the offside rule in football, the LLM does not search an index of webpages in the same manner that an internet search engine might. Instead, it would output an explanation of the offside rule one word at a time based on the information it has reviewed within its training data. The training data would likely contain several explanations of the offside rule (and an overview of the rules of football more generally), which the model would have reviewed in order to establish a relationship between the most relevant words and phrases. For example, when attempting to generate the opening sentence of the output, an LLM would be more likely to complete the sentence "A player is in an offside position by reference to the opposing team's final…" with the word 'defender', rather than 'attacker'.

Critically, however, LLMs do not 'understand' the full contextual meaning of a given word or phrase in the same way that humans do. Instead, LLMs effectively act as a very sophisticated form of predictive text, conceptually not too dissimilar to the auto-fill/next-word-suggestion feature seen on many modern phones and messaging apps.

Diagram explaining how LLMs work by effectively 'guessing' the next word to generate in its output based on the prompt provided by the user and the data it has reviewed in the model training process. For example: Box showing heading 'Prompt:' with the wording: 'Complete the following sentence. I need to cut this paper, could you pass me (showing a blank area)?'. Second sentence: The preceding text (history) informs the next element. In a separate box at the end of the diagram, with the following: A model's 'temperature' affects the predictability of its outputs. Models with higher temperature parameters are more variable and creative, whereas those with lower temperatures are more likely to use predictable patterns of text generation. Table with 2-columns, left column table heading 'Next element' and right column table heading 'Probability'. Listed as the probability of the next word being either, scissors 20%, knife 15%, sharp 5%, bin 2%, next 1%, cat 0.0000001%.

What are the risks of using GenAI?

AI and ML have been in development for several years and are established fields of technology. By contrast, LLMs with the sophistication and capability to answer a broad remit of user inputs and queries in a 'chatbot' format like ChatGPT are relatively new and are fast developing. There are certain inherent risks and issues that you should be mindful of when using GenAI tools. These risks are particularly heightened in the context of using GenAI in the context of legal issues. In addition to their years of legal training, lawyers have access typically to a wide range of authoritative, maintained, subscription-based legal databases. The data in these authoritative databases will not have been used to train publicly available LLMs – instead, as noted above, tools like ChatGPT are trained on general content on the internet, which presents a certain number of risks.

Hallucinations (errors/inaccurate statements)

This is the greatest risk in using LLMs in relation to legal queries. LLMs are known to produce fabricated, inaccurate or misattributed outputs, also known as "hallucinations". These are outputs which may initially appear to be believable – particularly if the user does not have subject matter expertise in the relevant field - but are, in fact, highly inaccurate, omit relevant information, adopt a particular 'spin', or are even entirely fabricated.

Hallucinations and other types of inaccuracies can occur for a wide variety of reasons. For example, the training data that an LLM reviewed during its development phase could itself have contained inaccurate, outdated, fabricated or even parodic content that the LLM was unable to distinguish from accurate, factual content. Alternatively, a user's query (prompt) may be so complex, niche or hypothetical that the LLM was not exposed to relevant content with respect to the user's query in its development and it therefore extrapolated an output based on more commonplace, prevalent areas of knowledge. In addition, many LLMs have been developed by US companies and, as a result, their outputs can default to US-styles of both grammar and context (e.g. interpreting a legal query with respect to US law) subject to user inputs to the contrary.

In a legal context, there have been instances of an LLM hallucinating by inventing a (convincing) fictitious statute or court ruling to support the conclusion to a user's query (see also 'alignment', below). There have been cases in the US, UK and Australia where a party in court proceedings has used an LLM to identify cases to support their case, and a fake citation has been produced by the LLM. In one case, even though the LLM was asked to confirm that the citation was correct, it 'doubled down' on its (incorrect) hallucination.

Alignment

Most LLMs are designed to be as 'helpful' as possible with respect to user queries and to always make an attempt to assist with a user's prompt, unless it is for malicious, harmful or illegal purposes. Whilst this is a useful design functionality of most LLMs, at times this priority emphasis on assisting a user can cause an LLM to inherit any biased, inaccurate or incomplete statements made by a user as part of their query (or for it to provide an answer when there is in fact no answer to the question). For example, if a user includes inaccurate statements regarding what the law is on a particular issue as part of their prompt, there is a risk that the LLM will fail to 'correct' the user on this inaccuracy and instead provide an output based on an assumption that the user was correct.

Temperature

LLMs are comprised of a complex array of probabilities and algorithms. One of these metrics, 'temperature', controls the predictability of an LLM's output (see the above diagram). LLMs with a 'low' temperature tend to generate predictable, repetitive outputs with less variance. Conversely, LLMs with a 'high' temperature are more likely to produce creative, unpredictable outputs. As a result, it is highly likely that, when presented with the same set of queries by different users – or even the same user, across different usage sessions - an LLM may generate differing answers with varying levels of emphasis and/or accuracy with regards to specific components of a user's query. You should therefore be mindful that LLMs do not have a predetermined answer for common queries and can vary greatly in their responses. Even the smallest variations in how a user's prompt has been written can result in significant variances in output when compared to comparable prompts by other users.

Training data bias

Outputs from an LLM might contain discriminatory or biased content because of any inherent biases in the training data. This can be particularly problematic with respect to certain fields of law that are more concerned with 'protected characteristics', such as employment or family law.

In addition, the use of LLMs also poses several potential legal risks, which have the potential to arise when using them for general queries, i.e., including for non-legal queries. For example:

Data protection and confidentiality

There is a significant risk that sensitive information included within a user prompt, such as confidential business information, including client or customer data, and other personal data could be used to train and improve an LLM and be subsequently incorporated into future training data sets. As stated above, terms and conditions for each GenAI service can vary depending on the service provider. Some terms and conditions state that, by default, all content inputted by users and any outputs generated by the service may be used by the service provider to train and improve future models. From a technical perspective, there is likely to be little to no practical means of recovering any data that has been used in this way. You should avoid including sensitive information within your prompts and, where possible, review the settings of the service you are using to 'turn off' the ability for the service provider to use your content to train future models.

Intellectual property (IP)

There is a risk that the output of a generative AI system could infringe the IP rights of a third party. This may occur if the training data used to create the LLM contained third party materials that are protected by IP rights without that third party’s permission. It may also arise because the user included third party IP content in their prompts when using an LLM. You should not upload content or IP belonging to a third party when using an LLM without the IP owner's permission and should be mindful of how the content of your prompts may result in outputs containing copyrighted material (such as song lyrics, or a depiction of a well-known character).

It is currently unclear as to the extent to which the outputs that an LLM generates are capable of being protected by IP rights, including copyright. In the UK and the EU, a literary, dramatic, musical or artistic work must be original in that it is the “author’s own intellectual creation”. Even if IP protection is available, there may be some uncertainty as to who would own those rights. Some terms and conditions of publicly available LLMs state that their users will either own, or be granted a licence to, any intellectual property rights that may arise out of outputs that are generated (if, of course, such rights are available in the first place). They are also likely to include terms to the effect that you are responsible for outputs generated using the tool. You should review the service providers terms and conditions and terms of service for each GenAI-enabled service you use.

Finally, in contrast to a qualified lawyer who is regulated by a disciplinary body (such as the Solicitors Regulatory Authority in the UK), the service providers of all major LLM chatbot services are not, at the time of writing, licensed in any way to provide regulated legal services (and will not be covered by professional negligence insurance in the same way as regulated lawyers). It is therefore likely that a user will have more limited remedies and avenues for compensation in the event that they rely upon the outputs of an LLM and that output was deemed to be negligent or inaccurate.

How can I use LLMs to help with legal queries?

For the reasons given above, you should not rely upon LLMs to obtain legal advice, nor rely upon any actions recommended by an LLM to resolve a legal dispute or query, without first consulting with a qualified lawyer.

You should also avoid including any confidential, proprietary or otherwise sensitive information or data, or personal data, in your prompts when using publicly available LLMs.

LLMs can however have a useful role to play in relation to general legal queries and, as noted above, this is a fast-developing area that could have a positive impact in relation to access to justice. When using LLMs, there are a number of best practices:

LLMs will be most effective when assisting with basic, high-level queries that don't require extensive subject matter expertise. For legal queries, LLMs will generally be more appropriate when asked to, for example, explain a legal right in simplified terms with illustrative analogies, rather than being asked to draft and complete a legal form or contract in order to 'enforce' that legal right against another party.

You can also use LLMs to help you prepare for initial meetings and consultations with qualified legal advisors, or to help you develop a basic understanding of introductory concepts and legal jargon. For example, an LLM may be able to provide you with an overview of the likely questions a lawyer may ask during a consultation, or explain the process for being onboarded as a client by a law firm in more general terms.

You should always check the output of any LLM thoroughly for errors, mistakes and incomplete information. Where possible, you should aim to source and cross-reference any information outputted by an LLM against an external, trusted, verifiable source of information so that you can better assess the accuracy of an output. You should be particularly diligent if you are asking a GenAI system to assist with an area where you have no or limited knowledge.

Do not use LLMs to generate formal contracts or agreements that you use in the course of your business and intend to be legally binding with another party – it is highly likely that the output will not account for the context of your business/circumstances, may be inaccurate and incomplete or it may suggest clauses and provisions that are not commercially appropriate to your business' size, sector, structure and/or jurisdiction. Conversely, you could use an LLM to help you draft an initial letter or email of complaint to another party that sets out the basic facts of a dispute (without formally threatening or commencing legal action).

LLMs can be an excellent tool to help with accessibility and inclusion needs, for example, by offering realtime transcription or text-to-speech and speech-to-text and assistive technologies. However, as stated above, be mindful of the risk that any sensitive or confidential information inputted to an LLM may be used as part of a future training set for subsequent AI model improvements. We recommend that you seek the permission of the owner (or other interested/affiliated party) of any such confidential or sensitive information or, where possible, seek independent advice, before you upload such information to an LLM.

Downloadable document

Select this link to download the guidance document: Using generative AI in relation to legal issues - Word document (951.1 KB).

Further resources and reading

If you'd like to access further information and support for legal advice to help you resolve your query, we have provided a suggested list of resources:

If you’d like to access further information and guidance on GenAI, we have provided a suggested list of resources:

Glossary of key terms

AI: 'Artificial intelligence' is an umbrella term that refers to computer systems designed to perform tasks and processes typically characterised as requiring human intelligence or input, such as reasoning, translation, decision-making, visual perception and speech recognition.

GenAI: 'Generative artificial intelligence' refers to a subset of AI systems, designed to generate a variety of content (such as text or video) in response to user inputs based on the training data that they have reviewed.

IP: 'Intellectual property' refers to a range of legal rights – copyright, trade secrets, designs, patents and trademarks - that protect intangible property and assets, such as images and text.

LLM: 'Large Language Models' are a category of GenAI designed for natural language tasks such as translation and text generation.

ML: 'Machine learning' is a subset of AI that focuses on the development of systems which utilise data, algorithms and statistical models to 'learn' how to perform tasks without an explicit set of instructions.

Mishcon de reya logo

This note was made possible thanks to contributions from Nina O'Sullivan, Partner and Head Knowledge Lawyer (Innovation), and Harry Clark, Associate in the Video Games & Interactive Entertainment practice, of Mishcon de Reya LLP.

If you would like more information on the work of the Open Justice Centre, please contact Francine Ryan, Director of the Open Justice Centre at open-justice@open.ac.uk.