Large language models and generative AI Contents

Chapter 2: Future trends

9.This chapter sets out capabilities and future trends in large language models (LLMs). The purpose is to summarise how they work, distinguish hype from reality, and provide the groundwork for our subsequent assessments of opportunity, risk and regulation. We do not attempt to provide exhaustive technical detail.

What is a large language model?

Box 1: Key terms

Artificial intelligence (AI): there is no universally accepted definition, though AI is commonly used to describe machines or systems performing tasks that would ordinarily require human brainpower. Smartphones, computers, and many online services use AI tools.

Deep learning: a method used in developing AI systems which involves processing data in ways inspired by how the human brain works.

Foundation model: a type of AI which typically uses deep learning and is trained on large datasets. It is characterised in part by its ability to adapt to a wide range of tasks. Many use a deep learning model, known as a transformer, developed by Google in 2017.

Generative AI: Closely related to foundation models, generative AI is a type of AI capable of creating a range of outputs including text, images or media.

Large language model: a subset of foundation models focused on language (written text). Examples of LLMs include OpenAI’s GPT, Google’s PaLM 2 and Meta’s LLaMA.

Multi-modal model: a subset of foundation models which can handle more than one modality (for example images, video, code).

Frontier AI: a term used to describe the most powerful and cutting-edge general-purpose AI tools that match or exceed today’s most advanced capabilities.

Compute: we use this term to refer to the hardware, software and infrastructure resources required for advanced AI processes.

Hallucination: a term describing LLMs producing inaccurate responses, many of which can sound plausible.

Model cards: a short document used in AI to provide information about how a model works, how it was developed and how it should be used.

Source: Written evidence from the Alan Turing Institute (LLM0081), Alan Turing Institute, ‘Frequently asked questions’: https://www.turing.ac.uk/about-us/frequently-asked-questions [accessed 17 January 2024], House of Lords Library, ‘Artificial intelligence: Development, risks and regulation’ (18 July 2023): https://lordslibrary.parliament.uk/artificial-intelligence-development-risks-and-regulation/ [accessed 17 December 2023] and Amazon Web Services, ‘What is compute?’: https://aws.amazon.com/what-is/compute/ [accessed 20 December 2023]

10.Large language models are a type of general purpose AI. They are designed to learn relationships between pieces of data and predict sequences. This makes them excellent at generating natural language text, amongst many other things.10 LLMs are, at present, structurally designed around probability and plausibility, rather than around creating factually accurate assessments which correspond to the real world. This is partly responsible for the phenomenon of ‘hallucinations’ whereby the model generates plausible but inaccurate or invented answers.11

11.LLMs can nevertheless perform a surprisingly wide range of economically useful tasks. They can already power chatbots, translation services and information retrieval systems; speed up office tasks by auto-generating documents, code and marketing materials; and catalyse research by synthesising vast amounts of data, and reviewing papers to identify patterns and insights.12 OpenAI told us that LLMs will deliver “immense, tangible benefits to society”.13 Fundamentally new products remain nascent, though there is speculation that a highly capable autonomous personal assistant could emerge that can operate across a range of different services.14

Figure 1: Sample of LLM capabilities and example products

Infographic showing Large language models capabilites and products

Source: Alan Turing Institute, ‘Large Language Models and Intelligence Analysis’ (2023): https://cetas.turing.ac.uk/publications/large-language-models-and-intelligence-analysis [accessed 14 December 2023]

12.Developing an LLM is complex and costly. First, the underlying software must be designed and extensive data collected, often using automated bots to obtain text from websites (known as web crawling).15 The model is pre-trained using parameters (known as model weights) which are adjusted to teach the model how to arrive at answers.16

13.Further fine-tuning may be undertaken to improve model performance and its ability to handle more specialised tasks.17 The process for arriving at an answer is typically described as a ‘black box’ because it is not always possible to trace exactly how a model uses a particular input to generate particular outputs, though efforts are underway to improve insight into their workings.18

Figure 2: Building, releasing and using a large language model

Building, releasing and using a large language model.

Source: Competition and Markets Authority, AI Foundation Models Review (2023): https://assets.publishing.service.gov.uk/media/65045590dec5be000dc35f77/Short_Report_PDFA.pdf [accessed 14 December 2023]

14.Models may be released in a variety of open or closed formats. Those on the open end of the spectrum tend to make more of the underlying system code, architecture and training data available.19 The parameters may also be published, allowing others to fine-tune the model easily.20 Those on the closed end of the spectrum tend to publish less information about how it has been developed and the data used.21 The use of the term ‘open source’ model remains contested. We therefore use the term ‘open access’.22

Figure 3: The scale of open and closed model release

Infographic showing open and closed model release

Source: Irene Solaiman, The Gradient of Generative AI Release (February 2023): https://arxiv.org/pdf/2302.04844.pdf [accessed 14 December 2023]

15.The building blocks and distribution channels for LLMs are likely to vary considerably. Some large tech firms might own the entire process from development to distribution. Others are likely to have different businesses working on each part of the model development and deployment.23

Figure 4: Level of vertical integration in model development and deployment

Infographic showing development and deployment process

Source: Competition and Markets Authority, AI Foundation Models Review (2023): https://assets.publishing.service.gov.uk/media/65045590dec5be000dc35f77/Short_Report_PDFA.pdf [accessed 14 December 2023]

Trends

16.Models will get bigger and more capable. The amount of computing power used in training has expanded over the past decade by a factor of 55 million. Training data use has been growing at over 50 per cent per year.24 Ian Hogarth, Chair of the (then) Frontier AI Taskforce, anticipated up to six orders of magnitude increase in the amount of compute used for next-generation models in the next decade, yielding “breath-taking capabilities”.25

17.Costs will grow significantly. EPOCH, a research initiative, estimates the costs for developing state-of-the-art models could reach between $600 million and $3 billion over the next three years.26

18.Fine-tuned models will become increasingly capable and specialised. The Royal Academy of Engineering believed models trained on high quality curated datasets are likely to have “superior accuracy, consistency, usability and accountability” than general-purpose LLMs.27

19.Smaller models will offer attractive alternatives. These could deliver capable systems with much lower compute costs and data requirements. Some might even be run locally on a smartphone.28

20.Open access models will proliferate over the next three years. There is a clear trend towards ever greater numbers of open access models with increasingly sophisticated capabilities, driven in part by the growing ease and falling costs of development and customisation.29 They are unlikely to outclass cutting edge closed source models within the next three years if judged on a suite of benchmarks, but will offer attractive options for those who do not require cutting edge capabilities.30 Consumer trust is likely to be a factor affecting uptake.

21.Integration with other systems will grow. Models are likely to gain more widespread access to the internet in real time, which may improve the accuracy and relevance of their outputs.31 Better ways of linking LLMs both with other tools that augment their capacities (for example calculators), and with other real-world systems (for example email, web search, or internal business processes) are also expected.32 The availability of existing infrastructure suggests this will occur faster than in previous waves of innovation.33

22.The timeline and engineering pathway to widespread integration of LLMs in high-stakes areas remains uncertain. LLMs continue to hallucinate, exhibit bias, regurgitate private data, struggle with multi-step tasks, and pose difficulties for interpreting black-box processes.34 In light of these issues it is unclear how quickly LLMs should be integrated into high-stakes applications (for example in critical national infrastructure). Improvements to bias detection, memory, complex task execution, error correction and interpretability are major areas of research and some improvements within three years are highly likely.35

23.There is a realistic possibility of integration with systems capable of kinetic movement. There is some evidence of progress already, though sci-fi scenarios of a robot apocalypse remain implausible.36

24.There is a realistic possibility of unexpected game-changing capability leaps in solving real-world problems. These remain difficult to forecast as there is not a predictable relationship between improvements to inputs and problem-solving capabilities.37

25.Some automation of model development may occur. This would involve using AI to build AI. Such progress might speed up some aspects of model development significantly, though at the cost of fewer humans involved in the process.38

26.High quality data will be increasingly sought after. EPOCH expects developers to exhaust publicly available high-quality data sources such as books, news, scientific articles and open source repositories within three years, and turn to lower quality sources or more innovative techniques.39 Professor Zoubin Ghahramani, Vice President of Research at Google DeepMind, said there was ongoing research into using machine-generated synthetic data, but thought this could also lead to a degraded information environment,40 or model malfunction.41

27.The level of market competition remains uncertain. A multi-billion pound race to dominate the market is underway. Many leading AI labs emerged outside big tech firms, though there has been subsequent evidence of trends towards consolidation.42 It is plausible that a small number of the largest cutting-edge models will be used to power an extensive number of smaller models, mirroring the existing concentration of power in other areas of the digital economy.43

28.Large language models (LLMs) will have impacts comparable to the invention of the internet. The UK must prepare for a period of heightened technological turbulence as it seeks to take advantage of the opportunities.


10 Written evidence from Dr P Angelov et al (LLM0032), Alan Turing Institute (LLM0081) and Google and Google DeepMind (LLM0095)

11 Q 97 (Jonas Andrulis)

12 Q 15 (Dr Zoë Webster), written evidence from the Market Research Society (LLM0088), MIT Technology Review, ‘Large language models may speed drug discovery’ (22 August 2023): https://www.technologyreview.com/2023/08/22/1076802/large-language-models-may-speed-drug-discovery/ [accessed 28 November 2023]

13 Written evidence from OpenAI (LLM0113)

14 Competition and Markets Authority, AI Foundation Models Review (2023): https://assets.publishing.service.gov.uk/media/65045590dec5be000dc35f77/Short_Report_PDFA.pdf [accessed 14 December 2023]

15 Web crawlers search and index content online for search engines.

16 Written evidence from Dr P Angelov et al (LLM0032) and Microsoft (LLM0087)

17 Q 75 (Rob Sherman) and written evidence from Dr P Angelov et al (LLM0032)

18 Written evidence from Sense about Science (LLM0046)

19 Written evidence from OpenUK (LLM0115)

20 Written evidence from Hugging Face (LLM0019)

21 Written evidence from Google and Google DeepMind (LLM0095), Microsoft (LLM0087) and OpenUK (LLM0115)

22 Our use of the term ‘open access’ is in line with definitions provided by the Oxford Internet Institute (LLM0074)

23 Competition and Markets Authority, ‘AI Foundation Models: initial review’ (2023): https://www.gov.uk/cma-cases/ai-foundation-models-initial-review [accessed 20 December 2023]

24 DSIT, Capabilities and risks from frontier AI (October 2023), p 11: https://assets.publishing.service.gov.uk/media/65395abae6c968000daa9b25/frontier-ai-capabilities-risks-report.pdf [accessed 17 December 2023]. Computing power is typically measured in floating-point operations per second (FLOPs).

26 Written evidence from EPOCH (LLM002). Note that further infrastructure costs could be substantial.

27 Written evidence from the Royal Academy of Engineering (LLM0063)

28 Written evidence from the Royal Statistical Society (LLM0055), Royal Academy of Engineering (LLM0063) and TechTarget, ‘Small language models emerge for domain-specific use cases’ (August 2023): https://www.techtarget.com/searchbusinessanalytics/news/366546440/Small-language-models-emerge-for-domain-specific-use-cases [accessed 20 December 2023]

29 See for example written evidence from Market Research Society (LLM0088), Edward J. Hu et al, ‘LoRA: ‘Llow-Rank Adaptation of Large Language Models’ (June 2021): https://arxiv.org/abs/2106.09685 [accessed 20 December 2023] and IEEE Spectrum, ‘When AI’s Large Language Models Shrink’ (March 2023): https://spectrum.ieee.org/large-language-models-size [accessed 20 December 2023].

30 Written evidence from the Royal Academy of Engineering (LLM0063), Stability AI (LLM0078), TechTarget, ‘Small language models emerge for domain-specific use cases’ (August 2023): https://www.techtarget.com/searchbusinessanalytics/news/366546440/Small-language-models-emerge-for-domain-specific-use-cases [accessed 20 December 2023] and IEEE Spectrum, ‘When AI’s Large Language Models Shrink’ (March 2023): https://spectrum.ieee.org/large-language-models-size) [accessed 20 December 2023]

31 See for example OpenAI, ‘ChatGPT Plugins’ (March 2023): https://openai.com/blog/chatgpt-plugins [accessed 28 November 2023] and TechCrunch, ‘You.com launches new apis to connect LLMs to the web’ (November 2023): https://techcrunch.com/2023/11/14/you-com-launches-new-apis-to-connect-llms-to-the-web/ [accessed 28 November 2023].

32 Q 98 (Jonas Andrulis), written evidence from the Royal Statistical Society (LLM0055), Dr P Angelov et al (LLM0032), Alan Turing Institute (LLM0081), Google and Google DeepMind (LLM0095) and DSIT, Capabilities and risks from frontier AI (October 2023): https://assets.publishing.service.gov.uk/media/65395abae6c968000daa9b25/frontier-ai-capabilities-risks-report.pdf [accessed 17 December 2023]

33 Written evidence from the Bright Initiative (LLM0033)

34 Written evidence from Oxford Internet Institute (LLM0074), Royal Statistical Society (LLM0055), Royal Academy of Engineering (LLM0063), Microsoft (LLM0087), Google and Google DeepMind (LLM0095), NCC Group (LLM0014)

35 Written evidence from the Alan Turing Institute (LLM0081), Google and Google DeepMind (LLM0095), Professor Ali Hessami et al (LLM0075). See also research interest in related areas, for example Jean Kaddour et al, ‘Challenges and Applications of Large Language Models’ (July 2023): https://arxiv.org/abs/2304.05332 [accessed 20 December 2023], Noah Shinn et al, ‘Reflexion: Language Agents with Verbal Reinforcement Learning’ (March 2023): https://arxiv.org/abs/2303.11366 [accessed 8 January 2024] and William Saunders et al, ‘Self-critiquing models for assisting human evaluators’ (June 2022): https://arxiv.org/abs/2206.05802 [accessed 8 January 2024].

36 Jean Kaddour et al, ‘Challenges and Applications of Large Language Models’ (July 2023): https://arxiv.org/abs/2304.05332 [accessed 20 December 2023]

37 Government Office for Science, Future risks of frontier AI (October 2023): https://assets.publishing.service.gov.uk/media/653bc393d10f3500139a6ac5/future-risks-of-frontier-ai-annex-a.pdf [accessed 25 January 2024]. See also AI Alignment Forum, ‘What a compute-centric framework says about AI takeoff speeds’ (January 2023): https://www.alignmentforum.org/posts/Gc9FGtdXhK9sCSEYu/what-a-compute-centric-framework-says-about-ai-takeoff [accessed 20 December 2023] and Lukas Finnveden, ‘PaLM-2 & GPT-4 in “Extrapolating GPT-N performance”’ (May 2023): https://www.alignmentforum.org/posts/75o8oja43LXGAqbAR/palm-2-and-gpt-4-in-extrapolating-gpt-n-performance [accessed 8 January 2024].

38 Daniil A Boiko et al, ‘Emergent autonomous scientific research capabilities of large language models’ (2023): https://arxiv.org/ftp/arxiv/papers/2304/2304.05332.pdf [accessed 21 December 2023], Drexler, ‘Reframing superintelligence’ (2019): https://www.fhi.ox.ac.uk/reframing/ [accessed 21 December 2023] and Tom Davidson, ‘Continuous doesn’t mean slow’ (April 2023): https://www.planned-obsolescence.org/continuous-doesnt-mean-slow/ [accessed 25 January 2024]

39 Written evidence from EPOCH (LLM002)

41 Ilia Shumailov et al, ‘The curse of recursion’ (May 2023): https://arxiv.org/abs/2305.17493 [accessed 21 December 2023]

42 Open Markets Institute, ‘AI in the public interest’ (15 November 2023): https://www.openmarketsinstitute.org/publications/report-ai-in-the-public-interest-confronting-the-monopoly-threat [accessed 21 December 2023]

43 Competition and Markets Authority, AI Foundation Models Review




© Parliamentary copyright 2024