Algorithms in decision-making Contents

2Applications and bias

Data sharing

12.A foundation for machine learning algorithms is the data on which they are built—both their initial ‘training data’ (paragraph 35) and the continuing feedback data which allow some algorithms to interpret and adjust to changing scenarios. ‘Big data’—drawing disparate datasets together to provide new insights—requires data to flow across organisational boundaries. Our predecessor Committee’s report on Big Data expounded the “enormous benefits in prospect for the economy and for people’s lives” from making public data ‘open’.37 In our current inquiry we examined the way data sharing is affecting three sectors in particular—in healthcare, criminal justice and social media.

In the health sector

13.In the context of healthcare, the Academy of Medical Sciences highlighted that “machine learning algorithms are more precise and sensitive when learning from a large, high-quality set of training data.”38 According to Dame Fiona Caldicott, the National Data Guardian, “new technologies and ways of sharing data mean that we can now gain huge benefit from the sharing of health and care data”.39 Algorithms are assisting earlier and more accurate diagnosis, supporting preventative medicine, and guiding complex treatment decisions.40 In our recent report on Genomics, we saw the potential of genomic data, when linked with other patient-related data, to find patterns for diagnosing rare diseases and ‘personalising’ medicine.41 Microsoft’s ‘Seeing AI’ application, they told us, “enables people who are visually impaired to use a mobile app that allows them to see and hear a description of what is around them”.42 AI is being used as a ‘risk assessment tool’ in the field of cancer.43 In pharmacology, it is assisting in clinical trial interpretations and simulations.44 In epidemiology, it is being “applied to public health data to detect and track infectious disease outbreak, […] enhance medical monitoring, and to optimise demand management and resource allocation in healthcare systems”.45 The recent controversy about a “computer algorithm failure” in the NHS breast screening programme shows both the benefits and the risks of some algorithms—the system allowed an enormous number of women to be automatically invited for screening at the appropriate time, but a “coding error” also meant that women aged between 68 and 71 were missed.46

14.Digitalisation is a key part of the NHS’s strategy to use data and algorithms to improve patient care. At present, the think-tank Reform noted, “the healthcare system is still heavily reliant on paper files and most of its IT systems are not based on open-standards”.47 In 2017, Nuance Communications, a technology firm, calculated that 43% of NHS trusts were investing in some form of artificial intelligence.48 Polygeia, a think-tank, worried that variability in NHS digitisation will mean that some trusts lag behind others in terms of improved healthcare access.49 Reform believed that without digitalisation, adoption of machine learning in the NHS will be “sparse”.50

15.The pace of digitisation in the NHS is slipping behind schedule. The National Information Board envisaged in 2015 that by 2020 “all patient and care records will be digital, real time and interoperable”.51 The Wachter review concluded in 2016, however, that the “journey to integrated paperless records” by 2020 was unrealistic and should be pushed back to 2023.52 The National Advisory Group on Health Information Technology was “very concerned that an aggressive push to digitalise the entire secondary care sector by 2020 was more likely to fail than succeed”.53 The most recent annual survey by Digital Health Intelligence found falling confidence by NHS IT leaders in being able to achieve the 2020 target for integrated digital health and care records.54

16.The National Data Guardian, Dame Fiona Caldicott, highlighted in her 2017 report on Genomics the urgency needed in developing “consensus on the legitimacy of data sharing in order to deliver high-quality, safe and effective genetics/genomics diagnostic services”.55 Professor Harry Hemingway of the Farr Institute of Health Informatics Research emphasised that the costs of not sharing data could be “severe”.56 The revised ‘Caldicott principles’,57 published in 2013,58 emphasised that “the duty to share information can be as important as the duty to protect patient confidentiality”.59 Reform pointed out, however, that people are generally reticent to share their data because they “do not always understand what happens to their data”.60 Following the termination of the ‘care.data’ patient data-sharing initiative in 2016, because of its low acceptance by patients and doctors, the National Data Guardian stipulated more stringent consent agreements and opt-outs for patients.61 She observed recently that “the most praiseworthy attempts at innovation falter if they lose public trust”.62 We explore issues around consent further in Chapter 4.

17.The current lack of digital NHS data is slowing the development of AI algorithms. Dr Dominic King, a research scientist at DeepMind Health, a company owned by Google,63 told us that because of uncleaned, unrepresentative and disconnected NHS datasets, it took many months to produce data in “machine readable, AI-ready format for research”, before DeepMind’s work was able to start on an algorithm to diagnose kidney disease. Dr King wanted to see “better education and investment in what it takes to get these datasets ready, so that they can be made available to a wide group of people”.64 The differing data codes used across NHS Trusts were seen as one hindrance to the rapid processing of data. Eleonora Harwich of Reform thought that “the standardisation of clinical codes, which are going to be replaced by a standard system” would be “a positive step forward”.65

In the criminal justice system

18.In the criminal justice system, algorithms are being used by some police forces for facial image recognition. Big Brother Watch have raised concerns about this, including about the reliability of the technology and its potential racial bias66 (paragraph 35). The Home Office told us in our separate inquiry on biometrics that the algorithm in these systems matched video images against a ‘watch list’ of wanted people, but also that police operators have to confirm the match indicated by the algorithm and “people are not arrested solely on the basis of matches made by facial recognition software”.67

19.AI and algorithms are also being used to detect “crime hotspots”68 and find those areas most susceptible to crime.69 Kent Constabulary have been using a commercial ‘PredPol’ algorithm since 2013; “a predictive policing tool” to identify areas “where offences are likely to take place” using data on past crime patterns”.70 RUSI highlighted that a similar algorithm developed in-house by Greater Manchester Police in 2012 had been “shown to be effective at reducing burglary”.71 The UCL Jill Dando Institute of Security & Crime Science emphasised that “knowing when and where a problem is most likely is only one part of the puzzle—knowing what to do is another”.72

20.We heard in our inquiry about how Durham Constabulary is also using algorithms to “assist decision making relating to whether a suspect could be eligible for a deferred prosecution”73 (Box 2), as well as their wider and more controversial use in the US for decisions on bail, parole and sentencing (paragraph 38).74 Durham Constabulary believed that AI’s ability to assess risk from past behaviours is being used to get “consistency in decision making” about targeted interventions for offenders.75 HM Inspectorate of Constabulary concluded in 2017 that the wider use of the technology used at Durham could “improve effectiveness, release officer capacity, and is likely to be cost effective”.76

Box 2: Durham Constabulary’s use of algorithms

The Harm Assessment Risk Tool (HART), designed as a result of a collaboration between Durham Constabulary and Dr Barnes of University of Cambridge, is a decision support system used to assist officers in deciding whether a suspect is eligible for deferred prosecution based on the future risk of offending.

Taking 34 different predictors—information on past criminal behaviour, age, gender and postcode—the model was ‘trained’ on over 100,000 custody events over a five-year period. The algorithm uses all these data to make predictions on the level of risk of reoffending

Source: Sheena Urwin, Durham Constabulary

21.The HART algorithm being piloted and evaluated by Durham Constabulary does not utilise data from other police force areas, nor indeed from national IT systems.77 The Royal United Services Institute’s ‘Big Data and Policing’ review in 2017 concluded that “because the system was only using Durham Police’s data, offences committed in other areas would not be considered, and dangerous criminals might not be identified”.78 HM Inspectorate of Constabulary found that “most forces have not yet explored fully the use of new and emerging techniques and analysis to direct operational activity at a local level”.79 Marion Oswald, Director of the Centre for Information Rights, and Sheena Urwin of Durham Constabulary, noted that only 14% of UK police forces were using algorithmic data analysis or decision-making for intelligence work.80 Professor Louise Amoore questioned whether there is a “place for inference or correlation in the criminal justice system”81 since, unlike normal evidence, it cannot be cross-examined or questioned.82 Jamie Grace from Sheffield Hallam University accepted its use but wanted “a single [independent] oversight body and regulator for the use of police databases and algorithmic analysis in criminal justice”.83

In the web and social media sector

22.On the web and social media platforms, algorithms allow faster searches. Adverts and news can be more effectively targeted. The recent controversy over the use of algorithms by Cambridge Analytica to use Facebook users’ data to help target political campaigning shows the risks associated with such applications, exacerbated in that particular case by the absence of consent for use of personal data (paragraph 83). A report from the Upturn and Omidyar Network found that people have also been adversely affected where uncompetitive practices, such as distorted filtering in search engines through “algorithmic collusion”, and “automatic price fixing”, have been built into algorithms.84 In 2017 the European Commission fined Google for manipulating its algorithms to demote “rival comparison shopping services in its search results” and giving “prominent placement to its own comparison shopping service”.85 The Royal Statistical Society called for the Competition and Markets Authority “to consider the potential anti-competitive effects arising from the independent use of pricing algorithms”.86

23.The major social media platforms have cemented strong market positions by providing algorithm-based services founded on vast datasets under their control. Professor Ashiq Anjum from the University of Derby explained that “smaller organisations cannot get the same benefit because they do not have access to the same wealth of data and lack the resources to invest in the technology”.87 This was also a concern of the House of Lords’ Committee on AI.88 The consolidation of platforms flows in part from their acquisition of other social media businesses, not just to acquire more customers but also to combine datasets from different but complementary applications—combining search engine, photo-sharing and messaging services—and opening up new opportunities for more sophisticated algorithms for targeting adverts and news.89 Following Facebook’s 2014 acquisition of WhatsApp, the European Commission established that Facebook were able to match Facebook users’ accounts and WhatsApp users’ accounts.90 Such synergies from merging the datasets of two companies can be a key motivation for acquisitions.91

Government data sharing and getting value from its data

24.Our predecessor Committee’s 2016 report on Big Data welcomed the progress on making government datasets ‘open’ to data analytics businesses and acknowledged the “vital role” played by the Government’s Digital Catapult in facilitating private sector data sharing.92 The Committee recommended that the Government produce a framework “for pro-actively identifying data sharing opportunities to break department silos”93 and a map to set out “how the Catapult’s work and its own plans to open and share Government data could be dovetailed”.94

25.The Government-commissioned ‘AI Review’ in 2017 concluded that “Government and industry should deliver a programme to develop data trusts”, where data-holders and data-users can share data in a “fair, safe and equitable way”.95 The 2017 Autumn Budget subsequently announced a £75m investment “to take forward key recommendations of the independent review on AI, including exploratory work to facilitate data access through ‘data trusts’.”96 The Industrial Strategy White Paper suggested that the remit for the planned Centre for Data Ethics & Innovation “will include engaging with industry to explore establishing data trusts to facilitate easy and secure sharing of data”.97 In our current inquiry, the Government explained that:

The idea behind ‘data trusts’ is that they facilitate sharing between multiple organisations, but do so in a way that ensures that the proper privacy protections and other relevant protections are in place, that there is a governance of the data, which ensures that the voices of interested parties are represented in that governance, and that there is a fair sharing of the value that can be derived from those data. That was a recommendation in the autumn, and we are beginning work now to develop that further, with an aim of piloting data trusts in future.98

26.Central Government’s use of AI is growing. The Government’s written evidence to the Lords Committee on AI highlights the use of machine learning within HMRC “as part of a goal to automate 10 million processes by the end of 2018”.99 While the Government’s submission reveals where AI is used, it is more opaque about the specific ways in which it is deployed to deliver public services. Opportunities are also being explored within the Royal Navy, the MOD, and the Cabinet Office,100 although again it was less clear how. In the 2017 Government Transformation Strategy, “making better use of data to improve decision-making, by building and expanding data science and analytical capability across government” was set as a priority.101 Our Government witnesses told us that holistic oversight of public sector algorithms, including the “human rights issues”, did not fall under a single department.102 Last year’s Autumn Budget pledged the Government to create the “GovTech Catalyst, a small central unit based in the Government Digital Service that will give businesses and innovators a clear access point to government”.103 April’s AI Sector Deal allocates £20 million to a GovTech Fund to provide “innovative solutions for more efficient public services and stimulate the UK’s growing GovTech sector”.104

27.Hetan Shah from the Royal Statistical Society told us, however, that the public sector was not taking full advantage of the “extraordinary value” of the vast amount of data it already shares with private sector algorithm developers.105 Data and algorithms are inextricably linked106 and “algorithms are valueless without data”.107 In our recent Genomics inquiry, Genomics England and genomics scientists explained how the value of patients’ genomic data could be linked to medical and other data to provide valuable insights for diagnosing rare diseases and shaping ‘personalised medicine’.108 Because of the NHS’s unique scale and patient coverage, the benefits to algorithm developers of its data more generally, particularly once digitised (paragraph 15), will be enormous.

28.Hetan Shah told us, however, that the public sector currently has “a lack of confidence in this area and thinks the magic lies with the private sector”.109 In 2015, the Royal Free NHS Foundation Trust signed an agreement with DeepMind Health giving the company access to 1.6 million personal identifiable records (paragraph 17), but received no monetary gain in return.110 Hetan Shah thought that the NHS was “seduced by the magic of the algorithm company” and in future should at least seek more control over the data and their transparency:

What [the NHS] did not realise is they were the ones with the really important thing, which is the dataset. Over time, you are going to see more private sector providers springing up who can provide algorithms, but the public sector have the magic dataset, on which they have a monopoly. When they are transacting with the private sector, they should have more confidence and should not get tied up in exclusivity contracts, and they should ask for greater transparency from the private sector providers to say, ‘Open up so that you can show people what is going on with this evidence’.111

In Reform’s recent report, ‘Thinking on its own: AI in the NHS’, they argue that the planned ‘data trusts’ could provide a means for striking agreements between industry and the NHS on how “commercial value should be generated from data”. Reform recommended that the Government “should explore mutually beneficial arrangements such as profit and risk-sharing agreements”. Specifically:

The Department of Health & Social Care and the Centre for Data Ethics & Innovation should build a national framework of conditions upon which commercial value is to be generated from patient data in a way that is beneficial to the NHS. The Department of Health & Social Care should then encourage NHS Digital to work with [Sustainability & Transformation Plans112] and trusts to use this framework and ensure industry acts locally as a useful partner to the NHS.113

29.Algorithms are being used in an ever-growing number of areas, in ever-increasing ways. They are bringing big changes in their wake; from better medical diagnoses to driverless cars, and within central government where there are opportunities to make public services more effective and achieve long-term cost savings. They are also moving into areas where the benefits to those applying them may not be matched by the benefits to those subject to their ‘decisions’—in some aspects of the criminal justice system, for example, and algorithms using social media datasets. Algorithms, like ‘big data’ analytics, need data to be shared across previously unconnected areas, to find new patterns and new insights.

30.The Government should play its part in the algorithms revolution in two ways. It should continue to make public sector datasets available, not just for ‘big data’ developers but also algorithm developers. We welcome the Government’s proposals for a ‘data trusts’ approach to mirror its existing ‘open data’ initiatives. Secondly, the Government should produce, publish, and maintain a list of where algorithms with significant impacts are being used within Central Government, along with projects underway or planned for public service algorithms, to aid not just private sector involvement but also transparency. The Government should identify a ministerial champion to provide government-wide oversight of such algorithms, where they are used by the public sector, and to co-ordinate departments’ approaches to the development and deployment of algorithms and partnerships with the private sector.

31.Algorithms need data, and their effectiveness and value tends to increase as more data are used and as more datasets are brought together. The Government could do more to realise some of the great value that is tied up in its databases, including in the NHS, and negotiate for the improved public service delivery it seeks from the arrangements and for transparency, and not simply accept what the developers offer in return for data access. The Crown Commercial Service should commission a review, from the Alan Turing Institute or other expert bodies, to set out a procurement model for algorithms developed with private sector partners which fully realises the value for the public sector. The Government should explore how the proposed ‘data trusts’ could be fully developed as a forum for striking such algorithm partnering deals. These are urgent requirements because partnership deals are already being struck without the benefit of comprehensive national guidance for this evolving field.

Bias

32.While sharing data widely is likely to improve the quality of the algorithms they support, the underpinning systems also need to produce reliable and fair results—without bias. Machine learning is “application agnostic”.114 Algorithms are designed to discriminate—to tell the difference—between, for example, people, images or documents. As Professor Louise Amoore of Durham University explained, “in order for an algorithm to operate, it has to give weight to some pieces of information over others”, and this bias is “intrinsic to the algorithm”.115 Durham Constabulary warned against demanding “some hypothetical perfection”, and instead suggested considering “the conditions that would persist if such models were not available”.116 Dr Pavel Klimov, Chair of the Law Society’s Technology and the Law Group, highlighted the importance of not turning the technology into “a weapon against ourselves”, referring to the need for checks and balances.117 Some forms of bias can nevertheless extend beyond what is acceptable. Although algorithms have the potential to “promote efficiency, consistency, and fairness”, they can also “reinforce historical discrimination or obscure undesirable behaviour”.118

33.The Alan Turing Institute told us that when automated decision-making is applied “current legislation does very little to protect individuals from being discriminated” against.119 Where algorithms are used in the criminal justice system it is imperative that algorithms are not unfairly discriminatory. This is not always the case. We were told by the Information Commissioner that “algorithmic risk scores used in some US states” to determine sentencing “inaccurately classified black defendants as future criminals at almost twice the rate as white defendants, perpetuating a bias that already existed in the training data.”120 Even in relatively benign uses of algorithms such as when advertisements are displayed online, the result can be that “users of that service are being profiled in a way that perpetuates discrimination, for example on the basis of race”.121

34.Oxford and Nottingham Universities warned that as the complexity of algorithmic applications increases, “so do the inherent risks of bias, as there is a greater number of stages in the process where errors can occur and accumulate”.122 If discrimination (of the undesirable type) is introduced, subsequent deployment can amplify the discriminatory effects.123 Discrimination can enter the decision-making process from a variety of paths—in the use of inappropriate ‘training data’, a lack of data, through correlation disguised as causation, or the unrepresentativeness of algorithm development teams—and can present itself at any stage of an algorithm’s lifecycle including conception, design, testing, deployment, sale, or its repurpose.124

Training data

35.Perhaps the biggest source of unfair bias is inappropriate ‘training data’125—the data from which the algorithm learns and identifies patterns and the statistical rules which the algorithm applies.126 The way that training data are selected by algorithm developers can be susceptible to subconscious cultural biases,127 especially where population diversity is omitted from the data. The Royal Society noted that “biases arising from social structures can be embedded in datasets at the point of collection, meaning that data can reflect these biases in society”.128 A well-recognised example of this risk is where algorithms are used for recruitment. As Mark Gardiner put it, if historical recruitment data are fed into a company’s algorithm, the company will “continue hiring in that manner, as it will assume that male candidates are better equipped. The bias is then built and reinforced with each decision.129 This is equivalent, Hetan Shah from the Royal Statistical Society noted, to telling the algorithm: “Here are all my best people right now, and can you get me more of those?”130 Microsoft told us that, as part of its ‘Fairness, Accountability and Transparency in Machine Learning’ initiative, computer scientists were examining how some recruitment algorithms had learned biases “based on a skewed input data”.131 During our inquiry, Professor Louise Amoore of Durham University informed us of the case of a black researcher at MIT working with facial-recognition algorithms who found that “the most widely used algorithms did not recognise her black face”.132 Professor Amoore explained that the AI had been trained to identify the patterns in a facial geometry using predominantly white faces.133

36.Professor Nick Jennings from the Royal Academy of Engineering believed that algorithms are “not always well trained because people do not always understand exactly how they work or what is involved in training”. Because the research in this area is still relatively undeveloped, he explained, “you end up with poorly trained algorithms giving biased results”.134 This vulnerability can be difficult to tackle when, as is increasingly the case, the process of compiling training data and the process of pattern-learning are separate endeavours. Machine learning algorithm developers can procure training data from third parties, such as data brokers, where “access to the original basis on which the data was collected is unavailable”.135 The Horizon Digital Economy Research Institute explained that “as algorithms become embedded in off-the-shelf software packages and cloud services, where the algorithm itself is reused in various contexts and trained on different data, there is no one point at which the code and data are viewed together”.136

Insufficient data

37.As well as unrepresentative data, insufficient data can also cause discrimination. As prediction accuracy is generally linked to the amount of data available for algorithm training, incorrect assessments could be more common when algorithms are applied to groups under-represented in the training data.137 This is a recognised issue in the personal financial credit sector where, Google told us, “a lack of good data, or poor quality, incomplete, or biased datasets […] can potentially produce inequitable results in algorithmic systems”.138 Dr Adrian Weller of the Alan Turing Institute explained that one result of this ‘thin data problem’ is that banks may withhold credit simply because an individual does not match the pattern of larger bank customer groups:

It will train their algorithm based on looking for people who have at least this probability [of repaying loans]. When they do that, if they happen to be looking at a particular person who comes from a demographic where there is not much data, perhaps because there are not many people of that particular racial background in a certain area, they will not be able to get sufficient certainty. That person might be an excellent [credit] risk, but they just cannot assess it because they do not have the data.139

Correlation without causation

38.Bias or unfairness can arise, the Royal Society told us, when a machine learning algorithm correctly finds attributes of individuals that predict outcomes, but “in contexts where society may deem use of such an attribute inappropriate”.140 The Institute of Mathematics and its Applications gave the example of an algorithm used by the courts in Broward County, Florida, which asks: ‘Was one of your parents ever sent to jail or prison?’ Even if predictive, the Institute emphasised the unfairness of the inference that “a defendant deserves a harsher sentence because his father went to prison”.141

39.The sophistication of pattern-learning means that even setting restrictions on the algorithms produced, for example to ignore protected characteristics like race, may not easily solve the problem. Machine learning systems may instead identify proxies for such characteristics. Professor Amoore explained how in the US, where algorithms had been used to predict the outcome in criminal trials, “even where race as a category was removed from the input data, the algorithm still learned characteristics, or attributes, that we might say are in breach of the Equality Act, because they use what we could call proxies for race. They learn patterns in past patterns of crime or they learn patterns in postcodes, for example.”142 Following a review of Durham Constabulary’s HART algorithm, used to aid custody decisions (paragraph 21), a postcode field was removed amid concerns that it could discriminate against people from poorer areas.143 Concerns have been expressed that other characteristics used in HART and other policing algorithms are potential sources of bias, especially where they serve as proxies for race or gender. (paragraph 41).

40.The opaque nature of the algorithm ‘black box’ makes its use controversial in some areas. Professor Amoore warned that there may exist “areas of our social, political or economic lives where we might want to say there is no place for algorithmic decision-making”.144 She also questioned the use of inference and correlation in the criminal justice system, and suggested that its use in the US for sentencing “constitutes a violation of due process or overt discrimination”.145 (In the UK, Durham Constabulary was using an algorithm to help determine whether a low-risk offender is suitable for ‘deferred prosecution’.)146 The risk is compounded, as Professor Amoore explained, when the algorithm’s results do not allow challenge:

Whereas with conventional tools like DNA, or a photograph, or a CCTV image, or the evidence that has been given by an eye witness, there is always the possibility of this cross-examination and the questioning: ‘How did you arrive at that judgment?’” With machine learning algorithms that method is obviated.147

41.In some of our evidence, there was a desire for algorithms within the criminal justice system to be restricted to advisory roles. Elizabeth Denham, the Information Commissioner, noted that while “there may be some red lines” there is scope for its use around sensitive areas where there is “human intervention” for decisions “around sentencing or determining parole.”148 Big Brother Watch raised a concern about the Durham HART algorithm assessing reoffending risks, in part, on the basis of a wider Experian algorithm which characterises people using metrics such as postcode, family composition and occupation, which could be discriminatory.149 Silkie Carlo, then of Liberty,150 told us that “where algorithms are used in areas that would engage human rights, they should be at best advisory”.151

42.At the heart of this source of bias is a propensity to confuse ‘correlation’, “which is what algorithms […] can detect”, with ‘causality’.152 The Information Commissioner’s Office explained that “where algorithmic decisions are made based on such patterns [in the data], there is a risk that they may be biased or inaccurate if there isn’t actually any causality in the discovered associations.”153 Difficulties in fully understanding a machine learning algorithm, as we discuss in Chapter 3, make it hard to even identify whether correlation without causation is being applied.

Lack of representation in the algorithm development community

43.Dr Adrian Weller from the Alan Turing Institute told us that algorithm bias can also result from employees within the algorithm software industries not being representative of the wider population.154 Greater diversity in algorithm development teams could help to avoid minority perspectives simply being overlooked, by taking advantage of a “broader spectrum of experience, backgrounds, and opinions”.155 The US National Science and Technology Council Committee on Technology concluded in 2016 that “the importance of including individuals from diverse backgrounds, experiences, and identities […] is one of the most critical and high-priority challenges for computer science and AI”.156 Dr Weller also made the case for more representation.157 TechUK told us:

More must be done by Government to increase diversity in those entering the computer science profession particularly in machine learning and AI system design. This is an issue that techUK would like to see the Government’s AI Review exploring and make recommendations on action that should be taken to address diversity in the UK’s AI research community and industry.158

44.Algorithms, in looking for and exploiting data patterns, can sometimes produce flawed or biased ‘decisions’—just as human decision-making is often an inexact endeavour. As a result, the algorithmic decision may disproportionately discriminate against certain groups, and are as unacceptable as any existing ‘human’ discrimination. Algorithms, like humans, can produce bias in their results, even if unintentional. When algorithms involve machine learning, they ‘learn’ the patterns from ‘training data’ which may be incomplete or unrepresentative of those who may be subsequently affected by the resulting algorithm. That can result, for example, in race or gender discrimination in recruitment processes. The patterns that algorithms rely on may be good correlations but may not in fact show a reliable causal relationship, and that can have important consequences if people are discriminated against as a result (such as in offender rehabilitation decisions). Algorithms may have incomplete data so that, for example, some do not get favourable financial credit decisions. Algorithm developer teams may not include a sufficiently wide cross-section of society (or the groups that might be affected by an algorithm) to ensure a wide range of perspectives is subsumed in their work. These biases need to be tackled by the industries involved and, as we discuss in Chapter 4, by the regulatory environment being introduced by the GDPR, and safeguards against bias should be a critical element of the remit of the Centre for Data Ethics & Innovation.


37 Science and Technology Committee, Fourth Report of Session 2015–16, The big data dilemma, HC 468, para 42

38 The Academy of Medical Sciences (ALG0055) para 5

39 National Data Guardian, National Data Guardian 2017 report published, 12 December 2017

40 PHG Foundation (ADM0011) para 8; Research Councils UK (ALG0074) para 10; Academy of Medical Sciences (ALG0055) para 3

41 Science and Technology Committee, Third Report of Session 2017–18, Genomics and genome editing in the NHS, HC 349

42 Q92 [Dr M-H. Carolyn Nguyen]

43 REACT/ REFLECT research team, University of Manchester (ADM0023) para 1

44 The Academy of Medical Sciences (ALG0055)

45 Polygeia (ALG0043)

46 HC Deb, 02 May 2018, col 315

47 Reform, Thinking on its own: AI in the NHS (January 2018), p 6

49 Polygeia (ALG0043) para 8.1

50 Reform, Thinking on its own: AI in the NHS (January 2018), p 6

51 National Information Board, Delivering the Five Year Forward View (June 2015), p 6

53 National Advisory Group on Health Information Technology in England, Making IT Work: Harnessing the Power of Health Information Technology to Improve Care in England (September 2017), p 28

54Confidence in achieving NHS 2020 digitisation targets falls”, Digital Health Intelligence, 18 July 2017

56 Q247

57 The original Caldicott Principles were developed in 1997 following a review of how patient information was handled across the NHS. Source: Information Governance Toolkit, Department of Health

58 UK Caldicott Guardian Council, A Manual for Caldicott Guardians (January 2017)

59 Information: To share or not to share? The Information Governance Review, March 2013

60 Reform, Thinking on its own: AI in the NHS (January 2018), p 3

61 National Data Guardian for Health and Care, Review of Data Security, Consent and Opt-Outs (June 2016), p 3

62 National Data Guardian, National Data Guardian 2017 report published, 12 December 2017

63 DeepMind was founded in 2010. In 2015 it was acquired by Google

64 Qq232, 234, 235

65 Q230

68 Oxford Internet Institute (ALG0031)

69 UCL Jill Dando Institute of Security and Crime Science (ALG0048) para 5

70 Marion Oswald and Sheena Urwin submission; See also, “Pre-crime software recruited to track gang of thieves“, New Scientist, 11 March 2015

71 RUSI, Big Data and Policing 2017 (September 2017), p20

73 Sheena Urwin, Head of Criminal Justice, Durham Constabulary (ADM0032)

74 Institute of Mathematics and its Applications (ALG0028) para 19

75 Durham Constabulary (ALG0041)

76 HM Inspectorate of Constabulary, PEEL: Police Effectiveness 2016 (March 2017), p33

77 Durham Constabulary (ALG0041)

78 RUSI, Big Data and Policing 2017 (September 2017), p 24. See also, Big Brother Watch (ADM0012) para 12

79 HM Inspectorate of Constabulary, PEEL: Police Effectiveness 2016 (March 2017), p33

80 Marion Oswald and Sheena Urwin (ALG0030) para 5

81 Q27

82 Q28

83 Jamie Grace, Sheffield Hallam University (ALG0003) para 1

86 The Royal Statistical Society (ALG0071) para 1.3

87 Professor Ashiq Anjum, Professor of Distributed Systems, University of Derby (ADM0009), para 4

88 House of Lords Select Committee on AI, Report of Session 2017–19, AI in the UK: ready, willing and able?, HL 100, para 122

89 A Shared Space and a Space for Sharing project (ALG0006) para 10. See also: ICO, ‘Information Commissioner updates on WhatsApp / Facebook investigation,’ accessed 7 November 2016; Nick Srnicek, Platform Capitalism, (Cambridge, 2017); “We need to nationalise Google, Facebook and Amazon. Here’s why”, The Guardian, 30 August 2017.

91 Informatica, ‘The Role of Data in Mergers and Acquisitions,’ 16 December 2016

92 Science and Technology Committee, Fourth Report of Session 2015–16, The big data dilemma, HC 468, para 56

93 The big data dilemma, HC 468, para 42

94 The big data dilemma, HC 468, para 56

95 Professor Dame Wendy Hall and Jérôme Pesenti, Growing the artificial intelligence industry in the UK (October 2017), p 46

96 Autumn Budget, November 2017, para 4.10

97 Industrial Strategy, November 2017

98 Q382 [Oliver Buckley]

99 Written evidence received by the House of Lords Committee on AI, HM Government (AIC0229)

100 Written evidence received by the House of Lords Committee on AI, HM Government (AIC0229)

101 Cabinet Office and Government Digital Service, Government Transformation Strategy 2017 to 2020 (September 2017), p 6

102 Q364 [Oliver Buckley]

103 Autumn Budget, November 2017, para 6.22

105 Q18

106 Q221 [Professor Harry Hemingway]

107 Q221 [Dr Dominic King]

108 Science and Technology Committee, Third Report of Session 2017–18, Genomics and genome editing in the NHS, HC 349. See also Oral evidence taken on 01 November 2017, HC (2017–18) 349, Q24 [Sir John Bell]

109 Q16

111 Q15

112 Sustainability and Transformation Plans (STPs) were announced in NHS planning guidance published in December 2015. NHS organisations and local authorities in different parts of England have developed common ‘place-based plans’ for the future of health and care services in their area.

113 Reform, Thinking on its own: AI in the NHS (January 2018), pp 39–40

114 Q7 [Professor Nick Jennings]

115 Q10

116 Durham Constabulary (ALG0041)

117 Q53

119 Alan Turing Institute (ALG0073)

120 Information Commissioner’s Office (ALG0038)

122 Horizon Digital Economy Research Institute, University of Nottingham, and the Human Centred Computing group, University of Oxford (ALG0049) para 4

123 The Human Rights, Big Data and Technology Project (ALG0063) para 15

124 The Human Rights, Big Data and Technology Project (ALG0063) para 23

125 IBM (ADM0017), para 10

126 Science and Technology Committee, Fifth Report of Session 2016–17, Robotics and artificial intelligence, HC 145, para 5

127 IBM (ADM0017), para 10

128 The Royal Society (ALG0056) para 13

129 Mark Gardiner (ALG0068) para 4

130 Q10

131 Microsoft (ALG0072) para 20

133 Q10

134 Q10

135 The Human Rights, Big Data and Technology Project (ALG0063) para 28

136 Horizon Digital Economy Research Institute, University of Nottingham, and the Human Centred Computing group, University of Oxford (ALG0049) para 13

137 Google Research Blog, ‘Equality of Opportunity in Machine Learning’, 7 October 2016

138 Google (ADM0016) para 3.8

139 Q11

140 The Royal Society (ALG0056) para 13

141 Institute of Mathematics and its Applications (ADM0008) para 23

142 Q10

144 Q27

145 Kehl, Danielle, Priscilla Guo, and Samuel Kessler. 2017. Algorithms in the Criminal Justice System: Assessing the Use of Risk Assessments in Sentencing, accessed 5 April 2018

146 Sheena Urwin, Head of Criminal Justice, Durham Constabulary (ADM0032)

147 Q28

148 Q296

149 Big Brother watch, ‘Police use Experian Marketing Data for AI Custody Decisions’, 6 April 2018

150 Also known as the National Council for Civil Liberties

151 Q49

152 Institute of Mathematics and its Applications (ALG0028) para 29

153 Information Commissioner’s Office (ALG0038)

154 Q10

155 National Science and Technology Council Committee on Technology, Preparing for the future of artificial intelligence (October 2016), p 28

157 Q10

158 TechUK (ADM0003) para 65




Published: 23 May 2018