The big data dilemma Contents

4Open data and data sharing

32.One of the distinguishing features of big data is that it often brings together data from multiple sources. Big data can make use of datasets which are ‘open’—licensed for anyone to access and use. Many real-time datasets, such as live travel and weather feeds, fall into this category. Alternatively, access to external datasets can be facilitated by data-sharing agreements. ‘Shared data’ is typically stored in a secure setting, with restrictions on with whom the data is shared and for what purpose, as, for example, in the sharing of administrative data between government departments (paragraph 39). Gavin Starks, chief executive of the Open Data Institute, explained the role of his organisation in promoting each of these modes of data sharing:

When we talk about an ‘open licence’ we are referring to data that anyone can access, use and share. … The more complex area, where we have significant questions to ask, is the ‘shared data’ category. … We have very strong views that core data infrastructure should be open and owned by the state … In terms of what we see as the open remit, we should help to stimulate open innovation. The kind of work we can do to get the roles, policies and liabilities sorted out around the shared part of the data spectrum will help to unlock a huge amount of innovation and value in the country.62

33.Gavin Starks thought that “the processes, policies, standards and so on” of open and shared data were “much harder” than the data analysis itself.63 Government has a key role to play, nevertheless, in making its own data ‘open’ and ‘shared’, to enable the value in those data to be realised, whether for research, service delivery or commercial purposes. It can do this by making its data available for outside bodies to use, or make full use of its data itself to improve the cost-effectiveness of the public services it provides.

Government data

Open data

34.By publishing its data the Government can stimulate business and innovation, provide transparency and accountability, empower citizens to make informed choices about products and services, and improve data quality through its wider and more frequent use. The 2013 Shakespeare Review estimated the ‘direct value’ of public sector information at £1.8 billion, with ‘wider social and economic benefits’ worth up to £6.8 billion.64 Transport for London (TfL) described how it has more than 5,000 developers registered to receive its data sets and how this has stimulated the creation of 360 transport information apps for mobile devices.65 The Shakespeare Review estimated the value of time saved as a result of access to real-time travel data from TfL at £15-58 million a year, at a cost to TfL of £1 million a year.66

35.Gavin Starks from the Open Data Institute saw data as public infrastructure:

We should really be thinking about data as infrastructure in the same way as we think about roads as infrastructure. Roads help us navigate to places; data help us navigate to decisions. Those decisions need to be made by everyone. There is a lot of work to be done to work out what we would classify as data infrastructure for the country—for example, our geo-spatial information. [The Department for Environment, Food & Rural Affairs] has just released its dataset called LiDAR, which is very detailed environmental mapping. That has helped local businesses, citizens and Government make better decisions about their built environment.67

36.The Government has introduced a number of open data initiatives. In 2010 it launched the ‘’ data portal, which includes data from central government, local government, agencies and arms-length bodies, NHS bodies and the police—typically licensed under the Open Government Licence.68 The datasets are prioritised for publication according to criteria in the National Information Infrastructure,69 set up by the Government in 2013 in response to the Shakespeare Review.70 In 2012, the Government established the Open Data Institute, with £10 million of funding over five years from Innovate UK. Cabinet Office minister Matt Hancock MP recently noted the extent of international recognition for Government open data:

So far we’ve published over 20,000 datasets [on], covering almost £200 billion of public spending. This approach has won us plenty of plaudits. For the second year running, we’ve topped the World Wide Web Foundation’s Open Data Barometer. Last year we were number one in the Global Open Data Index.71

37.However, the Greater London Authority argued that current progress on open data does not go far enough:

The internationally acclaimed London Datastore contains over 600 datasets, but few among these can be described as big data. Transport data feeds aside, it is a very high quality but largely static data catalogue.

The public sector’s strategy of opening up proprietary data in machine readable form so that third parties can develop products or analysis to benefit stakeholders and the wider digital economy, has been an undoubted success. … However, without further incentives to encourage more consistent, higher quality and higher volume ‘big’ data from a wider set of suppliers, cities like London will fail to capitalise on the potential of big data to tackle the complex questions with which cities have historically grappled and deliver potentially transformative innovation.72

Experian considered that “whilst part of Government have been embracing [open data], there is a lack of a clear joined up directional policy around open data and the technology to facilitate and deliver value from this”.73

Government administrative data

38.Paul Maltby from the Government Digital Service saw big data being used increasingly in providing Government services:

The world has changed and is changing utterly the way we enjoy services in our everyday lives that are powered from data analytics and the way data work. We want to bring that transformation to government.74

Cabinet Office minister Matt Hancock MP recently stated that:

Openness is a means to an end. The end is to make government work better for the people of this country. That means better decision-making within government: policy based on data and evidence, not dogma and theory.75

39.The Government has initiatives in place to provide researchers with access to its administrative data. The Administrative Data Research Network, established by the Economic and Social Research Council as part of its Big Data Network,76 is a “UK-wide partnership between universities, Government departments and agencies, national statistics authorities, the third-sector, funders and researchers”. It securely provides administrative data to researchers wishing to carry out social and economic research which “has the potential to benefit society”.77 Similarly the HMRC ‘Datalab’ allows researchers from academic institutions and other Government departments to access anonymised data from HMRC.78 Both of these programmes require researchers to apply for access to the data, and projects are approved on a case-by-case basis, rather than facilitating real-time access to Government data.

40.Where such schemes are not available, we heard an example of Government taking the initiative and sharing the benefit of its administrative data with external organisations:

[The Ministry of Justice] hold great datasets in government, but we have to hold them very securely because they include very sensitive data. We are trying to explore ways of making that data available to academe in a way that is safe and in accordance with the law, and also bears in mind the important ethical and privacy issues academics take very seriously. … [The Ministry of Justice] developed a very interesting and novel way of helping charities work out who is and is not reoffending, by allowing charities to send their data to the Ministry of Justice. The Ministry of Justice did the matching and analysis and sent back the results. That was hugely successful.79

Digital economy minister Ed Vaizey MP believed, nevertheless, that at the moment data sharing between departments does not go far enough, and that facilitating legislation may be needed:

We have set up the Government Data Taskforce with the chief scientist and others to try to get Government departments to take big data seriously, to see the opportunity and also to share it. Mindful of the ethical concerns … surrounding things like ‘’ [paragraph 45], this provides massive opportunities. I think we need to look at potential future legislation to allow that sharing to be made easier between Government departments.80

41.In a similar vein, Hetan Shah from the Royal Statistical Society worried about data remaining in departmental silos and saw potential benefits if the Office for National Statistics were able to collect Government administrative data from across departments:

It does not seem to me that variability of data quality is the key issue in terms of stopping the sharing of data within Government and making it open. Francis Maude always used to make the argument that if you open up datasets the quality will increase … One of the big problems is that there is a silo mentality within Government, and different datasets are held and not shared across departments.

The single biggest opportunity is to move where other countries have gone—Canada, New Zealand and Ireland—in giving the statistical office a broad right to data access across departments. At the moment, the Office for National Statistics cannot easily get hold of HMRC, BIS and DWP data. If it could, we would have more real-time access to what is going on around the country. … You would not have the privacy issues, because the ONS is interested only in aggregate data; they do not care about us as individuals.81

42.There are enormous benefits in prospect for the economy and for people’s lives from making the nation’s core data infrastructure ‘open’. The Government’s work in this area has put the UK in a world-leading position. But there is more to do to breakdown departmental data silos, to bring data together in order to further improve public services, as well as to improve data quality (as we discuss in the healthcare context below). The Government should set out how it can build capacity to deliver more datasets, increasingly in real-time, both to decision-makers in Government and to external users and, in particular, should work to establish a right of access to data for the Office for National Statistics. The Government should also establish a framework—to be overseen by the Government Digital Service, the Office for National Statistics or another expert body—for auditing the quality of data within Government departments amenable for big data applications, and for proactively identifying data sharing opportunities to break departmental data silos.

Healthcare data

43.An area where the potential benefits of big data has been particularly significant, but also where data quality constraints have been evident, has been in healthcare and medical research. In 2014, Volterra and EMC consultants found that the NHS was “considerably behind other industries in terms of its use of data analytics”, and identified potential efficiency savings of between £16 billion and £66 billion a year if the NHS employed data analytics more successfully.82 The potential benefits could be better quality healthcare, with interventions more precisely tailored to individual patients’ circumstances (as illustrated at paragraph 14 above) if their medical and other data can be matched to extensive datasets. Aisling Burnand from the Association of Medical Research Charities highlighted how this would help research on rare diseases:

Up and down the country there may be only a handful of people with a particular condition. So the ability to join up public patient datasets to find those people and then use the data for research purposes will, we hope, lead to improvements in treatments and, hopefully, cures and life-saving advances.83

44.Such big data benefits depend however on the quality of the datasets being brought together. Professor John Williams of the Royal College of Physicians was concerned about the quality of hospital data because the data collection process is “out of date and no longer appropriate for [big data analysis]. … There is no requirement for a feedback-loop for clinicians to validate the data centrally so that we get richer and more accurate data.”84 He worried that current analysis of, for example, the mortality of patients admitted at the weekend was based on some available datasets but it still lacked quality data on other key statistics, so that “premature conclusions … are being drawn from the data because it is not rich enough”.85 Similarly, Professor Montgomery, chair of the Nuffield Council on Bioethics, told us:

There are major problems of data quality. The further away the interpretation of the health data gets from the person who produced it, the more scope there is for it being misinterpreted. There is work going on to try to improve standardisation and the way we record things, which would make it more possible to translate those things. … Extracting them as if they can be understood without reference to context is problematic in health data, because people record things in so many different ways.86

45.As our predecessor Committee reported in 2014, the momentum for using big data to improve health services was reduced by the experience of bringing patient data together under the ‘’ initiative. They stated that:

Members of the public do not appear to be wholly against the idea of their data being used by Government institutions, but support for data usage is highly dependent upon the context within which the data is collected.

The Government should have learned from the experience with and we recommend that the Government develop a privacy impact assessment that should be applied to all policies that collect, retain or process personal data.87

46.The ‘’ programme was introduced in 2013 by NHS England and the Health and Social Care Information Centre (HSCIC)—a system to “extract and link large amounts of patient data, collected as part of NHS care, in order to improve the delivery of healthcare and to benefit researchers inside and outside the NHS”. However, before the system could be launched, the programme was delayed to “allow GPs more time to notify their patients and for NHS England to conduct a public awareness campaign”.88 Dame Fiona Caldicott told us that the system was “put on hold because there was loud and extensive protest, not least from the general practitioners who were being called upon to download patients’ data from their health records about the patient to the HSCIC, in terms that GPs were not content about”.89

47.To regain patient trust, in 2014 the Department of Health established a National Information Board to “put data and technology safely to work for patients, service users, citizens and the professionals who serve them”.90 In the same year, the Secretary of State for Health appointed Dame Fiona Caldicott to a new role as National Data Guardian for Health and Care—”the patients’ champion on security of personal medical information”.91 Following a pathfinder stage, the programme had been expected to be re-launched in September 2015, but Dame Fiona told us that:

The Secretary of State for Health decided that new work should be done on the question of patients being able to opt-out of how their data were taken from one place to another and used, so there is currently another pause. Were [] to be restarted, I think it would be on the lines of much improved communication with both GPs and patients. … One thing that might be worth considering for the future is whether we should look at a more general question about data flows for a list of purposes, rather than the rather narrow purpose as publicised.92

48.Dame Fiona first addressed the flow of patient data “from NHS organisations to other NHS and non-NHS organisations” in the 1997 Report on the review of patient-identifiable information.93 The ‘Caldicott Report’ introduced the ‘Caldicott principles’—key principles of good practice for using patient data. In a 2012 review of these principles, they were extended to include the principle: “The duty to share information can be as important as the duty to protect patient confidentiality”.94 Reflecting on her review, Dame Fiona told us that she was “very disappointed when we revisited the new ‘Caldicott principles’ … to find that the culture in the NHS of sharing information had not moved in the way we hoped”.95 Today, the benefits of sharing patient data have yet to be realised. Dame Fiona noted, for example, that “there is a real issue for the public about why the ambulance service cannot see key aspects of the [patient] record when they go to collect an unconscious patient.”96

49.The November 2015 Spending Review has, however, now raised the prospect of progress on this front:

The Government will invest £1 billion in new technology over the next 5 years to deliver better connected services for patients and ensure that doctors and nurses have the information they need at their fingertips. By September 2018, 80% of clinicians in primary, urgent and emergency care will have digital access to key patient information. By 2020 integrated care records will give every health and care professional concerned with an individual’s care the information they need to provide safe and prompt care.97

50.The success of a scheme similar to in Scotland demonstrates that patients and healthcare professionals are not against the sharing of patient records if that sharing is performed with due care, and the benefits are clearly articulated. According to Professor John Williams of the Royal College of Physicians, the programme in Scotland “put together the infrastructure and the process with patients … It is because of that engagement that they have done better”.98

51.Another part of securing individuals’ consent for the sharing and use of their data is allowing them to see their data record and amend it. Dame Fiona told us that “within the next year or two, access to their records will be available to patients”.99 Aisling Burnand from the Association of Medical Research Charities believed that:

This might even help to drive up the quality if they are able to see what is in the record. They might be able to add to the record, at least to say, “Well, that is not our recollection.” They would still have to have the health professional involved, but they may even help with the quality of the data. We would certainly welcome greater openness from a patient perspective.100

52.Patients and GPs are more likely to be content for their personal data to be used for healthcare and medical research if the benefits—both to the individual and to society—are clearly explained and adequate safeguards are in place. But the track-record of ‘’ shows that this cannot be taken for granted. The Government cannot afford a second failure from a re-launched scheme. The Government should take careful account of the lessons from the pathfinder projects as well as the experience of the similar, successful, scheme in Scotland. To help bring patients onside and to streamline healthcare across different NHS providers—hospitals, GPs, pharmacists and paramedics—it should give them easy online access to their own health records.

Private sector data sharing

53.The Royal Academy of Engineering stated in its 2015 Connecting Data report that:

Much potentially valuable data remains locked away in corporate silos or within sectors, although some data is already traded within the supply chains of individual sectors. The next step, in areas that do not impinge on the privacy of personal data, should be the creation of platforms to enable proprietary datasets to be traded within a framework that promotes trust and practicality.101

The Digital Catapult is intended to facilitate this. It is one of Innovate UK’s expanding network of Catapult Centres, which are designed to support innovation in specific technology areas by providing access to expert technical capabilities, equipment, and other resources. The Digital Catapult aims to “help UK businesses unlock new value from sharing proprietary data in faster, better and more trusted ways”.102

54.Chirdeep Chhabra of the Digital Catapult was concerned that “we have yet to see enough sharing of data … between different silos. … We need to start off by enabling sharing of data between organisations”.103 He told us that “there is much more to be done” in taking forward the Government’s work on developing the UK’s big data capability. That included more work to facilitate “safe” data sharing:

Mixing data from different sources, silos of data, … that is where I think we need more initiatives … ‘data-sharing labs’.104

Some of the things we are doing are to create safe havens where organisations can bring their datasets together. They are not giving data to each other; of course, they will not do that for governance and business reasons, but organisations are realising more and more that they can only get benefit from their data by mixing it with other data … . There is a lot of friction in data sharing, in terms of legal governance and so on. That is where we need to take a lead.105

55.Funding Circle believed that in their fintech sphere the Small Business, Enterprise and Employment Act 2015 could improve data sharing:

[The Act imposes] a duty on designated banks to provide information about their small and medium sized business customers to designated credit reference agencies, and a duty on designated credit reference agencies to provide that information to finance providers. This … will be incredibly helpful as businesses currently have to provide bank statements themselves. … The new Act will mean this process is now automated (with businesses’ consent), allowing us to speed up the process and enable creditworthy businesses to access finance faster.106

Hetan Shah from the Royal Statistical Society suggested an area where further legislation could be beneficial, to give the Office for National Statistics access to privately held datasets:

The very interesting thing about Canada and New Zealand is that they have also mandated private sector data to be open to their statistical offices, and the private sector has said, “We are glad we are being put on a level playing field, because if I was volunteering my data to you I would be at a competitive disadvantage, but if we all have to give our telecoms or supermarket data it does not matter.” That sort of legislation, in the mould that other more forward-thinking countries are taking, would be the right way forward.107

56.While the private sector is making great strides in identifying opportunities for bringing different datasets together, it is understandably more challenging for businesses in a competitive market to share valuable data with one another or with Government. The Government’s Digital Catapult therefore plays a vitally important role in facilitating private sector data sharing in a ‘safe’, trusted environment. The Government should map out how the Catapult’s work and its own plans to open and share Government data could be dovetailed. The Government should also consider the scope for giving the Office for National Statistics greater access both to Government departments’ data (paragraph 41) and private sector data.

62 Q202

63 Q209

64 Stephan Shakespeare, An Independent Review of Public Sector Information (May 2013)

65 GLA and TfL (BIG0067)

67 Q215

68 website, accessed January 2016

69 Cabinet Office, The National Information Infrastructure (March 2015)

72 GLA and TfL (BIG0067)

73 Experian (BIG0022)

74 Q192

76 ESRC website, accessed January 2016

78 HMRC ( website, accessed January 2016

79 Qq264-265

80 Q264

81 Q203

82 Volterra Partners for EMC, Sustaining universal healthcare in the UK: Making better use of information (September 2014); and EMC (BIG0046)

83 Q2

84 Q9

85 Q10

86 Q127

87 Science and Technology Committee, Responsible Use of Data, Fourth Report of Session 2014–15, HC 245, para 29

88 House of Commons Library,, Standard Note SN06781 (October 2014)

89 Q11

92 Qq12-13

93 Department of Health, Report on the review of patient-identifiable information (December 1997)

94 Williams Lea for the Department of Health, Information: To share or not to share?, (March 2013)

95 Q25

96 Q23

97 HM Treasury, Spending Review and Autumn Statement 2015, Cm 9162 (November 2015), para 1.101

98 Q28

99 Q23

100 Q23

101 Royal Academy of Engineering and Institution of Engineering & Technology, Connecting data: Driving productivity and innovation (November 2015)

102 Digital Catapult website, accessed January 2016

103 Q37

104 Q44

105 Q54

106 Funding Circle (BIG0081)

107 Q203

© Parliamentary copyright 2015

Prepared 11 February 2016