UNCORRECTED TRANSCRIPT OF ORAL EVIDENCE
To be published as HC 1666-i

House of COMMONS

Oral EVIDENCE

TAKEN BEFORE the

Science and Technology Committee

The Census and social science

WEDNESDAY 7 December 2011

Professor David Blane, Professor Heather Joshi and Professor Leslie Mayhew

Evidence heard in Public Questions 1 - 45

USE OF THE TRANSCRIPT

1.

This is an uncorrected transcript of evidence taken in public and reported to the House. The transcript has been placed on the internet on the authority of the Committee, and copies have been made available by the Vote Office for the use of Members and others.

2.

Any public use of, or reference to, the contents should make clear that neither witnesses nor Members have had the opportunity to correct the record. The transcript is not yet an approved formal record of these proceedings.

3.

Members who receive this for the purpose of correcting questions addressed by them to witnesses are asked to send corrections to the Committee Assistant.

4.

Prospective witnesses may receive this in preparation for any written or oral evidence they may in due course give to the Committee.

Oral Evidence

Taken before the Science and Technology Committee

on Wednesday 7 December 2011

Members present:

Andrew Miller (Chair)

Stephen Metcalfe

David Morris

Stephen Mosley

Pamela Nash

Roger Williams

________________

Examination of Witnesses

Witnesses: Professor David Blane, Deputy Director, ESRC International Centre for Life Course Studies, Professor Heather Joshi, President, Society for Lifecourse and Longitudinal Studies, and Professor Leslie Mayhew, City University, gave evidence.

Q1 Chair: I welcome the three of you to this morning’s session. Would you be kind enough to introduce yourselves, for the record?

Professor Blane: I am David Blane from Imperial College, London.

Professor Joshi: I am Heather Joshi from the Institute of Education.

Professor Mayhew: I am Les Mayhew from Cass Business School.

Q2 Chair: Thank you very much indeed. As you know, we are looking into the census in some detail. From your perspective, is the census the "key source of information on population statistics" or is it just one of many?

Professor Mayhew: No. It is one of many. The other sources are largely administrative. They have to be combined and used in certain ways to produce something that is equivalent. I could go into detail, but, in my experience, there are enough other sources out there from which you can replicate more frequently, more accurately and in more detail what is available in the census.

Q3 Chair: Is that view broadly held?

Professor Joshi: My answer would be slightly different. The main census is part of a structure of sources of evidence for social and demographic research. A census underpins the rest, which are mostly sample surveys.

Professor Blane: I do quite a lot of research in the rest of Europe and I am aware that there are many other ways of doing these sorts of things. I find that on the mainland a lot of people are still quite envious of this country with its census. The problems with alternatives are quite formidable, not least in terms of informed consent. The country that has the best alternative is probably Sweden and the Scandinavian model, where each citizen has a unique identification number which appears on all Government records. In theory, you can link every record the state holds on any individual. To get access to these records, you have to pass very stringent ethical committee approval.

Q4 Chair: Does that approval extend to academics?

Professor Blane: Yes.

Q5 Chair: It is not just Government organisations.

Professor Blane: No. It is a major research tool. Most importantly, it requires a referendum every 10 years or so, so that there is population support for this sort of record linkage.

Chair: That is interesting.

Professor Blane: The most recent one in Sweden, which was a very close call, was essentially on the function of these linked registers for medical research.

Q6 Chair: Historically, as I understand it, the reason for having a census is to help manage increases in population and develop structures to match change. Does that remain true today or are there other reasons for having a census?

Professor Mayhew: I can speak from a personal standpoint. I use the census very little these days because I find that alternatives are much better, timelier and more accurate. I work a lot in local authorities with healthcare providers and the census is regarded as out of date and not fit for purpose for their needs: commissioning local services, managing complex budgets and identifying unmet need. All those things require timely, detailed and accurate information which the census cannot provide in the form that they need.

In areas like London, in the last 10 years, there have been absolutely huge changes in the number of people and in the ethnic make-up of the population, yet some people are still using census data. A good example I came across recently was looking at the prevalence of TB cases in the African population. The denominator that the public health specialists are using is hopelessly wrong, and the outcome of that kind of analysis can be very misleading, misdirect resources and so on. In the areas where I work the case is quite clear. In that particular role, the census does not fulfil the need.

Q7 Chair: There is, in that particular instance, a significant difference in the available data coming from the census versus stuff that is policy driven-in this case your work on TB.

Professor Mayhew: It was not my work on TB, but, yes, there are so many local issues for which you require good quality population information.

Q8 Chair: Stemming from that, if the other witnesses could consider it as well, are there similar examples elsewhere?

Professor Joshi: Once the 2011 census data are available, they won’t be out of date or not very out of date, and they will refresh the baseline information that is used for all kinds of things: for example, the ethnic composition of local areas. That census will also produce evidence that is difficult to collect locally about the flows of people from place to place, from residence to workplace, commuting across boundaries and moving house across boundaries. That evidence does not become available very often, but it is an important source for knowing how the population is not only distributing itself residentially but how it is moving between different types of households. It also covers people not living in households, in communal establishments, to a greater extent than many other sources.

Professor Blane: A major problem with the census as a research tool is that it takes place only once every 10 years, so you don’t really know what has happened in between. You have to rid yourself of the idea that you are ever going to get a perfect dataset. You have to see how different types of datasets fit together. We have annual panel studies that can fill in between the 10 years of the census. The census gives us a secure block every 10 years. It is that in which all the other datasets are situated, in my experience.

Q9 David Morris: What would be your major concerns if census data were no longer available? Is there something that you could do that would no longer be possible?

Professor Mayhew: If the census was not available, it would be hugely beneficial to research because it would lead to a period of huge innovation in the research community as they would learn to use other sources and other datasets, which would become more available. That would create a completely different research environment. You are not completely comparing like for like.

However, there are some things in the census, for example, on caring responsibilities, a question on religion and a few other questions that cannot be replicated easily from administrative data. On the other hand, administrative data contain so much more information than is available in the census. That is what I mean by creating a new research environment and atmosphere that would lead, in my view, to huge innovation.

On the issue of turnover and change, in two months we completed a six-borough study of the Olympic boroughs in London. The information on the population was given to ONS to help validate, as a benchmark, some of their census estimates. This information was very detailed. It was on a personal level by age, sex and ethnicity. It followed on from previous snapshots of the same local authorities in the last two to three years. So we were able to look at turnover change and the flows into and out of those boroughs. There are changes of up to 20% per year in population. If you are measuring something that is 10 years old-you can be using data that are 12 years old with the census because it takes a couple of years to publish-you could be way out. You could be misdirecting resources and all sorts of other things. I cannot speak for the academic use of those sources, but at the local level, for the management of local resources, localism, the unmet need, the creation of health and wellbeing boards, much better data of population intelligence are essential.

Q10 David Morris: You are saying that this form of census could be outmoded and simply collecting this kind of data could be more expensive and more costly.

Professor Mayhew: It is not quite like for like, but we think you can count the population every year for about one tenth of the cost of the census. Statistics Finland, which has used registers for a number of years, calculates that their population intelligence is 31 times cheaper than the census that it replaced per head of population on the basis of equivalent data. The case is clear, but there will be gaps. The targeted use of surveys can fill many of those gaps and extend your intelligence. For example, you might be interested not only in a person’s religion but their propensity to give up smoking or something like that. This is what goes on. You can take that sort of data, link it to the administrative data and extend the range and usefulness in that way. You have to take that step. That is what I am saying.

Q11 Chair: Do the other witnesses wish to comment?

Professor Joshi: I would not be quite so optimistic, although I don’t really know what sort of database might be able to replace the census. As to the function of acting as a benchmark for sources of information that are not readily obtainable from administrative sources, neither are perfect, but the census tells you who is living with whom, which the administrative sources do not. It tells you about unusual combinations of people living together like husbands and wives who are more than 20 years of age apart, for example. You will never detect those reliably in a sample survey. It is something that is interesting from the research point of view and also from the policy point of view.

Q12 Chair: But they would exist inside public datasets of some sort.

Professor Joshi: If the datasets link those two individuals, yes.

Q13 Chair: Just reflecting back on Professor Blane’s observations about the Scandinavian approach, presumably, they address issues like this by drilling down into separate datasets and putting them together, but having some legal protection to ensure that data are anonymised properly and so on. Is that how it works in Sweden?

Professor Blane: I recently read a comparative study of Finland, Italy and the UK. Finland and Italy both have linked registers, and they are green with envy that we have the ONS Longitudinal Study, which is based on census, linking people across the four censuses. The Finns and Italians, anyhow, seem to envy our census.

Q14 Roger Williams: The Chairman has covered some of the ground I was going to cover. Do I detect a little bit of a difference between academics, on the one side, and people who want to deliver policy, on the other side? One prefers the census. I am not quite sure about the comments that Professor Mayhew made. Is this digging down into data that is there already, or is it commissioning new work outside the census?

Professor Mayhew: No; it is data that are there already. There is no dataset that completely covers all the population or is 100% reliable, so you have to combine them in some way. We link the population to property registers. We have a set of rules by which you can confirm or not confirm, based on whether they are on more than one dataset and other rules, which I could explain.

Because I think I probably have as much experience as anybody in using all these datasets, I want to put on record the fact that you can look at household composition using administrative data by linking people to their addresses. You can look at the household demographics. In fact, we have a classification system of eight household types that are generated out of a subset of 120 types. They include things like single-parent families, older people living alone, three-generational families and so on. It is surprising what you can do with the existing data. There is always scope to improve, and a population register would probably enable further improvements. You are right, in a sense, that there is a demand out there from the policy community which is not quite the same as the requirements of the academic community, particularly those who have used the census for many years and have built up their research history agenda on that basis. It is different, but I think it is much better.

Q15 Roger Williams: Is there anything that either of you would like to add?

Professor Joshi: There are uses of census data for research-as you say, I have different requirements-to combine, with some certainty, a lot more information about people and the people they live with, how long the states that they have been observed in have gone on for, and whether they were in the same place doing the same sort of work 10, 30 or 40 years ago, which the Longitudinal Study can tell you. As Professor Blane said, it is one of the major longitudinal research resources that this country has built up. Much use has been made of it, and much more would be made of it if it continued to accumulate.

Professor Blane: My use of the census is much more limited than that of my colleagues. I use it in the context of the British Longitudinal Studies. I do not know whether the members of the Committee are aware of it, but Britain remains the envy of the world in the richness of its longitudinal datasets. For example, if you take the 1946 birth cohort, these people have been tracked right across their lives and they are now in their 60s-my age. When you track people over that length of time, obviously, people drop out. What you need is a population count that tells you how representative the people left in these Longitudinal Studies are and what the directions of a selective bias might be. I do not see any other alternative than the census for that purpose, and that is the purpose I use it for.

Q16 Roger Williams: Does the fact that the census exists in any way remove an incentive for people to look at novel and new ways of analysing data?

Professor Mayhew: It probably does, for a combination of reasons. One is that people may lack knowledge about the availability of the administrative data that exist, and it takes time to learn that. Secondly, there are the different interpretations of whether or not you can have access to that data. In that area, there is scope for a lot of improvements. Hypothetically, you could be involved in a study that requires administrative data from five different data owners, which means five different sets of negotiations, getting ethical approval and governance, depending on what the research is for. That can be extremely off-putting. I would like to see some changes to that where, perhaps, the individual researchers are more regulated or licensed to use that data in a trusted and secure way, rather than having to go through all of those barriers. That is a barrier to change. Of course, the census is freely available. If you are used to using it, you will tend to fall back on it. If you can reduce those barriers and change the environment further, then that would be helpful.

Q17 Roger Williams: I guess that one of the virtues of the census is that it is a complete piece of work or as complete as we can get, anyway. Therefore, there is little of a sampling element in it. If it was replaced by a sampling technique or different sampling techniques, slight changes in how those samples were selected or decided upon could have quite profound results on the work that was being done. In the way forward that you are suggesting, is that a weakness?

Professor Mayhew: You seem to be saying, if I may say so, that the census is some kind of gold standard, but it is not, because there have been difficulties with the census in the past, such as low response rates and imputation in areas of London with a 70% to 80% response rate. From my point of view, those potential levels of accuracy count against the census.

Suppose you were designing a survey today and you wanted to have an ethnically representative sample of your population, you used the 2001 census and you were doing it in London. You would get a very misleading sampling framework if you adopted that approach. It is not a straightforward question. It is not one or the other. You have to look at it on its merits. There is no easy answer.

Q18 Chair: If we went down the road of some alternative method of collecting data, in some parts of the publicly collected data there is already an element of compulsion. You are supposed to register to vote, but we know that people do not, for example. That is one source of data about people. In the absence of a census, would it require there to be a great deal more compulsion on the data provider to ensure that there was accuracy in datasets?

Professor Mayhew: We have been working with administrative data for 10 years and we have noticed steady improvements in the quality, because there are now British standards for addressing or referencing records in datasets. That is noticeable, but you can always take that further. When you are using them, you have to look at how they are put together and how they are maintained in order to form a judgment in terms of whether they are suitable for the purpose for which you want to use them. There have been huge improvements over time. Something like this would lead to further improvements.

Professor Joshi: Administrative datasets that are being put together are not national. Even the four administrations of the UK generate their data in different ways at the moment. There is a question of how you harmonise the standards of collection and the content. Some places will, undoubtedly, have more problems and/or better quality data than others.

Let me come back to this question about innovation, which Professor Mayhew mentioned. In the past 15 years there has been tremendous innovation in the way that social scientists analyse data with the IT and statistical resources available to them. There has been a response from the ONS to enable social scientists to use micro data in a secure way which makes it much more useful scientifically but, nevertheless, preserves the confidentiality which is quite rightly assured to census informants. Doubtless this structure will continue to evolve, but, if it is evolving to preserve the different sorts of confidentiality of different sorts of informants in a hybrid dataset, that is a challenge which might well be met, but it needs to be thought about.

Professor Blane: If you are looking at alternatives, you want to pay great attention to issues of the reliability of the information and its completeness. I use two sets of datasets of administrative data that come from the national health service. They are the Hospital Episode Statistics and the General Practice Research Database. The Hospital Episode Statistics are very difficult to use because people get classified differently. The General Practice Research Database is almost unusable because general practitioners will not register the diagnostic categories in a consistent way.

My colleagues who have been civil servants tell me that academics are used to cleaning research databases, but the civil servants always laugh and say, "You have seen nothing until you have faced the problems in administrative data." A real ace like Professor Mayhew, who is motivated and skilled, can cut through a lot of problems, but I worry that most things are going to be done by people doing routine work. They are not going to be as motivated and skilled as Professor Mayhew. The potential for introducing error into the data is enormous.

Q19 Pamela Nash: We have talked a lot this morning about the problems with the census as it currently stands. Could each of you reflect on the usefulness of the census, particularly in planning ahead for public service provision?

Professor Mayhew: I have tried to comment a little on that already. The fact that it can be up to 10 or even 12 years old before it is replaced is a major issue because of population change. The entire landscape has changed in some areas, particularly in London, where we do a lot of work. Also, the spatial granularity of it is too coarse to aggregate it to drill down and answer some of the questions that health commissioners and local authorities have, whether it is about estates, brownfield sites or something like that. The way in which the data are delivered to you does not get you as far as you want to be. What they want is something much more flexible so that they can flex age groups, time periods and geographical areas that they are interested in, whether they are neighbourhoods or whatever. The demand for this level of information has been part of a trend over the last 10 years since we have been involved in it, but it is increasing with concepts like localism and the democratic issues of whether all people who are eligible to vote are actually there. All those things add to the pressure on getting better intelligence to support local decision making. I do not think that the census really meets, or will ever meet, that kind of agenda.

Q20 Pamela Nash: Do any of you have anything, additionally, positive to say about the census and the uses that we still have for it?

Professor Blane: I have two points. I have already mentioned the thing that I use it for most, which is to make best use of Britain’s investment in terms of tens and hundreds of millions of pounds and 60 or 70 years of research effort in the longitudinal datasets. We need something like a census to take account of inevitable attrition from these longitudinal datasets. Secondly, one study, the ONS Longitudinal Study, links the same individuals in the censuses of 1971, 1981, 1991, 2001 and 2011, which is a unique dataset because it takes Britain from the end of the post-war settlement in 1971, through the industrialisation in 1981 and the feminisation of the work force in 1991, to globalisation in 2001. You can look at these large-scale social processes and their impact on individuals’ lives. For me, the ONS Longitudinal Study is not the major issue, but it would be a definite loss.

Professor Joshi: For the study of what is going on in local areas, the census is an element; it is not always perfect or complete, especially in London. All sources of information put together would improve policy makers’ understanding about what is going on and researchers’ understanding of the process. I do not think that the census is perfect, but I do not think it is irrelevant. For understanding less local issues like trends in the labour force, trends in the family or projecting what the birth rate is going to be, the census is a complete treasure trove of evidence about the social fabric on which many other data sources build. It is like a tapestry; you can embroider on it, but the framework that the census gives you is the basis.

Q21 Pamela Nash: The census still has support as being a very useful historical document, but would you agree that that is the bigger use of the census information or does it still have a role in planning for the future?

Professor Joshi: Certainly, it would have a role for historians in the future. If the census series of evidence came to a halt, there would be dismay by future historians, but I am not sure that that is your prime concern now. It has laid down this fantastic document of how Britain has changed over more than a century. To know where you are in the picture of long-term change should be of importance to policy makers as well as to social scientists.

Q22 Pamela Nash: It seems that population changes much more quickly than the census is able to capture at 10-year intervals at the moment. Do you think there is a possibility of changing to a smaller census which would be more frequent?

Professor Mayhew: There is a possibility, but I am not sure that it would be valued by local authorities and people at that level, because a sample is a sample and you cannot generalise easily what is happening in London with what is happening in Cornwall or other parts of the country. There is an argument for smaller-scale surveys which ask questions about, perhaps, income, journey to work, commuting and that sort of thing. I wonder whether some of those kinds of issues could be piggy-backed on to existing national surveys, like the Labour Force Survey and other surveys. A more creative use of existing surveys, coupled with more use of administrative data, will get us to a point where we can ask the next question of whether we need a population register or something like that, which is being proposed, but which is a much greater step, in my view. A carefully calibrated mix and use of surveys of administrative data actually gets you to a better place.

Q23 Pamela Nash: Is there anything either of you would like to add?

Professor Blane: Yes. There is the new Understanding Society annual panel study. It is 40,000 households and 100,000 individuals and is a random sample across Britain. On an annual basis, that will fill in the gaps between the 10-year censuses. We already have the research infrastructure in place to look at the short-term changes.

Q24 Pamela Nash: Do you agree that technology could be used to improve the cost-effectiveness and efficiency to continue the current census as it stands?

Professor Mayhew: You can always improve it to some extent, and there have been huge improvements this time round with the address database and other things that are used. Ultimately, I do not think it is the right model for collecting these kinds of data. You are never going to be as efficient as the Scandinavians have managed to be and what I think is possible today. While you can improve it, I question whether it is the right model for collecting that kind of data.

Professor Joshi: There is the technology of analysis as well as the technology of collection to be thought about. That will change. It will have to change when there is room for innovation and improvement there.

Going back to your question about having a smaller census, there is one model where you would have a rolling census. Every year it would be in some local authorities but not in others. I would not like to hazard a guess about whether that would save any of the cost, but it might make the whole thing slightly more up to date if you updated the local estimates with all the other sources of information there are locally, and every 10 years you replace it and have a census, but not necessarily in a particular year. It may be more cost-effective, but I am not sure.

Professor Blane: There is a sea change in people’s preparedness to answer public surveys which, undoubtedly, is the problem with the census. If people knew that every time they filled in a tax return, they went to their doctor or registered or insured their motor car the information was going to be used for an invasion of their privacy, you are going to get the same problems with administrative data, which might be of greater concern to Government Departments. We just have to face the fact that we live in a changing world. I do not think that technology is going to solve the problem that people are less deferential.

Q25 Chair: In your earlier remarks about the Longitudinal Studies, you were talking about yourself as a researcher using historic data, and then Professor Joshi referred to historians of tomorrow. You and I will have filled in about the same number of census forms, I am guessing. The one thing that I recall from them is that every one has been different. What is the definition of the core dataset that is mission critical for the kind of Longitudinal Studies that you are interested in and future historians would be dismayed about if it were not collected?

Professor Blane: Professor Joshi will give you a better answer to that question. A problem with a survey is that everybody wants more. It is the responsibility of the person leading it to say what you cannot include. The census does a pretty good job of including core information. In recent years, when they have added new questions, by and large they have been very fruitful in terms of our understanding of social processes. The question that was added in 2001 about informal caring, "Are you looking after someone who is ill?", or what have you, has been really important in terms of understanding work-life balance. In 1991 there was a question that was added about limiting long-standing illness, which has been very important in terms of understanding levels of functional disability in the population, the relationship with people who are categorised as permanently sick within the labour force and so on.

Professor Mayhew: But some of that data is available in existing surveys: for example, the Health Survey for England.

Chair: Stephen wants to take this a bit further.

Q26 Stephen Mosley: Thank you, Chair. From what you are saying, pretty much all the data that are available in the census are actually available in other data sources, but, from where I am sitting now, these data sources seem to be spread across a wide range of Government institutions and organisations. The one advantage with a census is that it is in one place. You can go to the ONS website and go down to the lowest super output area, wherever it is, and see all the data. The problem that I can see in the future is that you will have all this data in different places. That might be okay for a Government institution which has millions of pounds to spend on sorting it all out, getting access to the data and producing it in a useful format, but, for the researcher in a university or a social historian, how are they going to gather all of these pieces of data together? Can you see some way how all of this data can be brought together so that it is in an easy-to-use format to give us the advantages that we can only get with the census?

Professor Mayhew: It is a question of management, transition and vision. Having a core academic dataset that mimics or extends what is already available in the census is something that we ought to be thinking about. You are absolutely right that there are different data owners. The owners of the data that we use are the local authority and the primary care trusts. That is sufficient to count the population in considerable detail at the level that they require. With all these other add-ons and so on, you may have to have a rolling series of surveys, like the existing surveys, but you need the vision that will pull down that data and get it into a form that can be used.

Q27 Stephen Mosley: Does that vision exist at the moment?

Professor Mayhew: No, it needs to happen.

Q28 Stephen Mosley: Thank you. That is something we can work on in our report. We are talking about the historical record, which is something I am particularly interested in. I think it is great that you can look at the 1901 census results and see what your great-grandfather was doing, where he lived and all those kinds of things. Would you envisage in future a snapshot of this data being taken on one day every 10 years, having it as a snapshot and putting it back for historical purposes, or do you imagine that the whole thing would just be a rolling process? It would be constantly updated and in 100 years’ time all of this data will be on a CD somewhere, or whatever they will use at that time, but there would be no actual snapshot?

Professor Mayhew: My colleagues may be more expert than I am on this. Genealogists, for example, probably go back to the original registration records that are held in Southport to do a lot of the tracing work that you are reflecting. Beyond that, I cannot comment. It is something we might need to look into to see whether that exists in administrative records already: i.e. the births and deaths registration system. Maybe Heather would like to comment.

Professor Joshi: You would need to devise some form of database which could be interrogated and linked by users in the future as well as the more immediate policy users, once you had put things together. I am not saying that it is impossible but it is quite a challenge if you are going to put people’s names on it. You could think of maintaining the Longitudinal Study database. It is quite a challenge to keep following those people if they are not being followed in the census.

Q29 Stephen Mosley: Coming back to something you said earlier, you mentioned that there is no database of individuals, essentially, or database of addresses. Do you need something like that to act as a key to the database?

Professor Mayhew: Addresses are key. There is an address database for the whole country. In fact, there are two or three. I am not an expert on this. The one I use is called the Local Land and Property Gazetteer, which is a geo-reference address of every property in an area which links through into council tax and so on. You assign individuals to addresses within that controlled framework where you validate people’s current address and so forth. Going back to the original question, has that answered it?

Q30 Stephen Mosley: Yes. So you have this gazetteer of all the addresses. Do you then go in and cross-reference the stuff yourself, or is it a central cross-reference?

Professor Mayhew: Some local authorities have reached the stage where they allocate what is called a UPRN-Unique Property Reference Number-to every record, but many have not reached that stage. The first step we use is that we assign people’s records to a UPRN and then we proceed from there. If you wanted to think in terms of making this whole process more efficient, the first thing that could be done is to link all administrative records to the UPRN, and that would make a huge difference to the quality of the data and the processing of the data in future.

Professor Blane: I am a bit peripheral to this issue, but I am aware that within the civil service there are problems of linking data. The example I know about is with the ONS Longitudinal Study where, for about 20 years, there has been talk of linking in people’s benefit records from the Department for Work and Pensions. Some years it is on and some years it is off. There is a big problem about the Data Protection Act and whether a civil service department will release data to another civil service department because of the implications under the Data Protection Act. It could be that the culture in Britain is different from that in Scandinavia-that this relatively legitimised linkage in Scandinavia is foreign to the culture of Britain and that it would not work.

Q31 Chair: That is nothing to do with the Act. That is how the data are collected, is it not? If, on collecting the data, the Benefits Agency, for example, said, "This information is going to be made available to the HMRC", that is all they have to do to clear their back. It is not the Act that stops it.

Professor Blane: It is the legal advice of the DWP, as I understand it, that you cannot link to the ONS Longitudinal Study without informed consent.

Q32 Chair: What is the evidence that informed consent would be withheld by honest people?

Professor Blane: Within a medical context, people often do. That is all I can say. They feel that it is their private business.

Professor Mayhew: There is no doubt that there is sensitive data and there is not so sensitive data, and health records are an example of sensitive information. Core information for counting the population is everyday low-level information. There is already a perception in the population that Government already merge these records, and they are surprised to be told that that does not actually happen and all this information exists in different departmental silos across Government. They cannot understand why that information is not-

Q33 Chair: Every MP who has dealt with a Child Support Agency case will be told, whether by the custodial parent or the non-custodial parent, that they are surprised that this data cannot be pulled together.

Professor Mayhew: You are absolutely right. All these complexities come together at the level of children’s social services. That is why we get all these problems because everybody hides behind confidentiality. These are all, ultimately, bureaucratic issues. They are counter to common sense, really, in many cases. That ought to be looked at.

The point I wanted to make is that there is a key difference between the use of administrative information for statistical purposes and the use of information about the individual. Very often these things get confused. Section 33 of the Data Protection Act states that we can use administrative information for research purposes, but it needs to be made absolutely clear, from a research point of view, for managing local resources and managing the resources of the country, that that is a legitimate use and it does not have any of the normally associated dangers.

Q34 Chair: Just before I ask Stephen Metcalfe to come in, your previous answer to Stephen Mosley about linking people to a unique property number would instantly create a universal electoral register that is accurate.

Professor Mayhew: It is funny you should say that because we have just done some work in one part of London looking at precisely that issue. It would make a huge difference, yes, although this particular local authority already links to addresses. The problem it has is that, when it comes to update the register every year, it get lots of people who do not respond. Something like 30% of its electorate is not registered. So it is not the UPRN that is causing the problem. It is just the flow of information issue and the turnover of the population.

Professor Joshi: I would like to say something about that. Addresses may be unique but people may not be uniquely associated with addresses, as the census finds visitors, second homes and children moving between parents. It is a good start but it is not perfect. We get evidence from various sources, administrative and the census, about where the adjustments have to be made on the edges.

I would also like to say two things about getting consent from people who are getting benefits. First, it does not get informed consent from people who have had benefits in the past, so you would not be able to link that information, and, secondly, you would not be able to get information about people who do not collect benefits, people in that age group or demographic group, who are not unemployed and are not collecting carer’s allowance, for example, so that you can see what the rates of claiming these benefits are. You need some other information on the population that is not a beneficiary in order to make best use of the administrative information.

Q35 Stephen Metcalfe: I think we established this morning that there is quite a large divide between the needs of academic research and those of public policy research. Therefore, the question is whether the Government should be funding those two tracks separately. Should it be funding the collection of social science research as a public good and then funding policy research separately? Do you have views on that?

Professor Mayhew: The priority to have good quality core population information is absolutely key. Whatever else you decide, it should sit on top of that. That is why I said earlier that we need a vision of how we are going build the research needs on top of that core information. That means looking at access to administrative data, dealing with that issue to get the core right, and then looking at what needs to change over and above that by looking at the existing surveys or other arrangements. We need a vision to make that transition. That is the way things will go in the future. The population will always drive the agenda, whether it is about migration or simply counting the number of people living in an area. All of those things are key.

Professor Blane: I never thought of the issue before you asked the question. While Professor Mayhew was speaking, I was imagining the debates that are going to go on between the Medical Research Council and the Economic and Social Research Council about what proportion each research council should contribute towards the research component of the decennial census. I could imagine that this would go on for ever. It is probably easier just to recognise that it is a multi-purpose survey and funded centrally, as it is.

Q36 Stephen Metcalfe: Stick with what we have got is really what you are saying.

Professor Blane: Yes. We must continue to modify and adapt it to new circumstances. I think split funding might be a bit complicated.

Q37 Stephen Metcalfe: Do you want to add anything to that, Professor Joshi?

Professor Joshi: I certainly agree that history reveals that, when you try to support big data resources for research across research councils, it is another challenge. The micro datasets are of our public goods. They would mostly be used by academics, but they will be informing policy makers as well as science. It is probably best to keep their funding integrated, as David suggests.

Q38 Stephen Metcalfe: I have a couple of final questions. Assuming we keep the existing arrangements, you mentioned earlier about tracking individuals through that. Is the census, as it stands, the best way or the main way of doing that, or are there viable alternatives of tracking individuals over a long period of time?

Professor Blane: The alternatives are the annual panel studies. I am always full of admiration for the people who agree to be part of an annual panel study, because some demented social scientist comes to them once every year of their life and asks them questions. I have forgotten your question and was getting carried away with the thought of this.

Q39 Stephen Metcalfe: The first is the extent that it is necessary to track individuals, and, secondly, is the census the best way of doing it?

Professor Blane: The annual panel studies have small numbers. If you want to look at a small group like Chinese people in Britain, there are too few to be able to get any statistical power, so you do not know whether it is just an odd person or whether they are representative of Chinese people in Britain, whereas in the decennial census you have the whole population. In the ONS Longitudinal Study, which is a 1% sample, you have half a million people. You could look at Chinese people in Britain through the census or the ONS Longitudinal Study, but you could not in one of the smaller-scale studies.

Professor Joshi: We also have the birth cohort studies. A new one is just about to start. All these longitudinal data resources complement each other because they have different features and different periodicities, and the longitudinal sample of the census does form a standard to which the others can be linked. The others elaborate detail, but the census and its longitudinal sample give the long-term rates of change. They do not give you the year-on-year changes. They give you decade-on-decade changes, for which you have to wait for a terribly long time if you are collecting data prospectively.

Professor Mayhew: One administrative route that we have used is to combine NHS numbers, which are with you all your life, with address level information-UPRNs-and you can track changes of address and movements into and out of areas. We have done it at a population level of up to half a million, but, if you were to do that at the national level, you would probably be looking at two databases-the Child Benefit database and the DWP database. If you could populate those data with UPRNs, you would be able to monitor statistically movement whenever and wherever you wanted, subject to the accuracy of the underlying information, which should be about 95% to 98% accurate.

Q40 Stephen Metcalfe: In the absence of a census, if we were to change the way we collected information and moved to alternative models, would you all suggest that that data be made available publicly-not necessarily the details of the individuals but the headline numbers?

Professor Mayhew: Although we put together datasets from individual local data when it is handed over to users, it is completely anonymised and you cannot identify any person or individual in it. At some stage in the production process, it is inevitable that you are going to be handling that kind of information.

Q41 Stephen Metcalfe: You are handing it over to the user. Should it be publicised but not published, effectively?

Professor Mayhew: The work we have done has been published on the web by local authorities, yes. I know quite a lot of examples of that.

Q42 Stephen Metcalfe: A lot of social scientists are employed by the taxpayer. All three of you touched upon standardising or regularising the way in which data are collected and the way in which they are put on to systems. Do you think the ESRC should impose a standardisation requirement when it is funding social research?

Professor Mayhew: The university of Essex runs the data archive and it tries to maintain standards, but I do not think that they are universally adhered to. In principle, of course, you are right, but thinking of the SRC as the ultimate authority on this is a bit tricky.

Professor Blane: The only standardisation that the SRC would claim is top quality research which can take many forms.

Professor Joshi: In effect, the way that the census classifies variables, without any compulsion, does form a guidance for other people collecting data. If you look at collectors on ethnic groups, that is not the way that the ONS classifies it. That is not going to be so useful. There is a problem also of harmonising with other countries.

Q43 Chair: Presumably, just looking at the other side of the coin, in relation to data that are held by Government, which are accessible to researchers, there has been some minor move in the right direction, but it would help tremendously if there was a consistent format in which Government held data about us.

Professor Mayhew: And a consistent policy on the release of that data for statistical purposes.

Q44 Chair: The day before yesterday we had the debate about health data. It was clear from that that there needs to be a public dialogue and an agreement or a contract between citizen and state that has some longevity associated with it.

Professor Mayhew: Yes. Medical records and that type of information is in a different category to basic standard administrative information about where you live and whether you are on benefits and that sort of thing. You are absolutely right that having a standard approach across Government is important. I do not think that we are far off that. In principle, they will use common identifiers. There is not complete consensus on this. Some use the NHS numbers and some use a child benefit number, or whatever that is, or an NI number. You can create what are called "Look up tables" to link those identifiers together and they could then be used for statistical purposes. It may even be done at the moment.

Q45 Chair: The one that always amuses me is the CSA, which uses a reference number that is longer than the number of people on the planet. There you are.

Professor Blane: You have to qualify the idea that medical records are in a separate category. If you look at the things that people can be blackmailed for, few of them are to do with their health. Psychiatric illnesses would be an exception. The things that people are blackmailed for are much more about their criminal past and so on. There are many areas of state data which are highly sensitive.

Professor Mayhew: I agree that criminal data are not in the same category.

Chair: Can I thank you for your contributions this morning? It has been very informative. There are some challenging issues hidden under all of this. Thank you very much indeed.

Prepared 14th December 2011