Behaviour Change - Science and Technology Committee Contents


CHAPTER 6: EVALUATION OF BEHAVIOUR CHANGE INTERVENTIONS

6.1.  A common concern raised by witnesses was the need for greater consistency in the quality of evaluation of government behaviour change interventions, with many suggesting that this was a significant area of weakness.[207] Rigorous evaluation is necessary not only in order to establish whether policies are working, whether they can be improved and whether they represent value for money, but—in the context of behaviour change interventions—also whether they contribute to the development of a much needed evidence-base for the effectiveness of interventions at a population level (see paragraph 3.1 above).

Ensuring effective evaluation

6.2.  Witnesses identified the following factors as necessary for effective evaluation:

  • building evaluation into a policy design from the outset;
  • good outcome measures;
  • longitudinal data;
  • the use of controls wherever possible;
  • sufficient funding for evaluation;
  • data on cost-effectiveness.

BUILDING EVALUATION INTO POLICY DESIGN

6.3.  Several witnesses called for evaluation to be built into policy design from the outset,[208] not least because, as Professor David Gunnell, Professor of Epidemiology at the University of Bristol, observed, it was often the case that "once the policy has been formulated and begun to be rolled out, it is too late to build in an effective evaluation".[209] It is unfortunate, therefore, that, according to Professor Ray Pawson, Professor of Social Research Methodology at the University of Leeds, many evaluations are put out to tender and organised in a way that "excludes the evaluation team from the programme design stage", meaning that they have no influence on fine-tuning an intervention, are unclear about whether results will demonstrate success or failure and often discover that the intervention is "half-baked".[210] Others also highlighted the need to bring in academics at the start of the policy development process to help ensure that evaluations are appropriately designed.[211]

6.4.  The Treasury guidance on appraisal and evaluation (the Green Book) works on the basis of the ROAMEF (rational, objectives, appraisal, monitoring, evaluation and feedback) policy making cycle and so appears to provide for evaluation after implementation.[212] We were encouraged to hear however from Siobhan Campbell, Head of Policy at DECC, that evaluation should not be considered only as the "E" in the ROAMEF but rather that "at the rationale, objective, appraisal stage, there is an expectation that you are using evaluation evidence to inform these decisions that you are making along the way",[213] and Dr McCloy, ESRC Research Fellow in GES, said that one of the key aims of her work with the behaviour change network was to get policy makers to build evaluation into the policy design process at an early stage.[214]

LONG-TERM EVALUATION AND OUTCOME MEASURES

6.5.  Several witnesses identified the absence of longitudinal evaluation (that is, evaluation over a prolonged period) as a significant problem.[215] Dr Ian Campbell, Medical Director of Weight Concern, for example, suggested that attitudes in the public sector tended to be short term and that funding was not available to measure outcomes over the longer term.[216] Anne Milton MP, Parliamentary-Under Secretary of State for Public Health, appeared to confirm this observation when she suggested that measures which would show, for example, a reduction in alcohol-related liver disease would take a long time and that that could lead to the accusation that the department was doing something which "maybe will favour our successors in Government".[217] Phillip Darnton, Chair of Cycling England, and Professor Pawson said that another problem was that those conducting evaluations were sometimes asked to report prematurely.[218] John Dowie, Director of Regional and Local Transport at the DfT, agreed that this had been the case with the Sustainable Travel Towns programme (see Box 11, page 46).

BOX 11

Sustainable travel towns pilot
Professor Peter Bonsall, Professor of Transport Planning, University of Leeds, criticised the evaluation of the Sustainable Travel Towns pilots on the ground that "rather too many of these initiatives have been evaluated predominantly by self-reported behaviour, which is ... a recipe for disaster" and that a number were "evaluated by the same team who did the work".[219] Mr Dowie further noted that "because of the drive to get evaluation evidence out" the evaluation would not establish the longitudinal effects of the intervention,[220] meaning that any conclusions about the long-term effectiveness of the intervention and its cost-effectiveness could not be established. We note however that this was a pilot and used randomisation and control techniques effectively, therefore meeting some of the criteria for effective evaluation.

6.6.  It was suggested to us that a further weakness of the Government's current approach to evaluation is how outcome measures are framed.[221] Outcome measures should be specific, objective and consistent across trials. Appropriate outcome measures enable the success of policies to be monitored throughout their implementation and therefore allow an assessment of their effectiveness in the shorter term, even if the evaluation is intended to continue through to the long term. Several witnesses expressed concern about inappropriate use of outcome measures for behaviour change interventions. In particular, it was noted that attitudes and self-reported behaviours were often used as measures of a behaviour change intervention.[222] The Government acknowledged that sometimes "evaluation has been distorted by being focused on customer attitudes and programme outputs, rather than outcomes".[223] The Targeting Benefit Thieves Campaign provides an example of this confusion (see Box 12, page 47).

BOX 12

The Targeting Benefit Thieves campaign
The Targeting Benefit Thieves campaign began in 2002 and was designed to reduce fraud and error in the benefit system. The Government noted that "the campaign tracked people's attitudes and self-reported behaviour as a result of seeing the campaign. Tracking research indicated that the proportion of claimants who consider it 'very easy' and 'fairly easy' to get away with benefit fraud declined from 41% (Oct 2006) to 29% (March 2010). The proportion of claimants agreeing with the statement, 'the chances of getting caught abusing the benefits system are slim' has declined, falling from 39% (Oct 2006) to 21% (March 2010)".[224] No information was provided about evaluation of the primary interventions outcomes of reducing fraud and error in the benefit system.

6.7.  We were particularly concerned by confusion between outcomes and outputs within DH. For example, in relation to the Great Swapathon, we were told that the outcome of the intervention was "to create a million swaps"—although, shortly afterwards, we were told that the outcome measure was, in fact, a decreased health burden in the long term.[225] Measuring the Great Swapathon according to the number of swaps exemplifies a point made by NICE that "evaluations of behaviour change interventions frequently fail to make a satisfactory link to health outcomes".[226]

USING SUFFICIENT CONTROLS AND EVALUATING COMPLEX INTERVENTIONS

6.8.  Some evaluations, including the Sure Start and the Health Trainers evaluations (see Boxes 13, page 48 and 15, page 49), have been criticised for not including any, or sufficient, controls. According to Dr Steven Skippon, Principal Scientist, Shell Global Solutions, insufficient controls and poor methodology would increase the chance "that confounding variables will confound the answer".[227] Choosing a method of evaluation is not straightforward. Several witnesses thought that randomised control trials (RCTs) were the "gold standard ... of evaluation".[228] Professor Pawson, on the other hand, challenged this view, arguing that, in some cases, demonstrating the effectiveness of a policy would sometimes require a "comprehensive" or a "multi-method evaluation"[229] rather than a simple "policy on, policy off" comparison.[230] Professor Pawson's argument was supported by other witnesses and we note that Defra's evaluation of their food waste policy uses a range of methods (see Box 14, page 48).

BOX 13

Sure Start
The evaluation of the Sure Start programme was criticised by Professor Gunnell because it could have been rolled out in a way that would have allowed "more robust, randomised evaluation ... it could have been done with better collaboration between researchers and policy makers".[231] Dr Halpern also said that Sure Start arguably did not build in sufficient controls.[232]

6.9.  Where RCTs are not possible, some witnesses suggested a natural experiment design (where the evaluation is not by way of a randomised experiment but controlled using existing variation);[233] others suggested a "stepped-wedge" approach (where a policy intervention is rolled out to participants at different times). Small-scale pilots and demonstration projects could also ensure that controls were established—indeed, Professor Britton thought piloting was "crucial".[234] Pilot groups were used effectively for establishing and improving smoking cessation interventions (see Box 1, page 20) and the Sustainable Travel Towns programme made good use of demonstration projects (though in other areas its evaluation could have been improved) (see Box 11, page 46). Professor Kelly noted however that the pilots themselves need to be evaluated properly,[235] and Dr Chatterton cautioned that sometimes pilots would not show the extent of the effects of an intervention at a population level.[236]

BOX 14

Food Waste

Defra's food waste reduction programme, delivered in collaboration with Waste Resources and Action Programme (WRAP), was evaluated using a range of methods, both qualitative and quantitative. Self-reported behaviour, including assessments of awareness and understanding of the campaign, understandings of how to reduce food waste and commitment to reducing food waste. This was accompanied by data about the quantities of food in the waste stream and information from the retail environment about purchasing habits.[237]

6.10.  We have already concluded, on the basis of the evidence we have received, that using a range of interventions will often be more effective in changing behaviour at a population level than using a single intervention in isolation (see paragraph 5.13 above). But multi-faceted interventions can make evaluation more difficult because of the difficulty in discerning the relative effectiveness of the components of such interventions. Professor Bonsall, for example, said: "it is impossible ever to untangle the particular effects of different components. One can only talk about the packages and draw inferences from the effect of different packages".[238] Professor Britton argued that, while it would be practically possible, disaggregation would take too long. But this, he suggested, should not deter policy makers because, in his view, where there was a sufficient evidence to support trying an intervention, then it should be tried without prevarication.[239] Mr Letwin took a similarly practical approach: what mattered most for government was to be able to judge whether what they had done was working—"it is less important, immediately and practically, to know which bit of it is working".[240]

6.11.  Dr Harper, Chief Social Scientist at Defra, however, appeared to be more cautious: there was, she said, more to be done across government "in establishing what works and what's worth investing in, in terms of specific components of packages".[241] She suggested that one approach would be to use more pilots.[242] Mr Baker agreed: "it is possible, even within a complex matrix of interventions, to work out which ones are having particular effects"[243] and Professor Michie drew our attention to the Medical Research Council (MRC) guidance on evaluating complex interventions.[244]

FUNDING

6.12.  Rigorous evaluation, especially using long term objective outcome measures and population-representative samples, cannot be undertaken without adequate funding provision. The Health Trainers programme (see Box 15, page 49) provides an example where the absence of funding impacted adversely on evaluation.

BOX 15

Health Trainers

The Health Trainers programme was cited as a policy based on good evidence (see paragraphs 7.22-23 below) but several witnesses told us that the evaluation could have been better. The BPS health psychology team advised DH that the programme should be rolled out in stages in a 'stepped wedge' design but this was not possible because of insufficient funding.[245] As a result, the programme was evaluated by comparing data before and after the programme. Judy White from the Yorkshire and Humber Health Trainer Team told us that that the quality of the data collected was variable, making it difficult to conduct thorough analysis.[246] Ms White was unable to provide any comprehensive data on the basis of the evaluation of the Health Trainers programme, particularly in relation to cost-effectiveness.[247] This was in stark contrast to Weight Watchers and MEND, which provided figures for the effectiveness and cost-effectiveness of their interventions (see Box 16).[248]

COST-EFFECTIVENESS

6.13.  Publicly funded behaviour change interventions should provide value for money. That is self-evident. It was disappointing to find therefore that although private sector companies—such as the Weight Watchers and MEND programmes (see Box 16, page 50)—were able to provide detailed evaluation data (though we did not assess the methodology by which these data were established),[249] we had difficulty in sourcing such data for the Health Trainers programme.

BOX 16

MEND and Weight Watchers

MEND

Paul Sacher, Chief Research and Development Office for MEND Central, said that, since the first MEND programme was delivered in 2002, the programme has undergone "feasibility, pilot, efficacy and effectiveness studies to fully evaluate the outcomes". The programme has been evaluated against a range of outcome measures, including "reductions in body mass index, reductions in waist circumference … improvements in things like cardiovascular fitness, physical activity and sedentary activity levels and again some of the psychosocial measures, so things like self-esteem and body image". Mr Sacher told us that an independent study demonstrated that "the incremental, cost effectiveness ratio of the programme is £1,671 per QALY[250] gained".[251]

Weight Watchers

Zoe Hellman, Company Dietician for Weight Watchers, said that an independent report demonstrated that the cost-effectiveness of the Weight Watchers programme was £1000 per QALY. The results of a recent RCT "compared having access to Weight Watchers versus standard care within primary care" across the UK, Australia and Germany. The study demonstrated that after a year "people who had access to Weight Watchers lost significantly more weight ... and the retention rates were higher".[252]

Conclusions

6.14.  Effective evaluation requires that:

  • evaluation should be considered at the beginning of the policy design process. External evaluation expertise should be sought, where necessary, from the policy's inception;
  • relevant outcome measures—as distinct from outputs—should be established at the beginning of the policy development process;
  • the duration of the evaluation process should be sufficiently long-term to demonstrate that an intervention has resulted in maintained behaviour change;
  • pilot studies, using population-representative samples, followed by controlled trials assessing objective outcomes should be used whenever practicable; and
  • sufficient funds should be allocated for evaluation, recognising that establishing what works, and why, is likely to result in better value for money in the long-term.

6.15.  We find however that, at present, evaluations of government behaviour change interventions often lack one or more of these necessary elements. While we welcome the Government's revision of the Magenta Book, the evaluation guidance for policy makers and analysts, we believe that it could be further improved. We recommend that the Government consult external evaluation experts on the creation of a concise document for policy makers, containing only the most important principles of evaluation. We further recommend that they make clear what steps they will take to ensure that the revised guidance leads to a change in evaluation culture across Whitehall.


207   BC 2, BC 31, BC 42, BC 47, BC 52, BC 76, BC 82, BC 83, BC 94, BC 105, BC 107, BC 108, BC 109, BC 110, BC 114, BC 138, BC 150. Back

208   QQ 26, 188, 223-4, 314, BC 9, BC 90. Back

209   Q 188. Back

210   BC 148. Back

211   BC 103. Back

212   http://www.hm-treasury.gov.uk/d/green_book_complete.pdf. Back

213   Q 185. Back

214   Q 26. Back

215   BC 6. Back

216   Q 349. Back

217   Q 737. Back

218   BC 148, Q 648. Back

219   Q 607. Back

220   Q 662. Back

221   BC 52, BC 73, BC 105, BC 110. Back

222   BC 5, BC 110. Back

223   BC 114. Back

224   IbidBack

225   QQ 356-8. Back

226   BC 52. Back

227   Q 593. Back

228   Q 188. Back

229   Q 203. Back

230   IbidBack

231   Q 196. Back

232   Q 30. Back

233   Q 219. Back

234   Q 168. Back

235   IbidBack

236   IbidBack

237   Q 284. Back

238   Q 612. Back

239   Q 170. Back

240   Q 717. Back

241   Q 60. Back

242   Q 75. Back

243   Q 722. Back

244   Q 88. See Developing and evaluating complex interventions, MRC (2008). Back

245   BC 43. Back

246   Q 404. Back

247   Assessing the Value for Money of Health Trainer Services, Lister (2010). Back

248   Q 432. Back

249   Though we note that QALY figures can only be used to demonstrate orders of magnitude and do not provide specific data and we have not investigated how these figures were formulated Back

250   Quality adjusted life year. Back

251   Q 432. Back

252   IbidBack


 
previous page contents next page


© Parliamentary copyright 2011