Memorandum submitted by Met Office (CRU 54)



What are the implications of the disclosures for the integrity of scientific research?

1. The UK enjoys a reputation for strong and robust science on the international stage. In the field of climate research the Met Office is widely acknowledged as world leading.

2. Whilst it would be arrogant to assume that any system or process is perfect, the codes and processes in place to govern science research across the UK, laid down by the Government Office for Science, the UK Research Integrity Office and Research Councils UK, form a comprehensive framework within which science research is produced and debated. In addition, the Met Office also adheres fully to the Civil Service Code and operates to the highest standards of integrity and transparency.


3. All published science is subjected to a rigorous process of peer review: a well-established method by which scientific evidence and claims, and importantly the methodology behind these, are scrutinised by qualified experts in the field. Peer review also promotes and maintains open debate across the science community - crucial to further developments in science.


4. Transparency and integrity are vital components in maintaining the Met Office's, and the UK's, position at the leading edge in climate science and we have wherever possible, but dependant on IPR ownership, released the underlying land temperature component used in the HadCRUT analysis. The Met Office's sea temperature component has been widely available for some time.


How independent are the other two international data sets?

5. There is strong evidence that the globe has warmed. Three independent global temperature data sets, HadCRUT, NCDC and NASA-GISS, all clearly demonstrate the rise in global temperatures over the last 150 years. Despite the large differences in the methods used to estimate global temperature trends, these blended analyses are consistent in their view of global temperature change.


6. There are numerous studies in the peer-reviewed literature that attest to the robustness of the surface temperature records, their independence and their non-reliance on specific individual station records. Support for the reality of surface trends also comes from reanalyses and changes in ocean heat content, glaciers, humidity and a host of other indicators including phenological data. Indicators from meteorological, oceanographic and physical measurements are strongly consistent with the surface temperature records discussed above.


7. The data come from numerous different technologies and have been investigated by numerous independent groups. Where multiple datasets exist for a given parameter none fundamentally disagrees with the expected signal for a warming world. For the surface records to be wrong would require all these other analyses to be similarly wrong.


8. Annex A provides a detailed explanation of the differences between the analyses with respect to data sourcing and methodology.


Annex A - Independence of the analyses


Independence of methodology

9. There is substantial independence between the methods used to derive the principal estimates of global land-surface air temperature trends: CRUTEM3 (Jones and Moberg, 2003; Brohan et al., 2006); NOAA (Smith and Reynolds, 2005); NASA-GISS (Hansen et al., 2001). The differences are summarised below and relate to choices of:


- source data;

- quality control and homogenisation criteria;

- gridding;

- merging of land and ocean data;

- in-filling for data void regions;

- calculation of the global mean diagnostic.


10. This independence of methodology means that the estimates provide an indication of the true degree of uncertainty in the global-mean surface temperature evolution.








Summary of differences in methodology between analyses

Aspect Addressed




Land Surface Air Temperature

Number of stations



c. 7300

Sources of station data

GHCN, READER, various regional studies, paper archives, US COOP network. Stations must have enough data to form a climatology (15 years in 1961-90) or have a WMO normal. Real-time updates primarily from CLIMAT messages.

GHCN which is made up of over 30 sources of data, most of which are not regularly updated. Regular updates are primarily from USHCN data in the U.S. and CLIMAT messages transmitted by WMO Members.

Unadjusted surface air

temperatures from the GHCN,

adjusted (except for urban

warming) USHCN version 1 data and

SCAR (Scientific Committee on

Antarctic Research). Sites must

have at least 20 years data.

Quality control procedures

Manual inspection, including real-time quality control using GIS software; quality control described in literature for the various regional studies.

A long series of automatic quality control tests based on both statistics and physics (e.g., outlier tests, identical values two months in row, etc.)

Some unphysical looking outliers and segments of station temperature series were eliminated after manual inspection.

Homogeneity adjustments

Visual comparison. Includes recourse to near-neighbour series. c.20% of stations affected. Regional studies have the breakpoint identification and adjustment procedures described in the literature applied.

Pair wise comparisons with neighbours to identify non-climatic step changes as well as trends and adjusting the data to remove those artificial biases.

If there are multiple records at a given location, these are combined into one

record adjusting according to the average difference during the period of overlap.

Urbanisation effects

Uncertainty model includes a one-tailed estimate (assumes a warming bias persists)

Addressed by the homogeneity adjustments methodology.

Urban and peri-urban (i.e., other than rural) stations (defined by night-lighting in

USA and by available documentation

elsewhere) are adjusted so that their

long-term trend matches that of the mean

of neighbouring rural stations. Urban

stations without nearby rural stations

are dropped.

Ocean data

Sources of data

Ships, buoys from ICOADS (1850-1997) and GTS (1998 on)

Ships, buoys

HadISST1: 1870-1981

Reynolds 11/1981-present

Unlike NCDC and HadCRUT employed SST products these are derived products that are spatially interpolated to be complete over the ocean sphere.

Quality control

Gridbox climatology based removal of outliers. Check that consecutive ship positions and ship speeds are consistent. Buddy check using near neighbours. Rejection list of known bad observations.

Gridbox climatology based removal of outliers.

HadISST1 and Reynolds are interpolated analyses so no GISS quality control is applied.

Homogeneity adjustments

Adjustments for transition from wooden to canvas buckets in early 20th Century. Corrections are ramped down from 1939-1941 as the number of non-bucket measurements in the data base increased.

Statistical adjustment for the transition between buckets and engine intakes based on a relationship between SSTs and night time air temperature and global metadata for the timing of the transition.

HadISST1 and Reynolds are interpolated analyses so no GISS specific homogeneity adjustment is applied.

Spatial interpolation, merging and calculating of global average

Merging procedures

Gridded land and ocean temperatures merged, weighted according to inverse error estimates on a gridbox-by-gridbox basis where both exist

Land and ocean grid boxes with data merged with a weighting based on fraction of the grid box having land versus ocean.

Land air temperature overrides

SST in grid-boxes where both

are available.


Accounting for data void regions

No infilling performed. The effects of incomplete sampling are accounted for in the uncertainty model.

Empirical Orthogonal Teleconnection Functions used to interpolate land and ocean data separately with limits on how far interpolation can be made. Areas of sea ice are set to missing.

A grid of 8000 grid boxes of equal area is used. Time series are changed to series of

anomalies. For each grid box, the stations within that grid box and also any station within 1200km of the center of that box are combined using reference station method.

Calculation of global average

Average of gridbox area-weighted Northern Hemisphere and Southern Hemisphere values. Avoids over-weighting better sampled Northern Hemisphere influence.

Area weighted analysis based on 5x5 degree grid boxes. Recent tests to evaluate NH+SH/2 revealed for the coverage we have using EOT functions the results are nearly identical with NH+SH/2.

A grid of 8000 grid boxes of equal area is used. Anomalies are averaged over the areas 23.6-90N, 23.6N-23.6S and 23.6-90S, then these three averages are averaged with 3:4:3 weighting to represent

their area.






The uncertainty model also takes into account incorrect / missing adjustments and temporal / measurement sampling errors.

The above was based on GHCN version 3 which is scheduled to be released in the spring of 2010.

Info from Hansen et al. 1999,

2001; and from

Code to calculate GISSTEMP is available from

Independence of basic land data and gridding

11. The CRUTEM3 analysis calculates anomalies for each station and therefore requires that there are data for a station (or nearby stations) during the period 1961-1990. Because some stations opened after this period, or closed before it, some short station records could not be included in the CRU analysis.


12. There are stations that are unique to each analysis reflecting both differences in methodological approach and personal contacts with potential data providers. Even where the stations are the same the raw data source, quality control procedures and adjustment protocols applied differ fundamentally between the analyses.


13. However, the set of stations used to update the datasets on a regular basis is more limited and is generally the same for each dataset. These are "CLIMAT" messages exchanged between National Meteorological Services and represent data for only those stations within their authority. Periodic efforts are made to substantially improve coverage by incorporating data from other sources and these data are incorporated in non real-time when they become available.


14. The IPCC report also included a fourth global land temperature data set: Lugina et al. The Japanese Meteorological Agency produced a global average (land with ocean) temperature data set by blending in a simple way GHCN data with their COBE SST analysis (Ishii et al., 2005). Figure 1 in Annex B shows the global surface air temperature anomalies (relative to 1961-1990) from these four global analyses.


Other tests of data independence

15. The three main datasets use as much data as possible in order to obtain a more accurate estimate of global temperature. However, various studies have shown that estimated global and hemispheric trends change very little when based on limited subsets of stations (e.g. Parker et al. (2009) and Figure 2 in Annex B).


16. Jones et al. (1997) showed that reliable global trends might be obtained from fewer than 200 well maintained stations. Peterson et al. (1999)'s subset of rural stations showed very similar trends to those derived from the full GHCN dataset. Using a worldwide network of about 270 stations, Parker (2006) obtained very similar trends to those produced by Jones and Moberg (2003) from their full network. The reason for this robustness is the geographical coherence of temperature trends (Figures 3.9 and 3.10 of Trenberth et al. 2007).


Independence of methods used to minimise non-climatic effects

17. Temperature records from weather stations can be affected by many non-climatic factors, for example, changes in instrumentation, the time of day at which measurements are taken and changes in the location of the station. These changes must be accounted for before the data are interpreted as real changes in land surface air temperature. This metadata is significantly incomplete in many regions and simple photographic evidence is insufficient, especially if it is used solely as a single snap-shot. Therefore although valuable, efforts such as can say very little about the long-term homogeneity of the network. Advanced statistical techniques applied to sequences of differences between neighbouring stations, however, can detect and quantify major discontinuities and relative trends in the absence of metadata. Adjustments can then be applied to the faulty series. These techniques can substantially improve the data records. If full metadata were available, even better results could be obtained.


18. The homogeneity adjustments made by CRU (Jones et al., 1986a, b; Jones and Moberg, 2003) were made to only a limited number of stations (~20%) and the sum total of these adjustments has a near zero effect on large-scale temperature averages (Figure 4 of Brohan et al., 2006). In addition, the likely size of errors arising from uncorrected or inadequately corrected stations was estimated and included in the uncertainty calculation (Brohan et al. 2006).


19. GISS and NCDC analyses are both based on the Global Historical Climatology Network (GHCN). Some of the methods used to homogenise data in the GISS and NCDC analyses are the same, but some are different.


20. Adjustments for stations in the United States HCN (USHCN) owing to changes in the time of day of measurements, station moves, and instrumentation changes are the same in both analyses. The homogeneity adjustments that are applied to the NCDC analysis involve a pair-wise neighbour based approach and impart many more adjustments than CRU.


21. In addition, GISS make an urbanisation adjustment. They use night light data (satellite photos of the earth at night showing areas that are bright and urban, or dark and rural) to classify stations as urban or rural over the US. For the rest of the world they use metadata on population contained in the GHCN data base. They then compare urban sites with nearby rural sites.


22. Different homogenisation and infilling techniques can give different results for individual stations and this is reflected in larger estimated uncertainties in poorly sampled regions (e.g. Brohan et al. 2006). However, despite the different techniques used, agreement between the data sets is good at global and hemispheric scales even at less well observed times (see figure 1 in Annex B).


Temperatures over the oceans

23. To get a truly global estimate requires sampling the 71% of the globe covered by oceans. The sea surface datasets used in each analysis similarly exhibit a degree of difference. There are at least six different analyses of sea-surface temperature (HadISST, HadSST2, Kaplan, COBE, ICOADS, ERSST3) that extend for more than 100 years and several additional data sets (e.g. OI v2, plus satellite estimates) that cover the satellite period. As with the land data there is substantial overlap in the raw data, with most long analyses drawing in situ observations from the ICOADS data base. There are differences in quality control, homogenisation and data reconstruction methods used to fill gaps in the data. In the recent period, trends from in situ data are corroborated by independent estimates from satellite data. The differences between these approaches are being studied by the Global Climate Observing System's (GCOS) SST and Sea Ice (SI) Working Group whose goal is to understand differences between different SST analyses and reconstructions with the aim of producing better, long-term SST climate data records


Independence in dealing with the data gaps

24. There are large areas of the Earth's surface that are not routinely observed. The analyses differ in the extent to which such gaps in the data are filled and in the way that they are filled.


25. HadCRUT takes the simplest approach. The available data are averaged onto a regular grid. No attempt is made to fill grid boxes where there are no data, instead the empty boxes are treated as an additional source of uncertainty when area averages, such as the global average, are calculated.


26. The GISS land station data are interpolated over data free regions (including over the oceans) to a maximum distance of 1200km. This has a particularly large effect over the Arctic and Antarctic where there are few data points and temperature variability is large.


27. NCDC also uses interpolation to fill in some of the gaps in the data. Areas of sea ice are set to missing. Their method typically fills fewer gaps than the GISS analysis and the global average provided by their analysis generally lies somewhere between GISS and HadCRUT3.


28. The estimated trends over land are robust to the choice of analysis technique (Vose et al., 2005). This paper showed that the NCDC (Smith and Reynolds (2005)) and the HadCRU (Jones and Moberg (2003)) land air temperature analyses yield comparable trends but the GISS (Hansen et al. (2001)) land-only analysis yields reduced trends in recent decades because their interpolation scheme gives greater emphasis to coastal and island stations. However see below for a different impact on the blended (land + ocean) dataset in the most recent years.


29. There is growing evidence that the HadCRUT3 blended product under-estimates warming since 1998 because it has on average sampled regions that exhibit less warming than the true global mean over this most recent decade (Simmons et al., 2010). See also the "Evidence from reanalyses and other variables" section below. Over the past decade, temperatures at high northern latitudes have increased.


30. These regions, which are sparsely sampled, are under-represented in the CRUTEM3 analysis. Consequently temperatures in this analysis have run a little cooler in very recent years than either the GISS or NCDC analyses which interpolate over data voids in Siberia and Canada.


Compare and contrast to the troposphere

31. As at the surface, several independent groups have produced datasets of tropospheric temperatures from both satellites and radiosondes (weather balloons). Again, similarly to the surface record these groups use substantially over-lapping raw data but different methods and assumptions to account for known and suspected non-climatic influences. Figure 4 in Annex B shows that the ensuing uncertainty in global tropospheric trends is substantially larger than that at the surface. The surface data may not be perfect but our ability to diagnose the global-mean temperature is substantially better at the surface than in the troposphere. This is because the observing system at the surface has been much more stable, taken as a whole than from either satellites or weather balloons that have seen multiple, complex, changes over the period of observations.


Evidence from reanalyses and other variables

32. Reanalyses consist of modern-day weather forecast model configurations run on historical observations. They take into account all observational evidence. Surface observations are used only indirectly to inform the soil moisture conditions in the most recent reanalyses and not at all in earlier products. The most recent reanalyses products offer substantial support for the reality of the surface record. Figure 5 in Annex B shows how when CRUTEM and the reanalysis are similarly sampled their timeseries overlap almost exactly. It also illustrates the impacts of the sampling on the most recent decade which in all likelihood leads to HadCRUT under-estimating the true global mean.



Annex B



Figure 1: Global average land surface air temperature anomalies (relative to 1961-1990) from four global analyses: CRUTEM3 (black and grey area denoting its stated uncertainty (95% confidence interval)), NCDC (red), GISS (blue) and Lugina et al. (green to 2005 only).




Figure 2: from Parker et al. (2009) The bottom panel shows the locations of stations in two independent samples of data labelled Sample 1 and Sample 2. The samples were chosen such that observations from each sample did not fall into neighbouring grid boxes. The upper panels show how global and hemispheric average temperatures from those two samples compare to one another and to the full station network (solid black line). The differences are consistent within the error ranges of the data.






Figure 3. Global average temperatures, combined land and ocean surface.




Figure 4. Temperature trends from 1979 to 2008 for the surface (green, collected from weather stations and ships / buoys) and the "Mid-troposphere" satellite retrieval (blue, measuring from the surface into the lower stratosphere with peak weight at about 8Km, 5 miles) and equivalent estimates from weather balloons (red).






Figure 5. Comparisons between land surface air temperatures in CRUTEM3 and ERA reanalyses with coincident sampling (left) or complete reanalysis sampling (right). From Simmons et al., 2010









Brohan, P., J. J. Kennedy, I. Harris, S. F. B. Tett, and P. D. Jones (2006), Uncertainty estimates in regional and global observed temperature changes: A new data set from 1850, J. Geophys. Res., 111, D12106, doi:10.1029/2005JD006548.


Hansen, J., R. Ruedy, M. Sato, M. Imhoff, W. Lawrence, D. Easterling, T. Peterson, and T. Karl (2001), A closer look at United States and global surface temperature change, J. Geophys. Res., 106(D20), 23947-23964, doi: 10.1029/2001JD000354.


Ishii, M., A. Shouji, S. Sugimoto, and T. Matsumoto. Objective analyses of sea-surface temperature and marine meteorological variables for the 20th century using ICOADS and the Kobe collection. Int. J. Climatol., 25(7):865-879, 2005.


Jones, P. D., and A. Moberg (2003), Hemispheric and large-scale surface air temperature variations: An extensive revision and an update to 2001, J. Climate, 16, 206-223.


Jones, P. D., S. C. B. Raper, R. S. Bradley, H. F. Diaz, P. M. Kelly, and T. M. L. Wigley (1986a), Northern Hemisphere surface air temperature variations: 1851-1984, J. Clim. Appl. Meteorol., 25, 161-179.


Jones, P. D., S. C. B. Raper, and T. M. L. Wigley (1986b), Southern Hemisphere surface air temperature variations: 1851-1984, J. Clim. Appl. Meteorol., 25, 1213-1230.


Parker, D. E. (2006), A demonstration that large-scale warming is not urban, J. Climate, 19, 2882-2895.


Parker, D.E., Jones, P.D., Peterson, T.C., and Kennedy, J.J., 2009: Comment on "Unresolved issues with the assessment of multi-decadal global land surface temperature trends" by Roger A. Pielke Sr. et al. J. Geophys. Research, 114, D05104, doi: 10.1029/2008JD010450.


Peterson, T. C., K. P. Gallo, J. Lawrimore, A. Huang, and D. A. McKittrick (1999), Global rural temperature trends, Geophys. Res. Lett., 26(3), 329-332, doi: 10.1029/1998GL900322.


Simmons, A. J., K. M. Willett, P. D. Jones, P. W. Thorne, and D. P. Dee (2010), Low-frequency variations in surface atmospheric humidity, temperature, and precipitation: Inferences from reanalyses and monthly gridded observational data sets, J. Geophys. Res., 115, D01110, doi:10.1029/2009JD012442

Smith, T. M., and R. W. Reynolds (2005), A global merged land and sea surface temperature reconstruction based on historical observations (1880-1997), J. Climate, 18, 2021-2036.


Trenberth, K. E., P. D. Jones, P. Ambenje, R. Bojariu, D. Easterling, A. Klein Tank, D. Parker, F. Rahimzadeh, J. A. Renwick, M. Rusticucci, B. Soden, and P. Zhai (2007), Observations: Surface and Atmospheric Climate Change. Chapter 3 in Climate Change 2007: The Physical Science Basis. Contribution of Working Group I to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change (Edited by S. Solomon, D. Qin, M. Manning, Z. Chen, M. Marquis, K. B. Averyt, M. Tignor, and H. L. Miller). Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, pp. 235-336.


Vose R. S., D. Wuertz, T. C. Peterson, and P. D. Jones (2005), An intercomparison of trends in surface air temperature analyses at the global, hemispheric, and grid-box scale, Geophys. Res. Lett., 32, L18718, doi:10.1029/2005GL023502.