Buy

Books
Click images for more details

Twitter
Support

 

Recent comments
Recent posts
Currently discussing
Links

A few sites I've stumbled across recently....

Powered by Squarespace
« Quote of the day | Main | Conservatory tax scrapped »
Monday
Apr162012

How are the statistics?

I'm not a statistician, but I have been hanging around with statisticians for several years now and I have picked up a certain amount along the way. I was therefore intrigued by the paper by Lohmann et al, which Luning et al discussed at WUWT a couple of days ago. The Lohmann paper is a comparison of climate model output with proxy sea surface temperature reconstructions for the Holocene and concludes that the correlation between the two is poor.

I had a leaf through the paper and was struck by the fact that they seem to have calculated a simple R2 for the correlation. Can any of my statistically qualified readers tell me if this is right? I had thought that both series would be highly autocorrelated and that any correlation measure would therefore be inflated. Shouldn't they correct for autocorrelation?

PrintView Printer Friendly Version

Reader Comments (26)

Auto-regressive correlation in time series data appear to be the stumbling block amongst most 'climate scientists'. The last 10 years the warmest on record would indeed be worthy of note were they not time series autocorrelated observations. What they basicaly do is claim more info than they should because the usual independence assumption in typical statistical inference is invalid. Estmates would generally not be affected but the precision of them would be biased and lead one to claim more accuracy than is real. In general the variance estimates are a function (inversely related) of root(n) where n is the number of independent observations. So that you might like to think of n independent n autocorrelated observations being 'worth' as much as n/2 or n/3 of the independent ones. Hence the variance estimate (in autocorrelated data) should have something like a root(n/2) or root(n/3) in it instead of the claimed root(n). The statistics on which inference is based is the ratio of the estimate to it's variance and if you under-estimate the variance (divide the estimate by root(n) instead of something smaller) then hey presto the stat is elevated and (false) significance more likely to be claimed.

Apr 16, 2012 at 10:02 AM | Unregistered CommenterCamp David

Positive autocorrelation is commonplace in meteorological observations - all it takes is for the proximate cause of an appreciable part of the variability to persist. For example, if the wind keeps blowing from the south in the UK, successive hourly temperatures are likely to be similar and display fairly steady rises and falls through the daily cycle as the sun goes up and down in the sky. One implication is that the series of values of temperature so obtained cannot be regarded as statistically independent.

In industrial process control, the presence of very strong autocorrelation could be helpful in so far as it meant that the current value, for example, provided a very good estimate of the next value of some process observable. In the absence of autocorrelation, the best guess for the next value is not the current value, but rather the overall mean, or projected mean if there is an established trend.

Another implication is that the observed variability in such autocorrelated data is larger than it would be without the the autocorrelation. This was handled by inflating the width of process behaviour limits on time series plots - these limits being used to help detect surprisingly large departures from the process mean.

A third implication for formal testing purposes is that a sample of size n from an autocorrelated series is not to be given the same weight as a sample of size n of independent values. A reduced value n' can be computed as a kind of 'effective sample size' for use in the formal tests (where n' < n).

When it comes to correlation between two variables, the presence of autocorrelation is not necessarily an immediate concern. For example, in comparing model time series with observed time series, the interest may simply be in seeing how well they follow each other. In other words, the focus is just on the appearance - the visual effect if you like. Formal tests of R for statistical significance might be another matter, however.

I am very far from being a statistical theoretician so my next remark should be regarded as speculative. If I do a naive substitution of effective sample sizes in the formula for the correlation coefficient between two variables, both with autocorrelation, then I do get an R' < R. I have had a quick google and check on some textbooks which I have to hand, but I have not found any support for this!

Apr 16, 2012 at 10:09 AM | Registered CommenterJohn Shade

I'd worry about the assumption that the trends are linear ahead of worrying about the statistical tests.

Apr 16, 2012 at 10:12 AM | Unregistered CommenterHAS

The 'trend' here is the relationship of the value to its proxy. In fact they both appear to be proxies. It's not the relationship of either over time (that might indeed be non-linear). Non-linearlity in the 'trend' here would result if the value of one differed more/less from the other depending on where on the scale one was. If that's the case then sure the simple Pearson correlation will tend to be smaller if that's what they do (assume linearity)? They can of course calculare the R^2 around any curve, so yes the linear is an assumption, but they may not have made it?

Apr 16, 2012 at 10:24 AM | Unregistered CommenterCamp David

http://www.youtube.com/watch?v=GiPe1OiKQuk&feature=related

A clip to help a school kid doing there Climate Change home work

Apr 16, 2012 at 10:32 AM | Unregistered CommenterJamspid

Camp David, not sure if your comment was directed at me, but they compare the linear trend in the proxies with the linear trends in the model output. As they say they assume the temperature trends are linear.

Apr 16, 2012 at 10:42 AM | Unregistered CommenterHAS

I recognise this is a naive question, but why should the temperature trends actually be linear?

Apr 16, 2012 at 10:47 AM | Unregistered CommenterArthur Dent

Simple answer to your simple question (last sentence).
Yes. Of course.

Apr 16, 2012 at 10:57 AM | Unregistered CommenterEvil Denier

Arthur Dent, they probably aren't (and it would have been easy for them to test for this), so before they even start comparing the models and the proxies they are using a statistic (the linear trend from each) that is quite likely not well behaved.

Apr 16, 2012 at 11:02 AM | Unregistered CommenterHAS

Oh. To expand (minimally).
Doesn't affect their conclusion. 'Poor' would get worse - at least for confidence intervals.

Apr 16, 2012 at 11:05 AM | Unregistered CommenterEvil Denier

It must be the season, Judith Curry's latest post looks at the quality of GCM software.

Latimer Alder is scathing:


A bunch of guys teach themselves Fortran. Write a few equations. Get really chuffed when the code actually executes without failure. Get even more chuffed when they discover that they’ve avoided a pitfall that another programmer has fallen into. Run the model a few times then write up a paper (sans any means of reproducibility or practical testability) which – to gain attention and hence citations – will reliably state that it’s all worse than anybody thought and that in 100 years the world will end.

The prediction is, at a 95% confidence level, that the debate will get heated in the next 100 comments.

Apr 16, 2012 at 11:20 AM | Registered CommenterLord Beaverbrook

To HAS.

Frankly the original paper means little to me, it poorly written and the methods unclear. Certainly for each trend over time it would be daft to start fitting straight lines. From what I can tell it’s how correlated two things are to each other , not how the ‘trends’ over time correlate. For the latter there would have to be several estimates of whatever these trends are and that would involve some fit statistics like slope or something else. The only thing I can imagine, but cannot say as (as I say) I don’t follow it all, is that they take the two measures (each at the same series of times and correlate them (as you say in a linear fashion). That in itself would be a valid analysis apart from they seem to assume (don’t say they corrected for the correlated nature) the pairs of values (to be correlated, a pair for each time) are independent (across time, which they are not). What I think, maybe wrong, is that we have three columns of data one the timepoint identifier, the other two values as measured by different things the estimation of the level of agreement of which is sought. So the R is simply the Pearson correlation of the 2nd and 3rd columns and is not linked to the timepoint id apart from that that determines the pairing.

I guess this is what comes of not having a statistician look at your stats before your put them in print. If one cannot comprehend what they did then it’s pretty worthless research imho.

Apr 16, 2012 at 11:47 AM | Unregistered CommenterCamp David

1. Comparing a model outcome to observations (even proxy observations) can be done with R2 (the square linear correlation coefficient between the two) even if the data are not linear OVER TIME. You can fit a non linear model to one set of observations, and in a perfect model you would predict exactly the observed value for each data point; in other words, you would get a perfectly linear relationship in which Yt* (predicted value for data point t) equals Yt (observed value for data point t). R2 tells you the amount of variance in data that is explained by the model, whatever kind of model.

2. The problem is not about the linearity or non linearity of the model, but about the autocorrelation of data. With correlated data your effective sample is smaller than the number of observations, and thus ignoring it would cause R2 to be overestimated.

3. Several statistical techniques are customarily used to take correlated data and autocorrelation into account. One kind of approaches is grouped under the heading "multilevel models with random effects". For time series, econometricians have developed a different set of techniques as well, especially under the heading of "cointegration". Some references:
Tom A.B. Snijders and Roel J. Bosker, Multilevel analysis, 2nd edition. Sage, 2011.
B. Bhaskara Rao (editor), Recent Developments in Nonlinear Cointegration with Applications to Macroeconomics and Finance. 2002.
B.Bhaskara Rao (editor) Cointegration for the Applied Economist , Second Edition. Palgrave MacMillan, 2007.

Apr 16, 2012 at 4:25 PM | Unregistered CommenterHector M.

The Quenouille adjustment is relevant. Perform a 'find' in:

http://climateaudit.org/2009/02/06/steigs-corrections/
and/or
http://climateaudit.org/2009/02/26/steig-2009s-non-correction-for-serial-correlation/

Apr 16, 2012 at 4:54 PM | Unregistered Commenterigsy

Coming back here after a few hours, I re-read my reminiscing about my days in industrial statistics above, and wonder whether I could be a little briefer. But I followed igsy's lead, and found this:

'However, in another recent paper, Santer et al. (2008) point out that “In the case of most atmospheric temperature series, the regression residuals … are not statistically independent…. This persistence reduces the number of statistically independent time samples.” Such a reduction of the effective sample size can cause Ordinary Least Squares (OLS) standard errors and confidence intervals to be too small, and the significance of coefficients to be overstated.

Which clarifies the implications of the effective sample size n' being smaller than the actual sample size n. Thus if one does a formal test of the statistical significance of an observed R, it is likely to find a lower p-value than the data really deserves. Or if you like, the confidence intervals will be narrower than they would be if some allowance were made for autocorrelation.

I think we're getting there!

Apr 16, 2012 at 6:06 PM | Registered CommenterJohn Shade

And then I re-read the Bish's original post even further above. I don't think you have found anything of concern there.

'The Lohmann paper is a comparison of climate model output with proxy sea surface temperature reconstructions for the Holocene and concludes that the correlation between the two is poor.'

When little correlation has been found, and probably found to be not statistically significant, there is little to be gained by doing fancier computations that can only reinforce that result and serve to clutter up their paper without good cause.

Apr 16, 2012 at 6:27 PM | Registered CommenterJohn Shade

I am surprised commenters have not mentioned the Durbin-Watson test for autocorrelation, which is a standard statistic in the output from regressions using SPSS. In general, if DW is less than 2, there's spurious regression. The standard solution is to regress changes in temperature against changes in [CO2] and other variables (to taste). I find that then the DW is OK, but the R2 tends to vanishing point, and p blows out. Result: no statistical significance can be found for changes in [CO2] having any effect on changes in GMT.

That is as it should be, because in the physical experiments of Tyndall (1861), it is nitrogen and oxygen that are the components of the atmosphere (around 95-99%) which do not absorb and radiate heat and should therefore be deemed the true GHGs, while CO2 and atmospheric water vapour [H2O] first absorb and then radiate the earth's heat, and therefore do not exert any greenhouse effect. Our blankets are the oxygen and nitrogen, while [CO2] and [H2O] are our exhaust vents through the infrared, at 14-17 mm and 17-95 mm respectively (the first m should be the Greek mu). (See Hoyle 1981 & with Wikramisinghe 1999). Without them we would fry.However the [H2O] is much more variable annually unlike the [CO2] and that is why it does have a statistically significant relationship with changes in GMT - and that is why it is expunged from AR4 WG1.

Apr 17, 2012 at 12:18 AM | Unregistered CommenterTim Curtin

I was probably a bit cryptic in the comments above.

Well before starting to do any correlation analysis the researchers estimate trends in the temperature proxies at the relevant locations (see figs 1 & 2). These trends then get used in a subsequent analysis that compares with the model runs (see fig. 3). The trends at the various locations are plotted against the model results for that location (fig. 4). From that point on the paper torturers the data well beyond what probably should be regarded as acceptable.

Anyhow going right back to the beginning. What if the temperature proxies and model outputs are not linear with time (and I suspect we can guarantee this)? The linear trends that are used throughout the balance of the analysis become difficult to interpret in any physical sense. Exactly what performance of the models are we trying to validate - surely not their capacity to replicate a process that the raw data tells us isn't happening (as it were).

I'd say the initial problem with this paper isn't statistical, it is experimental design. Call me old fashioned but starting out with a clear statement of the model being tested is to be preferred over a randomish walk through the garden picking up stones and losing track of how certain you are about where you are and what you have found.

Anyway it then becomes a statistical problem very quickly because they failed to do simple testing of their hypotheses about the trend being well behaved as a statisitc. I should add that if one is postulating a linear trend then sticking with the original data is pretty straightforward and the subsequent statistical analysis tells you more about what is going on.

There are some above who I think find comfort in the fact that the second phase of analysis (where they get to the R2 Bishop quotes) ain't a time series. However the temperature trends at different locations are highly likely to be spatially auto-correlated with exactly the same problems. (Again no testing done).

Apr 17, 2012 at 3:57 AM | Unregistered CommenterHAS

Reading the above comments on the statistics I am reminded of the quote from Keith Kloor on 6 April “Almost everyone that dismisses climate change as a problem does it for ideological or political reasons, not for scientific reasons,” he said. “We scientists need to recognize that.”" He obviously has not visited this blog.

Serial correlation in data reduces the effective sample size. An early WMO technical note (No. 79, 1966) on Climate Change gives a useful for rule for calculating effective sample size. If N is the sample size and r1 is the lag one serial correlation then:

Effective sample size = N (1-r1)/(1+r1)

For annual temperature data, r1 is typical greater than 0.9. This means that the effective sample size of a 100 year record is only around 5 years or less.

Apr 17, 2012 at 10:04 AM | Unregistered CommenterRon

Indeed Ron, Take the tide level at 5 min intervals around high tide and one must surely conclude the end is nigh. Bit like the 2000's were the hottest on record. That and corruption of the temperature record in the first place.. Any pseudo statistician would be rightly skeptical about any estimation based on a minority of available data being used, the majority not being used (censored). The y-variable (temperature - the one the claim is based upon) is dodgy (in technical terms), how on earth can we model it? We will simply get predictors of the censoring mechanism (mans corruption of the temp record). The claim, the belief, simply fails at the first hurdle. If you don't know what you're modelling, how are you gonna model it and what does the model mean?

Bona Fide Statisticians would laugh (are laughing) at this farce, that is if it were not so tragic.

Apr 17, 2012 at 10:52 AM | Unregistered CommenterCamp David

'for annual temperature data, r1 is typically greater than 0.9' Ron at 10:04AM

That seems on the high side to me. Can you provide some further references on this?

I just checked it for the Central England Temperatures (CET) annual means (353 years) and got less than 0.4, and for the latest 30 years of the record on their own I got 0.31.

Many thanks for that WMO link, by the way. There is something reassuring about materials that predate the preoccupation of some climate scientists with sounding the tocsin, which I suppose began in the 1970s, took off in the 1980s, and became institutionalised in the IPCC in the 1990s. I note also that Hubert Lamb was an editor or co-author of that note, and that is also reassuring.

[CET data from here: http://www.metoffice.gov.uk/hadobs/hadcet/data/download.html]

Apr 17, 2012 at 11:58 AM | Registered CommenterJohn Shade

Is this 0.9 the AR correlation so that the correlation of points k time intervals away is roh^k?

Apr 17, 2012 at 12:02 PM | Unregistered CommenterCamp David

I agree with what John Shade said on Apr 16, 2012 at 6:27 PM.

Note that the formula given by Ron, and alluded to by others, assumes an AR(1) process—and the global temperature series is not an AR(1) process:
http://www.informath.org/media/a42.htm

Apr 17, 2012 at 1:18 PM | Unregistered CommenterDouglas J. Keenan

I stand corrected, The r1 values for the three main global land & ocean data sets are: HadCRUT3 - 0.87, GHCN - 0.90, GISS - 0.89. I should have said "typically around 0.9".

For those who want to follow this up the 1983 edition (WMOno100.pdf) and the 2011 edition (WMO_100_en.pdf) are both available on the internet. As far as I can tell the 1983 edition makes no reference to anthropogenic climate change.

Apr 17, 2012 at 9:54 PM | Unregistered CommenterRon

There are too many statisticians on this thread not thinking about what their statistical tests exist for.

Testing the correlation isn't the issue. The point is if you can't reject an AR model for the temp time series or the locational trends then the whole basis of the study collapses in a screaming heap. The study is predicated on a simple linear model in both cases.

End of story.

Apr 17, 2012 at 10:55 PM | Unregistered CommenterHAS

As far as I can tell the authors are claiming something about a straight line fit to two series of data (one X and one Y), that they do it twice (with the same X maybe) is immaterial. Tke each individually the same problems apply. In ordinary cases the (x,y) are independent (clearly not) and only one is subject to error (the y). If you use methodology that assumes these then God knows what you get, but in the absence of God I'll tell you what I think. You will get an underestimate of the slope (because you ignored the y-error) and your estimate of the variance (of any statistic) will be wrong. Good luck with that one.

Apr 19, 2012 at 11:34 AM | Unregistered CommenterCamp David

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>