[Update 31 October 2014]
The summaries of the Climate Dialogue discussion on the (missing) tropical hot spot are now online (see links below). We have made two versions: an extended and a shorter version.
Both versions can be downloaded as pdf documents:
Summary of the climate dialogue on the (missing) tropical hot spot
Extended summary of the climate dialogue on the (missing) tropical hot spot
The (missing) tropical hot spot is one of the long-standing controversies in climate science. Climate models show amplified warming high in the tropical troposphere due to greenhouse forcing. However data from satellites and weather balloons don’t show much amplification. What to make of this? Have the models been ‘falsified’ as critics say or are the errors in the data so large that we cannot conclude much at all? And does it matter if there is no hot spot?
We are really glad that three of the main players in this controversy have accepted our invitation to participate: Steven Sherwood of the University of New South Wales in Sydney, Carl Mears of Remote Sensing Systems and John Christy of the University of Alabama in Huntsville.
Climate Dialogue editorial staff
Rob van Dorland, KNMI
Marcel Crok, science writer
Our Sun went into an inactive cycle in 1998 and each sun cycle is diminishing into a Dalton Minimum. Due to a very active sun in the later half of the 20th century our oceans have stored this thermal rise. The PDO has shifted to cold several years ago and the AMO should shift to cold soon. When sun cycle 25 arrives expect a major cool down, ocean levels failing, corp failures and all the bad outcomes of cold weather. With the weaker sun and its solar wind, expect more Cosmic Ray interaction forming clouds with additional cooling. Hopefully we won’t have a serious volcano eruption in these coming days.
For those looking for a clean power source, I will be on the Mother Love Show Wednesday the 17th at 2PM PST time zone http://www.LATALKRADIO.COM How a forgotten energy source Thorium LFTR will change the world. http://www.energyfromthorium.com
Sherwood, Mears and Christy will write a first response to the two other blog posts. These reactions will be published in a few days.
Meanwhile feel free to comment in the public comments section. Be aware that comments are moderated in advance by our moderator of KNMI. This will be done during Dutch day hours.
I suspect one of the reasons for the difference between the models (the tropical hotspot) and observations (no hotspot) may result from how poorly climate models simulate sea surface temperatures, primarily in the Pacific.
The following is a model-data comparison of the Pacific sea surface temperature anomaly trends for the past 31 years on a zonal-mean basis. The data is the Reynolds OI.v2 SST, and the models are the multi-model ensemble mean of the CMIP5-archived models—simulation of TOS (Historic/RCP6.0).
The models show a relatively high warming rate in the tropics, but the data show little to no warming. In fact, over this period, the equatorial Pacific has cooled.
And I also suspect the differences between the modeled and observed sea surface temperature trends in the Pacific are caused by the failure of climate models to properly simulate ENSO processes.
The above graph is from this post:
If the models can’t simulate ENSO or Pacific sea surface temperature trends properly, one wonders how they can ever hope to project regional variations in temperature and precipitation.
I’m wondering who even thought this was a good or important point for discussion. The intro text is itself misleading in a number of ways – on the facts and on the history, some of which I believe it has wrong (or at least is overly petty or without appropriate error bars). As far as I can tell from my study of the history of this, the first person to highlight tropical lapse rate changes with the name “hot spot” and claim it was a “fingerprint” of greenhouse warming was Christopher Monckton in an August 2007 article here:
and as can be read there (compared with the rational discussion above), he clearly was very confused by FIgure 9.1 in IPCC AR4 WG1. It’s not a “spot”, for one thing – we’re talking about the entire equatorial band around the earth. And it’s hardly “hot” – the troposphere is still cooler than the surface; the issue is the ratio of a small relative temperature change – making it hard to measure (something John Christy seems to refuse to acknowledge in his comments which refer only to problems in models, not in observations). And it’s very definitely and absolutely and certainly not a “fingerprint” of greenhouse warming – the same pattern is there, as discussed clearly by Mears above, for every source of surface warming.
Furthermore, the ratio of tropical troposphere to surface changes *is* greater than one if you look over relatively short periods of time – as Mears also discussed here, as can be read clearly from Santer’s 2008 paper for instance. So the expected amplification from theory (“models” hardly does the reasoning here justice) seems to be very visible on short time scales. The discrepancy is *ONLY* with regard to the ratio of long-term temperature changes, over periods of greater than a decade.
For example, look at monthly temperature anomalies between a relative low in January 1989 and relative high in February 1998, a large temperature change over less than a decade. From some numbers I looked at a few years ago (apologies if corrections have changed this significantly) we have:
January 1989 to February 1998 change:
Hadley tropical surface temp: 0.970 C (-0.172 to 0.798)
RSS tropical TLT: 1.836 (-0.522 to 1.314)
RSS tropical TMT: 1.806 (-0.466 to 1.340)
UAH tropical T2LT: 1.83 (-0.52 to 1.31)
UAH tropical T2: 1.76 (-0.48 to 1.28)
so dividing by the 0.97 surface temperature difference gives an amplification of between 1.81 and 1.89. What explains this clear case of amplification over a little under a decade, but the apparent lack of amplification in the longer-term trends? If there’s no reasonable theoretical explanation for such a difference, the difficult of observational calibration over such long time-periods strongly suggests it’s not the theory that’s wrong.
So we have a case here of tricky and difficult observations being promoted in a wildly incorrect fashion by the flamboyant Mr. Monckton – and then echoed in venues such as this one – for what reason? The burden of proof here is with the satellite observations, not with basic physical theory. As has been pointed out earlier as well – surface temperature changes are well understood. If the tropical troposphere is warming less than expected, the logic points to higher climate sensitivity, not lower. So this whole mess has only downsides for those trying to avoid action.
First, thanks to all participants for their comments. A number of interesting issues were raised, not just with respect to the physical science of the “tropical hotspot,” but also philosophical issues relating to how we place uncertainties in an appropriate context for model (or observational data) evaluation, and how these things then get translated into what is shared with policymakers at varying levels of confidence.
I was very happy to see an extensive discussion by Steve Sherwood and Carl Mears on the very large uncertainties in the observational datasets, which right now do not provide a robust direct comparison when evaluating whether the tropical troposphere has stayed close to a moist adiabat. Other “proxy” measurements such as those developed from the thermal wind equation (e.g., Allen and Sherwood, 2008) or those looking the structure of deep convection changes in the tropics (e.g., Johnson and Xie, 2010) are also a good supplement to the topic, because they are independent from the satellite or radiosonde temperature data, and do not suggest a fundamental data-theory-model mismatch. I was also happy to see a discussion by Steve Sherwood on various implications of a real data-model mismatch should it exist. In the next paragraph, I will outline some points where I disagree with Dr. Sherwood on this. Unfortunately, John Christy’s post read like a defense lawyer’s argument on why models stink and why everything is too complex, with only fairly limited substance on the actual issue of the tropical hotspot (and with only limited reference to a large body of literature on observational uncertainty). Nonetheless, several of his points require elaboration.
Steve Sherwood correctly concludes that there is no obvious connection between a tropical hotspot and climate sensitivity. In fact, because the greenhouse effect depends on the temperature difference between the surface and layers aloft, lack of upper-level amplification could actually mean a slightly higher climate sensitivity, since the lack of enhanced infrared emission aloft (with no hotspot) would be compensated for by higher temperatures lower down to restore planetary energy balance. This would be a small effect though, and somewhat counteracted by a weaker water vapor feedback.
I do think Steve oversells his point by saying “nil” however. Large departures from a moist adiabat would signal a fundamental destabilization of the tropical troposphere and have some influence on basically any tropical process involving deep convection. Personally, I tend toward the Sherwood-Mears argument that this is a big problem with observations and probably less of a “real” issue, but to the extent the issue is real, wide communities (e.g., those in hurricane projections) would need to take it seriously. This doesn’t translate trivially into climate change projections or into climate sensitivity.
Much of John Christy’s post involved all sorts of topics not directly related to the problem of the tropical hotspot. As others have pointed out for example, the CO2 itself has little to do with the moist adiabat structure. The tropical hotspot would exist even if the Sun were the root cause of global warming (the stratospheric structure is a different story). Disappointingly, John Christy’s post does not give this impression. Moreover, the discussions about aerosol-cloud interactions, or how much the deep oceans are responsible for joule uptake and the decadal-scale changes in global temperature, are rather important but (IMO) distracting issues from the topic at hand. I think that Carl Mears and Steve Sherwood did a much better job at discussing multiple issues with both the data and models.
It is important to note that no climate modeler regards their model as “the truth” but as testbeds of varying levels of complexity and usefulness, depending on the question being raised. Model evaluation should not be done on the expectation of perfection but on the skill at simulating various features of climate or climate change (e.g., there are countless questions one could raise for Mt. Pinatubo, the LGM, the mid-Holocene, etc). The tropical hotspot is just one of thousands of topics deserving of attention. Other topics like Arctic sea ice, snowpack in Colorado, precipitation changes in the subtropics, etc all are governed by different physical phenomena and need to be evaluated on their own. Sometimes, mismatches between models and observations are expected to arise for a number of good reasons- one of which simply being that the observed temperature record is one single realization of many possible realizations that could have emerged in the last few decades given internal climate variability. Another issue is observational uncertainty, or mis-specified forcings in the model. It is non-trivial to establish a real mis-match, but if one exists, finding why that exists is how interesting science gets done.
After all of this has settled down, those aspects of climate (like e.g., the water vapor feedback, or summertime cooling following a volcanic eruption) that are robust to multiple datasets/methods/groups, are emerging in observations, have a sound theoretical basis, and are borne out paleoclimatically will be brought forth to policymakers or other groups with strong confidence. Other things that are not yet settled (like what dataset is best for evaluating the tropical hotspot, or whether there is a link between tornadoes and climate change) which do not meet this criteria are not brought forward or are done so with less confidence. Along the way, the implications for the “bigger picture” (e.g., the attribution of global warming to human activity or sensitivity) should be kept in mind.
Finally, it is bizarre to me that it is recommended that models should not be used to inform policy makers because of uncertainty (which by extension, would also have to apply to the radiosondes and satellites in this case). Those seeking information in policy, agriculture, military, insurance, etc will need to be told about uncertainty in an appropriate way, and those groups I’m sure are used to dealing with uncertainty. Unfortunately, we do not yet have UAH satellite observations of the future, so we need to rely on models to inform outcomes. From this, there is only a short distance to the next- uncertainty works in a number of ways, and using it to suggest we cannot inform policy on a number of topics is, in my view, not recommended at this time.
A question for John & Carl, as the Satellite Temperature experts.
Could the Hot Spot be hiding in plain site in the existing data? Follow my chain of thought and see what you think.
The Upper Troposphere channel (TTS, MSU 3, AMSU 7) is showing essentially no warming or cooling. But this is the channel we would expect to see the hot spot appearing in.
TTS has a weighting function where nearly 1/2 the signal (perhaps 40%) for TTS originates in the lower stratosphere and the rest in the upper troposphere.
Channel TLS (MSU 4, AMSU 9) is associated with the lower stratosphere. Its weighting function is strongly located in the lower stratosphere and middle stratosphere with only a very small percentage (less than 10%) of its signal originating from the upper troposphere. And this channel is showing substantial cooling (more than 0.3 DegC/decade cooling which is stronger than the warming trend in the lower troposphere). Also the less commonly cited higher altitude channels for the rest of the stratosphere are all showing the stratosphere cooling as well.
So, if the stratosphere is cooling strongly then nearly half the signal for the TTS channel, the one that is relevant when looking for the hot spot, is coming from a region of strong cooling. If the overall signal for TTS is showing no warming or cooling, doesn’t this suggest that the part of the TTS signal that originates in the upper troposphere must actually be warming, in order to balance the stratospheric component to give a net flat-line trend?
Isn’t the hot-spot visible in the data if we understand what the data actually means?
Could an approach similar to that used to create the TLT synthetic channel, using a differencing algorithm applied to off-nadir samples taken from MSU 3/AMSU 7, be used to synthesize a true upper troposphere measurement? Alternatively, could an approach similar to that of Fu & Johansson be used to extract such a signal?
That might clear up this question once and for all.
Also a question for John Christy.
In your first graph you have taken the average of the data for UAH and RSS satellite products for the mid troposphere channel (TMT). But there is a third team producing such a product from the same channel, the STAR/NESDIS team. Why haven’t you included their analysis. UAH is showing a trend for TMT of 0.04 DegC/decade and RSS of 0.078, giving an average of around 0.06.
However, STAR/NESDIS are reporting a TMT trend of around 0.124 DegC/decade. So an average of all three products would be more like 0.08, 1/3rd higher. Also wouldn’t it be more meaningful to show the range of results produced by all three products. A trend of 0.124 from STAR/NESDIS would match very well with the climate model results for example. Although all three teams are working hard to try and find the correct result, the range of values obtained still leaves open some significant questions about what is actually happening up there. Obviously all three teams can’t be right and showing the range of results seems far more meaningful.
Also, in using the radiosonde data, what consideration have you given to the well recognized issues with radiative heating effects and the way in which they distort the trends reported by the radiosonde products. Many people have expressed reservations about how much credence can be given to the radiosonde data, particularly at higher altitudes.
Also, the sideways step in the radiosonde data between ground level and 850 hpa looks really suspicious, almost unphysical. If there is an issue with the radiosonde data, with perhaps a bias introduced there, maybe the entire curve needs to be shifted to the right. In which case the radiosondes aren’t as far out of step with the models as they appear.
Also, why haven’t you included the satellite data from the three teams here where applicable? Although the sat’ data covers broader altitude ranges it would still provide a useful point of comparison. Why omit it?
It is worth noting that the statistical test used in Douglass et al. (2008) is obviously inappropriate as a perfect climate model is almost guaranteed to fail it! This is because the uncertainty is measured by the standard error of the mean, rather than the standard deviation, which falls to zero as the number of models in the ensemble goes to infinity. If we could visit parallel universes, we could construct a perfect climate model by observing the climate on those parallel Earths with identical forcings and climate physics, but which differed only in variations in initial conditions. We could perfectly characterise the remaining uncertainty by using an infinite ensemble of these parallel Earths (showing the range of outcomes that are consistent with the forcings). Clearly as the actual Earth is statistically interchangable with any of the parallel Earths, there is no reason to expect the climate on the actual Earth to be any closer to the ensemble mean than any randomly selected parallel Earth. However, as the Douglass et al test requires the observations to lie within +/- 2 standard errors of the mean, the perfect ensemble will fail the test unless the observations exactly match the ensemble mean as the standard error is zero (because it is an infinite ensemble). Had we used +/- twice the standard deviation, on the other hand, the perfect model would be very likely to pass the test. Having a test that becomes more and more difficult to pass as the size of the ensemble grows is clearly unreasonable. The spread of the ensemble is essentially an indication of the outcomes that are consistent with the forcings, given our ignorance of the initial conditions and our best understanding of the physics. Adding members to the ensemble does not reduce this uncertainty, but it does help to characterise it.
The thing that really concerns me though, is that Douglass and Christy (2013) discuss their earlier paper quite uncritically, despite the statistical shortcomings of that paper having been widely discussed, both on-line and in the peer-reviewed litterature.
Douglass DH, Christy JR, Pearson BD, Singer SF. A comparison of tropical temperature trends with model predictions. Int J Climatol 2008, 27:1693–1701
Douglass, D. and J.R. Christy, 2013: Reconciling observations of global temperature change: 2013. Energy and Env., 24 No. 3-4, 414-419.
The explanation of the physical origin of the tropical hot spot by Mr Mears is based on undisputed physics.
However, in my opinion that does not mean that it is possible to claim that climate in all it’s complexity, should behave according to our simplified physical representations. There is for instance no doubt that many climate phenomena behave as chaotic systems, seemingly disobeying physical laws that we would expect to produce linear behaviour.
In the case of the hot spot, it is in my opinion possible to suggest mechanisms that are consistent with the reality we know, and still don’t produce a hotspot.
I would like to bring to your attention a theory that might explain why the hot spot is missing.
It is based on the essence of the tropical thunderstorm: it originates from the temperature difference between the warming surface and the air of the lower troposphere. So it will develop as soon as this temperature difference has reached a certain value. On a hot day that will be earlier than on a cold day, but at the same surface temperature. And it will immediately cool the surface beneath the storm center considerably.
This mechanism can be observed on satellite images.
This would result in vertical temperature profiles of storms that are the same on both warm and cold days. On a hot day, or in a warmer climate, they just start earlier and last longer, transporting a lot more energy to the tropopause, without the necessity of a temperature change at any height. An increase of the volume of air and water vapour that is transported upwards would suffice to transport and store the extra energy, without producing a hot spot.
This theory is explained in more detail on http://www.climatetheory.net/11-analysing-the-missing-tropical-hot-spot/
I have a question about the last paragraph from Sherwood, about where the heat has been going in the past decade. If more of the heat has gone into the oceans, as seems likely, is it then appropriate to relax and think that the oceans are keeping us cool? Or will more heat in the oceans also cause a bit more sea level rise. Also: a small change of sea temperature could be a big problem for an ecosystem that may have evolved to depend on the past relative stability of ocean temperatures? So it may be that even if atmospheric temperature has not been rising much, we should still worry about the effects on the ocean ecosystem, and sea level.
The first reactions of the participants to eachother’s guest blogs will be published early next week. It was quite difficult to find a period in which all three were just at the office. So currently one of them – Christy – is on the road with infrequent internet access.
Meanwhile just continue the discussion.
As I understand it, the basic basic argument for a tropical hotspot would be the increase in evaporation due to an increase of surface temperature. It should be clear that evapotranspiration does not depend on surface air temperature, but on the surface “skin” temperature, which may differ considerably from the air temperature. We have studied the surface “skin” temperature, using Meteosat data for the period 1982-2006, and find a significant decrease in both land and ocean skin temperature. Why should we then expect a tropical hotspot?
See: Rosema A, Foppes S, van der Woerd J(2013) “Meteosat derived planetary temperature trend 1982-2006”, ENERGY & ENVIRONMENT, Volume 24, No. 3 & 4, 2013, pp 381-395
Actually evaporation is not really relevant for this picture. The mechanisms governing the amount of water vapor in the air and (non-linear amount of latent heat release via Clausius-Clapeyron) work even if you reduce evaporation a bit, say through reducing wind speed.
@ Chris and Andries
The way Mr Mears explains the hot spot is clearly illustrated in the link I mentioned before:
Using the graph in the link, assuming the same humidity instead of the same relative humidity of the parcel at the surface, a 10 degrees warmer parcel would start ascending over the DALR till 1 km high, and then follow the SALR.
This SALR curve would be very close to the 20C curve, resulting in a complete lack of warming at 10km.
So I don’t think that the (absolute) humidity at the surface can be disregarded.
Carl Mears distinguishes two aspects of model behaviour in the tropics that seem to be conflated in the use of the term ‘hot spot’: the observed rate of warming, and the ratio of warming aloft to that at the surface. Mears’ post focuses on the amplification aspect, whereas John Christy’s focuses on the rate itself.
Regarding the uniqueness of the tropical “hotspot”, the uniqueness arises from the magnitude of the trend, not the amplification with respect to the surface. While it is true that amplification would be observed in response also to increased solar forcing, it’s clear from comparing panels (a) (solar), (c) (GHG) and (f) (all) in the IPCC figure that only GHG’s are expected to have had a sufficiently strong effect to yield the level of warming projected overall. Were there to be a lack of warming, it would be most inconsistent with the GHG simulation.
To ask whether the ‘hotspot is missing’ is evidently too ambiguous a title. I distinguish 4 variants of the question: (a) Is there any observed amplification at all, (b) is there as much as is predicted by climate models, (c) is there any warming trend aloft, and (d) is there as large a trend as is predicted by models.
Here is what I glean from the 3 postings so far:
(a) From his Figure 1, Mears argues that, with long enough data sets, enough of the observational series show troposphere/surface trend ratios greater than 1 to allow us to say that evidence does not refute the expectation of amplification.
(b) Mears’ Figures 1 and 3 (+4) nonetheless show that the amplification rate in models is high relative to the distribution in the observations. Current data sets are too short to say whether the difference is statistically significant or not. I doubt they will ever be long enough. The statistical issues involved in figuring out the distributions of ratios of random numbers get complicated quickly, and I wouldn’t be surprised if the problem is intractable.
(c) None of the authors focused on this question. In the correction to McKitrick McIntyre and Herman (2010, herein MMH), see Atmospheric Science Letters October 7 2011, we show that, for a 1979-2009 sample, the answer is mostly yes in the LT layer and mostly no in the MT layer. In my work with Tim Vogelsang (under review, herein MV, available at http://econapps-in-climatology.webs.com/MV-revision-April_2013.pdf) we find that over the 1958-2010 interval for HadAT, RICH and RAOBCORE the answer is yes in LT & MT, but if you allow a step-change in 1977 the trends go to zero in both the LT and MT layers. So it’s not a “trend” it’s a single step that accounts for the change in the mean over the sample.
(d) Christy focused on this question, which I consider the more interesting one as well. Mears brought the issue up in regards to his Figure 2, acknowledging that the observed trends are at the extreme low end of the model distribution. I thought it was unfortunate that he tabled further discussion, despite agreeing that it is the more interesting issue. In MMH we showed that the difference between models and observations is weakly significant over 1979-1999 and significant (p<0.05) over 1979-2009, at both the LT and MT layers. In MV we find the difference over 1958-2010 is significant whether or not a step-change is permitted at 1977, that the data endogenously determine the necessity of a 1977 step-change, and that the rejection of climate models is very strong (p<0.001 in all cases). In both these papers we use robust time series methods that address the shortcomings of the methods in the Douglass et al/Santer et al dispute.
Specialists might find the discussion of (a) and (b) interesting, but it seems to me that it amounts to arguing over a specific aspect of the behavior of climate models that contributes toward, but doesn't fully determine, their overall accuracy and validity. (c) and (d) are interesting because the models are in such clear agreement that, given the underlying assumptions they share in common, there ought to be, not merely this or that ratio of warming aloft-versus-surface, but a lot of warming, period. The failure to observe anything like that much warming means either that multiple independent observational systems are missing the warming that really is there, or models share some biases in common. Since the observational record involves two different systems (radiosondes and MSU) and the average balloon record does not differ from the average MSU series (MMH Table III), the credibility of the observed record merits serious consideration.
Parenthetically, having published numerous papers on problems in the various land surface temperature series and paleoclimate reconstructions, I am aware that people may base their assessment of the reliability of temperature data sets on whether they support (or not) hypotheses that are preferred on a priori bases. So the details of why data sets get deemed to be reliable or not matter. I find the IPCC and CCSP processes implausibly quick to accept the land surface record while dismissing the tropospheric record.
I don't buy Steven Sherwood's argument that a lack of warming in the tropics is ultimately irrelevant either for attribution or sensitivity calculations. The Gaffen et al data cited includes the 1977 Pacific Climate Shift, and I suspect that had the sample ended in, say, 1976, the results would show model trend overprediction, as is the case in MV. His challenge, that those arguing for the importance of the issue need to come up with an alternative model that "agrees just as well with observations" is misplaced, since the starting point of the whole discussion is that the existing models don't agree with the observations.
The Figure shown at the top of the discussion, from the AR4, is a backcast indicating that models projected that a lot of warming should have been observed due to the increase in GHG's over the 20th century. Mears' Figure 2 and Christy's Figure 1 show that relatively little has been observed since 1979 in comparison to model projections, and as I explained, the discrepancies are statistically significant. My work in MV shows very little trend since 1958, only a step change at 1977 that is not predicted by the models and is associated with a different cause. It adds up to a potentially serious inconsistency, to coin a phrase.
Overall, it looks to me like there is enough agreement among GCMs about what "should" be happening in the tropical troposphere if the underlying mechanisms are all well-understood and accurately represented in models, and enough agreement among different data sets that it is either not happening or happening only at a very attenuated level, to take it as read that the models are overstating the expected rate of warming in the tropical troposphere in response to rising GHG levels. I think it will be very important in the years ahead to figure out why this is the case.
Ross McKitrick seems to be implying that we should not trust the surface warming record, and should regard the atmospheric temperature record as a separate measure of climate change which somehow discounts global warming itself. He and others seem unwilling to accept the clear evidence that the surface and near-surface warming records are, collectively (that is including independent ocean surface, near-surface maritime, and 2-meter terrestrial records which at least over the 20th century are all quite consistent) far more trustworthy and solid than the dodgy free-atmosphere trends. I think we all agree that recent warming in the Tropics has been less than we would have expected no matter how it is measured, and I agree this merits further research (and indeed has spurred a flurry of efforts in the last year or two, so rest assured more papers will be coming out looking at this).
I have a question about figures 2 and 4 of the text of JR Christy: How observations trends are determined at 1000 hPa and is there values at 950 hPa?
And two comments:
– the declining trend from 1000 hPa to 850 hPa is quite surprising,
– the calibration of Figure 4 at 850 hPa tells a very different story.
Ross McKitrick asserts consistency in the tropospheric observations because “the observational record involves two different systems (radiosondes and MSU) and the average balloon record does not differ from the average MSU series (MMH Table III)”. If ever there was an assertion about scientific observations that tells far more about the asserter than it does about reality, this is surely it. I urge the diligent on this thread to investigate this claim, and what it really implies about the consistency of tropospheric temperature trends, or in more common terminology, what their uncertainty is likely to be, a term that McKitrick notably avoids while talking about “averages”.
This is a response to phi. I think that phi makes a good point, which I’ll try to explain a
little more fully. If we calculated the amplification starting at 850 hPa, instead of the surface (1000 hPa), the trend ratios from John’s figures 2 and 4 would be much more in line with the expectations of moist adiabatic lapse rate theory, at least up to about 250 hPa. So the problem (at least in the radiosonde data) mostly occurs between 1000 hPa and 850 hPa. (We can’t resolve features this small in height using the satellite data.) What might this mean? Some people with more of a climate change denial perspective might claim that the surface data are wrong, but I don’t think this is very likely. What I think is more likely is that the models are getting something wrong about the boundary layer response to global warming. The boundary layer is much more complicated than the free atmosphere above it, so if the models are wrong, it seems more likely to be in this region.
It is indeed surprising that the trend is less at 850 hPa than the surface. Here is one way that it might be happening. Assuming that the lapse rate is determined by the moist adiabatic rate from the surface to the tropopause might be too simplistic. Under most convecting clouds, there is a sub-cloud region where the air is not saturated, and the lapse rate is closer to the dry adiabatic lapse rate, which is much larger. (You can read about this in Kerry Emanuel’s book “Atmospheric Convection”) If the thickness of this layer increased over time fast enough, then the temperature at 850 hPa (above this layer) would have a smaller trend than the surface, and we would get the observed temperature trends. This (I think) would require that the relative humidity at the surface trend downward, and I am not sure how this might occur.
Maybe Steve could comment on this idea — he knows a lot more about tropical convection than I do.
Thank you for your interesting answer.
From my side, I would make a different assumption even if I do not know enough about the source of the data and the tropical dynamics to be very assertive.
First, I don’t think that surface temperatures trends are very reliable. You noted that the boundary layer was complex, you can add it is very inhomogeneous in the horizontal plane. In addition, these inhomogeneities are not temporally stable and a series of puzzles are related to surface temperatures (various divergences and inconsistencies).
For these reasons, I prefer not to take into account the trend in 1000 hPa and analyze what happens from 850 hPa. It may not be statistically very significant, but in this case, the observed (relative) hot spot is even more pronounced than the modeled one (about 25% more at the tropopause).
On the other hand, the modeling of the initial effect of GHG (before feedback) postulates the invariance of the temperature gradient. This assumption is in contradiction with the general theory of heat transfer (flow distribution). In fact, we should expect an initial hot spot (not related to water vapor) in parallel with the increase of the average temperature.
The strong hot spot observed and the overvaluation of the surface warming by models are two characters that tend to confirm this theory.
Steven, I am not making a categorical claim that the land surface record should be dismissed entirely and the free atmosphere data should be accepted uncritically. My parenthetical comment was to point out that the research community seems readily to accept that there are possible non-climatic trends in the balloon and MSU records, yet is very resistant to evidence of the same problems in the land data products. The IPCC described biases in the land record as “negligible” and dismissed their possible influence out of hand, though in the most recent draft they seem to have climbed down from this rigid stance somewhat. All data products, including the SST records, have strengths and weaknesses. It strikes me that one of the strengths of the free atmosphere data is that two independent measurement systems operate simultaneously in the same locations. So the fact that, on average, in the tropics, balloons do not disagree with satellites but both disagree with models points to a real mismatch between models and observations.
First comments of Carl Mears on the two other blog posts:
It appears to me that the three of us agree fairly well about the basic facts, but differ on interpretation.
Like Steve, I am somewhat mystified about all the attention given to the tropical hotspot, as I don’t think it is very important for global warming theory, and it is relatively poorly observed. The uncertainties involved in the various types of observations are made worse by focusing on a ratio between two relatively small trend values.
I think all three of us agree that the observed temperature changes in the tropics (and globally) are less than predicted over the last 35 years. John uses this fact to argue that there are fundamental flaws in all climate models, and that there results should be excluded from influencing policy decisions. This goes much too far. First, many imperfect models are used to inform policy makers in many areas, including models of the economy, population growth, environmental toxins, new medicines, traffic flow, etc. etc. As pointed out by a commenter in this thread, policy makers are used to dealing with uncertain predictions. If we throw out all imperfect models, we will be reduced to consulting the pattern of tea leaves on the bottom of our cups to make decisions about the future. Second, as I argue below, there are many possible reasons for this discrepancy, and only a few substantially influence the long-term predictions.
Let’s return from this philosophical aside to a discussion of the difference between recent observations and climate model predictions. In my mind, the possible causes for this disagreement fall into 3 general categories.
Bad Model Physics
Bad Luck. By Bad Luck, I mean that the last decade is cooler than normal due to the random occurrence of some pattern of unforced internal variability. Most climate model simulation exhibit decade-long periods of little or no warming, as shown by Eastering and Wehner (2009). And in general, climate models, even though they tend to have too much year-to-year variability (as mentioned by John in his initial post), often show too little variability on multidecadal on longer scales. There is an interesting discussion of this topic in a recent issue of the AGU newsletter (Lovejoy, 2013). So, for multidecadal time periods, I would expect the real world to be bumpier (on multidecadal time acales) than a typical model simulation, and much bumpier than the mean of many simulations. Thus I think that there is some possibility part of the cause the current discrepancy may be just bad luck. Though the time period is getting long enough, and the discrepancy is getting large enough that we should be able to begin to understand something about what is going on. In other words, even if it is due to a random fluctuation, we should be able to see the fluctuation in other variables or parts of the system, such as heat flux into either the ocean or into space.
Bad Forcings. Forcings are the inputs to the climate system by processes external to Earth’s climate. These include anthropogenic modifications to the atmosphere (CO2, methane, various aerosols), volcanic aerosols, and changes in solar output. If the estimates of these forcings, which are used as input to the climate model simulations, are not correct we can hardly expect the climate model output to be correct. Is there any evidence for incorrect forcings over the past 35 years being used for model input? In fact, there is.
One example is radiative forcing due to stratospheric sulfate aerosols, which are little droplets of SO2 and water that scatter the incoming light from the sun. It is well accepted that increases in stratospheric aerosols warm the stratosphere and cool the surface and troposphere. Both effects can clearly be seen in the MSU temperature record after the colossal eruptions of El Chichon and Pinatubo. These eruptions spewed large amount of gaseous sulfur into the stratosphere, where it oxidized to form excess levels sulfate aerosols. These events, and others before 2000, are well represented in the stratospheric aerosols datasets used to drive the 20th century simulations for CMIP-5. After 2000, the level of stratospheric aerosols in the input datasets is allowed to decay to zero. In real life, however, observations indicate that the background level of stratospheric aerosols increased over the 2000-2010 period (Solomon et al, 2011), probably due to a large number of small volcanic eruptions (Neely et al, 2013). The effect is large enough to offset about 25% of the effect of increasing CO2 over this period (Solomon et al, 2011).
Other forcings with possible problems include solar output, stratospheric ozone, and black carbon aerosols. The sun has been in a quiet, low-output phase for longer than expected. This is not included in the CMIP-5 forcings, and thus model results should be expected to be slightly warmer than real life. Temperature changes in the upper troposphere and lower stratosphere have been shown to be very sensitive to the stratospheric ozone concentrations used (Solomon et al, 2012). These effects appear to extend below the tropical tropopause, low enough to affect tropospheric temperature trends and the tropospheric hotspot. The ozone dataset used in the CMIP-5 simulations is the one with the most conservative trends in ozone. If one of the other datasets had been used, the models would have shown less upper tropospheric warming. There are probably other similar problems that I am not aware of.
None of these effects are large enough to explain the model/measurement discrepancies by themselves, but they are each likely to be part of the cause. The cumulative effect of all has not been evaluated.
Bad Model Physics. It is also possible that Bad Model Physics could be part of the cause. Possible causes in this category include problems with cloud feedback, problems with the effects of tropospheric aerosols (and in particular the interaction of aerosols with cloud formation), and poorly-modeled interaction between the atmosphere and ocean. The first two are widely acknowledged to be major contributors to the uncertainty in model predictions. For the third, there is some evidence that heat is being subducted into the ocean at a rate higher than the models expect, though exactly where it is going is less clear (Balmaseda et al,, 2013, Levitus et al., 2012). In our own observations of ocean surface winds, we see trends in wind speed in the tropical pacific that are far larger than those predicted by models. These winds may serve to stir up the ocean, and remove heat from the surface. We do not know whether these effects represent part of the response to global warming, or a part of a pattern of decadal-scale random fluctuation.
Note that only some of the possible problems with model physics affect the long-term model predictions. Increased heat flux into the ocean only serves to delay the temperature increase (and increase the rate of sea level rise), while an error in cloud feedback could affect the long-term temperature rise.
In summary, there are a large number of possible explanations for the model/measurement discrepancy in recent temperature rise. Only a few of these, such as errors in cloud feedback, affect the long-term predictions, while others, such as errors in the natural forcings used as model input, or simulated ocean heat uptake do not. At this time, we simply do not know the exact cause or causes, but I strongly suspect that it is due to a combination of causes rather than one dominant cause.
Eastering, D. R. and M. F. Wehner, “Is the Climate Warming or Cooling?”, Geophysical Research Letters, 36, L08706, doi:10.1029/2009GL037810, 2009.
Lovejoy, S., “What is Climate”, EOS, 94, number 1, January 2013.
Solomon, S., J.S. Daniel, R. R. Neely III, J. P.Vernier, E. G. Dutton, and L. W. Thomason, ‘The Persistently Variable “Background” Stratospheric Aerosol Layer and Global Climate Change’, Science 333, pp 866-870, 2011.
Solomon, S., P. J. Young, and B. Hassler, ‘Uncertainties in the evolution of stratospheric ozone
and implications for recent temperature changes in the tropical lower stratosphere’, Geophysical Research Letters, 39, L17706, doi:10.1029/2012GL052723, 2012.
M. A. Balmaseda, K. E. Trenberth, E. Kallen, ‘Distinctive climate signals in reanalysis of global ocean heat content’. Geophys. Res. Lett. 40, doi:10.1002/grl.50382 (2013).
S. Levitus, et al.,World ocean heat content and thermosteric sea level change (0-2000 m), 1955-2010. Geophys. Res. Lett. 39, L110603, doi:10.1029/2012GL051106 (2012).
First comments of Steven Sherwood on the two other blog posts:
I agree with pretty much everything Carl says, and he’s gone into more detail than I did on the latest results. We agree that the data we have are basically not stable enough over time to distinguish whether a “hot spot” exists or not, or is as prominent as we would expect. We also agree that warming over the past couple of decades is running lower than nearly all CMIP5 models predict it should be, which is perhaps a more worthy “debate” topic and one that I think will get a lot of attention when the IPCC report comes out. The reasons for this are likely due to cooling influences that have not been applied to the models, such as the unprecedented recent solar minimum, the continuing rise in atmospheric aerosol concentrations and the decline in stratospheric water vapour. To some extent it may also be a chance fluctuation that will go the other way in a few years. Finally, it may signal a somewhat low climate sensitivity–but a sensitivity low enough to make global warming cease to be a problem is basically ruled out by other evidence, particularly palaeoclimate evidence.
As to John Christy’s post, I don’t really think he’s being forthright about the uncertainties in the data. His Fig. 1 does not identify what datasets are actually being used, but Carl’s own plots show that the results depend on this. Also, the plot states that the model calculations he compares to are based on scenario “RCP8.5,” but that is a high-emissions future scenario so I am puzzled by why he is doing this — there are historical simulations in CMIP5 that are meant for comparing with observations.
John makes a number of rambling but sometimes interesting points. One is that models have too much interannual variability, which he suggests may be a sign they are too sensitive. He is right that they have too much interannual variability, but they have too *little* decadal variability as compared to paleoclimate data over the Holocene. So by his own reasoning maybe they are actually too insensitive.
His statement that heat is not being sequestered in the oceans is false. A paper last year by Rahmstorf et al. showed that temporary heat storage associated with recent La-Nina conditions could explain why warming during the last decade has been slower than in previous decades, and other studies of Earths’ heat balance (most importantly papers by Syd Levitus and Murphy et al.) have shown that the heat is indeed appearing in the worlds’ oceans more or less as expected, as far as we can tell.
I agree with John Christy that our models are imperfect (see my own post, which discusses this even more than he does), but not that this implies climate change is necessarily any less of a concern.
I can also respond to one posted comment, on whether heat absorption by the oceans would mean we have less to worry about. The main problem is that this absorption is likely to be temporary.
First comments of John Christy on the two other blog posts:
For some readers of the science of climate, the topic of tropical surface and atmospheric temperature differences has become the “issue that would not die.” We have been discussing this for 15 years and yet resolution has not been achieved, either in the trends themselves or in the physical understanding of the problem. This has become a specific example of the fact that the science of climate change from human causes is not a “settled science.”
Comments on Mears Original Post
The information provided in Mears’s comment focuses on a little-used quantity to investigate the tropical temperature. After presenting the information, Mears arrives at the conclusion that the observational data are not yet accurate enough to prove or disprove the magnitude of the model-generated hot-spot as real (i.e. not accurate enough to falsify the dominant model response regarding the enhanced greenhouse effect). I agree with virtually all that Mears writes as background. However, I think the fundamental question examined here may be viewed from a larger perspective that draws on more information that then can lead to a less ambiguous conclusion.
Mears focuses on “observations” of a quantity known as the “Temperature Tropical Troposphere” TTT which is not actually observed, but is rather a derived quantity dependent upon the difference of two measurements (temperature of the mid-troposphere (TMT) and lower stratosphere (TLS)). The differencing process tends to increase the error opportunities for the derived product. Observations of the lower stratosphere (TLS) have greater uncertainty, and this contributes to the spread of the TTT results among the various datasets.
A more direct method that reduces observational error (especially of the more uncertain stratospheric portion) is simply to examine the temperature product of one channel which captures the bulk of the desired signal (mid-to-upper troposphere temperature) and which avoids the compounding of errors that a “difference” of products introduces. (The stratosphere only contributes about 7% to the TMT tropical signal.) Hence in my posting, the comparisons focused on the TMT product about which much more is known. [I calculated and discussed, but did not show by chart, that the results using the lower tropospheric temperature or TLT were very similar to those of TMT.] Mears also tends to discuss those datasets which are the “warmest”, i.e. RSS, STAR and MERRA.
However, there is information on why the datasets differ, and this can be used to infer a more confident assessment (see Christy et al. 2010 and 2011 for more information).
The following are a few examples of the knowledge we have of these datasets. STARv2.0 contains a spurious warming shift on 1 Jan 2001 which will be corrected in the new v3.0 to be released later this year. So, STAR’s results in Mears’s contribution overstate the warming. MERRA is an outlier dataset for temperature trends with problems, including a significant warm shift between 1990 and 1992 probably due to the inability to correct for infrared contamination from Mt. Pinatubo. As a result, MERRA produces the warmest tropospheric trend by far of all observational and reanalysis datasets, being almost +0.10 °C/decade warmer than the average of the balloons. HadAT2, using a more conservative methodology for detecting shifts in balloon measurements, likely has retained spurious upper troposphere/lower stratosphere cooling from radiosonde equipment changes over time which contributes to its relatively “cool” trend. ERA-I appears to be excellent in the lower troposphere, but with the inclusion of aircraft reports after 2002 experienced a spurious warming in the upper troposphere due to the previously too-cool analyzed values in that region (note: this also impacts the RAOBCORE and RICH datasets).
A minor controversy appeared last year when Po-Chedley and Fu (2012) allegedly found an error in our UAH TMT dataset when in fact the main source of their finding was based on an incorrect understanding of the UAH merging sequence of the satellites (Christy and Spencer 2013). It is understandable that Mears highlights RSS data since he is the source of the product, but evidence has been published to demonstrate the RSS TMT product likely has spurious tropical warming due to an apparent overcorrection of the diurnal cycle errors (e.g. Christy et al. 2010). [In a counter-intuitive result, this correction causes the RSS global TLT trend to be cooler than UAH’s.] I’m not clear as to why RATPAC (NOAA balloon dataset) nor JRA-25 (a Japanese reanalysis dataset) were not included in Mears analysis. Both are very near the overall averages shown in my earlier posting and are both cooler than RSS, STAR and MERRA. So, while none of the datasets can claim to be perfect, we can explain many of their differences and by averaging, reduce the independent errors.
Thus there are clear reasons for not highlighting RSS, MERRA or STAR as observational datasets. Rather, to take a more unbiased approach to the observations, I had simply calculated the mean of the two categories of datasets (satellite and balloon separately) to reduce the random error opportunities. In this way the impact of independent errors that lead to trends that are too warm or too cool may be limited. The fact the tropospheric trends from the average of two very different and independent set of monitoring systems, i.e. balloons and satellites, are within 0.01 °C/decade of each other, lends confidence to the result. [I did not include STAR due to the known shift in its temperature and the fact it uses the identical diurnal corrections as RSS – thus it is very similar to RSS but with a known spurious shift. However, even if STAR were included as an “independent” dataset, the significance of the results would not change.]
The simple numbers tell the story and can’t be overlooked. From 73 CMIP-5 model runs, the 1979-2012 mean tropical TMT trend is +0.26 °C/decade. The same trends calculated from observations, i.e. the mean of four balloon and mean of two satellite datasets, are slightly less than +0.06 °C/decade. Tropical TMT is a quantity explicitly tied to the response of models to the enhanced greenhouse effect (or any applied forcing). Because the sample of climate model runs is relatively large (N = 73) we have a very confident assessment of the model-mean value and its error range does not encompass the observations. In addition, the agreement of the means of two independent observational systems further indicates that we have a very good idea of the actual TMT trend. The mean of the models (often used as the “best estimate” in IPCC assessments) and observations differ by +0.20 °C/decade which is highly significant. And, we are not talking about 10 or 15-year trends – this is a 34-year period over which this discrepancy has grown. Regarding the highly significant nature of the differences in my initial posting, I failed to mention the many papers led by Ross McKitrick (e.g. McKitrick et al. 2010, McKitrick et al. 2011 and others) in which they demonstrate with more advanced statistical tools that the models and observations are indeed significantly different regarding tropical tropospheric temperature trends.
All in all there is little to argue with in the posting by Mears as it reflects a typically careful and clinical examination of the issue, and indeed allows for the conclusion I state above. However, the extra information shown above, I believe, enhances the confidence in the observational results that then leads to a more definitive statement that the models, on average, have significantly misrepresented the evolution over the past 34 years of a bulk quantity directly tied to the models’ response to the enhanced greenhouse effect.
Comments on Sherwood’s Original Post
Sherwood provides a more theoretical discussion of the topic, as well as bringing up some different problems with which climate models must also contend. He points to the likely tendency of models to tie the surface trends too tightly to upper tropospheric temperature and I agree. If I read the post correctly, however, Sherwood also expresses the opinions that (1) the observations are too error-prone for any definitive use, and (2) even if they were useful, the large disagreement between observations and models in the tropical troposphere would be largely inconsequential.
I hold very different opinions in that (1) with time and increased understanding, the observations of the troposphere from independent systems are converging to the true answer, and that (2) the fact the average model is accumulating heat in the upper atmosphere at a rate three times faster than the observations has serious implications for representing the entire climate system, including a mis-modeling of the surface temperature (see next paragraph). It also betrays a mis-handling of the hydrologic/convective processes of the troposphere – processes that are fundamental to understanding climate variability and change over time, as Sherwood notes. And, by utilizing TMT as I did, I avoided the uncertainties of the stratospheric correction to TTT that Sherwood rightly points out.
Sherwood displays a plot that shows how water vapor feedback and lapse-rate feedback tend to cancel. It is important to realize that this is generated from model output which I find difficult to accept as a proxy for the real world. For models in general, water vapor feedback doubles the surface warming. The lapse rate feedback mitigates this somewhat at the surface. Now, if I follow this train of thought correctly with the idea the upper temperature trend doesn’t matter, if the water vapor feedback in models was zero, is Sherwood claiming there would still be the extra surface warming? The observational evidence suggests the water vapor feedback is weak to non-existent for multi-decadal time scales which implies less warming than that depicted by models with their strong positive water vapor feedback.
Sherwood indicates that the surface temperature record is robust for climate purposes. I have three comments to make here. First, we and others have shown that the land surface record, as represented by daily mean temperature, is likely contaminated by a warming nighttime trend due to surface development around the world (e.g. Christy et al. 2006, McKitrick and Michaels 2007, Christy et al. 2009, McKitrick 2010, McKitrick and Nierenberg 2010, McNider 2012, Christy 2013.) Secondly, if the surface temperature is the most robust and important metric, we find that the 73 models shown in the earlier post, on average, produce a surface trend in the tropics that is almost twice too warm since 1979 even with the contaminated observational data (+0.19 vs. +0.11 °C/decade). Thus, even with the surface temperature metric, there are problems for models. Thirdly, the fundamental measure of greenhouse warming is the accumulation of joules in the climate system. The surface temperature is woefully inadequate to document this metric as the deep atmosphere and ocean represent the reservoirs that should be monitored to detect changes in this quantity.
When pointing out other model problems Sherwood notes, as an example, that no climate model has replicated the rapid north polar ice loss and that this is an interesting problem. However, none of the models have shown an increasing extent of sea ice in the southern hemisphere either – so we have a problem at both poles for which models have opposing answers and thus opposing issues to solve. There are other such examples of models overwarming the climate. However, as stated in my original post, the importance of the tropical atmospheric temperature is key to model fidelity to the real world because it involves the complicated and ubiquitous interrelationships among the various water components of the climate system.
I did not quite understand Sherwood’s closing comments challenging scientists to come up with a different theory regarding the hot spot. I understand the challenge regarding a new explanation for tropical features, but the implication is that the current theory should reign. However, it is clear that the current theory (as expressed in climate models) fails the test against observations and thus should not be granted any particular meritorious status. The current theory may be close to reality in some aspects but is missing something important since it diverges so far from reality. Other theories have been offered (i.e. negative cloud feedback as mentioned by Sherwood) which in fact more closely match the observed temperature changes.
As I indicated in the original post, I cannot say why any particular model departs so much from reality, but I believe the evidence is clear that the departures are real and significant which then exposes serious problems for the models as long-range forecasting tools.
John R. Christy
University of Alabama in Huntsville
Christy, J.R., W.B. Norris, K. Redmond and K. Gallo, 2006: Methodology and results of calculating central California surface temperature trends: Evidence of human-induced climate change? J. Climate, 19, 548-563.
Christy, J.R., W.B. Norris and R.T. McNider, 2009: Surface temperature variations in East Africa and possible causes. J. Clim. 22, DOI: 10.1175/2008JCLI2726.1.
Christy, J.R., B. Herman, R. Pielke, Sr., P. Klotzbach, R.T. McNider, J.J. Hnilo, R.W. Spencer, T. Chase and D. Douglass, 2010: What do observational datasets say about modeled tropospheric temperature trends since 1979? Remote Sens. 2, 2138-2169. Doi:10.3390/rs2092148.
Christy, J.R., R.W. Spencer and W.B Norris, 2011: The role of remote sensing in monitoring global bulk tropospheric temperatures. Int. J. Remote Sens. 32, 671-685, DOI:10.1080/01431161.2010.517803.
Christy, J.R. and R.W. Spencer, 2013: Comments on “A bias in the midtropospheric channel warm target factor on the NOAA-9 Microwave Sounding Unit.” J. Atmos. Oceanic Techno., 30, 1006-1013. Doi:10.1175/JTECH-D-12-00107.1.
Christy, J.R., 2013: Monthly temperature observations for Uganda. J. Applied Meteor. Clim. (in press).
McKitrick, R.R. and P.J. Michaels, 2007: Quantifying the influence of anthropogenic surface processes and inhomogeneities on gridded global climate data. J. Geophys. Res., 112:D24S09. DOI:10.1029/2007JD008465.
McKitrick, R.R., S. McIntyre and C. Herman, (2010): Panel and multivariate methods for tests of trend equivalence in climate data sets. Atmos. Sci. Lett., 11(4), 270-277. doi: 10.1002/asl.290.
McKitrick, R.R. and N. Nierenberg, 2010: Socioeconomic patterns in climate data. J. Econ. Soc. Meas. 35:149-175. DOI:10.3233/JEM-2010-0336.
McKitrick, R.R., S. McIntyre and C. Herman, (2011): Corrigendium. Atmos. Sci. Lett., 12(4), 386-388. doi: 10.1002asl.360.
McNider, R.T., G.J. Steeneveld, A.A.M. Holtslag, R.A. Pielke Sr., S. Mackaro, A. Pour-Biazar, J. Walters, U. Nair, and J.R. Christy, 2012: Response and sensitivity of the nocturnal boundary layer over land to added longwave radiative forcing. J. Geophys. Res., 117, D14106, doi:10.1029/2012JD017578.
Po-Chedley, S. and Q. Fu, 2012: A bias in the midtropospheric channel warm target factor on the NOAA-9 Microwave Sounding Unit. J. Atmos. Oceanic. Technol., 29, 646-652.
While it is true that amplification would be observed in response also to increased solar forcing, it’s clear from comparing panels (a) (solar), (c) (GHG) and (f) (all) in the IPCC figure that only GHG’s are expected to have had a sufficiently strong effect to yield the level of warming projected overall.
As far as I can see you haven’t referenced the figure in question but I think I recall from arguments you’ve made previously that you’re talking about this one. You’re either misinterpreting this figure or missing the point. What it shows is the expected temperature change given our understanding of how those different climate drivers have changed historically. The GHG hotspot is larger simply because the model considers that historical GHG forcing should have caused more surface warming than, say, historical solar forcing. If we conjecture that historical solar forcing is actually of a similar magnitude to GHG forcing (as some have) the expected hotspot would be the same. The point is that the hotspot in these models is a function of surface warming, mostly unrelated to the cause of that warming.
As others have indicated, trying to explain a missing hotspot (if it is missing) by invoking biases in the land surface temperature record doesn’t work well. I’ll offer a different perspective than others and simply focus on the magnitudes involved. Land takes up about 23% of surface area in the Tropics (20S to 20N). Even if we were extreme and decided to cut tropical land surface trends by 50% it would only decrease tropical land+ocean trends by about 10-15%. Furthermore, since the proposed hotspot in mid-tropospheric temperatures is a function of the moist adibiatic lapse rate, it is sea surface temperature change, rather than change over land, which will dominate. So, even a large warming bias in the land surface temperature record would have only a small effect on land+ocean surface trends and would be negligable for expectations of mid-tropospheric temperature trends.
I think there may be scope for discussion of the SST records. Carl Mears states that HadCRUT4 and NCDC tropical land+ocean surface trends are very similar. However, modelling using prescribed SST observations (e.g. as discussed by Isaac Held here) is usually performed using the HadISST1 dataset, which doesn’t feature in any of the major land+ocean records. Whereas the 20S-20N 1979-2012 linear trend in HadSST3, part of HadCRUT4, is 0.10ºC/Dec, in HadISST1 it is 0.055ºC/Dec. Replacing HadSST3 with HadISST1 for the SST portion of HadCRUT4 changes the 20S-20N 1979-2012 trend from 0.115ºC to 0.080ºC/Dec. This is fairly significant for the surface trend and, because the difference is in the sea surface area, is potentially important for our expectations of mid-tropospheric trends.
I’ve yet to find anything published looking at reconciliation of tropical HadISST1 and HadSST3 trends. Would be grateful if someone here has seen anything relevant.
Regarding the decreased trend at 850hPa compared to the surface, could it be partially related to radiosondes being launched nearly exclusively from land? It has been discussed in numerous papers that part of the land-sea warming contrast is caused by decreased evapotranspiration due to increased CO2. Even on islands this might enhance near-surface warming.
I would guess the CMIP5 comparisons use complete sampling of the tropics (?) though even attempting to mask for radiosonde sampling might not work since GCM grid cells are too coarse for many of the islands used.
A big thank you to the invited experts, Carl Mears, Steven Sherwood, and John Christy, for their detailed essays regarding the tropical hotspot.
As Carl Mears noted in his response to the others: “It appears to me that the three of us agree fairly well about the basic facts, but differ on interpretation.”
In that vein I’d like to focus on questions 1 and 2 from the introduction, which address some basic facts about the hotspot, about which it may be easiest to obtain agreement. I would like to invite all three invited experts to explicitly address these two questions (either with a yes or no, or with some context if desired), which I’ll repeat below.
1) Do the discussants agree that amplified warming in the tropical troposphere is expected?
Carl Mears explicitly addressed the thermo-dynamical cause of a tropical hotspot (I’ll take that as a “yes”) and Steven Sherwood alluded to such. John Christy referred to it only as a model prediction, without addressing the plausibility of the physical underpinning. Do all three of you agree that amplified warming (of an as yet unquantified magnitude) in the tropical troposphere is expected, based on established physics?
2) Can the hot spot in the tropics be regarded as a fingerprint of greenhouse warming?
Carl Mears and Steven Sherwood explicitly stated that enhanced tropospheric warming over the Tropics is not specific to a greenhouse mechanism, but should occur for any surface warming, irrespective of its cause (I’ll take that as a “no”). John Christy referred to the hotspot as a model-predicted consequence of the enhanced greenhouse effect, giving the impression that he regards it as a fingerprint specific for a greenhouse mechanism. John, perhaps you could confirm whether or not you view the hotspot as specific for a greenhouse mechanism?
After getting these issues clarified, we can move on to Q3, on which there seems to be ample disagreement (on whether the enhanced tropical tropospheric warming is significantly different from observations or not).
Regarding the question of feedback between water vapor and the lapse rate:
The anti-correlation between the lapse rate and water vapor feedback is well understood physically (see e.g., Ingram, 2010), since outgoing radiation changes are largely determined by the relative humidity structure, but this partial cancellation has been known for decades, and alternative ways to setup feedback definitions, such as keeping relative humidity fixed while warming the troposphere (instead of the usual base state in which just the temperature is allowed to adjust and specific humidity is held fixed, see e.g., Held and Shell, 2012) yield insight into the framework behind this cancellation. But I think John Christy is trying to dodge any acknowledgment that we know something about climate.
For the feedback, it’s useful in this context to decompose the tropical water vapor feedback into two components, one of which is the enhanced water vapor that you’d get throughout the column if you warmed the tropical troposphere by a uniform amount. I’ll call that WV(w). The second component to the water vapor feedback would be any “extra water” that you get on top of the uniform-warming assumption by amplifying the upper tropospheric surface temperature. I’ll call that WV(a). So the total water vapor feedback, WV, would be WV = WV(w)+WV(a).
Let LR be the lapse rate feedback. Ignoring cloud feedbacks, and any other surface albedo changes, the total feedback in the tropics would roughly be feedback = WV + LR = WV(w) + WV(a) + LR.
In general, we’d expect WV(w) and WV(a) > 0 (positive feedbacks) and LR |LR| > WV(a). It is the WV(a) component that is partially cancelled by the lapse rate feedback, and since |LR| > WV(a), any departure from the moist adiabat to something closer to a uniform warming situation would result in a positive feedback (since the negative lapse rate feedback is larger than this second water vapor contribution). Nonetheless, the total water vapor feedback, bringing in W(w), makes the sum positive regardless.
There is now clear evidence for positive water vapor feedback in observations, even on longer timescales (see e.g., Soden et al., 2005, Science Shi and Bates, 2011, JGR), despite Christy’s assertion otherwise. This is also true in the lower troposphere and on interannual timescales, with many papers on this (e.g., Dai et al., 2011, J. Climate, and see some of Andrew Dessler’s papers). Consistent water vapor responses are also seen in response to Pinatubo.
The introductory item glosses over some significant recent papers:
“More papers then started to acknowledge that the consistency of tropical tropospheric temperature trends with climate model expectations remains contentious.[xiv][xv][xvi][xvii]”
[xv] Po-Chedley and Fu 2012 says “It is demonstrated that even with historical SSTs as a boundary
condition, most atmospheric models exhibit excessive tropical upper tropospheric warming”
[xvi] Santer et al 2012 says that models “overestimate warming of troposphere” (and it is amusing that Santer et al 2012 does not cite Santer et al 2008).
[xvii] Thorne et al 2011 says “agreement between models, theory, and observations within the troposphere is uncertain over 1979 to 2003 and nonexistent above 300 hPa. ”
These recent papers would suggest that the question of consistency between observed and modelled tropical temperature trends is in fact not very contentious 🙂
Based on emails from both Steven Sherwood and John Christy, and based on Carl Mears’ blogpost, I can report that all three agree that
1) Yes, amplified warming in the tropical troposphere is expected.
2) No, the hot spot in the tropics is not specific to a greenhouse mechanism.
Notice that I changed the wording of question/statement 2 here, because the word “fingerprint” was interpreted differently by John Christy than how we meant it.
In his email to us, John Christy wrote regarding Q1: “Yes, the hot spot is expected via the traditional view that the lapse rate feedback operates on both short and long time scales.” Regarding Q2 he wrote: “it [the hot spot] is broader than just the enhanced greenhouse effect because any thermal forcing should elicit a response such as the “expected” hot spot.” Further elaborations in the email exchange, e.g. regarding whether to call this a fingerprint, involved interpretations as to the meaning of (a lack of) a hot spot, which we will defer for the moment.
The next issue that we’ll take up is encapsulated in Q3:
3) Is there a significant difference between modelled and observed amplification of surface trends in the tropical troposphere (i.e. between the modelled and the observed hot spot)?
As concluded in the comment above the three discussants agree about the first two basic issues. They all expect tropical amplification and they agree that any forcing in the tropics should generate the hot spot so that the hot spot is not strictly a fingerprint of greenhouse forcing.
Based on their guest blogs and first comments there is much less agreement though about the existence of the hot spot in the observations. Put shortly, Sherwood and Mears state the uncertainties in the data are so big that one cannot conclude much. Mears for example wrote in his guest blog: “Taken as a whole, the errors in the measured tropospheric data are too great to either prove or disprove the existence of the tropospheric hotspot. Some datasets are consistent (or even in good agreement) with the predicted values for the hotspot, while others are not. Some datasets even show the upper troposphere warming less rapidly than the surface.”
Christy on the other hand is pretty sure that differences between observations and models are significant. According to him this applies both to the absolute warming trends of the TMT (Tropical Mid Troposphere)(see his figure 1) as well as to the amplification (the ratio between warming in the tropical troposphere and the warming at the surface)(see his figure 3).
Mears (see his figure 1, 2 and 3) and Christy go in most detail about the data they use and they end up drawing different conclusions. This needs clarification. How is that possible and can we understand why they come to different conclusions?
Both Mears and Sherwood (and this opinion is also given in several public comments) say that Christy underestimates the uncertainties in the data. Sherwood in his first comments wrote: “As to John Christy’s post, I don’t really think he’s being forthright about the uncertainties in the data.”
1. Which datasets to use?
Christy on the other hand states that he feels pretty confident about the observational trends because he used the average of four radiosonde datasets (RATPAC, RAOBCORE, RICH and HadAT2) and two satellite datasets (RSS and UAH) and these averages are pretty close to eachother.
Mears in his figure 2 shows RAOBCORE, RICH and HadAT2, RSS and UAH as well, but not RATPAC. In addition he shows two reanalysis datasets (ERA-Interim and MERRA) and a third satellite dataset (STAR). STAR and MERRA come closest to the modelled amplification in Mears’ figure 2.
Christy in his first comments was critical about some of the datasets used by Mears:
So Christy gives a reason why he doesn’t use ERA, MERRA and STAR. Here a reaction of Mears and Sherwood is wanted.
2. TMT vs TTT
Another issue is that Christy used TMT while Mears used TTT (Temperature of the Tropical Troposphere) where TTT is defined as 1.1*TMT – 0.1*TLS (Temperature Lower Stratosphere). Christy thinks TTT needlessly increases the uncertainties by using a signal from the lower stratosphere:
Based on the six datasets he used Christy concludes the TMT trend since 1979 is “slightly less than +0.06 °C/decade which is a value insignificantly different from zero. The mean TMT model trend is +0.26 °C/decade which is significantly positive in a statistical sense.” He didn’t mention error bars.
Mears and Sherwood didn’t give such a number, maybe because they believe such a number is meaningless given the uncertainties in the data.
It would be informative of course to have their best estimates or guesses as well including uncertainty intervals. Could all three of you give these (trends and error bars)?
In summary, we want to get clear why Christy feels sure observations and models are inconsistent with eachother while Sherwood and Mears say the data are too uncertain to draw this conclusion.
Let’s start with the three issues described here and see how far we can come clearing this up.
I’ll address the question of which datasets to consider in this post, and get to the other questions in later posts.
I think it is dangerous to eliminate specific datasets from consideration based on a limited set of criteria. Most of the temperature trend community have agreed that it is best to show all the datasets so that the analyst can use the spread between datasets to assess how well temperature changes are understood. If we start to throw out datasets as soon as we detect a small flaw, we may be left with datasets with larger, but undetected (so far) flaws. Several groups, including ours, have moved to making a large number of possible datasets to further flesh out the range of reasonable datasets.
John Christy likes to use arguments based on short term trend differences and jumps to throw out datasets — usually those with warmer than average trends. We showed in Mears et al, 2012 that this method is strongly dependent on the comparison datasets used, and that the entire time series should be assessed before drawing conclusions, as opposed to only analyzing one or more segments that are under suspicion. Note that in this paper, we found that when the entire time series was investigated, the STAR 2.0 dataset had short term trends that were closest to those in the various adjusted radiosonde datasets.
In Christy’s last post, he made an argument for excluding STAR V2.0 based on a small positive jump in temperature in 2001, and that is is the same as RSS, since it uses the same diurnal correction. First, the jump in 2001 is fairly small, and does not change the 34 year trend very much when removed in V3.0 (the global trend in STAR V3.0 TMT will be about 0.015 K/decade lower — but still warmer than RSS). Second, the STAR analysis uses a completely different calibration scheme based on simultaneous nadir overpasses. In the STAR scheme, the satellite calibration is not polluted by errors in the diurnal correction, because it occurs before the diurnal correction in processing. So I really think STAR 2.0 is an independent dataset, and cannot be excluded based on dependence on RSS. Also note that STAR 3.0 TMT will no longer use the RSS diurnal correction, but still shows more warming than RSS TMT.
I do tend to leave out RATPAC. This is because the individual station data are adjusted before 2005, and then not adjusted after 2005. We have shown that when comparing global radiosonde averages to satellites, it is critical to subset the satellite data to the radiosonde locations
(Mears et al, 2011).
(This is not so important for tropics only averages because of the smoothness of temperatures in the tropics)
I tend to de-emphasive reanalysis output, because I think reanalysis is even less ready than the satellite data for use in global temperature trend assessment. In general, the reanalysis projects ingest uncorrected satellite data, and hope that their analysis system can make the needed adjustments. This has certainly not been proven to be the case, and there are many examples of it not working out — e.g. problems with vapor and clouds in the MERRA reanalysis caused by the advent of AMSU brightness temperatures.
None of this changes my overall conclusions:
1. The presence (or not) of the tropospheric hotspot depends on which pair of datasets you use. Thus the result is not statistically significant in the grossest sense.
2. Measured trends in the tropical troposphere are less than all of the modeled trends (or almost all in the case of STAR 2.0). This is an important, statistically significant, and substantial difference that needs to be understood. I addressed this in my last post.
Assessing the value of Microwave Sounding Unit–radiosonde comparisons in ascertaining errors in climate data records of tropospheric temperatures. JOURNAL OF GEOPHYSICAL RESEARCH: ATMOSPHERES Volume 117, Issue D19, 16 October 2012, Carl A. Mears, Frank J. Wentz and Peter W. Thorne
Assessing uncertainty in estimates of atmospheric temperature changes from MSU and AMSU using a Monte-Carlo estimation technique. JOURNAL OF GEOPHYSICAL RESEARCH: ATMOSPHERES Volume 116, Issue D8, 27 April 2011, Carl A. Mears, Frank J. Wentz, Peter Thorne and Dan Bernie
Why use TTT.
In this post, I explain why I use TTT, especially when comparing to radiosondes.
Here is a figure showing the TMT, TTT, and TLS weighting functions over the ocean.
Over land, it is a little different for TMT and TTT, but these differences do not affect my argument.
First, lets consider TMT,shown in Blue. This weighting function peaks in the mid to lower troposphere, but still has considerable weight above 17 km, which is about where the stratosphere starts in to deep tropics. The problem is that the stratosphere is cooling more rapidly than the troposphere is warming, and this cooling tends to cancel some of the tropospheric warming, making the signal harder to see. There is another MSU/AMSU channel, TLS, (red curve) which is sensitive to the upper troposphere and lower stratosphere in the tropics (not just the lower stratosphere, as is sometimes assumed from its name). Fu and Johanson proposed a combination of these two channels that removes much of the weight above 17 km from the TMT weighting function. Fu calls this TTT, for Temperature Tropical Troposphere. The TTT weighting function is shown in purple.
First let me address the accusations of increased uncertainty. Let me assume that the TMT product has an trend uncertainty of 0.038 K/decade (from Mears et al, 2011), and the TLS has a trend uncertainty of 0.060 K/decade (this is twice what we found in Mears et al 2011). If we assume these two errors are independent, we find a resulting error of 0.0422 K/decade.
Now, since a lot of the error comes from possible problems with the diurnal adjustment, the errors may not be independent, so this could be worse. But even if we assume the worst case scenario (perfectly anti-correlated errors), we only get an uncertainty of 0.0478 K/decade.
So our procedure has increased the uncertainty, by a factor (in the worst case) of about 1.25. That must be bad, right? No, because we have also increased the signal we are trying to see using the procedure. By calculating TTT, we have increased the signal in both the RSS and STAR data by a factor of 1.35. For UAH, the factor is even larger, about 1.8. So, by using TTT instead of TMT, we have increased the signal to noise ratio in all three satellite cases. This would not be true only if the uncertainty in TLS were HUGE compared to TMT
Trends 1979-2010 (K/decade)
Dataset TMT TTT Ratio
UAH 0.050 0.091 1.82
RSS 0.117 0.158 1.35
STAR 0.144 0.194 1.35
This is not the end of the story. TTT has even more benefits when we are considering radiosonde data. It is fairly well established that the problems with the radiosonde increase at higher altitude, with most indications being that even the homogenized records show spurious cooling at high altitude. The exact level where this problem sets in is not known. By using TTT instead of TMT, the weights for the radiosonde levels above 100 hPa are very much reduced, reducing the contribution to the error from these troublesome levels.
Carl A. Mears, Frank J. Wentz, Peter Thorne and Dan Bernie (2011) Assessing uncertainty in estimates of atmospheric temperature changes from MSU and AMSU using a Monte-Carlo estimation technique. JOURNAL OF GEOPHYSICAL RESEARCH: ATMOSPHERES Volume 116, Issue D8
Fu, Q., and C. M. Johanson (2005), Satellite!derived vertical dependence of
tropospheric temperature trends, Geophys. Res. Lett., 32, L10703,
John Christy reasons that if the means of two different subsets of the data are roughly the same, then we know the answer, even if there is large scatter within each subset. That is very interesting: according to that reasoning there is no longer any doubt about equilibrium climate sensitivity, because the average of the models and of various estimates based on past data are each around 3C (e.g., IPCC 2007).
As for his reasons for rejecting various datasets, they seem like subjective, a posteriori rationalisations. Every dataset shows rapid changes somewhere or another which look like they could be artificial, or has some design limitation. Is there any peer-reviewed paper using objective criteria to show that the datasets John rejects are truly worse than the others? My 2008 paper showed that the warming trends from the UAH version of MSU TMT at the time were significantly smaller than those from radiosonde data, in a fairly consistent manner across different parts of the globe, while the other two analyses available at the time were consistent with the sondes (please compare the comprehensive global approach in that paper with the pick-and-choose methods one sometimes sees). This was never either refuted or acknowledged by John who continues to maintain that his products are the ones to believe.
Here is some information on the question of whether observational trends are significant and whether they match model trends; also what individual data sets say versus what they jointly say. These are relevant results from my papers MMH2010 and MV2013 (references at end). Note the final results for MMH2010 were in the Correction of Oct 2011, and MV2013 is under review.
MMH2010 looked at the UAH, RSS, RICH and HadAT series, and for each asked (among other things) if the 1979-2009 trends are significantly different from zero and significantly different from the model mean, at both the LT- and MT-equivalent layers. Each paper uses HAC estimation of the variance-covariance matrices to be fully robust to dependence among the series and over time, but in MMH2010 we had some incomplete data segments, so we also used panel estimators allowing for an AR1 correction, since that method can handle unbalanced panels. In MV2013 all the panels are balanced so we only used the HAC estimators.
(a) The UAH trend (0.078 C/decade) is significantly different from zero at the 10% level, the RSS trend (0.146 C/decade) is significantly different from zero at the 5% level, the HadAT trend (0.091 C/decade) is significantly different from zero at the 10% level and the RICH trend (0.111 C/decade) is significantly different from zero at the 5% level. The 4 series averaged together have a trend (0.105 C/decade) that is significantly different from zero at 5%. So I conclude the data exhibit a trend at the LT layer.
(b) UAH and RSS are individually and jointly significantly different from (i.e. below) the models at the 1% and 5% levels respectively. HadAT and RICH are jointly significantly different from (i.e. below) the models at the 1% level. (We didn’t test them individually.) All 4 series averaged together have a trend significantly different from models at 5%. So I conclude the data are significantly below LT model trends.
(c) UAH and RSS are significantly different from each other at 5%. The MSU series averaged together is not significantly different from the balloon series averaged together (p>0.9). The disagreement within the basic data types is stronger than that across the data types, and is not large compared to the difference between observations and models.
(a) The UAH trend (0.040 C/decade) is insignificant, the RSS trend (0.111 C/decade) is significantly different from zero at 5%, the HadAT trend (0.018 C/decade) is insignificant and the RICH trend (0.025 C/decade) is insignificant. The 4 series averaged together have a trend (0.025 C/decade) that is statistically insignificant (p=0.53). So only RSS exhibits a trend at the MT layer.
(b) UAH and RSS are individually and jointly significantly different from (i.e. below) the models at the 1% and 5% levels respectively. HadAT and RICH are jointly significantly different from (i.e. below) the models at the 1% level. (We didn’t test them individually.) All 4 series averaged together have a trend significantly different from models at 5%. So I conclude the data are significantly below MT model trends.
(c) UAH and RSS are significantly different from each other at 5%. The MSU series averaged together is not significantly different from the balloon series averaged together (p>0.4). The disagreement within the basic data types is stronger than that across the data types, and is not large compared to the difference between observations and models.
MV2013 looks at the 1958-2010 interval using 3 balloon series: HadAT, RICH and RAOBCORE, in each case the latest version at the time of the analysis. We derived a HAC estimator robust not only to dependence over time and among series, but also to a step-change at either a known or an unknown point. The data identify a significant step-change at December 1977 when comparing observations to models. Allowing for this step change we get the following results:
(a) HadAT trend is 0.064 C/decade (0.135 without the step change). RICH trend is 0.093 C/decade (0.134 without the step change). RAOBCORE trend is 0.065 C/decade (0.147 without the step change). The trends are all insignificant with the step change (p=.35, .18, .24 resp) and significant without the step change (p<.001 each). So I conclude that evidence for a significant 1958-2010 LT trend in the balloons is not robust to inclusion of a step-change around 1978.
(b) The average balloon trend is significantly different from the average model (i.e. below) at < 0.1% significance without the step-change, and with a step-change at a known date the difference is significant at <0.02%. In tests of individual models, 12 of 23 individually over-predict the trend by a significant margin. If the break date is assumed unknown the difference between balloons and the average model is significant at <0.1%. So I conclude that over the 1958-2010 period the models have a collective tendency to over-predict the LT trend in the balloon data.
(a) HadAT trend is -0.001 C/decade (0.089 without the step change). RICH trend is 0.048 C/decade (0.096 without the step change). RAOBCORE trend is 0.042 C/decade (0.132 without the step change). The trends are all insignificant with the step change (p=.99, .50, .39 resp) and significant without the step change (p<.001 each). So I conclude that evidence for a significant 1958-2010 MT trend in the balloons is not robust to inclusion of a step-change around 1978.
(b) The average balloon trend is significantly different from the average model (i.e. below) at < 0.1% significance without the step-change, and with a step-change at a known date the difference is significant at <0.01%. In tests of individual models, 17 of 23 individually over-predict the trend by a significant margin. If the break date is assumed unknown the difference between balloons and the average model is significant at <0.1%. So I conclude that over the 1958-2010 period the models have a collective tendency to over-predict the MT trend in the balloon data.
Regarding the comment by Steven Sherwood, the average model sensitivity calculation only tells us something about the central tendency of models. Whether models approximate the real world is precisely the point at issue.
Carl points out that, depending on the surface-troposphere pair one uses, one can observe amplification with altitude, so the result concerning the absence of a hotspot is not significant in a gross sense. My understanding is that this depends to some extent on the use of an earlier version of ERSST which was believed to have a cold bias and later replaced. But in any case, each tropospheric series implies a likely surface trend based on the reciprocal of the amplification factor. The distribution of implied surface trends would be interesting to compare to the distribution of observed trends, as another check on surface data quality.
McKitrick, Ross R., Stephen McIntyre and Chad Herman (2010) “Panel and Multivariate Methods for Trend Comparisons in Climate Data Series” Atmospheric Science Letters Volume 11, Issue 4, pages 270–277, October/December 2010 DOI: 10.1002/asl.290. Correction: October 2011.
McKitrick, Ross R. and Timothy Vogelsang (2013) HAC-Robust Trend Comparisons Among Climate Series with Possible Level Shifts. In review.
As Mears indicates, he, along with Sherwood, are “mystified” that so much attention is drawn to the tropical hot spot, or lack thereof, because they suspect it is not germane to the global warming issue. Others are even more than “mystified” and seek to completely shut off debate which reminds me of the line “… move along now, nothing to see here” used in several movies including Men In Black and The Naked Gun where secretive and embarrassed authorities try to divert a curious public from observing an obvious disaster caused by said authorities. Seriously, in my opinion, there IS something critically important to see here.
The “hot spot”, as I stated earlier, represents an integration of much of our understanding of the energy cycle of the climate system. It is the energy cycle that must be well-characterized before attempting to forecast the climate response to a very slight increase in total energy forcing due to the enhanced greenhouse effect. The tropical atmosphere represents about 30% of the global atmospheric mass, holds a significant role of the planetary hydrologic cycle, and is the entry point for about half of the Earth’s solar energy. If the processes that combine to create the observed tropical structure, variations and change are not understood and replicated well, then we cannot claim we know enough about the system to make confident predictions. Thus, I agree with the instigators of this blogpost, by saying “…DON’T move along now, because there IS something to see here.”
I will address issues as I came across them while reading the posts, so this may appear as a set of disjoint paragraphs. But without reading the somewhat boring details of my following comments, in summary, I don’t believe there is disagreement regarding the basic statement that tropical tropospheric trends of observations and models are significantly different. This means we have some serious questions to explore regarding the energy cycle that have not been well-characterized in the climate modeling establishment to date.
My use of a particular CMIP-5 scenario (here RCP8.5) is irrelevant when dealing with the observational period (a question from Sherwood). All of these RCP scenarios used the same forcing to 2006, then continued on until a prescribed forcing was achieved for each scenario. The differing scenarios don’t lead to different results until after 2030 when the lowest level is approached and that scenario’s forcing is fixed at that level. Thus the use of any of the forcing scenarios is fine as long as one deals with pre-2030 output.
The idea that the average of two completely different sets of measurements gives the same result is quite helpful in my view (a question from Sherwood). [As an aside, Sherwood describes the average of the balloons and satellites as “roughly” the same when in fact their average is different by only 0.01 C/decade, so “roughly” is not an accurate characterization.]
To be more specific about the numbers, we can say the trend of the radiosonde-average tropical TMT annual anomalies is +0.047 ± 0.035 °C/decade where the error is large enough to incorporate all four of the realizations. This trend value is smaller than assumed in my original posting because I had erroneously inserted the lower troposphere (TLT) rather than TMT in one of the datasets. It is curious to me that the latest version of RICH displays the warmest trend of the balloon datasets whereas in earlier versions, RAOBCORE was more positive than RICH. I suspect there may be an unwanted artifact that arises from the use of the ECMWF first-guess in the adjustment process. There is an obvious spurious warm trend in RAOBCORE at the 100 hPa pressure level [see light green circle in my original Fig. 2 at 100 hPa where the trend warms dramatically from 150 hPa to 100 hPa when all other evidence indicates the trend should be less at 100 hPa than 150 hPa], but as this has a small impact on TMT (not so TUT retrieval) we shall not address it.
Averaging the satellites (UAH and RSS) we have +0.059 ±0.031 °C/decade where again the error range encompasses both datasets. Recently I have begun to use the average of RSS and UAH as a useful product in climate discussions as this reduces independent errors that are present in each of our datasets (e.g. McKitrick et al. 2010). Of the six datasets used here, RSS displays the warmest trend and RATPAC the coolest (see Christy et al. 2010 and 2011 for quantitative discussions on dataset differences – information which demonstrates that my decisions regarding dataset selection were based on evidence and were not “subjective, a posteriori rationalizations.”). Taking all datasets together, this provides an average trend between +0.05 and +0.06 °C/decade.
[The appeal to the climate sensitivity analogy regarding dataset differences is actually very interesting (Sherwood question) as recent estimates of its value have fallen well below the IPCC AR4 estimate with no less than 10 recent papers indicating central estimates between 1.0 and 2.1 °C. This is evidence that the calculation of climate sensitivity has considerable uncertainty and that IPCC estimates are likely too high for a number of reasons. By comparison, the observational estimates of TMT shown here are highly consistent.]
The comment about selecting or deselecting datasets is a minor issue. First, all four of the updated radiosonde datasets were indeed used, so there is no issue there. Even though there is some dependence of RAOBCORE and RICH, as they are produced by the same investigator, their individual construction processes appear to be sufficiently different that I utilized them as independent realizations.
Secondly, due to obvious issues with the inability to seamlessly incorporate new observing systems into the assimilation process, the Reanlayses (JRA, ERA-I, MERRA) were not used at all (see Sakamoto and Christy 2009 and Christy et al. 2011– again, the reasons for not using Reanalyses are not “subjective” but based on published information). Additionally, a quick examination of The State of the Climate 2012 (BAMS 2013) indicates the three Reanalyses have much greater spread than the simple observational datasets. Here, regarding Reanalyses, I agree with Mears.
Thirdly, the only observational satellite dataset not utilized was STAR TMT for the reasons stated, i.e. (a) it has a glitch in 2001 that the authors recognize and (b) it uses the same diurnal corrections as RSS, and thus is not an independent realization of satellite temperatures. I included RSS even though it has been demonstrated that the dataset contains a warming shift in the 1990s in the tropics relative to all other datasets (including surface datasets) that suggests errors in the diurnal correction (Christy et al. 2010). Thus, RSS, with a likely spurious warming due to diurnal overcorrections, is accepted, but to compound this by adding STAR (with the same diurnal correction issue) will lead to double counting a documented problem. The differences in the additional adjustments of bias and calibration are small by comparison.
However, if it settles anything, adding STAR into the mix does not change the ultimate conclusion to which I had arrived. Since the STAR shift is known to its authors, I calculated the value relative to RSS (since both STAR and RSS use the same diurnal correction and the AMSUs were in use in 2001, this focuses on the shift apart from other adjustments) as +0.056 °C. Subtracting this shift at 1 Jan 2001, now produces a time series almost identical with RSS with a difference in trend of only +0.004 °C/decade. So, using the three datasets as independent realizations (a poor assumption as noted) we have a satellite result of +0.071 °C/decade.
Mears states the new STAR product (v3.0) differs from RSS in the tropics by an unspecified amount but from Mears numbers would appear to be about +0.010 to 0.015 °C/decade warmer than RSS. We have no information about how STAR has applied adjustments so no independent assessment has yet been performed which may reveal problems. In any case, the calculation done in the previous paragraph with an estimate for STARv3.0 doesn’t change any of the basic numbers and result. Thus, even if we throw out the radiosonde datasets, a mean satellite trend of +0.07 °C/decade is still highly significantly different from +0.26 °C/decade found in the models. I don’t see any other conclusion that can be justified and I note that the other authors essentially agree with this finding.
The use of TTT
The issue of TTT vs. TMT is discussed by Mears who prefers TTT. One objection he expresses is the “considerable weight” that the stratosphere exerts on TMT. Actually, that weight is only 7 percent, hardly “considerable” in my opinion. Now, if TTT were a directly-measured quantity, I would agree that it produces a purer tropical tropospheric signal than TMT. However, it is not directly measured and contains error that is larger than TMT alone. Too, one wonders why models should be exempt from getting the tiny portion of the stratosphere that resides in TMT correct to begin with?
With regard to TTT, I calculated the quantity for all 73 CMIP-5 models with the mean of the tropical TTT 1979-2012 trends of +0.32 °C/decade. The mean of the models’ TMT trends was +0.27 °C/decade (this is the mean of the trends whereas the trend of the annual means is +0.26 °C/decade). Throughout the individual comparisons, a consistent result, and an artifact of the retrieval scheme, was that for models, TTT was warmer than TMT by an amount that was inconsistent with the TMT trend. Indeed for many models with a wide range of TMT trends of +0.13 to +0.45, the addition to TMT to calculate TTT was between +0.04 and +0.05 °C/decade. Note too that the trends of TTT from the satellite datasets provided by Mears are also about 0.04 to 0.05°C/decade warmer than TMT.
Therefore, from the models, the average ratio of TTT to TMT was 1.18, meaning very little new information is provided by the retrieval as far as the models go. The satellite ratios are much larger (1.35 to 1.82) than the models simply because the denominators (i.e. observed TMT trends) are so much smaller.
So, whatever one thinks about TTT, and I think the errors are larger than estimated by Mears, one still must come to the conclusion that the trend of TMT is a problem for models to replicate. This seems to be accepted by Mears and Sherwood too.
Reader GavinCawley: The results of Douglass et al. 2007, which are actually less remarkable than shown in my initial post here, still stand. The confusion created by reading later papers and blogs is the misunderstanding of the question that was addressed. We asked the question this way, ”If climate models had the same surface temperature trend as the real world, would their upper air temperature trends agree with the real world?” This was clearly stated in the paper. This condition of a required agreement in surface trends allowed us to directly compare the models and observations, and we found that the climate models, on average, were significantly different than observations. We discussed problems with the criticisms of our 2007 paper in Douglass and Christy 2013, such as the improper use of datasets known to be obsolete and the comparison of upper air trends even though surface trends between models and observations were not consistent (i.e. apples to oranges).
Reader Phi: The 1000 hPa temperature is taken from the NCDC surface temperature dataset. We did not have values at 950 hPa.
Reader Paul Matthews: Agreed. There are other papers too that are now pointing out the obvious. I’m wondering how the IPCC AR5 will discuss the issue as the earlier drafts were less than forthcoming.
I think that John and I will continue to disagree about TTT.
First about John’s point about TTT not being “directly measured”. Using this argument, many satellite retrievals would not be acceptable, including, to cite a RSS example, all the wind speed and total column water vapor retrievals from microwave imaging instruments, which are derived from measured radiances. But wait a minute, even the MSU/AMSU measurement are derived from radiances! Which are derived from small currents crossing the PN junction in a detector diode. So none are directly measured! In my view, if you don’t believe in the ideas behind calculating TTT, then you don’t believe in the possibility of atmospheric sounding with microwaves. John’s own TLT product uses a similar “combination of weighted measurements” approach, with the added complication that the measurements are made at different locations or times.
I like TTT because it separates the trends is the troposphere and stratosphere more than TMT does. The trends in these area have somewhat different drivers (greenhouse gases vs. ozone), and different measurement problems when talking about radiosondes (radiosondes are probably more screwed up in the stratosphere than the troposphere), and different forcing problems in the models (ozone vs. well mixed green house gases).
But, that being said, let’s leave it at John’s statement–
“So, whatever one thinks about TTT, … one still must come to the conclusion that the trend of TMT is a problem for models to replicate. This seems to be accepted by Mears and Sherwood too.”
None of my conclusions would not be altered by using TMT instead of TTT.
To summarize my conclusions:
1. The observations are probably not good enough to prove or disprove the presence of the hot spot. This is in part due to the added noise that one gets when calculating the ratio of two small, relatively similar, uncertain numbers.
2. Models are showing much more tropical tropospheric warming than observations.
3. I don’t think errors in the datasets are large enough to account for this discrepancy.
4. There are a lot of possible reasons for this, some having to do with inputs to the models, and some having to do with model physics.
5. I doubt these problems are such that throwing out the idea of anthropogenic global warming is warranted.
From my perspective there has been much discussion of observations but little regarding quantification of exactly what it is we should expect to see in these observations. I thought it would be a good idea to offer a review the main methods and extracted expectations pertaining to this topic:
TLT/Tsurf amplification factor
The trend ratio between a particular vertical-weighting of model air temperature to model land+ocean surface/near-surface temperature. Christy et al. 2010 found an expected TLT/Tsurf amplification factor of ~1.4 using CMIP3 models.
Using CMIP5 1981-2005 TLT trend figures reported by Po-Chedley and Fu 2012 comparing to CMIP5 land+ocean SAT (2m surface-air temperature) data from Climate Explorer I found the average ratio to be 1.33 with 95% spread 1.05-1.75. However, there would be a small bias here if we tried to relate this ratio to observations because Tsurf records tend to use ocean SST data rather than SAT. For the models which also had consistent SST data stored at Climate Explorer I produced combined LandSAT+OceanSST time series using an area ratio of 0.232:0.768. From these I found an average ratio of 1.4, 95% spread 1.15-1.5. Part of the reduced spread seems to be due to the use of SSTs rather than SATs, and part due to dropping models without available consistent data.
Equally we can check TTT/Tsurf amplification factors, again using TTT values from P-CF2012. Against CMIP5 landSAT+oceanSST I find an average 1.65 and range 1.35-1.8.
One caveat I would offer here is that very few CMIP5 models produce surface trends close to observed values. It’s possible that the range may be overconfident or even biased due to them relating to model runs with mostly larger surface trends.
This was the actual focus of Po-Chedley and Fu 2012. While the TTT and TLT weightings substantially overlap TTT has greater weighting in the upper troposphere and TLT in the lower troposphere. Therefore the ratio between the two can be considered to reflect the presence and strength of any “hotspot” around the upper troposphere in relation to lower tropospheric warming.
In all CMIP5 models there was a ratio greater than 1 for 1981-2005 trends, meaning that TTT warmed more than TLT in every model, and the average was 1.19.
AMIP prescribed SST simulations
Unlike the fully coupled atmosphere-ocean (and land) CMIP5 model runs, AMIP simulations use atmosphere-only models fed with SSTs from observations. Automatically this means surface trends are a decent match for those which actually occurred, both in terms of magnitude and spatial distribution, so at first glance we might expect that the model output can be compared directly to observed trends as a test of how the model translates surface temperature changes up through the troposphere. However, I found a few details which make me question the value of AMIP simulations in this regard.
Despite using substantially the same basic models as in CMIP5 historical runs, AMIP simulations produce a substantially shifted TLT/Tsurf range, with average TLT/land+oceanSAT moving from 1.33 to 1.55. Now, it might be the case that the specific pattern of observed SST trends tends to cause enhanced tropospheric amplification, at least in the models – that is, after all, why you might want to run these types of simulations rather than trusting a universal scaling factor and it is still within the 1.05-1.8 spread. However, there also appears to be a considerable discrepancy between CMIP5 and AMIP simulations regarding ocean SAT/SST trend ratios – averaging about 1.07 in CMIP5 and 1.15 in AMIP – meaning that TLT/LandSAT+oceanSST ratios average ~1.8 compared to ~1.4 in coupled CMIP5 runs. This suggests to me that the enhanced amplification is really caused by a bug/feature of the AMIP experimental setup (i.e. non-interactive SSTs) which is causing unrealistically rapid trend gradients, particularly in the boundary layer.
This apparent issue makes me doubt the usefulness of AMIP model simulations for the topic in question. I don’t think the outputs can really be considered to be expected values. At the least it needs to be recognised that they are likely biased high for TLT, TMT and TTT trends.
Discrete vertical interval trend comparisons
Basic model air temperature outputs are provided in discrete vertical levels, defined in terms of atmospheric pressure, from the surface to upper stratosphere. We can simply find model trends at each level as John Christy does in his figures 2 and 4. Since satellite MSU/AMSU data doesn’t have the fidelity to properly compare discrete levels in this way, the only observations available for this task are from radiosondes.
At first glance this seems like the most obvious and best way to test model expectations – it avoids the messy overlapping of vertical domains in the satellite data and uncertainties in vertical weighting functions used to produce the model TLT/TMT/TTT equivalents. However, the problem is in assessing whether the radiosonde observations are up to the task. It’s beyond the scope of this little review of expectations to discuss deficiencies in radiosonde observations (not to mention putting me way out of my depth) but it is relevant to suggest how model data should be processed in order to provide like-for-like expectation comparisons with the radiosonde observations. At the least the model data should be masked to cover only the sites from which radiosondes were launched, rather than sampling the whole tropics as I assume was done to produce John Christy’s graph.
John Christy has offered another comparison, in his original post and latest comment: that of absolute CMIP5 (TMT in this case) trends to relevant satellite and radiosonde observations. It’s difficult to see how this comparison is useful for the discussion at hand given that modelled mid/upper tropospheric trends are so closely linked to surface trends and CMIP5 tropical surface trends are almost all larger than those observed. So, even if mid/upper tropospheric warming was working perfectly consistently with model expectations (relating to the scaling discussed earlier), this comparison might well incorrectly indicate otherwise.
John Christy might argue that surface trend is affected by what happens aloft so it is an appropriate test. While it is very likely the case that surface trends will be affected by the specifics of lapse rate changes and water vapor feedbacks offering this comparison as evidence of something would require an assumption that those combined factors are the dominant reason for lower observed surface trends. Since that is both a major assumption and an, as yet, unfounded one I would suggest this type of comparison contributes more confusion than insight. I think it’s an important topic for discussion why modelled tropical surface trends are generally so much larger than observed, but it’s not this topic.
Regarding John Christy’s statement in his last comment:
The 1000 hPa temperature is taken from the NCDC surface temperature dataset. We did not have values at 950 hPa.
This is problematic because the NCDC product has very different spatial sampling from radiosonde datasets and the NCDC land surface component mostly relates to the 925hPa level rather than 1000hPa.
It would make more sense to use the radiosonde 850hPa level as the base reference for comparison with model trend ratios.
Ok, let’s try to summarise again what the discussants said about trends. Let’s start with the satellite trends. A few things are still unclear.
Mears in his latest comment said the following about trends:
Dataset TMT TTT Ratio
UAH 0.050 0.091 1.82
RSS 0.117 0.158 1.35
STAR 0.144 0.194 1.35
Christy in his reaction also mentioned some trend numbers.
“Averaging the satellites (UAH and RSS) we have +0.059 ±0.031 °C/decade”
Now if I take the Mears numbers (0.050 + 0.117/2) this gives an average trend of 0.0835 which is higher than the 0.059 that Christy mentioned? What is the reason for this discrepancy?
Also the 0.117 trend is outside the “error range” of 0.059 + 0.031. So while Christy said that “again the error range encompasses both datasets”, this doesn’t seem to work for the numbers that Mears gave.
Sherwood so far didn’t give trend estimates. Do you accept those of Mears?
In the public comments Ross McKitrick is commenting extensively about the trends. He has two papers about this topic and I therefore add his trend estimates to the list:
“The UAH trend (0.040 C/decade) is insignificant, the RSS trend (0.111 C/decade) is significantly different from zero at 5%”
So according to McKitrick and Vogelsang the UAH trend is insignificant and the RSS trend is significant. This relates to our question 4: What could explain the relatively large difference in tropical trends between the UAH and the RSS dataset?
McKitrick added this statement that is relevant here: “(c) UAH and RSS are significantly different from each other at 5%.”
Do the discussants agree with this conclusion that differences between UAH and RSS are significant?
What could be the reason?
Christy wrote in his latest comment:
I included RSS even though it has been demonstrated that the dataset contains a warming shift in the 1990s in the tropics relative to all other datasets (including surface datasets) that suggests errors in the diurnal correction (Christy et al. 2010).
Mears in his comment didn’t address the difference between UAH and RSS directly, but his comment about STAR is relevant:
“In Christy’s last post, he made an argument for excluding STAR V2.0 based on a small positive jump in temperature in 2001, and that is is the same as RSS, since it uses the same diurnal correction. First, the jump in 2001 is fairly small, and does not change the 34 year trend very much when removed in V3.0 (the global trend in STAR V3.0 TMT will be about 0.015 K/decade lower — but still warmer than RSS).”
So Mears suggests the small jump in 2001 cannot explain the differences between UAH and RSS.
For outsiders these differences are intriguing as on a global scale the trends from UAH and RSS are very close to one another.
What are the possible reasons for the difference between UAH and RSS?
The trend discrepancies are mostly due to referencing different periods – 1979-2010 for Mears, 1979-2012 for Christy.
uah5.6 1979-2010 = 0.048
uah5.6 1979-2012 = 0.03
rss3.3 1979-2010 = 0.114
rss3.3 1979-2012 = 0.089
To support the idea that RSS and UAH Tropical TMT trends are significantly different from each other the RSS site displays a validation comparison. If you select TMT and Summary buttons you can see that sampled UAH trends are outside the uncertainty range produced by RSS for the same sampling.
For outsiders these differences are intriguing as on a global scale the trends from UAH and RSS are very close to one another.
That’s pretty much true for TLT but there is a reasonable discrepancy for global TMT: 0.078 against 0.046ºC/Dec. That is intruiging though, given that TLT is produced from the same base data as TMT. It suggests the similarity in global TLT is largely due to compensating errors rather than agreement.
In his last post, Marcel asked:
Do the discussants agree with this conclusion that differences between UAH and RSS are significant?
What could be the reason?
The significance question depends on the uncertainty estimates in both the RSS and UAH data. The uncertainty in UAH has not been documented well enough for me to feel comfortable doing analysis with it. Using the uncertainty analysis that we have done using a Monte-Carlo approach, I can say that in general for TLT, UAH is within our 95% confidence interval, while for TMT, it is not. This suggest that for TMT, the RSS/UAH differences may be significant.
We tried to explore the reasons for these differences several years ago. The two main possibilities are differences in the non-linearity correction (the “target factor” — this accounts for errors due to changes in temperature in the hot calibration target) and differences in the diurnal adjustment applied to account for changes in measurement time. The target factor differences are largest for NOAA-09, where UAH uses a very large value for alpha. This is discussed in Po-Chedley and Fu, 2012. The diurnal adjustment is most important for NOAA-11, due to its long life and large measurement time drift. In 2005, UAH made changes to their diurnal adjustment to TLT that brought their results into closer agreement with RSS, but I am not sure if any changes were made to the UAH TMT adjustment. Around that time, there was a paper circulating describing the new UAH diurnal adjustment, but to my knowledge it is not published (John, is this true?). At any rate, the geographic distribution of RSS/UAH TMT differences points to the diurnal cycle as the most likely culprit. The best agreement is in the Southern Hemisphere Extratropics, where the diurnal adjustment is small due to the prevalence of ocean. The largest disagreement is in the tropic, where the diurnal cycle tends to be large. There is also a ramp in the difference time series during the NOAA-11 lifetime. If the main culprit were the target factors, the differences would be more similar in the different regions.
A pointed out by Paul S, there is a tool on our website that allows you to look at trend difference between RSS, UAH, and STAR, as well as the radiosonde datasets. The error bars on the RSS data are from our Monte-Carlo Analysis, and include estimates of error in the diurnal adjustment, as well as the subsequent effects of these errors on the intersatellite merging process. The error analysis is documented in our 2011 paper.
Paul S also said, in response to Marcel
Marcel: For outsiders these differences are intriguing as on a global scale the trends from UAH and RSS are very close to one another.
Paul S: That’s pretty much true for TLT but there is a reasonable discrepancy for global TMT: 0.078 against 0.046ºC/Dec. That is intriguing though, given that TLT is produced from the same base data as TMT. It suggests the similarity in global TLT is largely due to compensating errors rather than agreement.
I (Carl) agree with this last sentence.
Po-Chedley, S. and Q. Fu, 2012: A bias in the midtropospheric channel warm target factor on the NOAA-9 Microwave Sounding Unit. J. Atmos. Oceanic. Technol., 29, 646-652.
Mears, C. A., F. J. Wentz, P. Thorne and D. Bernie, (2011) Assessing Uncertainty in Estimates of Atmospheric Temperature Changes From MSU and AMSU Using a Monte-Carlo Estimation Technique, J. Geophys. Res., 116, D08112, doi:10.1029/2010JD014954.
I will be very busy in the next few days so I will respond as best I can now:
From Mears (stamp 11 Sep 6:18 p.m.): As Carl notes, “I think John and I will continue to disagree about TTT.” This is a statement with which I agree, and I do so mainly because TTT has greater error than TMT, not because of the physical profile it attempts to produce.
From Mears (stamp 11 Sep 5:50 p.m.): In a statistical sense one could say TMT for RSS and UAH are significantly different from each other in the tropics. However, one could not say the same regarding the MEAN of UAH and RSS, i.e. both datasets are within error ranges of their common mean value. I often use the mean of RSS and UAH now because whatever spurious warming/cooling there might be in our separate constructions it will likely be minimized in the average. I think Carl’s discussion on the cause is correct – the divergence in the mid-1990s is the key and likely relates to the diurnal adjustments.
When Wentz and Mears discovered the UAH diurnal adjustment error in TLT back in 2005, we corrected the problem with a new diurnal calculation. However, that new calculation had virtually no impact on TMT’s diurnal adjustment since it was not subject to the erroneous cross-swath-subtraction artifact as was TLT. This diurnal correction was based on 3 co-orbiting AMSU instruments. [For those unfamiliar with this adjustment, UAH uses an empirical technique drawn from (admittedly noisy) observations and RSS relies on a climate model simulation. Neither will be perfect, hence an average of the two products I think is the best way to deal with the differences, since the average is within error ranges.] It should be obvious that an accurate representation of the diurnal drift error is not an easy problem to solve.
Ultimately, the divergence of model projections and observations tells us we have much to learn regarding the climate system, and satellite observing systems are critical to improving our knowledge. With so much more to learn, and the apparent relative insensitivity of the climate system to CO2 forcing as demonstrated by very modest temperature trends, I believe we are in a situation to question the presumed outcomes of specific carbon-control proposals which will also have tremendous economic impacts. These outcomes are based on model projections which to this point have low credibility in my view.
Paul S.: Starting the comparison with 850 hPa misses the key point of the relationship of the surface to the troposphere. This “within troposphere”, say 850-200, lapse rate is useful in its own right, but different from a more fundamental issues of the relationship between the surface and troposphere that involve the climate system. Recall that the vast majority of heat flux to the atmosphere occurs at the surface, not 850 hPa. How one partitions that heat (i.e. fluxed back to air with sensible and latent heat vs. being diffused down into the ocean) is critical to the question posed here. So, by examining the surface-troposphere relationship one delves into these tremendously important parts of the climate system.
Paul S.: The validation comparison of RSS in the “Tropics – 30S-30N” uses a very large number of questionable radiosondes outside of the 20S-20N band. [See Christy et al. 2007 and 2010 for comparisons in the band we are discussing. The updated sonde comparisons in Christy et al. 2011 give a slight tip to UAH.] I also notice that the website’s satellite comparisons with RAOBCORE and RICH may be utilizing their older versions. I note too that though UAH lies outside of RSS for TMT in 30S-30N, UAH agrees almost exactly with HadAT2 and IUK. I agree with Carl that the disagreement would be considerably reduced if we could deal with the NOAA-11 (and some NOAA-14) differences. [Note that UAH cuts off NOAA-11 in Aug 1994 when its instrument temperature started swinging wildly, whereas I believe RSS uses NOAA-11 a little further out.]
As we are still discussing datasets, I wanted to raise a point that Sherwood made in his guest blog and that so far hasn’t been discussed.
“I used to think (as do most others) that the radiosondes were wrong, but in Sherwood et al. 2008 we found (to my surprise) that when we homogenised the global radiosonde data they began to show cooling in the lower stratosphere that was very similar to that of MSU Channel 4 at each latitude, except for a large offset that varied smoothly with latitude. Such a smoothly varying and relatively uniform offset is very different from what we’d expect from radiosonde trend biases (which tend to vary at lot from one station to the next) but is consistent with an uncorrected calibration error in MSU Channel 4. If that were indeed responsible, it would imply that there has been more cooling in the stratosphere than anyone has reckoned on, and that the true upper-tropospheric warming is therefore stronger than what any group now infers from MSU data. By the way, our tropospheric data also came out very close to those published at the time by RSS, both in global mean and in the latitudinal variation (Sherwood et al., 2008).”
I wonder what Mears and Christy have to say on this.
Prof. Christy replies to my comment “The results of Douglass et al. 2007, which are actually less remarkable than shown in my initial post here, still stand. The confusion created by reading later papers and blogs is the misunderstanding of the question that was addressed.”
Actually I spotted the error from reading the original paper, the problem is not to do with the question posed in the paper, but the statistical validity of the analysis used to provide an answer.
“We asked the question this way, ”If climate models had the same surface temperature trend as the real world, would their upper air temperature trends agree with the real world?” ”
Whis is a reasonable question, but a reliable answer requires a valid statistical test. It is dissapointing that Prof. Christy responded to my comment, but without actually addressing the “parallel Earths” thought experiment that explains why the test used in Douglass et al. is clearly inappropriate as a perfect climate model ensemble would be virtually guaranteed to fail it!
Prof. Christy continues: “We discussed problems with the criticisms of our 2007 paper in Douglass and Christy 2013, such as the improper use of datasets known to be obsolete and the comparison of upper air trends even though surface trends between models and observations were not consistent (i.e. apples to oranges).”
The discussion in the 2013 paper does not include a discussion of the validity of the statistical test used, so it fails to address the criticism raised in my comment.
The focus in the dialogue here on instrumental series problems, and particularly issues with the satellite measurements, I think is fascinating. I congratulate the organizers – at least in this instance we seem to have hit on a substantive issue with serious discussion.
I still find John Christy’s argument (and Ross McKitrick’s supporting comments) about averages to be rather disconcerting. In Christy’s latest he writes “Neither will be perfect, hence an average of the two products I think is the best way to deal with the differences, since the average is within error ranges.” However, we are discussing errors of unknown origin. The difference between the two shows there are real observational errors here. It’s possible the two datasets have errors in opposite directions, and the average cancels those out. But it’s equally possible they have errors in the same direction, with one larger than the other – the average in that case is not getting us closer to the truth, but keeping the large error bars in mind (is Christy quoting only one-sigma standard errors?) even so it may be the best we can do. The fact that a third series, STAR, has numbers outside of the range between the other two certainly suggests that there really is a strong possibility the truth is not between but outside the range of the other two.
In any case, the impression Christy gives here is one of trying to minimize uncertainties. I wonder what Judith Curry would have to say about it? In actual fact the measurements here are still clearly very uncertain, and are simply not of sufficient quality to say anything about differences between tropical troposphere amplification in observation and theory – the theory is well within 90% confidence interval regarding the ratio of troposphere to surface warming, and as I noted in my earlier comment, if you look at shorter time intervals the observations actually show clear and consistent amplification. Here the certainty Christy conveys is to imply an error in theoretical models of climate, and therefore that all conclusions about climate change are uncertain. To the contrary, there is no strong evidence here at all of any problem with models, but even if there were – uncertainty about climate change is hardly a reason to be certain that nothing bad will happen, and therefore we can ignore it!
There are inconsistencies but also things that agree and sometimes surprisingly well. As has been said, a ratio based on the values at 850 hPa avoid the serious inhomogeneity radiosondes / SST and stations. The following graph (http://img708.imageshack.us/img708/6844/s8ht.png) is an overprint on figure 2 of John Christy (see the orange curve).
Also shown an alternative with correction of surface temperature which can be justified as follows:
1. This is an extrapolation of tropospheric data to the surface which is founded on the known atmospheric physics and the results of models.
2. This is roughly the ocean component of the trend, see tropical SST by NCDC (http://www.climate4you.com/images/NOAA%20SST-Tropics%20GlobalMonthlyTempSince1979%20With37monthRunningAverage.gif).
3. The high value of surface trends is due to stations data which are notoriously unreliable (see this pretty amazing example : http://imageshack.us/a/img21/1076/polar2.png).
The tropical hot spot is therefore there and even more pronounced in relative terms than what is expected. This should not be too surprising since the addition of CO2 is not assimilable to a forcing but act (before feedback by temperature) especially on the gradient.
In one of his latest comments Mears summarizes his findings (bold mine):
Mears backed up his first conclusion with the figures 1 and 3 in his guest blog which I will repeat here:
Both these figures deal with the amplification of the warming compared to the surface.
Christy’s comparable figure is this one:
Christy’s conclusion is that even if you look at the trend ratios the differences are significant:
Ross McKitrick, who also has a number of relevant publications on this issue tends to agree with Mears in this case. In a public comment he wrote:
It would be interesting to hear Christy’s reaction on this. Would he be willing to reconsider his position on this specific issue?
Definition of the hot spot
Having read most of the discussion again I realised there is still some confusion left about the definition of the hot spot. In their contributions Sherwood and Mears clearly referred to the hot spot as “the amplification of the surface warming in the troposphere”.
In our introductory article we were less clear about it, I would say in hindsight. About our figure 1 (repeated below) we wrote: “The expected warming is highest in the tropical troposphere, dubbed the tropical hot spot.”
Here the “hot spot” doesn’t refer to the amplification alone but also to the fact that models expect lots of warming aloft. Ross McKitrick also made a remark about this (my bold):
Do all the participants agree that the term hot spot is also used for the large absolute warming trend that is expected high in the tropics? Should we limit the term to the “amplification”? If so what other term could we use for the high warming rates aloft?
Marcel asks whether by “hot spot” one means the warming aloft, or the difference (or ratio) between warming aloft and at the surface. The problem here is that the “hot spot” concept was not created by scientists (as far as I know) but is a term coined by climate skeptic bloggers. If one looks at the problem from the point of view of climate physics it decomposes naturally into one on lapse rates (which are governed by atmospheric convective processes) and global surface temperature (which is controlled by top-of-atmosphere radiative balance and ocean heat uptake). For this reason the focus in the scientific literature (as opposed to the internet) has been on either lapse rates, or surface temperatures, and this is the focus I prefer. Obviously it is fair enough to ask whether warming in any particular location is consistent with models or not, if one’s only goal is to falsify models. But if one is trying to understand the system it is better to ask first what is happening at the surface, and then, given that, what is happening in the atmosphere.
This installment of my comments on the “hot spot” truly descends into what we in the USA call “the weeds” and deals only with some side issues. Some information on these side issues is provided below and I hope it will be informative to the reader who has considerable patience.
Carl and I performed our error testing of the satellite in differing ways. Going back to Christy et al. 1995, we indeed tested the parametric ranges of the bias calculation by varying the number of overlapping observations utilized, testing the effect of the magnitude of the noise cut-off parameter, and testing the various options for the sequence of satellites employed to create the backbone. We also tested the reduction in daily noise based on the window-width of the time-filter by latitude. The total impact of the diurnal effect for MT was relatively small (it is larger for other layers.) Thus, we did perform several types of variational testing.
But testing parametric uncertainties of a dataset construction process does not get to all of the potential error. If a diurnal adjustment, for example, is fundamentally flawed, the parametric variations around that flaw won’t lead to better understanding. So, the main effort for our error calculations was to employ a completely independent observational dataset for testing – that being radiosondes. Unfortunately, the majority of sonde data records are plagued with numerous, and often uncatalogued, changes through time. Our solution was to select a subset that geographically spanned the tropics to the high latitudes, which, by design, had the most consistent set of instrumentation and methods – the U.S. VIZ subset of 32 stations. This was done at the grid point level using the full radiation code, including humidity effects. Below is a table of the results from Christy et al. 2011 regarding the comparison of UAH, RSS and STAR against the VIZ radiosondes.
Statistical properties of the difference time series between the adjusted VIZ sonde series and each satellite dataset.
Mon Std Dev Differences
Ann Std Dev Differences
Monthly r2 Composite
Annual r2 Composite
With such a comparison (to make a long story short) we were able to generate a set of error characteristics that was not affected by our own subjective notions of parametric uncertainty. We did some of this in our 1992 papers (Spencer and Christy 1992a,b), but did so more thoroughly beginning with Christy et al. 2003. This type of analysis was most recently updated in Christy et al 2011. We also utilized the 28-station, well-documented Australian network in this paper (results below), but with less success due to some significant changes in their stations that were not consistent in time. Thus each station had to be treated separately and it was UAH which successfully pin-pointed the Australian instrument changes (for which there was documentation) more often than RSS and STAR.
Characteristics of the detection of breakpoints for the 28 Australian sondes.
Mon Std Dev Differences °C
Median r 28 Stations
Annual r2 Composite
Of the three datasets (UAH, RSS and STAR) the results indicated UAH data achieved the smallest error characteristics relative to both the U.S. VIZ and the Australian radiosondes.
In Christy et al. 2011 (and earlier papers) we found a clear trend in the difference between the VIZ sondes and both RSS and STAR during the 1990s which was not present relative to UAH. This was during a period in which the VIZ instrumentation was completely consistent. We interpret this to demonstrate a spurious warming due to diurnal correction errors in RSS and STAR (they both used the same diurnal adjustment) for NOAA-11 and NOAA-14. Christy et al. 2010 demonstrated the same result using the more traditional area-average surface and radiosonde datasets in the tropics.
Mears, evidently, does not appreciate these results as do I. He indicates that “The uncertainty in UAH has not been documented well enough for me to feel comfortable in doing analysis with it.” According the published results above, I could say exactly the same about the RSS dataset. Rather, Mears group prefers to use a collection of “adjusted” individual multi-country radiosondes which are, from my perspective, plagued with unknown instrumentation and other changes (Mears et al. 2011). I’ve looked at these sondes individually through the years and many require “adjustments” for undocumented changes whose magnitudes impact the trend to an extent greater than the true trend-signal itself (see Christy and Norris 2004). So I chose not to add such a significant complication into the evaluation methodology. However, I have utilized the tropical average of these radiosonde datasets as a way to minimize their individual errors.
For the reader, this question of “which is better?” ends up being a dilemma because both of our groups can make strong claims to back up our decisions as to why we chose the particular method of testing. It is entirely understandable that the reader would be suspicious of any group whose methodology of evaluation supports the results of that group. So, which dataset is better? In many of my presentations, as done here, I simply utilize the average of UAH and RSS and so by-pass the question. By so doing, I essentially assume that UAH and RSS contain an equal amount of error on either side of the truth. This seems reasonable to me. (However, if we both contain a systematic error, such as inclusion of what we believe is spurious warming of the NOAA-12 sensor, that error remains in the average.) Averaging the radiosonde datasets (with four members) is reasonable as well.
Reader Phi: (stamped 2013-09-12 16:34:04)
Others have discussed this through the years – i.e. estimating the surface trend by making it consistent with the upper air profile rather than using the scattered ,and often deficient, surface thermometer stations. Doing so gives a result for the 1979-2012 surface trend (in your diagram about +0.035°C/decade) that is outside of the measurement errors of the observations. There is some information to support the hypothesis that the surface warming over the tropical land is misrepresented by the current datasets since they utilized TMean which contains the contamination of TMin by surface development (see my papers on East Africa temperatures Christy et al. 2009, Christy 2013). While I think the current surface datasets show more warming than they should, I don’t think they are off by that much. I’m comfortable with the idea that the complex vertical temperature structure of the tropics contains enough degrees of freedom that allow for departures in the area-average from the strict moist-adiabatic lapse rate profile. However, if the tropical surface trend is only +0.04 °C/decade, then model projections of the past 35 years are in even greater error than demonstrated in this blog topic.
The issue in Douglass et al. 2007 is straightforward. There were two populations of a defined metric in the tropics (a linear trend) that we compared – trends from observations and trends from a model average. There were several altitude levels for this test so the model averages consisted of comparisons at each level. Our analysis simply compared the MEAN of the trends of the models for significance testing at each altitude. (We used the MEAN because this is often used as the “Best Guess” for IPCC assertions. Again, our study focused on the MEAN.) There were 67 model runs representing 22 different modeling groups in our model population. We chose to be conservative and assigned for the models a sample size of N = 22 rather than 67 even though all 67 were included. We calculated the mean and standard error of the model trends at each level according to the most simple of statistical methods in which the magnitude of the population mean is estimated from the sample mean with standard errors appropriately calculated. The observations were significantly different (highly significantly different) from the models’ means. Several new publications are appearing which support this conclusion (e.g. Douglass and Christy 2013, Fyfe et al. 2013).
The idea of parallel Earths is not needed. To answer the point about calculating the standard-error-of-the-mean by using an infinite number of samples from the population which then produces a value of zero for the std-err – this answer is correct, i.e. you would have perfect knowledge of the mean since you used all members of the population and thus no error. The simple result of Douglass et al. 2007 is still valid for the limited, restrictive question we asked and answered. We were comparing the MEAN tropospheric trends of the models for the restricted case in which their surface trend was +0.13 °C/decade. As more realizations are added (whether in parallel Earth’s or not), and because models have a very rigid relationship between the surface and the troposphere, the MEAN of whatever set of models is chosen will be found within a narrow range. This was further demonstrated in Christy et al. 2010. If I understand GavinCawley’s claim, it appears that he/she believes that there should be a very wide range of tropospheric trends for the case in which all of the selected models posses a surface trend of +0.13 °C/decade. I have never seen evidence that would support such a claim, and indeed have seen considerable published evidence that contradicts it. A simple check of the plots I presented in the initial blog indicates as much. [We use the restriction on the surface trend being +0.13 °C/decade for the obvious reason that the tropospheric observations are so constrained as well. Thus we can have a true “apples-to-apples” comparison throughout the atmosphere.]
Christy, J.R., R.W. Spencer and R.T. McNider, 1995. Reducing noise in the MSU daily lower-tropospheric global temperature dataset. J. Climate. 8, 888-896.
Christy, J.R., R.W. Spencer, W.B. Norris, W.D. Braswell and D.E. Parker, 2003. Error estimates of version 5.0 of MSU-AMSU bulk atmospheric temperatures. J. Atmos. Oc. Tech., 20, 613-628.
Christy, J.R., W.B. Norris and R.T. McNider, 2009: Surface temperature variations in East Africa and possible causes. J. Clim. 22, DOI: 10.1175/2008JCLI2726.1.
Christy, J.R., B. Herman, R. Pielke, Sr., P. Klotzbach, R.T. McNider, J.J. Hnilo, R.W. Spencer, T. Chase and D. Douglass, 2010: What do observational datasets say about modeled tropospheric temperature trends since 1979? Remote Sens. 2, 2138-2169. Doi:10.3390/rs2092148.
Christy, J.R., R.W. Spencer and W.B Norris, 2011: The role of remote sensing in monitoring global bulk tropospheric temperatures. Int. J. Remote Sens. 32, 671-685, DOI:10.1080/01431161.2010.517803.
Christy, J.R., 2013, Monthly temperature observations for Uganda. J. App. Meteor. Clim., in press.
Douglass, D.H. and J.R. Christy, 2013. Reconciling observations of global temperature change: 2013. Energy and Env., 24, 415-419.
Fyfe, J.C., N.P. Gillett and F.W. Zwiers, 2013. Overestimated global warming over the past 20 years. Nature Climate Change. 3. 767-769.
Mears, C.A., F.J. Wentz, P. Thorne, D. Bernie, 2011. Assessing uncertainty in estimates of atmospheric temperature changes from MSU and AMSU using a Monte-Carlo estimation technique. J. Geophys. Res. 116, Issue D8, 27.
Prof Christy: You have missed the point. Of course having an infinite sample means that we know the MEAN exactly, however we know a-priori that the observed trend is not going to be identical to the ensemble MEAN, even if the model is perfect.
The ensemble MEAN is an estimate of the forced response of the climate system (i.e. the response of the climate due to changes in the forcings), as the unforced response (i.e. “internal variability” or “weather noise” etc.) is not coherent across model runs and hence will average out. The obseved trend however is the result of both the forced response and a single realisation of the unforced response. So we shouldn’t expect them to be identical, even if the model is perfect; the expected difference between the ensemble mean and the observations depends on the plausible variability of the unforced response (which we cannot estimate from a single realisation of the observed climate, but we could estimate it from the spread of the model runs from a perfect GCM).
When we compare the observed trend with the GCMs we are comparing ONE realisation of a chaotic process with the MEAN of a set of simulations of that chaotic process. Even if the model producing the simulations is absoultely perfect, there is no reason to expect the realisation we actually observe to be any closer to the MEAN than any of the individual simulations. Hence the correct test for consistency is to determine whether the observed trend could plausibly be a sample from the population of simulated trends (i.e. the standard deviation test).
As I have pointed out, an infinite ensemble of GCM runs from a GCM with perfect physics and infinite temporal and spatial resolution is almost guaranteed to fail the test used in Douglass et al. (2007). As someone that works in (a branch of) statistics, that seems to me to be absurd, if the test is reasonable, a perfect model should pass with high probability. Your response does not address that point.
The “standard error” test is the statistics cookbook solution to the problem of comparing means, but that assumes that the corresponding population means should be expected to be identical. In this case they should not.
Just a thought: why can’t the missing tropical hotspot be a side effect of a cell caused by the Hadley cell, call it the Stratocell, similarly to how the Ferrel cell is caused? (I’ll describe the NH, the SH is its mirror image.)
Whereas the Ferrel cell sits poleward of the Hadley cell, as a much smaller doughnut sitting on the ground at 30N-60N side by side with the larger doughnut at 0N-30N, the Stratocell is also at 0N-30N but as a very slightly bigger doughnut in the stratosphere encircling the Hadley cell (i.e. above it from the point of view of an observer on the ground at 15N looking up vertically).
Just as the Hadley cell drives the Ferrel cell like one gearwheel driving another touching it, so does it also drive the Stratocell. The Stratocell’s bottom just above the tropopause is driven poleward accompanying (and driven by) the poleward flow of the Hadley cell’s top.
As the top of the Hadley cell approaches 30N it finds territory getting scarce (decreasing perimeter of the increasing latitudes), so to keep Navier and Stokes happy it dives down and flows back to the equator.
The bottom of the Stratocell encounters the same problem but it can’t solve it by diving down the way the Hadley cell does because the Hadley cell is selfish: it needs every bit of room it can get at 30N, in fact the pressure there should be getting larger on that account. So instead the Stratocell solves its space crisis by shooting up where there is no opposition, then over the top and back to the equator.
So now we have one Hadley cell driving two neighbors like touching gearwheels, one beside it, the Ferrel cell, and one sitting on top, the Stratocell. (Actually 6 touching gearwheels altogether when you include the SH, or 8 when the polar cells are counted.)
For the duration of the Stratosphere’s ride where it is in contact with the Hadley cell it continually picks up heat from the top of the Hadley cell. At 30N this heated stratospheric air then rises bringing the heat with it, though losing temperature due to lapse rate. On the way back, with no further heat input, it loses heat. By the time it dives down to the equator it has become a refreshing breeze cooling what theory would otherwise have predicted would be the tropical hot spot.
Since this mechanism seems pretty obvious I assume it was considered and discarded decades ago on account of some fatal flaw, such as evidence against any significant poleward flow up there. Nevertheless I’d be interested in seeing the literature where this mechanism is discussed. And if there isn’t any it would be interesting to know what’s wrong with this theory.