Mathematical models for newbies
Reader Andrew sent me his summary of the basics of mathematical models, which I think readers will find useful.
I have been devising mathematical models (simulations) of physical processes for over 20 years, and I just wanted to point out some of the basics that might help people understand these types of models:
1. The physics of the process (to be modelled) may be well understood, but although this helps it is somewhat irrelevent to the accuracy of all but the most simple model (although you will almost certainly not get a good model if you don't understand the physics). Nearly all computer models are based on mathematical formulae, commonly binomial expansions, that are representative of the physical situation. These expansions are typically of the form: A + Bx + Cx2 + Dx3 + Ex4 . . . and are truncated at some power of x (x representing the physical quantity under investigation, A, B, C etc. are calculated constants). I always tried to make it the x4 term, but this could lead to (in the 1990s) excessive calculation times (One commercial program, still in widespread use, truncates the series at the x2 term). Thus there is always a 'remainder term', or 'residual', which the model will (hopefully be programmed to) attempt to make an estimation of.
2. The problem, or 'domain', over which the model is to be applied (unless trivial) cannot be simulated as a whole. Thus it is divided into small regular shapes (squares, cubes, or more normally now triangles and tetrahedra, that are usually called 'elements') - a process known as 'gridding' or 'meshing', over which the (truncated) equations representing the physical situation can be relatively easily applied. Smaller elements usually produce more accurate results, but the computation time increases - and see 4 below. These can now be used to give a 'spot' or 'node' value for the physical quantity being simulated, for each of these elements (this is somewhat simplistic). A further mathematical process is now used to combine the results for all these elements. A single calculation through each element node within the domain is known as a 'sweep' or, more commonly, an 'iteration'.
3. Many iterations are undertaken until a programmed 'convergence' criterion is met. This is sometimes that the change in node value between one iteration and the next are all below a certain value, or that the residuals (see 1 above) are all below a certain value. This is generally known as 'convergence'. This process is somewhat easier for a 'static' situation where the physical values to be calculated are constant. If you add a time-based (dynamic) component, i.e. like the atmosphere, to the calculation it usually gets much more complex.
4. I hope it can be seen that this process is 'absolutely riddled' with scope for errors, incorrect assumptions, and erroneous simplifications. Not only that, the whole process can become mathematically unstable, due to interaction between the various steps, leading to the calculations 'exploding' to infinity, or crashing to zero. This is a particular problem with dynamic situations where the calculation 'time-step' can interact with the mesh/grid spacing, leading to the whole model 'falling-over' or collapsing.
5. Even if the model does converge to a solution - it does not mean that this is a correct (or accurate) one. In another commercial program (to that in 1 above), users are warned that an incorrect choice of the element type to be used - can lead to solutions that are up to 2000% (yes, two thousand percent) away from the correct value. One big problem with ascertaining the accuracy of computer simulations is that you generally have to have some idea of what the answer should be, so that you can compare the calculated solution.
6. Bear in mind that the process (simplistically) outlined above must be undertaken for each physical attribute being investigated, and it can be seen that this is a hugely non-trivial problem (for an atmospheric model).
7. In research work I have found that computer models are a very useful tool for qualitative analysis, but much less so for accurate quantitative analysis. The models I have worked on have generally been used for automated process control - and invariably these require 'calibration' or 'tuning' to real world measurements. Furthermore, these process control models are made so that any calculated solution outside the physically measured range is 'viewed with suspicion'.
Reader Comments (90)
Mar 20, 2012 at 2:27 PM Don Pablo de la Sierra
Well, I always thought that double-precision arithmentic had only one use - to tell you how many digits of your single-precision results had any meaning and were not simply accumulated rounding error.
And let's not ignore the fact that modellers fall in love with their creations all too easily...
http://progcontra.blogspot.co.uk/2011/08/i-love-my-model.html
Thinking Scientist
I have some prior experience of helping to construct the geometry and reservoir parameterisation of reservoir models as a petroleum exploration/development geologist, but their limitations were clearly brought home to me when we were trying to get a subtle exploration play drilled on the flanks of a major field. That field had a well established detailed reservoir model which matched production history and well performance and seemed to perform faultlessly for years. Active aquifer drive was present and the model insisted that aquifer drive swept right across our flank exploration prospect. We had extreme difficulty ever getting it approved for drilling by management and partners due to the development team's reservoir model interpretation. It was only by the serendipitous good luck eminating from a complete rig contract cock-up lumbering us with an immediate rig slot commitment that otherwise would have invoked stiff cancellation charges that our seemingly pre-condemned prospect ever got drilled. It proved to be one of the most important new discoveries for the company in years in that basin.
Reading the above suggests that there is considerable modeling experience at large in the world - with models whose "forecasts" have been tested against "reality." It might be interesting to know whether any modelers with the sort of experience identified in the comments above have ever dipped their toes in climate modeling.
Maybe that is an art developed in parallel and without benefit of any of this other experience.
Hi Pharos,
I like that story - I have the reverse story where a field was appraised on 2D seismic with just a few wells. It was a low relief, thin gas reservoir, typically 20 m max thickness, with maybe 10 - 20 m gas column. The reserves were calculated as 750 bcf. The field development plan was approved, the gas processing contract signed and concrete was poured. The first production well was drilled in the middle of the field at the highest point on the structure and had just 1 m of gas in it. Second production well in the north had very poor sand, third production even further north - no sand. Overnight the reserves dropped to 175 bcf and heads rolled (not mine I might add - this cock up preceded me!).
No model ever survives first contact with reality.
John Shade
The programming problems are probably worse than that. Error densities in code have long been studied in aviation to work out acceptable levels of safety in aircraft computers. One of the most important solutions to the impossibility of writing completely error-free code was to have multiple redundancy. Critical computers in aircraft might have triplex redundancy, i.e. a total of four computers. To avoid each computer failing for the same reason they are often programmed by separate teams, even by different sub-contractors.
Unfortunately a study of these suggested that the different teams were often making the same or similar errors.
This is of course relevant to climate "science", given that the only argument each team has left is that others have found similar results using different models - a weak argument in itself, fatally undermined. It appears that the nature of human fallibility, in that if we have all been taught the same basic techniques our errors are likely to be similar (a phenomenon not unknown to pilots, who study past crashes lest they make the same mistake), invalidates that one argument.
There are other fundamental problems other than what is in this article.
The first one I would sadly say is that complicated computer models on which a lot of money have been spent impresses a lot of people, especially educated people.
This is one of the main reason that so many people have been convinced that CAGW is about to happen.
The other reason is that reliable computer models can not be based on assumptions and that all aspect must be included and understood. This of course includes data from all types of solar influences, ocean cycles, ENSO and correct modeling of clouds and cloud convections. Unless these problems are addressed there is no way that more powerful super computers can add any type of improvements to these models.
Because it is not likely that this information can be understood, computer models can’t predict the future. They can only be used as experimental tools.
The best way to understand climate change is to look at historical data, build theoretical models and to do accurate measurements in the real world.
Martin A
That is one use, but unfortunately few people think of using quad (128 bit) precision to do the same for their 64-bit calculations. Most modern computers can do trillions of calculations in a second. My little desk-side PC has four cores running 3.5 gigahertz with a 64-bit instruction set, and a mere 8 gigabytes of dram memory. All for about $700 USD. It wouldn't take very many seconds for that little toy computer to overload a poorly written computational program even in 64-bit precision. At which point I would not be worried about the number of significant digits, but the sign of the answer.
Don Pablo, I've just been running some calculations that fell over at 200 digit precision. They mostly worked fine at 500 digits (I can also run some of the same calculations fully analytically, so I have a half decent check). Variable precision arithmetic is somewhere between a marvelous thing and a recipe for terrible programming.
In early 1970s there was a serious attempt to quantify uncertainty of numerical algorithms. Prof. Nickel of Karlsruhe University introduced "triplex numbers", in the form of [lower bound, standard floating-point result, upper bound]. Operations on lower bounds had to be rounded down, operations on upper bounds had to be rounded up (actually it was more complex), so the computation was slowed down severely.
Maybe climate models should be carried out in triplex numbers? The range of parameters would be represented very naturally. The cost would not be in trillions.
Gerald Quindry, thanks for reminding me of the Hamming quote,
"The purpose of computing is insight, not numbers."
One of my favorites. I remember seeing it in a mathematical modeling class in the '70s. I told one of my profs about it, and he thought it was exquisitely apropos--he was just grading a homework assignment in a fluid dynamics class where one student had handed in nothing but a 2-inch thick stack of IBM fanfold printer output. No written problem statement, methodology, discussion of results or anything.
The prof gave a score of zero and appended the above quote.
Several years ago I saw a civil engineer on TV (after scenes of bridges collapsing, airplane wings falling off and so on) describe structural engineering as--
"the art of taking materials whose properties we don't understand, forming them into shapes we can't analyze, and subjecting them to loads we can't predict--and doing it so successfully that the public doesn't know how ignorant we are."
The same difficulties probably apply to climate modeling.
Jonathan Jones
All one needs to do is make sure all the data has a magnitude of one with scaling -- something that is almost never done today. Indeed, almost nobody even thinks about the magnitude of their data and happily cross multiply numbers like 154,569,907.00009 by 0.0000000000000897654. That would never work in 32 bit floating point and only a short time in 64-bit.
Of course, given that some of the Climate Scientists™ can't use Excel, I have to wonder about their model codes.
Don Pablo, the calculations I am doing are somewhat unusual. One involved evaluating a function which when described by a Maclaurin series has a constant term and then no further terms below the 486th power. Not physical modelling; a bit more abstract than that.
The IPCC models are bunkum because the basic physics is wrong. Thus instead of using the correct approach which is to place convective plus radiative cooling of the ground equal to the SW energy input as I and all other process engineer trained people are taught, they assume radiation at the S-B level for a black body in a vacuum, add convective and evaporative to that and then make it up by imaginary 'back radiation'.
What's worse, the pyrgeometers that measure the DLR do not even measure the true DLR in equilibrium with the ground IR emission. [It's because the temperature of the body of the instrument is the local air temperature, not the ground temperature.]
As I get into the details of oxymoronic climate science, I see that it is riddled with scientific failure after failure, a subject created by amateurs and number crunchers without a clue about science or engineering.
Frankly, because this is just one of the four scientific failures, it needs closing down then rebuilding from scratch and those who authorised scientific fraud put in jail.
Until a fractal model is developed we'll never even approximate reality. The computing power required is too much though - unless we get another 5 million ps3's out there and linked
Mar 20, 2012 at 4:11 PM | Per Strandberg
Lots of very good points have been made on this thread including those from Per.
My modelling background comes from studying Geophysics and Oceanography before going on to using climate models as a postgrad. The inverse modelling issues explained here are very prominent in Geophysics. ie a basic example,a gravity survey shows something like a uniform bell curve like high gravity anomaly in a region which could be interpreted as an anticline of higher density rock which could potentially be an oil resevoir. You can compare with other data and observations and the local geology to see if this is a realistic 'model' as due to the inverse modelling involved there are an infinite number of possible solutions (ie it could be a mini black hole a kilometre down creating the gravity anomaly). The average Geophysicist at least gets their model 'ground truthed' by drilling holes to see what is really down there and if they get the predictions wrong too many times they won't have a very long career. Also drilling a hole will help inform where the model was wrong in the first place.
The problem with GCM's explaining the physics is that this is really explaining that we understand the Navier Stokes equations for fluid dynamics. Several people have pointed out how even this has lots of problems in modelling even knowing the exact equations being solved. The problem is that the fluid dynamics part is largely irrelevant in predicting 2 degree warming in 100 years time as that is being generated by the energy exchange part of the model. I reckon you could just put a diffusion type term for temperature in place of real dynamics and the answer won't change. Literally the models are parameterised with a more CO2 implies more energy is trapped in the system and is the only real thing that matters.
With regards hindcasting I'm sure someone could take a GCM and put in a solar/Svensmark type parameterisation, using the 'appropriate' forcings and hindcast the 20th century well. Obviously it wouldn't show whatsoever whether the theory was any good.
As a point of discussion what do people think of the http://climateprediction.net/ experiment?? To be this is the ultimate in curve fitting with no predictive ability that you could get. The documentation on the site explains the experiment pretty well. They are tweaking 20 'most unknown' parameters in a hindcast to then find the best values to predict the future. Most of these are water vapour/cloud based which isn't surprising as we don't know how they work and also happen to be critical in the energy balance calculation. Probably my biggest concern is the aerosol type forcing used in the hindcasts as that is basically making up data to force the models to match reality without any real basis to do that. I'd guess any volcanic aerosols put into the model will cause some cooling which is probably necessary to get any significant cooling in a CO2 driven model where CO2 is always going up. As pointed out before you could equally well but in some (fudged if necessary) solar forcing instead of aerosols to help in the hindcast. So much if these forcings and parameterisations are just so arbitrary.
RE: Rob Burton:
" The average Geophysicist at least gets their model 'ground truthed' by drilling holes to see what is really down there and if they get the predictions wrong too many times they won't have a very long career"
Absolutely. I am a geophysicist and my job is to make predictions about the subsurface from non-unique seismic inversion. Many of my methods are similar to things like proxy surveys with tree rings. I need cross-correlation with what I know (well data) against something which is effectively the first derivative (seismic reflectivity). I then make predictions at unmeasured locations using a model/inversion method. Up to this point the similarities with say climate paleo studies and GCM's are very profound.
The fundamental difference is that my predictions are then tested by drilling (a sometimes very expensive) oil well. This is the part missing with climate scientists - they wave their arms about making all sorts of fatuous statements about how they know their model is right and what the uncertainty is but they have no experience/expertise in this area. The only way to test their predictions is to wait 15 - 30 years and see what happens to "global temperatures". Based on predictions made so far, its not looking too good.
And like you, although I am a geophysicist I actually studied this on an Oceanography course - UCNW.
Rob Burton,
"The problem is that the fluid dynamics part is largely irrelevant in predicting 2 degree warming in 100 years time as that is being generated by the energy exchange part of the model. I reckon you could just put a diffusion type term for temperature in place of real dynamics and the answer won't change. "
Sorry, but no no no no no. Energy is not "trapped in the system".
If you warm air, it rises, and that's fluid dynamics. One of the main weaknesses of climate models is that they don't handle this correctly. Most climate models use a crude fudge for this based on a paper from the 1960s (see regular blog comments from 'quondam'). To get the earth's surface temperature right you need to get all the heat fluxes right, including that due to convection. This may be one of the reasons why the models overestimate the warming.
Jonathan Jones
My advice was for those using floating point hardware instruction sets. If you are doing work like what you are doing, you should be using infinite precision software, which can be used to calculate pi out to several billion places. Years ago, I used SNOBOL for that sort of work, but I haven't heard mention of it for decades. However, I hear from time to time people have added another million or so significant digits to pi, so obviously such software is available.
Enjoy your computer fun. I have a IBM 370/195 VMS system emulatior running on my PC. Full bore emulation. It runs the instructions set, OS, VMS, application code, etc, and uses my hard disk as tape and disk storage -- it is set up at the biggest 195 I even saw, 4 million bytes memory, and 50 million bytes of disk space. The original would fill a large house and require the power of a small town. Now it fits in the back side of my quad core system and hardly takes any resources at all.
The only problem is my iCore 7 based system still runs it too fast. Also, faking all those punch cards is a bit of a bother, but I have a model 029 emulator as well.
Bring back memories of my youth, it does.
Don Pablo, that's pretty much what I am doing: Mathematica has variable precision arithmetic built in. Very handy, though it does make you a bit lazy.
I'm old enough to remember using CP/M and Wordstar at home, and my 4th year project was done on a Norsk Data machine running Sintran (but I never had to learn PLANC). Moved to a SPARC Station 2 for my doctorate and never looked back.
Mar 21, 2012 at 9:54 AM | Paul Matthews
" Sorry, but no no no no no. Energy is not "trapped in the system". "
I totally agree with you Paul. In my simplistic view any attempt to heat the atmosphere is conteracted by the atmosphere itself in trying to convect that heat away from the surface in a attempt to cool itslef down again. I think the models have the heat fluxes incorrect too.
To be honest I still don't really understand where the 'basic science' about 1 degree heating for a CO2 doubling comes from. It seems to come from a theoretical heating of the upper atmosphere then projected down the lapse rate from the tropopause, which include the 2 massive assumptions that the tropopause height and lapse rate don't change. I've still not read a good simple explanation that most importantly includes convection as well as radiation in the explanation.
Again in my simplistic world the atmosphere does everything it can to get rid of any increase in heat at the surface by moving it upwards and towards the poles. Any theoretical increase in surface temperature will just increase the speed of this 'heat engine'.
Don Pablo,
I have a purchase proposal sent to my father by IBM when he was at British Aerospace. The proposal is for an IBM 360. I thijnk the date must be the end of the 1960's or 1970. My father has placed a handwritten note noting the spec for the IBM 370/145 in the front which is dated Sept 1970.
The most incredible thing is the cost. The full price to rent the system per month is quoted as £4,142 15s (fifteen shillings - pre-decimalisation). The outright purchase price is £194,672 5s in 1970!
And the 2314 Multiple disk drive unit is a further monthly rental cost of £2,109. The disk storage was eight drives plus spare to give a capacity of 200 million bytes (!) with a data transfer rate of 312000 bytes per second. The disk drive is, as you say, large - substantially bigger than a large chest freezer!
ThinkingScientist
I forget what the "rent" to run the 360/65 I used at Cornell was, but several hundred dollars per CPU minute. I use to work in the computer center as the statistical consultant -- an interesting job for a young man with many, many female graduate students desperate to get their data analyzed -- but more about that in the pub some night.
We played games with the machine. One of my favorite was playing songs on the 1401N1 Printer. Another was seeing who could get one of those washing machine disk drives to walk the furthest across the computer room floor with repeated disk seeks.
Oh, to be young again and have such expensive toys. Not quite a Ferrari, but then I didn't have to fill the tank.
All these young ones with their iPads iPhones and Facebooks. They have NO IDEA what fun they missed. All with machines that would today costs many millions of dollars, euros or even pounds.
1403-N1 -- getting old
500 million bytes -- not 50 million bytes -- damn, age is a terrible thing! At least I got the million right.
The largest computational conceit of the climastrology modellers seems to me to be the assumption that the errors of interacting variables cancel out, instead of multiplying. As though combining two 20% likelihoods of error gave you 10% error bounds, instead of the actual 44% or so. Given the number of such factors they play with, I think their "predictions" should be something like 1° warming +/- 10°. ;)
The units in the above are °Kevin, of course.
Don Pablo;
The physically largest machine I ever worked on "hands-on" was a Honeywell card-and-tape setup at AECL in Chalk River, as summer student job in the very early '60s. Lots of UR work with program and data decks before you finally got to compile a program deck, etc., etc. Fortran II and Algol, IIRC. Lovely fun!
Correction: APEX, a local upgrade of Algol by a certain Dr. Kennedy & assoc. now that I think of it. Nicest language I ever worked with.
Does anyone remember dropping the 3" stack of 80-column punchcards that was their Fortran program, and having to pick them up off the floor and get them back into order again ?
punksta;
Oh, yah! Off to the Unit Record sorter to get them back in sequence. Sort on least significant card index digit first, then next, etc.
:D
Mar 22, 2012 at 9:52 AM | Punksta
Oh yes, I dropped a 30 cm stack of punchcards for my integrated circuit simulator.After many tries, I found out that the PhilPac simulator did not represent reality. I found the fault cause I was looking for by physically measuring the IC with thin pins and a microscope, "pricking" we called it. You could do that in those days, chips of 3.5 mm2. But finding out that a model/simulator is useless to find out something that is not programmed into it already was a lesson I never forgot.
I probably did drop my cards, but I can't remember any instances - this was way back in the early 70s. Our lab was in Gloucestershire, our big computer was in London, and so roughly on alternating weeks, you could send off by van your stack of cards in a small cardboard box, and get back your green fanfold outputs in a big one. A great deal of waste paper was generated by this, although small amounts of it were used for notes and other preps for the next stack of cards. If you were in a hurry, you would send the cards punched with your latest version and hand it over for the van, before you got the box with the output from the previous one.
Good stuff. Thanks for sharing.
What a great thread which I've come to very late due to other concerns. My own experience of the software/hardware boundary includes some punched cards on 'legacy' IBM mainframes even in the early 80s. I began at school around 72 with a dail-up punched tape interface to the computer at a local polytechnic. Once I got the hang of programming I thought up a method to work out the square root of 2 to as many digits of precision required. Then I learned Newton and Raphson had got there before me. And those guys had really primitive forms of input-output. Fun times.
Thanks to the many 'real' modellers who have contributed here. As well as loving Dave Bob's story of the usefulness of Hamming's dictum the two most helpful paragraphs for me are from ThinkingScientist:
I think we will need 30 years or more of close correlation with measured global temperature anomaly to have any level of confidence at all in GCMs. Of course we may not need anything like that to reject them as useless for all practical purposes. And in the interim they should be assigned a probability of zero of being able to inform policy makers of anything at all. It's the last point that is not understood by politicians and the 'precautionary principle' crowd advising them. In science the precautionary principle says take no notice until there's been a genuine attempt to falsify. We desperately need our leaders to grasp this.
Punksta,
I never dropped my own card deck, but did do it as a prank on a fellow grad student with his thesis project "deck" I had made of scrap cards. His expression was so heartbreaking that I had to immediately come clean.
Dave Bob;
In relief, did he pound you into the ground like a tent peg?
>;-p
Brian H,
Though I may have deserved it, I recall no actual or threatened physical violence or other retribution.
(After all, he wasn't a climate scientist!)
Dave Bob;
I was more thinking of it as a convenient way of flushing out his adrenaline surge. ;)
I read your blog.I like your blog.Thanks alot.