Basic statistics.
My son, although an I T specialist, says he cannot follow the maths of my note on Value At Risk. If he can’t, perhaps neither can others. That is important not just for the business of VAR so I’ll try to explain in as simple a way as I can. This account should not be difficult to understand but you will need to follow step by step. If you have small mathematics the concepts may appear strange and at the start it may seem all a silly game.
Statistics can mean either the collection of data or its analysis. I have a book of statistics that gives for example birth rates for different countries, the size of the prison populations, the number of Nobel Prizes and so on; this is sometimes rather disparagingly called bean counting. I also have a book called Statistical Methods that is about the application of certain branch of mathematics. I shall deal only with statistics as mathematics.
In statistics the whole group of whatever we are dealing with is called the “population”; often it is literally a population of people but it could be say all the apples in cold storage. In statistics the “population” need not be of items at all; it could be a population of averages. But if the population is of items and if your bean counters have counted them all, mathematical statistics simply doesn’t feature. For example, if your firm decides you should all wear company T shirts and it wants to know how many to order of each size, it is likely to ask everyone whether they are small, medium, large, extra large. Every one in an army is measured so you know what number of which size of boot to have in stock. To try to theorise about how many of each size you “ought” to require is plain daft since you know the answer. With VAR you know the daily changes of the stock market for the last 100 years – all of them - so theorising about what they ought to have been according to some statistical model is idiotic. The whole point of statistics is to tell you something about a population when you only have data for a fraction – a sample.
Statistics started in 18th century France with the study of games of chance. The aristocracy was very interested in gambling. One important consequence was that the idea of something totally unbiased came naturally although there is no such thing in the real world. To suggest a die was biased was to accuse someone of being dishonest and that meant pistols at dawn.
So you have to accept that we start by thinking about a completely unbiased coin tossed in a totally unbiased way. There is no such thing in practice. You may already be inclined to say, “What can tossing mythical coins have to do with the real world?” but stick with it; you’ll be surprised.
If you toss such a coin it is equally likely to fall heads (H) as tails (T); or to put it a bit more technically the probability – chance if you prefer- of a head is one out of two, namely a half - as also is the probability of a tail.
One point here that I would like to get in is that there is no such thing as the “law of averages”. If we get nine heads one after the other the chance of the tenth being a head is still a half. The coin can’t remember it has turned up heads nine times and say to itself “Gosh its time I came down tails”. Coins are not like that. Of course ten heads in a row is very unlikely but that is viewing the whole sequence after it has ended. Each tossing has nothing to do with the previous tossing’s – each one is “independent”. But not every item is independent in many sequences. Anything human beings decide is likely to go in phases. Winning teams tend to go on winning partly because they expect to win – the confidence business. This matter of independence will become important.
Now suppose we toss the coin twice we might get 2 heads – the probability is as I trust you can see a half times a half –that is a quarter. In fact the chance of getting any result is a quarter- the chance of two tails; the chance of a head followed by a tail; the chance of a tail followed by a head. So we have 1/4HH,1/4TT and 2 /4mixed HT/TH.
Now do 3 tossings; we get
1/8 HHH; 3/8 (HHT/HTH/THH);3/8 (TTH/THT/HTT);1/8 (TTT)
This goes 1-3-3-1.
It gets tedious writing all the combinations down but of course there is a formula. It is called the binomial series-series is the mathematical term for a sequence produced according to some rule. The formula was discovered by the Frenchmen Bernoulli at the beginning of the 18th century.
If we do 8 tossings we get
1-8-28-56-70-56-28- 8- 1
This is 10 tossings and you can see it is beginning to look like the bell shaped curve you will have seen often illustrated.
1, 10, 45,162, 252, 162,45,10,1
If you went on increasing the number of tossing’s the curve would get smoother and smoother. Here comes in that splendid phrase that occurs in all maths books; It can be shown that (ie you wouldn’t be able to understand) it would eventually become an equation. The equation is a negative exponential. You may not know what that means and that does not matter. (The exponential e is in fact very interesting. Like pi the ratio of the radius of a circle to the circumference, however many decimals you calculate for it you can always do one more. Pi is 3.1412etc etc e is 2.7 etc etc. When people say something increases exponentially they rarely know what they are talking about. e is important in calculus so we’ll leave e alone.)
We can draw an equation in x and y as a graph. y=x would be a straight line through 0,0 at 45deg; when y=1, x=1; y=2,x=2 and so on.
Our negative exponential is the precise bell shape and is somewhat unfortunately called the normal (usually “distribution” is added but that is not important).
The normal is completely fixed by the mean and something called the variance. There is only one normal for any particular mean and variance; the mean just shifts the normal to left or right. We get the variance by squaring the difference of the values from the mean. Look at our 8 tossings the mean is 70 . 56 is 1 unit from the mean, 28 is 2 units, 8 is 3 units and 1 is 4 units
So we get 70*0 =0 + 56*(-1)*(-1)=56+ 28*(-2)*(-2)=112+8*(-3)*(-3)=72+ 1*(-4)*(-4)=16 making 256.There is also the other side making 512 in all. Remember minus*minus =plus. So the variance is 512. The square root of the variance first divided by the number of tossings, is the Standard Deviation usually shown by the Greek letter sigma - an o with a little streamer at the top. By the way the mean – strictly the arithmetic mean what most people think of as the average there are other means- is indicated by the Greek letter mu , a u with a tail at the front. We made 256 tossings. (70+2*56+ 2*28+2*8+2*1). In our case the S D is sq rt 512/256= sq rt 2 which is about 1. 4.
The uses of the normal.
The normal crops up in various situations. How you use the normal depends on the situation.
The normal gives information about the spread of the items that compose it. Because the S D defines our normal the spread is given in S D units. 1/3 of the items are more than 1 S D from the mean; 5% are more than 2 S D from the mean and 1% more than 3 S D.
Now let us look at the normal and the S D in practice. As I said at the start statistics is about saying something about a population from a sample – about everything from relatively few. Your sample must be a “fair” one. I think you can see more or less what fair means although making a sample that is fair is not as easy as you might think.
In fact the normal applied to individual items – like say heights of men- is of little use. No statistician would ever look to the normal to judge how many men were over 6 ft tall. To do so would be perverse. The best guide is the fraction of men in the sample over 6 ft in height. The heights of men - and of women – are in fact very close to the normal. That is interesting but not useful. To use the normal we would have to assume that the population of heights of all men is exactly normally distributed even if our sample heights are not. But our only evidence of whether the distribution is normal comes from the sample! It would be daft to prefer a model for which there is no theoretical basis, over the observations. Even then we would have to assume that the mean and S D of our sample is very close to the population values. There is no way of estimating the reliability of the S D of the population values. Clearly the larger the sample (from a normally distributed population) the more accurate the estimate of the S D- but there is no way of knowing by how much. The S D will not get smaller as the sample size increases. It might get a little bigger or a little smaller but essentially the S D of the population does not change with sample size. (When we add an observation we increase the variance but we then divide by one more than before.)
In fact statistics is not used to judge how often a rare event will occur. Statistics argues the opposite way. If a normal was for some unfathomable reason fitted to stock exchange movements no statistician would say that once in a thousand days a fall of X can be expected. The statistician would say “If there is a fall of X there is only one chance in a thousand that the observation came from a normally distributed population with mean mu and S D sigma.” That is the observation is a test of the assumption of normality. If you think about it that is quite sensible. If you get 10 heads in a row you are going to say, “I don’t believe that comes from a normal population with mean zero”. You would not use those words. You would say, “I don’t believe that is an unbiased coin and you are tossing it fairly.” Or more bluntly, “I reckon you are cheating.” Back to pistols at dawn.
So if the normal is little use in understanding the distribution of individual items what use is it? It is very useful with the mean. I will quote my bible - Snedecor’s Statistical Methods –
“Standard deviation of sample means.
How accurate is the mean of a sample in estimating the mean of the whole population? Work on this has produced the most exciting and useful result in the whole of statistical theory; it is part of every statistician’s stock in trade”
If we take a number of samples the means (of the samples) will be normally distributed about the population mean – the true mean – even if the population is not normally distributed. You can see intuitively that this is likely and it can be shown formally by something called the central limit theorem.
We can then calculate the S D of the estimate of the mean. This is often called the Standard Error (S E) and I think that is useful since you need to be clear about the S D of the population which is not much use and the S D of the mean which is very useful.
The S D is sq rt of the variance first divided by n, the number in the sample.
The S E is the S D/ sq rt of n. So your estimate of the mean gets better and better as the sample size increase. Make the sample 4n and you halve the S E.
In statistics technically we argue back to front but what it comes down to is that the true mean is the sample mean +/- the S E. That is there is a 66% chance that the true mean lies between one S E more and one S E less than the mean of our sample – and a 95% chance of +/- 2 S E.
You can do other things with the S E. We can compare the means of different populations; you could decide whether Englishmen were on average taller than Frenchmen. More precisely whether the means were “significantly” different. But a word of warning; significant is an unfortunate term, it means merely statistically significant not important. We might decide that 0.5 cm difference in English and French heights was unimportant but if we took large enough samples we could show that such a difference was statistically significant; that is that the difference was almost certainly real and not just the luck of the samples. And remember the mean is an abstraction; it will rarely apply to an individual. All Frenchmen are not shorter than all Englishmen. You will say “Of course not” but suggesting that blacks might be on average less intelligent than whites or that women on average are not good at physics, gets you into hot water. Even if both are true there are academically brilliant blacks and some fine women physicists. It all depends on the distribution of individual values and stats doesn’t have much to say about that.
I don’t wish to take matters further. Statistics is used by scientists, sociologists and economists, who have no statistical training. The result is that too often statistics is used like a cookery book and they the wrong recipe is chosen. You can’t know which recipe might apply unless you know the problem. I shall from time to time try to deal with cases where the wrong recipe has been used.