Standard Deviation
Standard deviation is not a simple concept. But it's very important so don't give up. The formula used in the PMBOK for standard deviation is simple. It's just (P-O)/6. That is the pessimistic activity estimate minus the optimistic activity estimate divided by six. The problem is that this in no way shape or form produces a measure of standard deviation. So if this isn't really standard deviation, what is it?
The dictionary definition of standard deviation is something like "a quantity calculated to indicate the extent of deviation from the mean or expected value for a group as a whole." Expected value in this case refers to atypical distributions; those are distributions other than a bellcurve ex. a chi squared distribution. Sometimes it's more succinctly rendered as "the mean of the mean." It is the average of the squared differences from the mean.
Because our distribution has only 2 points it exhibits neither the qualities of a Gaussian curve, Beta distribution, Chi squared distribution, Poisson distribution, the Bernoulli distribution, the binomial distribution, the geometric distribution, or any other common functions. If you only have two data points you only have a line, not a curve of any kind. Using this formula we have only two values: optimistic and pessimistic. This part of the formula is similar to one used in investing. It's called HML: high minus low. Since these are your high and low figures from a distribution the difference between them is called an interval. By some definitions it's also the range. Dividing that by six just produces a figure for 16.6% of the interval.
So what does this figure actually represent? In the example below I describe a linear increase in variance between the optimistic and pessimistic figures in a set of 10 examples. This illustrates the relationship between the PM formula SD, and the optimistic and pessimistic variables.
Example A:
If O=1000 hours, P=1000 hours.
SD=(1000 - 1000 = 0) ÷ 6 or 0
Calculated SD = 0, Population
Standard Deviation = 0
Example B:
If O=900 hours, P=1100 hours.
SD=(1100 - 900 = 200) ÷ 6 or 33
Calculated SD = 141.42136
Population Standard Deviation = 100
Example C:
If O=800 hours, P=1200 hours.
SD=(1200 - 800 = 400) ÷ 6 or 66.67
Calculated SD = 282.84271
Population Standard Deviation = 200
Example D:
If O=700 hours, P=1300 hours.
SD=(1300 - 700 = 600) ÷ 6 or 100
Calculated SD = 424.26407
Population Standard Deviation = 300
Example E:
If O=600 hours, P=1400 hours.
SD=(1400 - 600 = 800) ÷ 6 or 133.33
Calculated SD = 565.68542
Population Standard Deviation = 400
Example F:
If O=500 hours, P=1500 hours.
SD=(1500 - 500 = 1000) ÷ 6 or 166.67
Calculated SD = 707.10678
Population Standard Deviation = 500
Example G:
If O=400 hours, P=1600 hours.
SD=(1600 - 400 = 1200) ÷ 6 or 200
Calculated SD = 848.52814
Population Standard Deviation = 600
Example H:
If O=300 hours, P=1700 hours.
SD=(1700 - 300 = 1400) ÷ 6 or 233.33
Calculated SD = 989.94949
Population Standard Deviation = 700
Example I:
If O=200 hours, P=1800 hours.
SD=(1800 - 200 = 1600) ÷ 6 or 266.67
Calculated SD = 1131.37085
Population Standard Deviation = 800
Example J:
If O=100 hours, P=1900 hours.
SD=(1900 - 100 = 1800) ÷ 6 or 300
Calculated SD = 1272.79221
Population Standard Deviation = 900
Example K:
If O=0.0 hours, P=2000 hours.
SD=(2000 - 0 = 2000) ÷ 6 or 333
Calculated SD = 1414.21356
Population Standard Deviation = 1000
I've graphed the relationships below.
I have left the mean in this data set deliberately. I did that to indicate that the range was changing but the mean and sum were not. Because those figures are constant you might expect either the calculated SD or the PM SD figures to produce a line parallel to the mean. They don't. Standard deviation isn't a constant. It measures variance from the mean so a high standard deviation indicates only that the data points are spread out over a wide range of values.
DEDUCTIONS:
The reason that the PMI SD figure trends upward is that it's a fixed percentage of the range. So as the range widens any fixed percentage of that number will also increase. Correspondingly PMI SD and the traditional SD both trend toward the pessimistic estimate. This means that with wider ranges both of these will exhibit bias toward pessimistic estimates.
The population SD trend line remains paralel to the pessimistic estimate curve. This is an unexpected proportionality that may have other consequences.
This pessemistic trend is a constant irrespective of which figure or figures are adjusted, it merely is tacking to the lower estimates. The cause of this behavior isn't complicated. It's because the sum is divided by six. If we had divided by two it would have exhibited a linear relationship ploting a course parallel to the optimistic estimate.