# Why Project Costs are "Expected" to Overrun

Copyright © Harvey Wilson - Katmar Software
September 2015

It seems that more often than not, final costs on capital and construction projects overrun the approved budgets. The reasons (or excuses?) given for these cost overruns will usually be something like

• Inaccurate or overly optimistic initial estimates
• Movements in "the economy"
• Poor project management
• Poor cost control
• Scope creep
• Incomplete or inaccurate engineering

Indeed, an internet search for "causes and reasons for project cost overruns" will give many articles discussing these causes and a great many more. These reasons are valid in some cases, but the most important reason for project cost overruns, and sadly the cause that is virtually never mentioned, is that there is a fundamental flaw in the way most cost estimates are put together. The explanation for this built-in error depends on some elementary statistics, which are easily understood and which can easily be applied to eliminate the cause of the errors.

This short article will explain the reason why these errors are built into our project cost estimates, and how they cause the cost overruns. It will also discuss the statistical tools available to rectify the situation and prevent the cost overruns. The explanation uses simple, intuitive examples and no complicated statistics or mathematics is required to understand the reasoning or to apply the fix. Because readers will come from a wide variety of backgrounds the pace is quite slow - please bear with us if you have had some training in statistics.

Let us start the example by introducing a single cost element of a project, which has a "best estimate" (or Most Likely Cost) of \$10,000. In all project cost estimates at least some effort is made to attach an accuracy to the estimate. A good way to formalize this is to introduce the concept of "range estimating". In using this technique, rather than simply quoting the Most Likely Cost a bit more information is recorded and a Low Cost (or Minimum Cost) and a High Cost (or Maximum Cost) are also noted. The final bit of information that is required is to note how the probability of the cost estimate varies between the Low Cost and the High Cost.

While there is a whole branch of statistics devoted to probability distributions, for our purposes here it can be kept very simple. We will assume that the cost probability follows a triangular distribution, which is illustrated in Figure 1 below. Using this distribution makes the math easy, and the logic and reasoning used are equally applicable to any of the more complicated probability distributions available.

Figure 1 - Triangular Probability Distribution

In a triangular distribution there is zero probability of the cost being lower than the defined Low Cost. The probability increases linearly from zero at the Low Cost up to a maximum probability at the Most Likely Cost. And then the probability decreases linearly down to zero again at the High Cost.

Following this method, we can now refine our estimate for the single item in our example. We have already specified the Most Likely Cost to be \$10,000. For the purposes of this example we will say that there is no chance that the cost could be more than 10% below the Most Likely Cost, and we define the Low Cost to be \$9,000. In most cases there is more chance of over-spending than of under-spending so we will follow our intuition and say there is no chance of the cost being more than 50% greater than the Most Likely Cost. This makes the High Cost \$15,000.

What does this Probability Distribution really mean? We will get to some examples to illustrate it in a minute, but let's try to put it into words first. What it tells us is that if there were some way that we could repeat the purchase of this item over and over again, the frequency of the costs that we would have to pay for this item would follow this distribution. We have to imagine that each purchase is independent of all the others - think of the purchases as occurring in parallel universes and not influencing each other at all.

Now we can look at a numerical example. If we were able to feed all these costs of the repeated purchases back to a central point, and accumulate them into price bands and plot a histogram showing the number of purchases that fell into each price band we would get a graph that looked like this:

Figure 2 - Monte Carlo Simulation of Triangular Distribution

The tallest bar occurs at \$10,000 and this confirms that the Most Likely Cost is \$10,000. For price bands below \$10,000 we find that the frequency (or probability) of the project cost falling into that band decreases as the cost decreases and we never find a total project cost of less than \$9,000. Similarly, the probability for each band decreases as the project cost increases above \$10,000 and we never find a total cost greater than \$15,000.

This process of simulating the project over and over again is known as Monte Carlo Simulation. The name obviously comes from the casino where the ball in the roulette wheel falls into one of the 38 identical slots - effectively generating random numbers between 1 and 38. In the roulette wheel example each slot has an identical probability of 1 in 38 (2.63%) of catching the ball. This would be an example of a uniform probability distribution, but we want costs that follow a triangular distribution. With a little bit of mathematical tweaking we can convert a uniform random distribution to any distribution we want, covering any range we want, and this is what we have done to generate the costs that fit the triangular distribution in the example above.

If you are interested in how this mathematical tweaking is done, please download the free trial version of Project Risk Analysis and read the section on Monte Carlo Simulation in the program's Help.

The only other point to note at this stage is that we have marked the Mean Cost of \$11,333 on the histogram. The Mean Cost is what is known in everyday language as the Average Cost. It is calculated by simply adding together all the individual costs and dividing by the number of times the simulation was run. The fact that the Mean Cost is not the same as the Most Likely Cost is important, but we will come back to this a bit later. The Mean Cost is known in statistical terms as the Expected Cost, which is a hint as to why this article has the title it does.

All this talk of parallel universes running random imaginary projects may be a bit too nebulous for some, so let us look at a more concrete and intuitive example.

In the graph below we have plotted the weights of a group of 16 year old boys in a similar way to what was done with the costs above. We have defined weight bands of 5 kg each and then allocated each of the 500 boys' weights to the applicable band. The graph below shows the height of the bars in terms of the number of boys that fell into each band, but these numbers could easily be converted to percentages to put the graph on the same basis as before. The tallest bar tells us that the Most Likely Weight amongst these boys is 64 kg. However, we can also work out the Mean (or Average) Weight by adding up all the individual weights and dividing by 500. The Mean Weight turns out to be 75.9 kg. Hopefully the fact that the Mean Weight is more than the Most Likely Weight is intuitively obvious because our experience tells us that there is more scope for a boy to be over-weight than to be under-weight.

Figure 3 - Probability Distribution for 16 y/o Boys' Weights

An example of how this data could be used, and which highlights the importance of the difference between the Most Likely Weight and the Mean (Average) Weight, is to consider an airline calculating how much fuel they need to fly a plane with 300 16 year old boys. Should they base their calculations on 300x64 kg or 300x75.9 kg? While it is true that there will probably be more boys in the 64 kg band than any other band, the total weight of the boys is likely to be close to 300x75.9 kg and this is the load the airline needs to consider (probably with a bit of a safety factor added in).

Now we can return to the project costs that we are actually interested in. We will make the cost example more realistic by now having 10 cost items in our project, instead of the single element in the first example. However, to keep the math easy we will assume that while the 10 items are all different from each other, they all follow exactly the same cost distribution. We will take the cost for each item to have a Low Cost of \$9,000, a Most Likely Cost of \$10,000, a High Cost of \$15,000 and each to follow the triangular distribution.

It is important to note that the 10 items are all different, because we must allow the cost of each of the 10 items to be able to move independently of the others. We will now apply our Monte Carlo Simulation to the project of 10 items. Just as we did for the single item, we will "purchase" the items over and over again in simulated versions of the project. In each iteration we would generate costs for each of the 10 items, and then add the 10 costs together to get the total project cost for that iteration.

We do this thousands of times and accumulate the total project costs into price bands just as before. The graph below shows what the probability would be for any particular total cost for our 10 item project.

Figure 4 - Overall Cost Distribution for Project of 10 Items

A point to note is that although the individual cost of each item followed a triangular distribution, when they are lumped together the overall distribution is more like the traditional Gaussian or Bell Curve, known in statistics as a Normal Distribution. Again, if you are interested in the detail of how the lumping together of the costs changes the distribution shape, please download the free trial version of Project Risk Analysis and read Lesson 2 in the step-by-step example in the program's Help.

The second point to note is that if we had estimated the total cost of the project as the sum of the Most Likely Costs for each of the elements we would have had a total estimate of 10 x \$10,000, or \$100,000. From Figure 4 above it can be seen that there is a probability of only 0.01% for the total cost to be \$100,000 or less. A project estimated on this basis has virtually no chance of coming in on budget.

The third interesting point is that the highest bar occurs at roughly \$112,700. This is very close to the sum of the Expected Costs of the individual items, which we saw to be \$11,333 each in Figure 2. The small difference between the middle of the highest bar being at \$112,700 and the sum of the individual Mean Costs (which would be \$113,330) is mainly because the width of the bars makes it difficult to read exactly where the highest point is. In fact the Monte Carlo Simulation run which produced Figure 4 calculated the Mean Cost of the 10 item project to be \$113,352.

The final point to note from this exercise is that the Overall Cost Distribution in Figure 4 allows us to select any probability that we like, for which the total cost of the project will be less than or equal to the simulated project cost. Figure 4 shows that to have a 95% probability for the project to be completed on or below budget the total budget should be set at \$120,400 and this gives us a good basis for deciding how much contingency to allow to limit the probability of a cost overrun.

The take home message from all of this is that the Likely Cost of the sum of several items is not the same as the sum of the Likely Costs of the individual items. As Figure 4 has illustrated, the Likely Cost of several items taken together is the sum of the Expected Costs of the individual items. We saw that this was intuitively true in the example of the 300 boys on the airplane. We had to estimate the total weight of the boys as the sum of the Mean (i.e. Expected) Weights, rather than the Most Likely Weights.

Somehow this lesson, which we take to be obvious in the example of the boys' weights, is not applied to the practice of estimating cost budgets for large projects. Part of the reason for it not be have been applied to project cost estimating in the past was the difficulty of dealing with a wide variety of cost distributions and finding all the expected costs and combining them. These days with fast computers and friendly software there is no reason not to address the problem properly and to avoid the surprises (and excuses) at the end of the project.

If you would like to experiment with applying these principles to some of your own costs please download the free trial version of Project Risk Analysis. This will quickly show you how simple and fast it is to solve the project cost overrun problem. We have also prepared another article on how to use Monte Carlo simulation to estimate the project cost contingency required.