What can the forecasting profession do differently? Well, like most things in life, change will only happen if you demand it. Sharp words about failed predictions are essentially forbidden where they are most needed. So for all the incentives pushing experts to pump up their predictions, there must be a countervailing incentive to tone it down. In reality, there is little accountability for predictions, and while big calls that go bad should damage the reputations of those who make them, they seldom do.
Given how pervasive and influential commodity price predictions are, there is a surprising lack of data into how accurate the forecasts have been and which forecasters have the best track record. According to Philip Tetlock and Dan Gardner – authors of “Superforecasting: The Art and Science of Prediction” – there is a lack of accountability when it comes to financial forecasts. “Every day, the news media deliver forecasts without reporting, or even asking, how good the forecasters who made the forecasts really are,” say Tetlock and Gardner. They continue:
Every day, corporations and governments pay for forecasts that may be prescient or worthless or something in between. And every day, all of us – leaders of nations, corporate executives, investors and voters – make critical decisions on the basis of forecasts whose quality is unknown.
Governments, businesses, investors and individuals don’t demand evidence of accuracy before deciding whether to accept and act on a prediction. Forecasts are routinely made but the results are almost never tracked. As noted earlier, prominent forecasters build reputations not because of their accuracy but because of their skill at telling a compelling story with conviction.
Predict if you want, and rely on predictions if you really need to, but keep a tally of the predictions. Although my review of oil price forecasts over the past ten years was relatively easy to carry out, many of the predictions that pundits make are much more difficult to gauge. Part of the problem is the fuzzy language in which many predictions are often expressed, making it difficult to tell if the forecast was right or wrong even after the event.
Karl Popper famously made the observation that the usefulness of a prediction was related to its potential for falsification. “It will rain in London in the future” is a statement that is 100% accurate, but useless when it comes to telling us which day we should carry an umbrella. A statement that it will rain at 10.30am tomorrow is much more useful; it will rain or it will not. If it doesn’t, then we can examine what assumptions were used in the forecast that turned out to be false.
Forecasts are often expressed using ambiguous words like probable, possible and risk, for which there are no agreed definitions, making it impossible to score them afterward. The US Intelligence Community has famously struggled with the lack of precision in the meaning of words that are commonly used to express likelihood and chance since the 1960s. Sherman Kent, often described as the “father of intelligence analysis”, was a CIA analyst that recognised the problem of using imprecise statements of uncertainty. Particularly, Kent was jolted by how policymakers interpreted the phrase “serious possibility” in a national estimate about the odds of a Soviet attack on Yugoslavia in 1951. After asking around, he found that some thought this meant a 20% chance of attack, while others ascribed an 80% chance to the phrase. Most people were somewhere in the middle.
Investor and author Michael Maubossin recently posted a survey to gauge how people view probabilistic language. Take ‘real possibility’ as an example. To some people this meant a probability of just 20%, but to others it meant an 80% likelihood. Next time you hear someone say there is a ‘real possibility’ of this or that happening to a particular market you know they are just trying to make a forecast that no one can hold them to account for.
Remember, hits and misses don’t come with labels. It’s often a matter of perception whether a forecast is deemed to be a hit or a miss, which makes language important. The more ambiguous the wording is, the more a pundit’s prediction can be stretched. And since we want hits, that’s the direction in which things will tend to stretch. As Dan Gardener describes in his book “Future Babble”:
When the notoriously vague Oracle of Delphi was asked by King Croesus of Lydia whether he should attack the Persian Empire, the oracle is said to have responded that if he did he would destroy a great empire. Encouraged, the king attacked and lost.
The same confusing probabilistic terminology are used by pundits, and are then often also used by many in the financial media to imply something is much more likely to happen than it actually is. We saw an example of that in a previous chapter when Goldman Sachs suggested that because of limited spare capacity oil prices “could lead to $150-$200 a barrel oil prices” – many in the financial media interpreted “Could” as “Will”.
In other cases, relatively specific forecasts are matched with an unspecific time frame, which also makes it difficult to score them for accuracy. There is a maxim among professional analysts that cynically confirms the problem: “always predict a price, or a time frame, but never both”. However, in recent years, some commodity market forecasters have been pushed to quantify their forecasts by making specific price predictions over specified time horizons. Many have also embraced uncertainty by offering forecasts in the form of a probability distribution, rather than a point estimate, which is a much more useful and realistic way to think about the future. Commodity market forecasters are catching up with weather forecasters and the US Intelligence Community in trying to estimate the likelihood of a whole range of outcomes, not just the central one.
Percentage forecasts are an important step forward, but the commodity market is still lagging behind in terms of measuring forecast accuracy after the event. The problem with percentage forecasts is working out whether they were accurate even in retrospect. Tetlock and Gardner call this problem “being on the wrong side of ‘maybe’”. To understand the problem, imagine a weather forecaster who says that tomorrow there is a 70% chance of rain. The forecast also implies there is a 30% chance it will not rain. If it doesn’t rain, the forecast was not necessarily wrong in a statistical sense but it is still likely to be criticised by anyone concentrating on more than just the most likely outcomes.
Although assigning probabilities to particular scenarios is an improvement, it is far from being a panacea. It can give an impression of faux certainty, that all possible outcomes are knowable in advance and have been captured by the forecast. Known as “Knightian uncertainty”, probabilities cannot be assigned to different outcomes because the existing distribution of possible outcomes is unknowable.
The danger with faux certainty is that it might lead market participants to believe something is more likely than it actually is. Consumers of forecasts might think that all of the possible outcomes have been captured by the commodity forecaster and then act based on this supposed “evidence”. Remember, pundits see uncertainty as something that threatens their reputation. Often forecasters will make assumptions in their models that lower the perception of the degree of uncertainty in future prices.
Again, the solution to firming up fuzzy forecasts is to track performance over time. This would weed out those forecasters unable to capture the range of possible outcomes accurately and the over-confident from the accurate. Meteorologists pioneered the solution to the probability forecasting problem and the solution was published by Glenn Brier of the US Weather Bureau in 1950. Brier published a careful methodology for comparing a set of forecasts expressed as probability distributions with eventual outcomes, and scoring forecasters on a standard scale from zero (complete accuracy) to 2.0 (perfect inaccuracy). The most accurate forecaster is the one whose forecast probability distributions get closest to the distribution of actual out-turns over time. If a forecaster predicts there will be a 70% chance of rain, they should be correct approximately 70% of the time.
Verifying accuracy is obviously much easier for weather forecasts, where thousands of fresh forecasts are issued every day and can be compared with thousands of outcomes. Verification is more difficult for subjects like commodity prices but, given how frequently prices are forecast, it is not impossible and would be highly desirable.
Brier scoring price forecasts could also bring important benefits for commodity markets. The aim would not be just to identify the most accurate forecasters, those most worth paying attention to, but improving the accuracy of all forecasts by subjecting them to rigorous analysis after the event. Weather forecasts have improved enormously over the last fifty years because they have been subjected to rigorous analysis. It is far less obvious that forecasts for commodity prices and other financial markets have become any better.
In “Black Box Thinking”, Matthew Syed describes how the airline industry actively promotes the sharing of mistakes and failures in order to help propel the safety of the industry forward. Other professions are not so great at looking at past mistakes, learning from them and improving their processes and models to make the future better. Syed describes how psychotherapists gauge whether their treatments are effective, not by observing the patient with objective data over a long period of time, but by observing them in the clinic. As well as being prone to all kinds of biases, both from the patient and the psychotherapist, there is no feedback on the lasting impact of the treatment and hence no opportunity to learn – from success and from failure.
There is no reason why learning from mistakes and failures should not be part of the job description of the analysts involved in commodity forecasting. Capturing data is not a problem (whether that be prices, futures market activity, etc), information on market events are well publicised (for example, changes in interest rates, currency movements or political instability). All it needs is a change in attitude.