Black Swans, Models and Reality
And Nassim Nicholas Taleb
Nassim Nicholas Taleb’s two books, Fooled by Randomness and The Black Swan, are undoubtedly two of the most provocative and interesting books I have come across in my life. While I am still ploughing through them, especially the second book, The Black Swan, has made waves and is apparently now (as usual in retrospect) cited as required reading on Wall Street. The books are highly eclectic and traverse a remarkably diverse landscape that includes psychology, finance, evolution, mathematics, philosophy, history, sociology, economics and many other disciplines. It would be impossible to review them in a limited space. But if I wanted to capture their essence, it would be by saying that Taleb alerts readers to the patterns that human beings see in the randomness inherent in the world, and the models, both mental and practical, that they build to account for this randomness.
Since his book came out, Taleb has become a mini celebrity and has been interviewed on Charlie Rose and Stephen Colbert. His books sell in large numbers in the far corners of the world. The reason why Taleb has suddenly become such a big deal is in part because he at least philosophically seems to have predicted the financial crisis of 2008 which occured two years after The Black Swan came out. One of the firms he advised turned out a profit of more than a 100 million dollars in 2008 when others were close to losing the clothes on their back. Taleb now has emerged as one of the most profound soothsayers and philosophers of our times, a “scholar of randomness”, although his message seems to be more modest; models are not designed to account for rare or Black Swan events which may have monumental impact. The analogy deals with the assured belief that people had in the past about all swans being white. When the continent of Australia was discovered and black swans were observed in flocks (a fact which my dad who is currently in Australia corroborated), there was a paradigm shift. Similarly a model, any model, that is built on the basis of White Swan events will fail to foresee Black Swans.
Unfortunately as Taleb explains, it’s the Black Swans that dictate the direction that the world proceeds in. It’s the rare event that is the watershed, the event that changes everything. And it’s exactly the rare event that models don’t encapsulate. And this fact spells their doom.
To augment his theory, Taleb cites many Black Swan events from history and politics. For example if you lived in 1913, you would hardly foresee the gargantuan event of 1914 which would forever change the world. If you lived in 1988, you would scarcely comprehend the epoch-making events of 1989. One of my favourite parts of the book concerns The Turkey Analogy. Imagine you are a turkey who is being constantly fed by the butcher 364 days a year. You are happy, you know the butcher loves you, your economics and accounts departments are happy, they start to think this is the way of the world for you. Now comes the 365th day. You die. Just when your expectations levels reach their most optimistic, your destiny reaches its lowest point. But right before day 365 on day 364, you were 100% certain that you had a lifetime of bountiful happiness ahead of you. Day 365 was exactly contrary to the expectations of your finance department. Day 365 was the Black Swan which you did not anticipate. And yet it was that single Black Swan day that proved fateful for you, and not the earlier 364 days of well-fed bliss. According to Taleb, this is what most of us and especially the derivatives wizards on Wall Street are- happy, deluded turkeys.
In any case, one of the most important discussions in Taleb’s books concerns the fallacy of model building. He claims that the models that Wall Streets used, the models that raked in billions and Nobel Prizes, were fundamentally flawed, in part because they were not built to accommodate Black Swans. That made me think about models in science and how they relate to other models. Science has always progressed through model building; some models such as relativity or quantum theory have come as close to modeling “reality” as anything else, in spite of us still not understanding what they actually mean. As in the case of other models, more data helps to validate or reject the model hypothesis. However, it is dangerous to use the model if you are not sure how it works. In some ways we are always playing a high-risk game. In Taleb’s mind, the derivatives and other frameworks that the genius quants used were like a plane whose workings they did not understand. When you build a plane, you should always keep the possibility of a storm, a rare event, in mind. Aerospace engineers do this. The quants apparently did not.
Since my own field crucially involves building and investigating models, I can vouch for the fact that model building is always a tricky exercise and can always become seductive and misleading. We all, irrespective of the fields we work in, want models that will help us to predict the next important data point. And yet models are always elusive, always leave us with an uncomfortable, or falsely comfortable feeling, that our thing is the real thing. Several factors contribute to this problem, not to mention an inadequate exposure to statistics courses in college. But two things for me have especially stood out as common flaws in any kinds of models.
1. Correlation does not mean causation:
I believe that children should be taught to recite this 10 times in the evening the way they are taught to recite multiplication tables. Mistaking correlation for causation is the one of the biggest mistakes that all of us all too commonly make. This fallacy pervades our thinking in science, philosophy, politics, and every single day we see it running rampant in our daily conversations, in our sermons to our children, and in the newspaper articles. Even if many of us are aware of it in theory, we cannot avoid it in practice. The correlation vs causation problem is so omnipresent that it keeps serving us generous doses of what may be completely unreal but what we would construe as reality. There are several examples of mistaking correlation for causation; for instance the correlation between the number of breeding storks in Germany and the number of new births might readily provide credence to children’s beliefs. Mistake correlation with causation and you are off on a trajectory orthogonal to that of reality.
But let’s face one big truth about models that attempt to correlate and explain; most of them in fact are not designed to represent “reality”. In fact models don’t care as much about causation as they do about accurate correlation and prediction. While this may sound like shooting ourselves in the foot, it often saves us a lot of time, not to mention philosophizing. We use models not because they are “real” but because they work. This is a very important point, and I believe that most human beings who mistake correlation for causation nevertheless keep on operating in everyday life under the fallacy because it saves them time and allows them to make quick decisions, even if many of them are wrong. Part of this gut reaction probably stems from our evolutionary history in which our mind was essentially programmed to seek patterns. This pattern finding quality would have been indispensable in our cave-dwelling days; it’s probably safe to always assume that a dark forest means the presence of wild creatures who can easily attack and eat you. People who paused to take their time and tried to distinguish between correlation and causation may well have been weeded out of the gene pool. However, the quality is less than useful in an increasingly information-driven complex world full of quick multitasking, money management and unconscious data processing. In such a world, those who will carefully reflect on the fallacy and use it to temper their decisions might seem slow, but might prosper more in the long term.
2. The Problem of Overfitting:
Models can always fit data if arbitrary changes are made to their parameters and enough number of parameters are used. As a friend once quoted the great mathematician John Von Neumann in a past comments thread, “Give me five parameters and I can fit an elephant to a curve. Give me six and I can make him dance”. This is overfitting. In overfitting, models can do a stellar job of accounting for known data, but miserably fail to predict new data which is after all what they should be able to do. Modelers are frequently misled by high correlation coefficients that may not mean much and may well lead them down the wrong path. There are several methods to avoid overfitting, none of which however always works. One way is to leave out some of the data used for fitting the model and look at the resulting modified fit. Another one is to do a careful study and actually determine the minimal number of parameters necessary to fit the model to the data, and no more. Bishop Occam’s Razor is quite handy in such circumstances. Statistics is another, a simple remedy being to try to assert if whatever result you are getting is due to chance alone (p values, F tests and all that).
So how can all this affect our perception of Black Swans, of rare events that shake the whole foundation? Well, let’s say if I want to model the digestive process in my stomach and assume that food in my stomach is digested by little dwarves shuttling food to one another, breaking it down with hammers and chisels, and finally passing it on to goblins who open the gates to my blood supply and let it in. While this model might sound completely nonsensical, it could make perfect sense if I tweak the assumptions to explain observed facts. In fact religious people use exactly the same strategy; invoke assumptions to explain known facts. If facts don’t follow from assumptions, simply modify the assumptions to explain the new facts. Using these potentially ad infinitum set of assumptions, the dwarf-goblin model I constructed for digestion could work perfectly. One of the most important lessons I learnt from my advisor in graduate school is to always question the assumptions in any study. If the assumptions themselves are suspect, then all the conclusions that follow from them may be necessarily wrong, no matter how elegant or logical the train of thought sounds. In all of the above scenarios, we have neglected the one annoying factor that could help to disillusion us of the entire edifice- evidence. Hard facts found from observation, interrogation and experiment. That is in fact what prevents science and in fact most of everyday life from going down the same road as religion, and that is why anything goes in religion and therefore eventually nothing goes. In science and most of everyday life, one can do experiments to test assumptions. These experiments will then shed light on the ground reality of assumptions. Cut open a stomach, and my fantasy of dwarves and goblins playing with each other will come crumbling down in an instant. Needless to say, because of its pronounced emphasis on and indeed essential adherence to assumptions that cannot be tested, religion is the ultimate bad model.
And now we should understand that this is precisely the reason why models that don’t model reality carry the dangerous liability of not being able to predict monumental rare events. Because let’s say you have a model that has simply ignored a real factor operative under the circumstances. The model works pretty well when the contribution of that factor is marginal. But now, suddenly, the marginal factor becomes a factor of overriding importance. Since we don’t even know that the factor exists, it is as much of an outlier in our model as anything can ever be. We fail to predict it. And if it is a Black Swan, it may then negate the importance of everything that we apparently predicted well before. We can forget about prediction; at this stage we will be left with our head spinning, completely clueless about what went wrong because of our obliviousness to a factor whose existence we are not aware of. We won’t even be able to do retrospective explanation.
But finally, as much as we know these pitfalls, in practice we are much less careful since we want to get practical, applicable results and could care less if our model represented conditions on Mars. The bottom line is, all of us are playing a game when we use models, in science, finance or any other discipline. As in other games, we are fine as long as we win. One of Taleb’s messages is that we should at least be able to assess the impact of losing, something which he asserts the quants have significantly underestimated. If the impact is a complete game changer, then we should know when to get out of the game. We tend to forget that the models that we have don’t represent reality. We use them because they work, and it’s the reality of utility that produces the illusion of reality. Slightly modifying a quote by the great Pablo, models are the lies that help us to conceal the truth.
Note: The short Charlie Rose interview with Taleb is worth watching:
links for 2009-02-06 « Unjustly said
February 6 2009 @ 1:30 pm
[...] Black Swans, Models and Reality | From so simple a beginning… Imagine you are a turkey who is being constantly fed by the butcher 364 days a year. You are happy, you know the butcher loves you, your economics and accounts departments are happy, they start to think this is the way of the world for you. Now comes the 365th day. You die. Just when your expectations levels reach their most optimistic, your destiny reaches its lowest point. But right before day 365 on day 364, you were 100% certain that you had a lifetime of bountiful happiness ahead of you. Day 365 was exactly contrary to the expectations of your finance department. Day 365 was the Black Swan which you did not anticipate. And yet it was that single Black Swan day that proved fateful for you, and not the earlier 364 days of well-fed bliss. According to Taleb, this is what most of us and especially the derivatives wizards on Wall Street are- happy, deluded turkeys. (tags: history taleb randomness black_swan models) [...]
Rajesh said
February 8 2009 @ 3:50 pm
Good one but how do you account for such black swans? Not all data found is experimental. In fact, most of the data we have to construct these models in real world are based on observational data. Unless, we have significant number of observations in each event category, we can always get wrong parameter estimates. I mean statistics works that way right, based on events that occurred in past, we try to model them and try to predict future data points.