The well-known Minard diagram (see its larger version here) shows the losses suffered by Napoleon’s mighty army in the 1812-1813 during his invasion of Russia. Six variables are famously plotted in 2D: the size of the army, its location, time, direction of movement, and the outdoor temperature. The line width illustrates the size of the army at various points in time, while the temperature chart at the bottom suggests temperature to be the cause of the decrease in army size. This innovative multivariate display on a two dimensional surface tells a story that can be understood immediately. BTW, see it re-plotted here using D3 technology.
However, is this really a great example of cleverly-found causation or a case of misinterpreted correlation? Without doing an in-depth analysis of the data, I can only point out a couple of things.
During Napoleon’s invasions of Russia, the weather was not cold all of the time. For example, the major battle of the war, the Borodino battle (which is near Moscow and half-way into his journey), was fought early in September, when the weather was pretty warm, which could be seen from the historical painting that follows.
You can also see in the diagram that the size of Napoleon’s army started declining) immediately after Napoleon crossed the Niemen (or Neman) river earlier that summer (in June), when the temperatures were obviously well above freezing.
Other potentially important factors such as 1) the distance traveled, 2) increasing fighting intensity, 3) outbreaks of illness, 4) depletion of supplies, and some others are not accounted for in Minard’s analysis, but could have had the same or similar effect on the army’s losses. His “feature selection” process was not very rigorous.
I spent about 20 mins and managed to roughly capture the distance Napoleon’s army traveled from one location to another and then matched it with the corresponding army size numbers from Minard’s diagram. This is what I got:
The X-axis of this chart represents a cumulative distance traveled (in km) and the y-axis corresponds to Napoleon’s army size. With the traveled distance increasing, the army size declines exponentially from approximately 400,000 down to around 10,000. The temperature remained above zero until about half of the distance (1000-1250 km), and then declined below zero, but the overall fit to the data remains good independently of the temperature.
I think I can now propose a pretty strong alternative hypothesis that the temperature had relatively little impact on the decline of Napoleon’s army and the distance it traveled was a much more important factor. It is sufficient for us to explain what happened without even involving such factors as temperature.
Let me conclude now that, while Minard’s diagram is a very creative and innovating way to represent data, the conclusion this diagram offers seems to be unjustified. It proposed, in my opinion, a plausible and comfortable explanation to the French society of the time it was created by suggesting that cold weather was to blame for Napoleon’s catastrophe. But, if we look at the data more closely, it seems like the huge distance traveled, not the temperature was detrimental to Napoleon’s plans.
C’est la vie.