When and why is variance sometimes used instead of standard deviation?

From a practical standpoint, you can think of it like this (it is by no means a therocical/academical approach).

You want to know how far from the average value the data points are spread. Now there are many ways to measure this, one of them is to look at how far the numbers are from the average, and add all of these distances together. Now with this approach, you run into problems because some of the data points are smaller than average and some higher, so if you just took SUM(point - average), they would even out. So you try to make the "even out" effect dissapear - you simply square the differences so that the negative values become positive. Now you add those and have a pretty good metric, now just divide it by the number of data points and you get variance (because you do not want the size of the data set to play a role here). It turns out that this approach has many mathematical advantages when deriving other properties (for example compared to taking absolute values instead of squared values to get rid of the negatives), but that is not the point at this moment.

So now you have a pretty good number representing the spread of the data, but it is not really useful when looking at the data, beucase it all squared. If you want it to be useful when looking at original data, you need to get it back to the same dimension, so you simply take a square root. That "cancels out" the squaring and you have a number that is useful when looking at your dataset.

TLDR: Variance is very useful when deriving its mathematical attributes simply because calculating with squares in this context is easier. Standard deviation is useful because it can be used in comparison with the actual data (Variance is squared, so it cannot be intuitively used for this).

/r/askscience Thread