ELI5: Why is the 95% confidence interval a measure of data significance in scientific studies? (i.e. why not higher or lower?)

As someone who worked throughout my Geography undergraduate degree with both physical science and social science data I might be able to lend a (relatively) uninformed hand.

I believe it helps to think about what statistical significance is actually showing you: i.e. what is the likelihood that this pattern/correlation/statistical result you see arose purely by chance? That is to say, the higher the confidence interval, the more confident you can be that it DIDN'T occur by pure chance. This is since these tests compare actual results with what would be expected if your null hypothesis were true (I.e. what result would you find if your hypothesis were not the case).

I think the selection of the 95% point is partially arbitrary. Within statistical analysis there are a whole range of confidence intervals that you can use, such as 97.5%. Using a range can actually be useful in showing how the significance of your data changes: "it was significant at the 90% confidence interval, but not the 95%". Potentially you lack data, or your sample size isn't big enough etc., which can direct future improvements in your study. 95 serves as a relatively "easy" number: a whole number that is divisible by 5, relatively high, 19/20 etc. Bare in mind that some disciplines do use different significance intervals, such as particle physics and manufacturing, things requiring precision with direct real world consequence.

I think another reason 95% is commonly chosen is because it gives appreciation to how most data is complicated by noise and a certain randomness, whilst still looking to ensure statistical significance. Things tend to work in trends as opposed to direct relationships all the time. Take smoking, you can do it for 40 years and live to 100 in theory. But, you are probably more likely to die of lung cancer or similar earlier. If you studied this, you would thus never see a perfect correlation with a perfect significance: there could be random factors or unknown genetic influencers affecting your correlation, whilst issues with the format of your study itself might be biasing your results slightly, such as through a sampling error. As such, 95% serves as a compromise: it's evidently high, but acknowledges the nature of chance present in most places. Any result you get is statistically significant enough, but not perfectly so.

Why not lower? Likely because then your results just become increasingly less reliable past a point that has been considered by current researchers to be unacceptable. If a significant number of studies can demonstrate 95% confidence, then yours too should probably also pass this bar, otherwise your results are falling into a zone of the normal distribution curve that is simply increasingly likely to occur by "natural chance".

/r/explainlikeimfive Thread