Last year, CERN released 300 terabytes of Large Hadron Collider data. Why does particle physics use so much data?

Most people have already pointed out the mechanics behind the LHC producing so much data, but the real reason that it needs to measure such an ungodly amount of data points is that it's looking for a needle in a haystack.

The LHC is a proton-proton collider and they're a bit messy. Protons are particles with internal structure (they contain quarks and gluons) and as such each proton-proton collision is just a little bit different. Each collision produces a shower of high energy particles and those high energy particles then rapidly decay into other particles, which decay further and so on. Then finally the more stable particles make it into the detector where they are recorded.

Using these recorded particles, scientists, with the help of computers, try to work back up the decay chain to figure out what particles were created by the collisions. However, there are two problems here: First of all, as noted above, each proton-proton collision is different. This means that they need to work out not just the particles created in each collision but also how the collision happened, which requires a much higher precission than if they'd just need to reconstruct the particles created.

The second problem is that most particles are already known, and because of it's messy nature the LHC creates a LOT of particles that are already known. The particles we're actually interested in are much, much rarer and they exist on a big background. This means that, in order to spot these particles, scientists need to have very high precission data, which means they need to milk the machine for every last data point they can get. It also means that, in order to get that data, the LHC has been designed with a very high collision rate. But high collision rates also mean more data.

/r/askscience Thread