Why is "Countdown" compression unfeasible?

This comment was posted to reddit on Feb 06, 2015 at 10:23 pm and was deleted within 1 hour(s) and 27 minutes.

Why is "Countdown" compression unfeasible?

What you find is that if you choose a bunch of blocks of random data and try to compress them, most of their "compressed" forms are larger than the data you put into it. Obviously not all numbers take the form 2ⁿ + 1, but let's say that all numbers can take the form of floor(a^b + 1), where a and b don't have to be integers like in your example. For most numbers, the amount of data needed to represent a and b is larger than the amount of data needed to represent the original number.

It's impossible to have a lossless compression algorithm that take any arbitrary input and always give an output that takes less storage space. The reason is that every compressed block of data when put into the decompression algorithm will yield exactly one uncompressed block of data. Since the uncompressed data is larger than the compressed block of data, there are more possible uncompressed values than there are compressed values. So either there are some uncompressed values that don't have compressed representations, or some outputs of your "compression" algorithm require more storage space than their "uncompressed" value.

Real world compression algorithms do what they do by taking advantage of the fact that they know something about their inputs, but have huge swaths of data for which they will perform terribly and be worse than the uncompressed form. That's why if you use jpeg to compress an image with a lot of sharp edges you get artifacts - there is no jpeg-compressed block of data that you can give to the jpeg decompression algorithm that will give you your original image with lots of sharp edges. It does an admirable job of giving you a compressed block of data that will get close though (with the differences being the artifacts). Huffman coding will do a great job of compressing things like ASCII text, but if you put in random data the dictionary will usually be bigger than your input. Video compression works great on normal videos, which don't usually change a whole lot between frames, but if you feed in static and demand losslessness your video file is going to be huge.

If you want more information /r/askprogramming might be a better place to post

/r/askscience Thread

Why is "Countdown" compression unfeasible?

Recently removed from /r/askscience

More Random Comments