Came across this "fact" while browsing the net. I call bullshit. Can science confirm?

Calculating the probability in this way gets you close in relatively large scale problems like this, but isn't correct. The reason for this is that your method calculates the probability of, when selecting a pair of birthdates 253 times, selecting two different dates each time. This is the equivalent of finding the probability of rolling a 365 sided die 253 times and never rolling a 1. The number you get is irrelevant to the problem, unfortunately.

It's easier to explain with a smaller scale problem. Rather than people with birthdates, lets say you have four four-sided dice and you want to calculate the probability of rolling at least two the same. It's the same problem as the birthday problem, but with much smaller numbers. Right off the bat, we know there are exactly 44, or 256, ways the dice can land. Any probability concerning this problem must be able to be expressed as a number of those 256 ways the dice can land, or more simply put a fraction with a denominator of 256. In this problem, there are exactly 24 valid ways the dice can land (every way you can arrange the numbers 1 through 4) that do not include any repeat numbers, which means a probability of 24/256 that the dice will not contain any two matching numbers (and a 232/256 probability of containing at least two matching numbers). Note that the numbers directly correspond to the number of ways the dice can fall.

If we try to solve the problem via the method you used in your post, you would have 6 different ways to pair the dice (3+2+1), and each pair would have a 3/4 chance of not matching. Multiplying it all together would give you (3/4)6 , or 36 / 46. This fraction (729/4096) cannot be reduced to 256ths which means that whatever you are talking about is not ways the dice can fall. In addition, the probability given by this method is nearly double the actual probability of rolling four different numbers (17.8% vs 9.375%). When we consider the percentage we want in the end is 100% - our answer the difference doesn't feel so big, but it helps make it clear that the solution isn't accurate.

The reason this method doesn't work is that each pair doesn't have a 3/4 chance to not match. If you choose to find the probability by checking every single pair, you have to group pairs together in sets rather than all together in one group. For the dice problem, you could check the first die against the other three dice. Each of the other three would have a 3/4 chance to not match. However, once you compare the second die against the third and fourth dice, the probability of not matching is not 3/4. If you've made it past the first round of pairs, then none of the dice match the first die's number. This means that the remaining dice can only be one of three numbers, meaning the probability of not matching is 2/3. The following round leaves only two numbers left and a probability for 1/2 for a non-match. In the end, you end up with a fraction for each pair, but they aren't all the same:

(3/4)3 * (2/3)2 * (1/2)1

The first set (probability 3/4) includes all pairs with the first die, The second set (probability 2/3) includes all remaining pairs with the second die, and the third set (probability 1/2) contains the last pair. This fraction reduces fully to 3/32, which can also be expressed exactly as 24/256.

There is a mostly correct solution using this method above by /u/lookmeat along with a good explanation. There is a slight error due to the number of factors involved in using this method. When typing the problem into the calculator, the exponents got mixed up and an extra term was included. I provided another way to solve it with cleaner/less work below the post.

/r/askscience Thread Parent