Theorems that seem really powerful/fundamental but never do anything "in real life"

IDK about Deep Sets specifically, but generally,

ML frequently work by modeling on high-dimensional probability distributions, so each individual event usually occurs with very small probability. Considering also that many practical computations are about multiplying probabilities together, most algorithms work with logprobs and summing them (logprobs are the logarithm of these probability masses) rather than the actual values to avoid precision underflow (i.e. where the actual values are so small they can't be represented accurately in the machine-dependent representation of numbers).

There are occasions where you would want to sum these logprobs, e.g. when computing a marginal event, and in these occasions you typically temporarily convert them back to regular probabilities for summing -- but after scaling them by the largest logprob mass to avoid precision error in the most significant quantities: this is known as the logsumexp trick.

/r/math Thread Parent