How does sorting by "Relevance" work? How does a computer determine what's relevant and what isn't?

This comment was posted to reddit on Jun 09, 2016 at 10:20 am and was deleted within 14 minutes.

How does sorting by "Relevance" work? How does a computer determine what's relevant and what isn't?

Right now reddit is beta testing a new relevance algorithm. If you enable beta features in the options you can see it as "relevance2" in the checkbox that is highlighted in OP's screenshot.

The fact that they're requesting user feedback about it is, as you said, because it's a fuzzy thing. Not even humans agree on whether result A is more relevant than result B; turning this into a mathematical definition that a computer can use is even trickier.

So, yeah, compute statistics on how many times the term appears, what part of the page it is in, number of clicks, etc.

most of these techniques delve a lot on the field of Machine Learning

That's correct, but in this case, it's a bit trickier than usual. Machine learning in AI works because the agent can measure the success rate of its actions, e.g. if a high percentage of chess matches in which the horse is the first move result in a win, then it's a good move (it's just a silly example, actually I have no idea if that's a good move IRL). But in the case of searches, if a user clicks on a result and then closes the browser tab, how can the sorting algorithm know if it was what the user was looking for? Basically the number of clicks, yeah, but this is a very unreliable metric. Users open search results just to see what they are. We could speculate that a user stops searching after having found the right result (so the last link clicked would get score), but also this one is terribly unreliable - it might just mean the user gave up.

Also note that scores are not just linked to the page, but also to the search keywords. Eventually these statistics might require a lot of storage space and not be justified by the relatively small improvements.

From that we can speculate that there will never be a universally "best" relevance algorithm.

/r/askscience Thread Parent

How does sorting by "Relevance" work? How does a computer determine what's relevant and what isn't?

Recently removed from /r/askscience

More Random Comments