New player reform system heads into testing

80% f1-score would be pretty bad, but that's not even very good for the volume of data Riot probably has.

But a 95% f1-score is a lot less "bad' than what you're thinking, since those numbers are only talking about players that Riot/previous Tribunal cases believe should be permabanned, not the entire playerbase: - For every 20 "should-be-permabanned" players, 19 would be banned - For every 20 "bans" the system hands out, 1 would be erroneous. That doesn't necessarily mean that the banned player is "innocent". They just may not have been "bad" enough". Of course, they also could have been completely innocent. (There was one somewhat famous incident where a player who constantly verbally harrassed themselves was permabanned thru Tribunal).

Let's suppose 1% of Riot's 27 million daily players are toxic enough to be permabanned (seems high to me, but whatever), and Riot took a balanced f1-score of 95% (rather than adjusting the system to be more precise with lower recall):

  • pardon any minor math errors, but this should be pretty correct *

    • The system would submit a total of 269,330 bans
    • Of those, 256,500 would be correctly targeted towards toxicity (95% recall)
    • 12,830 would be false-positives (95% precision)

If the false-positives were ENTIRELY random and 90% of the player base is entirely "innocent", while 9% were "somewhat toxic":

  • 11,547 innocent players (0.048% of the innocent player base) is banned
  • 1,283 "somewhat toxic" players (0.048% of the somewhat toxic) is banned
  • 1,350 toxic players remain unbanned (5% of the toxic playerbase)

However, these systems are much more likely to accidentally guess a "neutral" as a "negative" rather than a "positive" as a "negative". My personal expierence says that, regardless of population sizes, neutral misclassified as negatives occur in a 2:1 ratio compared to positives misclassified as negatives.

This means that a better estimation would be: - 4,277 innocent players would be banned (0.018% of the innocent player base) - 8,553 "somewhat toxic" players would be banned (0.35% of the "somewhat toxic" base) - 1,350 toxic players would remain unbanned

The 4,277 innocent players still had to be reported: perhaps they had a shitty game, the reporter clicked them on accident or as a prank by friends.

If the system requires at least one "ranked restriction" or "chatlog restriction" in order to hand out a ban, then even false-positive "innocent" players would probably not be auto-banned by the system (unless the innocent player in question is either REALLY unlucky or is actually a reformed-toxic player).

In any case, those 4,277 players "can" be justified as a necessary evil to banning thousands and thousands of toxic players. If, for examples, more than 4,277 players would quit LoL due to toxicity (not saying i agree with that statement, but it could be said).

Not to mention: - the system was tuned to be more precise but with lower recall, it'd be even less than 4,277 innocents (if they used f0.5, then it might be only 2,150 innocent, but 2,700 unbanned toxics) - the % of "permabannable" players is actually 0.1% of the total population, not 1%. In that case, there are only 430 innocent players, small enough Player Support could manually unban.

Hopefully that should give you a clearer insight of what these numbers can mean (and, please note that 95% precision/recall is totally duoable if the minority class is well-defined with consistent behavior. i would say this is the case of LoL. There are several "types" of clearly-toxic behavior whether it is verbal abuse @strangers, intentional feeding with Zeals + Revive/TP, etc).

/r/RiotFreeLoL Thread Parent Link - na.leagueoflegends.com