Wasp wrote:QUOTE (Wasp @ Mar 20 2018, 08:55 AM) It doesn't work
How did you determine that it "doesn't work"? What's your metric?
It seems like your definition of "work" is "people generally agree that it's accurate", which is subjective and unfalsifiable.
Now, Baker at least knew that the effectiveness of a ranking system was something you could actually quantify: does the model make accurate predictions of who is going to win? At the end of the day, that's all that matters. Someone might have incredible "skills" that no one else can duplicate --like, for example, let's say I was world champion at MrK's probe-killing contest. I can deprobe twice as fast as anyone else. If it doesn't contribute toward the outcome of the game, what good is that skill?
And according to Baker, Allegskill had some ridiculous predictive power, something north of 90% of accuracy. I never saw the data myself, and I strongly suspect that he used the same data set for validation that he used for training, which is a big statistical no-no. Either that or 90% of games really are that stacked that the weaker team can't even win 10% from sheer dumb luck, or he overfit the data. Honestly, I would be surprised if any system, no matter how sophisticated, could predict the outcome 60% of the time, there's just so much random variation and nonlinear behavior that can't be modeled except to first order. But even 60% would still be a meaningful demonstration of accuracy.
QUOTE because it violates the underlying principle that the outcome of the game must be the sole responsibility of the players being ranked. The god like hand of the commander and the inconsistency of the rock placement and varying tech paths that greatly limit the comparable samples, completely disassociates almost every player of each game from outcome.[/quote]
This chestnut never dies, does it?
Please show me anywhere in the Trueskill paper where the statement "the outcome of the game is assumed to be the sole responsibility of the players being ranked" appears. Or any semantically equivalent statement, such as "Trueskill cannot work if there is any element of chance present in the game".
The only assumption the paper makes is that a person's performance is randomly distributed around their skill in a Gaussian distribution. What's the source of the random variation? Doesn't matter. Maybe the opponent picked an opening line you weren't familiar with. Maybe you spawned far away from that gun you like. Maybe you got screwed by the rocks, or by the commander. Maybe you just didn't eat your Wheaties that morning. TrueSkill is honey badger, it doesn't give a $#@!.
The only way you may have stumbled on something even approximately approaching a point is that people get to choose their own teams, so that IF a person is better than the ranking system at predicting the outcome of the game, then the ability and willingness to stack becomes a "skill" that TrueSkill is unable to distinguish from other skills that contribute towards winning a game.
QUOTE The fact that we are able to balance games on names alone and from that we can see that the numbers are off, we know that the current ranking system is neither useful or used and we are quite capable of balancing teams based upon our knowledge of what each of us will do in game. That is why I think it is the proper path to take to categorize players by what they do and rank on those categories. No nonsensical math is needed.[/quote]
Again, this is not a "fact", because there is no data behind it.
Honestly, the best thing we can do is just open source the data and let people come up with their own ranking systems. People can be as sophisticated as they want, or they can just always predict whatever side the boxset joins. Let's put them head to head and measure whose is the best.