AllegSkill

Xynth · Post by **Xynth** » Tue Mar 20, 2018 7:44 pm

BT wanted this ranking algorithm because he had hope for an influx of new players to allegiance that would enjoy a more modern progress based gaming experience (see Fortnight for example). We obviously did not achieve that influx and the old vets still rule the day. The ability to measure an allegiance e-peen is important to a large portion of the player base. I don't have the bandwidth to implement a new allegskill but if anyone has questions on how to wire it into the current system let me know. It should be fairly straight forward to get it plugged in and leverage steam stats to do it.

cashto · Post by **cashto** » Tue Mar 20, 2018 8:38 pm

Wasp wrote:QUOTE (Wasp @ Mar 20 2018, 08:55 AM) It doesn't work

How did you determine that it "doesn't work"? What's your metric?

It seems like your definition of "work" is "people generally agree that it's accurate", which is subjective and unfalsifiable.

Now, Baker at least knew that the effectiveness of a ranking system was something you could actually quantify: does the model make accurate predictions of who is going to win? At the end of the day, that's all that matters. Someone might have incredible "skills" that no one else can duplicate --like, for example, let's say I was world champion at MrK's probe-killing contest. I can deprobe twice as fast as anyone else. If it doesn't contribute toward the outcome of the game, what good is that skill?

And according to Baker, Allegskill had some ridiculous predictive power, something north of 90% of accuracy. I never saw the data myself, and I strongly suspect that he used the same data set for validation that he used for training, which is a big statistical no-no. Either that or 90% of games really are that stacked that the weaker team can't even win 10% from sheer dumb luck, or he overfit the data. Honestly, I would be surprised if any system, no matter how sophisticated, could predict the outcome 60% of the time, there's just so much random variation and nonlinear behavior that can't be modeled except to first order. But even 60% would still be a meaningful demonstration of accuracy.

QUOTE because it violates the underlying principle that the outcome of the game must be the sole responsibility of the players being ranked. The god like hand of the commander and the inconsistency of the rock placement and varying tech paths that greatly limit the comparable samples, completely disassociates almost every player of each game from outcome.[/quote]

This chestnut never dies, does it?

Please show me anywhere in the Trueskill paper where the statement "the outcome of the game is assumed to be the sole responsibility of the players being ranked" appears. Or any semantically equivalent statement, such as "Trueskill cannot work if there is any element of chance present in the game".

The only assumption the paper makes is that a person's performance is randomly distributed around their skill in a Gaussian distribution. What's the source of the random variation? Doesn't matter. Maybe the opponent picked an opening line you weren't familiar with. Maybe you spawned far away from that gun you like. Maybe you got screwed by the rocks, or by the commander. Maybe you just didn't eat your Wheaties that morning. TrueSkill is honey badger, it doesn't give a $#@!.

The only way you may have stumbled on something even approximately approaching a point is that people get to choose their own teams, so that IF a person is better than the ranking system at predicting the outcome of the game, then the ability and willingness to stack becomes a "skill" that TrueSkill is unable to distinguish from other skills that contribute towards winning a game.

QUOTE The fact that we are able to balance games on names alone and from that we can see that the numbers are off, we know that the current ranking system is neither useful or used and we are quite capable of balancing teams based upon our knowledge of what each of us will do in game. That is why I think it is the proper path to take to categorize players by what they do and rank on those categories. No nonsensical math is needed.[/quote]

Again, this is not a "fact", because there is no data behind it.

Honestly, the best thing we can do is just open source the data and let people come up with their own ranking systems. People can be as sophisticated as they want, or they can just always predict whatever side the boxset joins. Let's put them head to head and measure whose is the best.

Wasp · Post by **Wasp** » Tue Mar 20, 2018 8:50 pm

phoenix1 wrote:QUOTE (phoenix1 @ Mar 20 2018, 03:02 PM) AllegianceSkill tracked command rank separately from normal rank.

No it did not. You cannot distinguish which games were won where the commander was solely responsible and remove all others from rank adjustment. That system is co-mingling results and completely ignores the underlying rule by which it bases it's logic.

phoenix1 wrote:QUOTE (phoenix1 @ Mar 20 2018, 03:02 PM) Also, as important as techrock placement is to a team's success, that variance affects both teams roughly equally.

No it does not. Your premise assumes that the same two commanders will compete with enough rock variation samples to ween out the influence those variations had on outcome while still ignoring the co-mingled results of the players on the team. There will never be enough "interesting" samples where you can factor out all of those variables and determine skill.

phoenix1 wrote:QUOTE (phoenix1 @ Mar 20 2018, 03:02 PM) Finally, the ranking system was designed by statisticians who published their system in an MIT Press journal and worked together with Sgt Baker to tweak the system for Allegiance.

This is even more of the nonsense that Baker was trying to sell. You point at the source for allegskill (Trueskill) and then ignore the fact that you are not adhering to the rules dictated by that source! Then you go on to proclaim how applicable trueskill is to a game like allegiance, and then base that proclamation on someone else's credentials? It's either applicable or not regardless!,.. and that is solely based upon the underlying rules of the bayesian principle...that the players being ranked MUST be the SOLE responsible party for the OUTCOME of the game.

phoenix1 wrote:QUOTE (phoenix1 @ Mar 20 2018, 03:02 PM) The biggest flaw in AllegSkill was lack of sample size... which is definitely not Baker's fault. Probably the fault of commercial airline pilots making paragraph-long posts about how they know more about statistics and algorithms than two guys with PhDs in computer science and a guy with a PhD in statistics and whatever qualifications Baker had.

More nonsense. You endorse Baker's project and base that endorsement upon credentials of others who have no association with allegskill whatsoever. You post the credentials of others and point at statistical formulas and then try to associate those things with what Baker did in an attempt to give it validity.

Post by **zombywoof** » Tue Mar 20, 2018 9:00 pm

cashto wrote:QUOTE (cashto @ Mar 20 2018, 01:38 PM) The only assumption the paper makes is that a person's performance is randomly distributed around their skill in a Gaussian distribution. What's the source of the random variation? Doesn't matter. Maybe the opponent picked an opening line you weren't familiar with. Maybe you spawned far away from that gun you like. Maybe you got screwed by the rocks, or by the commander. Maybe you just didn't eat your Wheaties that morning. TrueSkill is honey badger, it doesn't give a $#@!.

QUOTE The only way you may have stumbled on something even approximately approaching a point is that people get to choose their own teams, so that IF a person is better than the ranking system at predicting the outcome of the game, then the ability and willingness to stack becomes a "skill" that TrueSkill is unable to distinguish from other skills that contribute towards winning a game.[/quote]
I wish there had been more traction to utilize autobalance in games. If we'd stuck with autobalance I think in the long term we woudl have had less stacking issues.

Autobalance, of course, had its problems (namely that I'd rather have two tens and a five nanning my bbr than a single 25), but that could have been reasonably fixed. It could have been set so that at no point does one team have more than 3x the pilots of the other team, and that pilots who are ranked "below average" are forced to join the stacked team while pilots who are ranked "above average" are forced to join the weaker team after a game has begun.

But there's a problem with this plan, namely that as a community we've developed a culture and an idea of what it means to join games of Allegiance. People don't seem to consider that there is literally no major competitive game on earth in which the primary game mode allows you to choose which "team" to be on outside of squad play. Sure, you can duo with your buddy in League of Legends, but the primary game mode for the past, oh something like fifteen years, has been "let our automated computer system pick your opponent."

At the end of the day, I've noticed that it's only the stackers who complained about being "forced" onto a team. There are occasionally good pilots who don't feel like dealing, but my memory of Weed joining wasn't "Weed looks for the stacked team" but rather "Weed looks at the team that's stacked against and says, "$#@! it, I guess I'm going ham and winning this game for those noobs."" Same with Dome.

Hell, that's how I got to be reasonable at this game: not giving a $#@! who I fly for/with and trying to find a way to win regardless. Sure, sometimes I don't feel like putting in the effort to drag a bunch of voobs kicking and screaming to victory, but many of the games I play I don't enter them with a mindset of "gee this game is stacked guess there's nothing I can do" and more "gee this game is stacked I wonder if I can turn the tide."

In fact, one of my favorite ways to play this game is the BoxSet style: I love sitting on mumble with SumV (and pretty much just SumV, though when Zruty was around he was fun to hang with and he was always welcome to join us). We'll pick a team, often the one with the weakest commander, and just do stupid @#(! like camp the enemy garrison in our basic scouts until the enemy resigns. Sure, I'd miss that if we went to straight autobalance games, but there wasn't anything stopping anyone from pulling stupid @#(! just like what BoxSet does weekly except their own stubborn refusal to just suck it up and get to work. Pick 10 random players and distribute them randomly across the teams and I guarantee there'll be at least two people on each team who are good enough to BoxSet, nevermind using an autobalance system that specifically recognizes when players are doing that and rewards them by increasing their rank.

Or we can have a system that considers Raum and TenForward the best players around.

Or we can do what Wasp suggests and just have everyone sort of "know" who the best players are and that'll prevent stacking because that's exactly how that panned out back in the early 2000s.

(BTW SumV might actually get my vote for "best active player" because no matter what the gamestate is I always know where he's going to be: where he's needed.)

Wasp · Post by **Wasp** » Tue Mar 20, 2018 9:02 pm

cashto wrote:QUOTE (cashto @ Mar 20 2018, 04:38 PM) Again, this is not a "fact", because there is no data behind it.

It is a fact because we are capable of balancing games regardless of the current numbers the ranking system provides. "Balanced" being defined here as a game that lasts upwards of a hour or more.

Mastametz · Post by **Mastametz** » Tue Mar 20, 2018 9:10 pm

Lack of a real ranking system makes it virtually impossible for newer/less experienced commanders to balance teams based on the required YEARS of in-depth knowledge on every individual player and their capabilities.
and difficult for everyone beyond that. Especially with consideration to the existence of hiders/alternate callsigns.

It's pretty ridiculous that COMMANDER feedback by COMMANDERS that PLAY AND COMMAND THE GAME is forgone for dev theorycrafting.

Just do what the $#@! the commanders say.

Wasp · Post by **Wasp** » Tue Mar 20, 2018 9:17 pm

Sheriff Metz wrote:QUOTE (Sheriff Metz @ Mar 20 2018, 05:10 PM) Lack of a real ranking system makes it virtually impossible for newer/less experienced commanders to balance teams based on the required YEARS of in-depth knowledge on every individual player and their capabilities.
and difficult for everyone beyond that. Especially with consideration to the existence of hiders/alternate callsigns...

Which is why I think that personal information needs to be pinned to the player so that when they're hiding, you can see what they're capable of. Three categories with individual ranks.

Mastametz · Post by **Mastametz** » Tue Mar 20, 2018 9:18 pm

Wasp wrote:QUOTE (Wasp @ Mar 20 2018, 02:17 PM) Which is why I think that personal information needs to be pinned to the player so that when they're hiding, you can see what they're capable of. Three categories with individual ranks.

That vastly overcomplicates this process.

Wasp · Post by **Wasp** » Tue Mar 20, 2018 9:22 pm

Sheriff Metz wrote:QUOTE (Sheriff Metz @ Mar 20 2018, 05:18 PM) That vastly overcomplicates this process.

How so? We can easily rank command based on "allegskill" since it's 99% influenced by commanders. We can easily rank on "whore" capability based upon whore stats and we can easily rank upon "Utility" based upon the things utility players do (nan, probe, prox, scout..)

cashto · Post by **cashto** » Tue Mar 20, 2018 9:24 pm

Wasp wrote:QUOTE (Wasp @ Mar 20 2018, 01:50 PM) You point at the source for allegskill (Trueskill) and then ignore the fact that you are not adhering to the rules dictated by that source! Then you go on to proclaim how applicable trueskill is to a game like allegiance, and then base that proclamation on someone else's credentials? It's either applicable or not regardless!,.. and that is solely based upon the underlying rules of the bayesian principle...that the players being ranked MUST be the SOLE responsible party for the OUTCOME of the game.

Again, this is not a rule of TrueSkill. You're making it up.

Random chance is a part of many games where TrueSkill is used. In fact, you could say random chance is an element of any game, even chess -- otherwise weaker players would always lose to stronger players, but they don't.

Sure, P1's point is a naked argument towards authority, and honestly, I never regarded Baker as the brightest bulb in the box. But he at least got the principle right: the accuracy of a ranking system is something that can be measured. So let's measure it.

phoenix1 wrote:QUOTE (phoenix1 @ Mar 20 2018, 02:00 PM) I wish there had been more traction to utilize autobalance in games. If we'd stuck with autobalance I think in the long term we woudl have had less stacking issues.

But there's a problem with this plan, namely that as a community we've developed a culture and an idea of what it means to join games of Allegiance. People don't seem to consider that there is literally no major competitive game on earth in which the primary game mode allows you to choose which "team" to be on outside of squad play. Sure, you can duo with your buddy in League of Legends, but the primary game mode for the past, oh something like fifteen years, has been "let our automated computer system pick your opponent."

I think the cure is worse than the disease here. Even in Halo you can do an XBox party with your friends and roflstomp whoever you want. Playing with friends (and against enemies) is part of the fun of the game.

Wasp wrote:QUOTE (Wasp @ Mar 20 2018, 02:02 PM) It is a fact because we are capable of balancing games regardless of the current numbers the ranking system provides. "Balanced" being defined here as a game that lasts upwards of a hour or more.

This is a terrible metric. Hour long games are more reflective of how the cores are designed, what factions are being played, and whether either team has a will to win or whether they just enjoy whoring. Arguably, hour long games are the cancer that kills allegiance, as everyone winds up exhausted at the end and not willing to play another game, or unwilling to join a game in progress they won't see to the end. Basically, any game that ends to a bomb rush is by definition not a balanced game?