ranking system

Catch-all for all development not having a specific forum.
Lykourgos
Posts: 1001
Joined: Tue Jan 11, 2005 8:00 am
Location: Portland

Post by Lykourgos »

Mainly make sure you talk to Saxy- I'm ok with data and with models but I'm an undergrad and he's an expert. Ada and tmc would also be good people to talk with, especially Ada because he's just the right sort of obsessive compulsive for this sort of thing.

Tell you what though- if someone happens to PM me an algorithm I have a fun test in mind for it. I'll make an array of "player" objects each with int Trueskill and int Rank, then put it through a loop each step of which represents a game and outputs a slightly randomized result based on Trueskill, balance the game teams according to Trueskill and repeat a few thousand times, calculate magnitude of residual abs(Trueskill-Rank).
Bard
Posts: 4263
Joined: Tue Jan 24, 2006 8:00 am
Location: Within your command center, enacting fatal attacks upon your conscripts
Contact:

Post by Bard »

quackdamnyou wrote:QUOTE (quackdamnyou @ Dec 12 2007, 12:17 AM) Well I only meant to say, please don't feel that this discussion has to dictate the timing of the project, its release, or implementation.
It won't.

You should have heard the discussions that some of the people involved in this had over TS and IRC when the idea was in it's infancy about 18 months ago. This thread is nothing compared to the little I heard, and I was rarely present.

It's worked out because they've mostly kept a level head and kept re-evaluating what they're doing based on the statistics while enlisting some outside opinions for perspective. I'd say that keeping your eyes focused on your work while you have a task at hand and looking for perspective once you've finished a task so you can revise if necessary is a proven model.

As TB said, this has never really been a secret, but I think the approach was right. Dropping a mostly functional product into the community's lap isn't generally a good idea, in spite of the fact that it's necessary for things like Alleg R* beta testing, it's completely detrimental to generating statistical data because gamers in general search out loopholes.

Relax guys. This isn't going to be a donkey punch.
ImageImageImageImageImage
Image Omnia Mutantur, Nihil Interit.
Papsmear
Posts: 4810
Joined: Sun Jul 06, 2003 7:00 am
Location: Toronto, Canada

Post by Papsmear »

When the new ranking system is implemented will the ranks be based on in game points. If this is to be the case, the point values in the game should be changed as well.
Image
Image
sgt_baker
Posts: 1510
Joined: Wed Oct 20, 2004 7:00 am
Location: London, UK.
Contact:

Post by sgt_baker »

Hi folks. Nice to see the plethora of questions and feedback that has accumulated over the last 24 hours. I'm able to dedicate the next couple of days to this thread and the stats issue in general.

I've mulled over how best to present the maths, systems and theories behind ranking and specific systems in laymans terms, and have decided to start at the beginning. I will also attempt to anser specific questions as I post. Next post: Why is Helo broken and can it be repaired? (sorry MrC /wink.gif" style="vertical-align:middle" emoid=";)" border="0" alt="wink.gif" /> )

B
Image
Granary Sergeant Baker - Special Bread Service (Wurf - 13th Oct 2011)
MrChaos
Posts: 8352
Joined: Tue Mar 21, 2006 8:00 am

Post by MrChaos »

Fragtzack wrote:QUOTE (Fragtzack @ Dec 12 2007, 12:26 AM) Your right this thread is not about about helo. My bad for posting. Suggest you stop sharing development ideas with the community. Every talk about ranks is going to branch out to talk about the current system until something is done, imo.
A better reply Frag is what to do in the short term about HELO and the history behind doesn't belong here. Does it work. The answer is not as intended.
Lykourgos wrote:QUOTE (Lykourgos @ Dec 12 2007, 12:39 AM) Mainly make sure you talk to Saxy- I'm ok with data and with models but I'm an undergrad and he's an expert. Ada and tmc would also be good people to talk with, especially Ada because he's just the right sort of obsessive compulsive for this sort of thing.

Tell you what though- if someone happens to PM me an algorithm I have a fun test in mind for it. I'll make an array of "player" objects each with int Trueskill and int Rank, then put it through a loop each step of which represents a game and outputs a slightly randomized result based on Trueskill, balance the game teams according to Trueskill and repeat a few thousand times, calculate magnitude of residual abs(Trueskill-Rank).
Ive talked with Saxy and tried to enlist his help early on but IIRC RL kept him away from this endeavour. We've taken the step of actually running tests on our results and seeing if the ranks could predict the game outcome. I encourage you to wait a bit to see the results, and underpinning but I see no issues with providing you the alogritm... assuming it hasn't broken into bits yet again for some new whistle or wierd wtf Visual Studio bug /wink.gif" style="vertical-align:middle" emoid=";)" border="0" alt="wink.gif" />. I'm being hesitant to explain details to allow the Mad Professor Baker to get the bits right. You're going to get detailed answers and not to spoil Baker's suprise most likely graphs too.
quackdamnyou wrote:QUOTE (quackdamnyou @ Dec 12 2007, 01:17 AM) Well I only meant to say, please don't feel that this discussion has to dictate the timing of the project, its release, or implementation.
Yeah QDY. We aren't the ones who will make the big decisions or even how they get implemented. Ive got a life as do the others and endless debate isn't going to happen... at least for me. A healthy discussion sure but a chance to vent your spleen endlessly I'll take a seat

Papsmear wrote:QUOTE (Papsmear @ Dec 12 2007, 05:50 AM) When the new ranking system is implemented will the ranks be based on in game points. If this is to be the case, the point values in the game should be changed as well.
Nope it will not be based on game points. People like points, and stats, and shiny jpeg medals me included and I'd be happy to see a catagory that said HTTs/Bombers/Caps spotted by probes but this endeavour did nothing to address the points or keeping of ingame statistics
sgt_baker wrote:QUOTE (sgt_baker @ Dec 12 2007, 07:10 AM) Hi folks. Nice to see the plethora of questions and feedback that has accumulated over the last 24 hours. I'm able to dedicate the next couple of days to this thread and the stats issue in general.

I've mulled over how best to present the maths, systems and theories behind ranking and specific systems in laymans terms, and have decided to start at the beginning. I will also attempt to anser specific questions as I post. Next post: Why is Helo broken and can it be repaired? (sorry MrC /wink.gif" style="vertical-align:middle" emoid=";)" border="0" alt="wink.gif" /> )

B
It's the wtf do we do with HELO storm that was starting here and honestly isn't something we should be addressing here. Say SOMETHING /laugh.gif" style="vertical-align:middle" emoid=":lol:" border="0" alt="laugh.gif" />
Raveen wrote:QUOTE (Raveen @ Dec 12 2007, 07:29 AM) Practising lecturing eh Baker /mrgreen.gif" style="vertical-align:middle" emoid=":D" border="0" alt="mrgreen.gif" />
He loves wearing the tweed coat with the elbow patchs, the pipe clenched between his teeth, and using the nasal upper crust voice.... and mocking my midwestern accent /wink.gif" style="vertical-align:middle" emoid=";)" border="0" alt="wink.gif" />
Ssssh
BlackViper
Posts: 6993
Joined: Thu Aug 07, 2003 7:00 am
Location: Green Bay, WI

Post by BlackViper »

I am going to move this to the misc dev forum. I am going to prune anything that does not directly relate to the post. This is NOT a dictatorship type move, just cleaning up a topic that has gotten off track. /smile.gif" style="vertical-align:middle" emoid=":)" border="0" alt="smile.gif" />
Always in the Shadows...
sgt_baker
Posts: 1510
Joined: Wed Oct 20, 2004 7:00 am
Location: London, UK.
Contact:

Post by sgt_baker »

Why is Helo broken and can it be repaired?

At 7:27am on September 22nd 2005, Pook began collecting game statistics using ASGS and TAG. The goal of the exercise was to use a derivative of the Elo system to generate ranks for players then, eventually, use said ranks as the input data for an automated balancing system. The reasoning behind this effort is relatively easy to digest: Despite every body's efforts to the contrary, the community as a whole has a tendency to organise games which are stacked to a lesser or greater extent. This is not to suggest that stacks were or are always contrived and deliberate, since numerous factors play upon the development of a stack, many of which are a result of natural trends in human behaviour combined with the nuances and complexities inherent in Allegiance's (quite wonderful) structure of game play.

At this juncture it is worth noting that Pook and the admin team should not be held responsible for any perceived failure of our ranking systems so far. The subject of ranking is vast and incredibly complex, especially when attempting to rank individual players in a team-based game such as Allegiance. Pook has freely admitted that he has neither the time nor the expertiese to embark upon a fully fledged investigation into this field, and as such was forced, in good faith, to utilise the best tools available to him in order to implement a ranking system. Any failure of ranking to date is entirely a result of the improper implementation and poor design of Elo/Helo.

So, what exactly is the problem with Helo?

Allegiace Helo is a system on top of a system on top of a system, with Prof. Arpad Elo's (they always seem to have funky names. In fact I'm contemplating renaming AllegSkill... "Bob" is my anti-weird-name-o-mat retort /wink.gif" style="vertical-align:middle" emoid=";)" border="0" alt="wink.gif" /> ) rating system sitting at the bottom of the pile. Elo was developed to statistically measure the relative skill of players of 1 vs 1 competitions, which is something it does rather well. Since it's inception it has been adopted as the principal ranking system by the United States Chess Federation and a number of other sporting bodies. It is important to note, however, that Elo was only ever intended to measure the performance and skill of individuals in competition with other individuals.

Given the obvious popularity of the Elo system it has become the staring point for numerous other ranking systems, the most notable in this context being the ranking system developed for HALO. I'm not entirely certain of the development history behind HALO-Elo, but it is an attempt to address team-based play. This is where things begin to come unstuck. Team play presents numerous problems for the developer of any ranking system, since for any given player one is now attempting to measure the interactions between multiple parties, both those on the opposing team and those on the friendly team. From a mathematical point of view the complexity of this problem increases exponentially as team sizes increase. The number of potential interaction for a 10 vs 10 game is truly astonishing. HALO-Elo's approach to solving this problem is essentially "Just average everything and it'll be fine". This is quite literally the blunt implement approach, and completely ignores practically all the pertinent issues when considering team play.

Allegiance Helo is a derivative of HALO-Elo which attemps to address issues specific to Alleg, such as newbie status and people dropping from games. Again, from a maths point of view, the proverbial instruments used to address these issues generate as many problems as they solve.

(From this point forth we'll assume that Elo is a sound ranking system when used in it's intended context.)

Before getting stuck into the specifics of the problems that arise in the HAlo and Helo approach, we need to understand how Elo goes about measuring a player's skill level. Elo is a statistical system that attempts to estimate a player's skill level based on whether a player won or lost a match, and the assumed skill level of the opposing player. It does so my making the assumption that for any given match, the outcome of the match contains a certain amount of information regarding the players' true skill level, and that this information is useful despite the fact that we've not observed the players actually playing the game. This, just to be clear, is an aspect of information theory. The 'total amount' of information contained in a win/loss outcome is relatively small, so systems such as Elo use a statistical approach to gather these small pieces of information from each match and collate them into something useful from the perspective of a human being, which in our case is a player's rating. It is worth noting that, using some non-trivial maths, it is possible to calculate how many game outcomes one must observe for any given ranking system before one can assert the accuracy of a player's rank. We now have our first important point:
Game outcomes contain real and useful information regarding a player's skill.
Now that we've established some of the background I bet you're all wanting to hear why H(a)elo performs poorly in Allegiance. As I've already mentioned, the system(s) is based entirely on abstract information contained in a win/loss game outcome. This information is tenuous at best, and it is reasonable to assume that any derivative of Elo should strive to maintain the quality of the information contained in a win/loss. This is exactly where H(a)elo comes unstuck. For the purposes of illustrating this we'll introduce two new terms: "Information leak" and "Information creep". (Yes, I've just made them up, but they suit our particular train of though.)

Information Leak

Information leak is literally that. A situation where the, already preciously rare, useful information contained in a win/loss is somehow lost, discarded or ignored. Given the tenuous and abstract nature of the information, any loss is incredibly detrimental to the ranking system since it places us in a position where we're required to observe a greater number of game outcomes in order to calculate a rank with any given degree of confidence. There are a number of ways in which useful information leaks from Helo, and we'll cover one of them here: The newbie helper function.

The NHF was implemented in an effort to provide some degree of 'alleg age' functionality to Helo. Players below rank 15 are deducted less points for a loss than they would otherwise have been deducted in a pure HALO-Elo implementation. It can be said that by artificially increasing a players rank outside of the statistical framework of a ranking system (in this case by dumbing down losses), the information contained in that players loss is being discarded, or is leaking from the system.

A good analogy for what's happening here (sorry for dumbing this down) is to consider each player to have two information-buckets. One bucket is a loss bucket, where all the information concerning losses is kept, and the other a win bucket. Over time each bucket fills with information and the ranking system is able to calculate an ever more accurate rank for the player. In the context of our analogy, the NHF punches 15 small holes in the bottom of a newbie's loss bucket. For every rank the newbie gains, one of those holes is plugged. By the time the newbie has become a rank 15 voobie, loads of information has leaked out of the holes in our loss bucket into the aether, never to be seen again.


Information Creep

Information creep is of similar detriment to ranking as information leak, yet is slightly different in how it manages to munge the information. An example of information creep is HALO-Elo's "just average it and everything will be fine" approach to team based games. The effect of the averaging is that information from one player's win/loss outcome manages to creep into the win/loss buckets of the other players on their team. This can quite literally be thought of as less skilled players 'stealing' a little bit of an expert player's rank every time their team wins a game. It is quite easy to envisage how this might mess around with the accuracy of any given ranking system.


Can Helo be salvaged?

Yay! A simple answer: No. To salvage Helo and turn it into a sound ranking system would mean scrapping the entire system and starting from scratch with Elo as our base. Fortunately for us, we needn't go to all that effort. Microsoft Trueskill is, in fact (and quite ironically), a third-generation derivative of Elo, but differs from Helo in that it has been developed by experts in the fields of statistics and ranking systems. AllegSkill is Microsoft Trueskill, differing only in name.

We'll address how AS works, and the broader issue of ranking and stats withing the Alleg community in another post.

B
Image
Granary Sergeant Baker - Special Bread Service (Wurf - 13th Oct 2011)
sgt_baker
Posts: 1510
Joined: Wed Oct 20, 2004 7:00 am
Location: London, UK.
Contact:

Post by sgt_baker »

P.S. This isn't intended to be some sort of lecture. I'm just trying to condense a years worth of research into something relatively easy to digest /smile.gif" style="vertical-align:middle" emoid=":)" border="0" alt="smile.gif" />
Image
Granary Sergeant Baker - Special Bread Service (Wurf - 13th Oct 2011)
Raveen
Posts: 9104
Joined: Wed Mar 16, 2005 8:00 am
Location: Birmingham, UK
Contact:

Post by Raveen »

Baker, would you be adverse to me adding these posts to the wiki as you post them as a permanent reference on AS and HELO?
ImageImage
Spidey: Can't think of a reason I'd need to know anything
jgbaxter
Posts: 2181
Joined: Mon Apr 25, 2005 7:00 am

Post by jgbaxter »

BV, nice pruning, you should be a gardner. /smile.gif" style="vertical-align:middle" emoid=":)" border="0" alt="smile.gif" />

Moving the thread to the sub-forum will vastly lower the impact of peoples attention. /wacko.gif" style="vertical-align:middle" emoid=":wacko:" border="0" alt="wacko.gif" />

Baker, nice post. /cool.gif" style="vertical-align:middle" emoid=":cool:" border="0" alt="cool.gif" />

Sounds like you are saying AllegSkill will use the game data from all previous games, I must certainly be missing something because surely that's not to be the case? /huh.gif" style="vertical-align:middle" emoid=":huh:" border="0" alt="huh.gif" />


EDIT:BV You found it didn't you by the link I left? /tongue.gif" style="vertical-align:middle" emoid=":P" border="0" alt="tongue.gif" />
Last edited by jgbaxter on Wed Dec 12, 2007 3:25 pm, edited 1 time in total.
n.b. I may not see a forum post replied to me or a pm sent to me for weeks and weeks...
Post Reply