Why is HELO broken and can it be repaired?

From FreeAllegiance Wiki
Jump to navigationJump to search
The original post this page is based on can be found here
Author's note: The author prefers to use 'flawed' in place of 'broken'. The choice of title was a direct response to the common in-game cry of "Helo is broken!" --Sgt_Baker 15:08, 15 December 2007 (CST)

At 7:27am on September 22nd 2005, Pook began collecting game statistics using ASGS and TAG. The goal was to use a derivative of the Elo system to generate ranks for players and eventually use these ranks as the input data for an automated balancing system. The reasoning behind this effort is relatively easy to digest: Despite everyone's efforts to the contrary the community has a tendency to organize games which are stacked to a lesser or greater extent. This is not to suggest that stacks are always contrived and deliberate. Numerous factors play upon the development of a stack, many of which are a result of natural trends in human behavior combined with the nuances and complexities inherent in Allegiance's (quite wonderful) structure of game play.

At this juncture it is worth noting that Pook and the admin team should not be held responsible for any perceived failure of our ranking systems so far. The subject of ranking is vast and incredibly complex, especially when attempting to rank individual players in a team-based game such as Allegiance. Pook has freely admitted that he has neither the time nor the expertise to embark upon a fully fledged investigation into this field and as such was forced, in good faith, to utilize the best tools available to him to implement a ranking system. Any failure of ranking to date is entirely a result of the improper implementation and poor design of Elo/Helo.


So, what exactly is the problem with Helo?

Allegiance Helo is a system on top of a system on top of a system, with Professor Arpad Elo's (they always seem to have funky names. In fact I'm contemplating renaming AllegSkill... "Bob" is my anti-weird-name-o-mat retort :D) rating system sitting at the bottom of the pile. Elo was developed to statistically measure the relative skill of players of 1 vs 1 competitions and it does so rather well. Since its inception it has been adopted as the principal ranking system by the United States Chess Federation and a number of other sporting bodies. It is important to note, however, that Elo was only ever intended to measure the performance and skill of individuals in competition with other individuals.

Given the obvious popularity of the Elo system it has become the staring point for numerous other ranking systems, the most notable in this context being the ranking system developed for HALO. I'm not entirely certain of the development history behind HALO-Elo, but it is an attempt to address team-based play. This is where things begin to come unstuck. Team play presents numerous problems for the developer of any ranking system, since for any given player one is now attempting to measure the interactions between multiple parties, both those on the opposing team and those on the friendly team. From a mathematical point of view the complexity of this problem increases exponentially as team sizes increase. The number of potential interaction for a 10 vs 10 game is truly astonishing. HALO-Elo's approach to solving this problem is essentially "Just average everything and it'll be fine". This is quite literally the blunt implement approach, and completely ignores practically all the pertinent issues when considering team play.

Allegiance Helo is a derivative of HALO-Elo which attempts to address issues specific to Alleg, such as newbie status and people dropping from games. Again, from a maths point of view, the proverbial instruments used to address these issues generate as many problems as they solve.


From this point forth we'll assume that Elo is a sound ranking system when used in its intended context.

Before getting stuck into the specifics of the problems that arise in the Halo and Helo approach, we need to understand how Elo goes about measuring a player's skill level. Elo is a statistical system that attempts to estimate a player's skill level based on whether a player won or lost a match, and the assumed skill level of the opposing player. It does so by making the assumption that for any given match, the outcome of the match contains a certain amount of information regarding the players' true skill levels, and that this information is useful despite the fact that we've not observed the players actually playing the game. This, just to be clear, is an aspect of information theory. The 'total amount' of information contained in a win/loss outcome is relatively small, so systems such as Elo use a statistical approach to gather these small pieces of information from each match and collate them into something useful from the perspective of a human being, which in our case is a player's rating. It is worth noting that, using some non-trivial maths, it is possible to calculate how many game outcomes one must observe for any given ranking system before one can assert the accuracy of a player's rank. We now have our first important point:

  • Game outcomes contain real and useful information regarding a player's skill.

Now that we've established some of the background I bet you're all wanting to hear why H(a)elo performs poorly in Allegiance. As I've already mentioned, the system(s) is based entirely on abstract information contained in a win/loss game outcome. This information is tenuous at best, and it is reasonable to assume that any derivative of Elo should strive to maintain the quality of the information contained in a win/loss. This is exactly where H(a)elo comes unstuck. For the purposes of illustrating this we'll introduce two new terms: "Information leak" and "Information creep". (Yes, I've just made them up, but they suit our particular train of thought.)

Information Leak

Information leak is literally that. A situation where the, already preciously rare, useful information contained in a win/loss is somehow lost, discarded or ignored. Given the tenuous and abstract nature of the information, any loss is incredibly detrimental to the ranking system since it places us in a position where we're required to observe a greater number of game outcomes in order to calculate a rank with any given degree of confidence. There are a number of ways in which useful information leaks from Helo, and we'll cover one of them here: The newbie helper function (NHF).

The NHF was implemented in an effort to provide some degree of 'Alleg Age' functionality to Helo. Players below rank 15 are deducted less points for a loss than they would otherwise have been deducted in a pure HALO-Elo implementation. It can be said that by artificially increasing a players rank outside of the statistical framework of a ranking system (in this case by dumbing down losses), the information contained in that players loss is being discarded, or is leaking from the system.

A good analogy for what's happening here (sorry for dumbing this down) is to consider each player to have two information-buckets. One bucket is a loss bucket, where all the information concerning losses is kept, and the other a win bucket. Over time each bucket fills with information and the ranking system is able to calculate an ever more accurate rank for the player. In the context of our analogy, the NHF punches 15 small holes in the bottom of a newbie's loss bucket. For every rank the newbie gains, one of those holes is plugged. By the time the newbie has become rank 15, loads of information has leaked out of the holes in our loss bucket into the aether, never to be seen again.

Information Creep

Information creep is of similar detriment to ranking as information leak, yet is slightly different in how it manages to munge the information. An example of information creep is HALO-Elo's "just average it and everything will be fine" approach to team based games. The effect of the averaging is that information from one player's win/loss outcome manages to creep into the win/loss buckets of the other players on their team. This can quite literally be thought of as less skilled players 'stealing' a little bit of an expert player's rank every time their team wins a game. It is quite easy to envisage how this might mess around with the accuracy of any given ranking system.


Can Helo be salvaged?

Yay! A simple answer: No.

To salvage Helo and turn it into a sound ranking system would mean scrapping the entire system and starting from scratch with Elo as our base. Fortunately for us, we needn't go to all that effort. Microsoft Trueskill is, in fact (and quite ironically), a third-generation derivative of Elo, but differs from Helo in that it has been developed by experts in the fields of statistics and ranking systems. AllegSkill is Microsoft Trueskill, differing only in name.

We'll address how AllegSkill works, and the broader issue of ranking and stats within the Alleg community, in the articles below.

AllegSkill
About: AllegSkill · FAQ · Interim FAQ · Gaining ranks · Whore rating · more...
Technical Details: Commander's ranking · Player's ranking · Stack rating · AllegBalance