AllegSkill: Difference between revisions

From FreeAllegiance Wiki
Jump to navigationJump to search
(→‎Technical details: Temp save. Need to cut/paste from elsewhere in the doc)
m (→‎How it works: both are modified by sigma)
 
(20 intermediate revisions by 6 users not shown)
Line 1: Line 1:
{{Stub}}
{{AllegSkill}}


AllegSkill is a system for rating the skill of Allegiance players based on their overall performance in-game.  AllegSkill is based on the [http://research.microsoft.com/osa/apg/trueskill.aspx Trueskill] system developed by [http://research.microsoft.com/default.aspx Microsoft Research] (who also developed Allegiance) with some notable additions.  The term 'AllegSkill' is intended to refer to the entire system, which includes additional statistics, and Microsoft Research should not be held responsible for differences when and where the occur.
AllegSkill is a system for rating the skill of Allegiance players based on their overall performance in-game.  AllegSkill is based on the [http://research.microsoft.com/en-us/projects/trueskill/default.aspx Trueskill] system developed by [http://research.microsoft.com/default.aspx Microsoft Research] (who also developed Allegiance) with some notable additions.  The term 'AllegSkill' is intended to refer to the entire system, which includes additional statistics, and Microsoft Research should not be held responsible for differences when and where the occur.  This wiki will deal principally with the technical aspects of the system.  For a layman's explanation of the basics, please see the [http://research.microsoft.com/en-us/projects/trueskill/details.aspx Trueskill] site.


<!---- [[Image:MuSigmaDifferenceGraph.png]] ---->
<!---- [[Image:MuSigmaDifferenceGraph.png]] ---->
==How it works==
==How it works==
You have two numbers keeping track of your rank: [[Mu]], '''&mu;''', and [[Sigma]], '''&sigma;'''. &mu; is your rating, and &sigma; is the <i>uncertainty</i> about your rating. After you play a game your &mu; goes up if you win, down if you lose and your &sigma; goes down.
You have two numbers keeping track of your rank: [[Mu]], '''&mu;''', and [[Sigma]], '''&sigma;'''. &mu; is your rating (your average skill you've exhibited in all the games you play), and &sigma; is the <i>uncertainty</i> about your rating. After you play a game your &mu; goes up if you win, down if you lose. Your &sigma; always goes down.


The amount they go up/down is modified by &sigma; - the more certain the system is about your rank, the less each game will affect it - and by how 'surprising' the game outcome was - a newbie beating a veteran is quite surprising and will have a greater effect on the ranks than if the vet beat the newb.
The amount your &mu; and &sigma; go up/down is modified by &sigma; (the more certain the system is about your rank, the less each game will affect it) and by how 'surprising' the game outcome was (a newbie beating a veteran is quite surprising and will have a greater effect on the ranks than if the vet beat the newb).


AllegSkill realises that whether a team wins or loses is highly dependant on the skill of both the commanders and their team, and the algorithms used represent this. Consequently there are separate skill ratings for commanders and pilots.
AllegSkill realises that whether a team wins or loses is highly dependant on the skill of both the commanders and their team, and the algorithms used represent this. Consequently there are separate skill ratings for commanders and pilots.


==Technical details==


===Mu & Sigma===
===Mu & Sigma===


Collectively, mu and sigma define a [http://en.wikipedia.org/wiki/Normal_distribution normal distribution] for the skill of a player, where mu is the average skill of the player over all games played, and sigma is the uncertainty around mu.
As mentioned previously, mu and sigma represent the average skill of a player and the uncertainty around that skill respectively.  Uncertainty is a fundamental part of the AllegSkill system, and deserve greater explanation. We have chosen to plot a graph of three players with different skill ratings:


[[Image:MuSigmaDifferenceGraph.png|right]]
#A <font color="#ff0000">newbie player</font>.
#*Average rating (25) and high uncertainty (8.333).
#A <font color="#0000FF">skilled player that has played only a little</font>
#*Higher rating (36) and moderate uncertainty (4)
#An <font color="#00AA00">average player that has played lots</font>
#*Average rating (25) and low uncertainty (2)


What follows is the simplest incarnation of the Trueskill update algorithm, as used for commander ratings.  We've provided as much information as is sensible, and we only assume that the reader is familiar with (or able to look up) the [http://en.wikipedia.org/wiki/Error_function error function] (<math>\text{erf}</math>).
The horizontal axis represents their rating, and the vertical axis is the probability. A simple way of interpreting these graphs is to think that the players "true rank" lies <i>somewhere</i> between the two end points of the curve and that the higher the curve is, the more likely the true rating is at that point.  


===The update formulae===
So, to interpret our three players curves:
After the game is played the commanders' ranks are updated as follows:
{| class="wikitable" border="1" cellspacing="0"  cellpadding="5" align="center"
!
! style="text-align:center"| Mu, &mu;
! style="text-align:center"| Sigma, &sigma;
|-
! Winner
| align="center" | <math>\mu'_w=\mu_w+\frac{\sigma_w^{2}}{c}\cdot V_\text{win}\left( \frac{\mu_w-\mu_l}{c},\frac{\varepsilon }{c} \right)</math>
| align="center" | <math>\sigma'_w=\sqrt{\sigma_w^{2}\left( 1-\frac{\sigma_w^{2}}{c^{2}}\cdot W_\text{win}\left( \frac{\mu_w-\mu_l}{c},\frac{\varepsilon }{c} \right) \right)+\gamma ^{2}}</math>
|-
! Loser
| align="center" | <math>\mu'_l=\mu_l-\frac{\sigma_l^{2}}{c}\cdot V_\text{win}\left( \frac{\mu_w-\mu_l}{c},\frac{\varepsilon }{c} \right)</math>
| align="center" | <math>\sigma'_l=\sqrt{\sigma_l^{2}\left( 1-\frac{\sigma_l^{2}}{c^{2}}\cdot W_\text{win}\left( \frac{\mu_w-\mu_l}{c},\frac{\varepsilon }{c} \right) \right)+\gamma ^{2}}</math>
|-
! Draws
| Placeholder
| Placeholder
|}


Where '''&beta;''', '''&gamma;''' and <big>'''&epsilon;'''</big> are constants:
#The red line touches the mu-axis at 0 and 50.  This means that the AllegSkill system believes that the player could have any skill between these two values.  
:<math>\beta = \frac{25}{6} </math> is the standard variance around performance
#*All new players start out with this skill rating.
#The blue line touches 22 and 50, so this player is either just below average, or the best expert in the game.
#The green curve hits the mu-axis between 18 and 32, but is quite likely to be 25.


:<math>\gamma = \frac{25}{300}</math> is the dynamics variable, which prevents sigma from ever reaching zero
===Stable uncertainty===
Note that players stabilise with a &sigma; of approximately 1. This would result in a skill distribution even 'tighter' than that represented by the <font color="#00AA00">green</font> player.


:<math>\varepsilon \simeq 0.08</math> is derived empirically from the percentage of games which result in a draw, currently ~1.01%.
By itself, &sigma; would eventually decrease to zero. However '''&gamma;''' is the dynamics variable which prevents sigma from ever reaching zero, which in turn determines how quickly mu can in/decrease once sigma has stabilised. If we discover that sigma-stabilised ratings are moving too slowly to reflect genuine changes in skill, we will increase gamma. It is because of gammas that some player's sigma are less than 1.


===Your rank===


'''c''' is a variable that expresses the general uncertainty of the system:
The rank that is displayed in-game is known as your '''Conservative rank'''. Basically it is where your curve touches the axis on the left - the system is 99% sure that your "true rank" isn't any <i>less</i> than your conservative rank.


:<math>c=\sqrt{2\beta ^{2}+\sigma_w^{2}+\sigma_l^{2}}</math>  
The formula is
<div align="center"><math>\text{Conservative rank} = \mu - 3 \times \sigma</math></div>


This means that newbies, who start with (&mu;, &sigma;) = (25, 8.33) will have a conservative rank of zero. As they play more games the uncertainty about their rank goes down, and so their conservative rank more closely resembles their rating - which will hopefully be close to resembling their actual skill by the time they've played that many games!


and '''V<sub>win</sub>''' and '''W<sub>win</sub>''' are TrueSkill functions based on
====Leaderboard/Ingame ranks====
#The [http://en.wikipedia.org/wiki/Normal_distribution normal distribution] (with a mean of zero and variance of one), and more precisely its [http://en.wikipedia.org/wiki/Probability_density_function probability density function].
The ranks shown on leaderboard/in game are multiplied by 0.6, so we have ranks up to 30, instead of ranks up to 50. This is a temporary measure to ease the process of migrating from Helo's 0-30 system to AllegSkill's 0-50 system.  Once the various non-ASGS components of Allegiance (game servers etc) have been updated to fully support AllegSkill (with additional features such as a new autobalance system), we will switch over to 0-50 ranks across the board. It is also worth noting that currently the 'rank' column on the leaderboard is also multiplied by 0.6, so should accurately reflect the rank displayed in-game.
#::<math>\text{PDF}(x):=\tfrac{1}{\sqrt{2\pi}} \text{e}^{-\frac{x^2}{2}}</math>
#The [http://en.wikipedia.org/wiki/Cumulative_distribution_function cumulative distribution function] of the normal distribution:
#::<math>\text{CDF}(y):=\tfrac{1}{2}\left(1 + \text{erf}\left( \tfrac{1}{\sqrt{2}} y\right)\right)</math>.


No idea what they are? Don't worry, they are just scary maths that stats dudes use to try and represent a large group of unknowns based on a small number of samples.  In our case the unknows represent every tiny detail of your in-game actions, and the samples are the game outcomes.
The formula is
 
<div align="center"><math>\text{Leaderboard rank} = ( \mu - 3 \times \sigma ) \times 0.6</math></div>
=== TrueSkill functions ===
Developed by Microsoft, the Trueskill update functions are:
 
:<math>V_\text{win}(t,\varepsilon) := \frac{ \text{PDF}(t-\varepsilon)}{\text{CDF}(t-\varepsilon) }</math>
 
 
:<math>W_\text{win}(t,\varepsilon) := V_\text{win}(t,\varepsilon) \cdot \left( V_\text{win}(t,\varepsilon )+t-\varepsilon  \right)</math>
 
 
There are also two special versions of <math>V</math> and <math>W</math> when draws take place.
 
:<math>V_\text{draw}(t,\varepsilon ):=\frac{\text{PDF}(-\varepsilon -t)-\text{PDF}(\varepsilon -t)}{\text{CDF}(\varepsilon -t)-\text{CDF}(-\varepsilon -t)}</math>
 
 
:<math>W_\text{draw}(t,\varepsilon ):=V_\text{draw}^{2}(t,\varepsilon )\cdot \frac{(\varepsilon -t)\cdot \text{PDF}(\varepsilon -t)+(\varepsilon +t)\text{PDF}(\varepsilon +t)}{\text{CDF}(\varepsilon -t)-\text{CDF}(-\varepsilon -t)}</math>
 
 
So, what are '''t''' and '''&epsilon;''' used in these formula? Well, if you know your maths, you realise they can be anything that you choose to 'pass into' the function. In our case we are letting
 
:::<math>t = \frac{\mu_w-\mu_l}{c}</math> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; and &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <math>\varepsilon = \frac{\varepsilon }{c}</math>
 
 
The <math>V</math> and <math>W</math> functions are the core of the Trueskill system, and vary depending on whether the game resulted in a win or a draw.  In both instances, positive values for <math>t</math> represent an unsurprising outcome:  the winner was more skilled than the loser.  Positive values result in the functions returning small values, which in turn result in small <math>\mu</math> and <math>\sigma</math> updates. The converse is also true:  Negative values for <math>t</math> represent a surprising outcome, and result in large updates.
 
 
=== AllegSkill example ===
This scenario pits a newbie commander (<math>\,\!\mu_A = 25;  \sigma_A = 8.333...</math>, i.e. normal rank, high uncertainty) against a slightly more experienced commander (<math>\,\!\mu_B = 32;  \sigma_B = 5</math>, i.e. high rank, medium uncertainty).
 
====Number crunching====
Let's now get our favourite computer assisted algebra system and do the calculations. Let's assume the experienced commander, B, won.
 
 
:<math>c = \tfrac{5}{6}\sqrt{186} \simeq 11.4</math>
 
 
:<math>\mu '_{B}=32+\tfrac{0.09195636321\sqrt{186}\sqrt{2}}{\sqrt{\pi }} = \cdots \simeq 33.0</math> <!--- 33.00064106 --->
 
 
:<math>\sigma '_{B}=\sqrt{\tfrac{3601}{144}-\tfrac{2.758690898\sqrt{2}\left( \tfrac{0.5701294519\sqrt{2}}{\sqrt{\pi }}+0.04463650105\sqrt{186} \right)}{\sqrt{\pi }}} = \cdots \simeq 4.8</math> <!--- 4.760851650 --->
 
 
:<math>\mu '_{A}=25-\tfrac{0.2554343423\sqrt{186}\sqrt{2}}{\sqrt{\pi }} = \cdots \simeq 22.2</math> <!--- 22.22044149 --->
 
 
:<math>\sigma '_{A}=\sqrt{\tfrac{10001}{144}-\tfrac{21.28619519\sqrt{2}\left( \tfrac{0.5701294519\sqrt{2}}{\sqrt{\pi }}+0.04463650105\sqrt{186} \right)}{\sqrt{\pi }}} = \cdots \simeq 7.2</math> <!--- 7.168423552 --->
 
 
Now let's run the same scenario in reverse, with commander A winning.
 
 
:<math>c = \tfrac{5}{6}\sqrt{186} \simeq 11.4</math>
 
 
:<math>\mu '_{A}=25+\tfrac{0.6919672626\sqrt{186}\sqrt{2}}{\sqrt{\pi }} = \cdots \simeq 32.5</math> <!--- 32.52977643 --->
 
 
:<math>\sigma '_{A}=\sqrt{\tfrac{10001}{144}-\tfrac{57.66393855\sqrt{2}\left( \tfrac{1.544470930\sqrt{2}}{\sqrt{\pi }}+0.04568607959\sqrt{186} \right)}{\sqrt{\pi }}} = \cdots \simeq 6.4</math> <!--- 6.435916375 --->
 
 
:<math>\mu '_{B}=32-\tfrac{0.2491082145\sqrt{186}\sqrt{2}}{\sqrt{\pi }} = \cdots \simeq 29.3</math> <!--- 29.28928048 --->
 
 
:<math>\sigma '_{B}=\sqrt{\tfrac{3601}{144}-\tfrac{7.473246435\sqrt{2}\left( \tfrac{1.544470930\sqrt{2}}{\sqrt{\pi }}+0.04568607959\sqrt{186} \right)}{\sqrt{\pi }}} = \cdots \simeq 4.6</math> <!--- 4.623224911 --->
 
==== Results ====
 
Let's compare the possible scenarios. Data is shown in the (mu, sigma) form.
 
{| class="wikitable" border="1" cellspacing="0"  cellpadding="5" align="center"
|
|align="center"| <b>Commander A</b><br> (Newbie)
|align="center"| <b>Commander B</b><br> (Vet)
|-
! Before the game
|align="center"| (25.0 , 8.33)<br>[[Conservative Rank|Rank]]: '''(0)'''
|align="center"| (32.0 , 5.0)<br>[[Conservative Rank|Rank]]: '''(17)'''
|-
! Commander A wins
|align="center"| (32.5 , 6.4)<br>[[Conservative Rank|Rank]]: '''(13)'''
|align="center"| (29.3 , 4.6)<br>[[Conservative Rank|Rank]]: '''(16)'''
|-
! Commander B wins
|align="center"| (22.2 , 7.2)<br>[[Conservative Rank|Rank]]: '''(1)'''
|align="center"| (33.0 , 4.8)<br>[[Conservative Rank|Rank]]: '''(19)'''
|}
 
<!--- Conservative ranks are 1) wildly unaccurate, 2) rounded to the nearest integer. --->
<!--- What do you mean 'wildly inaccurate'?  Please qualify the statement.  They're only as accurate as the player's sigma will allow.  If we used the optimistic rank, everyone would start with rank 50 and work down.  Am I the only person that sees a problem with that? The fact that the newbs con. rank jumps by thirteen places when he beats the VOOB (not vet. a vet would be (35, 1)) is, in fact, a desirable feature of the system.  Skill ratings are supposed to converge as quickly as possible. :) Baker ---->
 
As you can see, when the uncertainty is high ranks can change ''quickly''. But if you look at the players' ratings (the first number in brackets) you will see the variability is much less pronounced.
 
 
When the outcome closely matches expectations (Vet beats newb) we can observe how little changes occur:
*Commander A loses little rating, &mu;, but the confidence of the system on his rating has increased (&sigma; has decreased) (After playing a game AllegSkill now knows more about this player).
**Given the way [[Conservative Rank|the conservative rank]] is calculated, this ultimately results in a higher rank. This effectively replaces ELO's and HELO's newbie modifiers (the modifiers that allowed newbies to gain ranks faster than they lost them).
*Commander B receives a boost from his victory as but since his &sigma; is lower, the change to his rating, &mu;, is not as big.
**This is controlled by the <math>\sigma^2/c</math> factor in the updating formulas.
 
 
When AllegSkill receives a surprising outcome there are much bigger variations:
*Commander A, the (0) comm, gains a whopping 13 ranks (6 of which come from the drop in sigma).
** This is because AllegSkill received very <i>significant</i> information about commander A.
*Commander B's loses a bit of rating, &mu;, but that is somewhat limited by his lower &sigma;.
**The &sigma; reduction (gain in certainty about accuracy of commander B's rank) is also smaller in this scenario: a loss is less significant than a win.


==Related Articles==
==Related Articles==

Latest revision as of 01:05, 19 June 2009

AllegSkill
About: AllegSkill · FAQ · Interim FAQ · Gaining ranks · Whore rating · more...
Technical Details: Commander's ranking · Player's ranking · Stack rating · AllegBalance


AllegSkill is a system for rating the skill of Allegiance players based on their overall performance in-game. AllegSkill is based on the Trueskill system developed by Microsoft Research (who also developed Allegiance) with some notable additions. The term 'AllegSkill' is intended to refer to the entire system, which includes additional statistics, and Microsoft Research should not be held responsible for differences when and where the occur. This wiki will deal principally with the technical aspects of the system. For a layman's explanation of the basics, please see the Trueskill site.

How it works

You have two numbers keeping track of your rank: Mu, μ, and Sigma, σ. μ is your rating (your average skill you've exhibited in all the games you play), and σ is the uncertainty about your rating. After you play a game your μ goes up if you win, down if you lose. Your σ always goes down.

The amount your μ and σ go up/down is modified by σ (the more certain the system is about your rank, the less each game will affect it) and by how 'surprising' the game outcome was (a newbie beating a veteran is quite surprising and will have a greater effect on the ranks than if the vet beat the newb).

AllegSkill realises that whether a team wins or loses is highly dependant on the skill of both the commanders and their team, and the algorithms used represent this. Consequently there are separate skill ratings for commanders and pilots.


Mu & Sigma

As mentioned previously, mu and sigma represent the average skill of a player and the uncertainty around that skill respectively. Uncertainty is a fundamental part of the AllegSkill system, and deserve greater explanation. We have chosen to plot a graph of three players with different skill ratings:

MuSigmaDifferenceGraph.png
  1. A newbie player.
    • Average rating (25) and high uncertainty (8.333).
  2. A skilled player that has played only a little
    • Higher rating (36) and moderate uncertainty (4)
  3. An average player that has played lots
    • Average rating (25) and low uncertainty (2)

The horizontal axis represents their rating, and the vertical axis is the probability. A simple way of interpreting these graphs is to think that the players "true rank" lies somewhere between the two end points of the curve and that the higher the curve is, the more likely the true rating is at that point.

So, to interpret our three players curves:

  1. The red line touches the mu-axis at 0 and 50. This means that the AllegSkill system believes that the player could have any skill between these two values.
    • All new players start out with this skill rating.
  2. The blue line touches 22 and 50, so this player is either just below average, or the best expert in the game.
  3. The green curve hits the mu-axis between 18 and 32, but is quite likely to be 25.

Stable uncertainty

Note that players stabilise with a σ of approximately 1. This would result in a skill distribution even 'tighter' than that represented by the green player.

By itself, σ would eventually decrease to zero. However γ is the dynamics variable which prevents sigma from ever reaching zero, which in turn determines how quickly mu can in/decrease once sigma has stabilised. If we discover that sigma-stabilised ratings are moving too slowly to reflect genuine changes in skill, we will increase gamma. It is because of gammas that some player's sigma are less than 1.

Your rank

The rank that is displayed in-game is known as your Conservative rank. Basically it is where your curve touches the axis on the left - the system is 99% sure that your "true rank" isn't any less than your conservative rank.

The formula is

<math>\text{Conservative rank} = \mu - 3 \times \sigma</math>

This means that newbies, who start with (μ, σ) = (25, 8.33) will have a conservative rank of zero. As they play more games the uncertainty about their rank goes down, and so their conservative rank more closely resembles their rating - which will hopefully be close to resembling their actual skill by the time they've played that many games!

Leaderboard/Ingame ranks

The ranks shown on leaderboard/in game are multiplied by 0.6, so we have ranks up to 30, instead of ranks up to 50. This is a temporary measure to ease the process of migrating from Helo's 0-30 system to AllegSkill's 0-50 system. Once the various non-ASGS components of Allegiance (game servers etc) have been updated to fully support AllegSkill (with additional features such as a new autobalance system), we will switch over to 0-50 ranks across the board. It is also worth noting that currently the 'rank' column on the leaderboard is also multiplied by 0.6, so should accurately reflect the rank displayed in-game.

The formula is

<math>\text{Leaderboard rank} = ( \mu - 3 \times \sigma ) \times 0.6</math>

Related Articles


AllegSkill
About: AllegSkill · FAQ · Interim FAQ · Gaining ranks · Whore rating · more...
Technical Details: Commander's ranking · Player's ranking · Stack rating · AllegBalance