Page 3 of 4

Posted: Sat Dec 29, 2007 10:49 pm
by MrChaos
]
Ksero wrote:QUOTE (Ksero @ Dec 29 2007, 05:15 PM) Baker,

First you say that helo ratings can be inflated by stacking. We've covered that already, and MrChaos said there were more reasons for why helo ratings are inaccurate. That's what I wanted him to elaborate on.

Secondly, you ask why helo rewards stacking when it was designed to prevent that behaviour. Helo is not designed to prevent stacking. Helo is just designed for measuring skill. Helo + autobalance can prevent stacking. I don't think AllegSkill can prevent stacking by itself, without using autobalance.

Then you say that when a high-ranking player goes head-to-head with a lower-ranked player and the higher-ranked player wins, his rank is reduced. That sounds preposterous to me. I tried the calculator with the following figures:
CODEplayer . Before . After
| mu sigma | mu sigma
Alice | 30 8.333 | 30.362 8.106
Bob | 5 8.333 | 4.638 8.106

Draw probability: 1 %
As expected, winning increases the winner's rating.
Again go to development forum and read Baker's explaination. I'd like to present this in the manner we feel is best in these forums rather then having 300 seperate discussions at once and confuse things even more. Baker's afk atm [ time for a Saturday piss up /wink.gif" style="vertical-align:middle" emoid=";)" border="0" alt="wink.gif" /> ] but we are hard at work on the next installment of wtf are you two stinkers blathering on about anyway.

I'll say this last bit here since it don't want others to get the wrong impression about your most excellent question and any idea Im not wanting to answer you. It is quite possible for a severely stacked game to actually cause a player's rank to go DOWN since while Mu increases microscopicly the Sigma [ uncertainity ] actual goes up [ stacking throws a shadow on your actual ability causing uncertainity to increase ]. I'll leave it to you to peek at the equation and figure out the details.

MrChaos

edit: the explaination above assumes the severe stack won... God help their rank if they actually lose *shudder*
edit2: I am so bad at providing links! Here is a brief MSReseach explaination... under "Q: Well there must be a bug in the system cause I jumped into a 4 person race with 3 lower ranked individuals, won the race and my position in the league I was in dropped about 50 spots."

Posted: Sun Dec 30, 2007 12:50 am
by CronoDroid
I choose to stack (when I do) when there's a huge difference in commander skill between the two comms, and I'm pretty sure most people do the same.

Then again, I anti-stack much more than I stack. Either that or I just lose a lot of games.

Posted: Sun Dec 30, 2007 1:25 am
by sgt_baker
Surely Ksero!?

Have you forgotten how TS works? Mr Baker is absolutely tired of the random, deliberate and general WTF the ranking question causes.

Posted: Sun Dec 30, 2007 1:27 am
by sgt_baker
IN OTHER WORDS: YOU, OF ALL PEOPLE, SHOULD KNOW BETTER.

Posted: Sun Dec 30, 2007 1:54 pm
by sgt_baker
Ksero wrote:QUOTE (Ksero @ Dec 29 2007, 10:15 PM) | mu sigma | mu sigma
Alice | 30 8.333 | 30.362 8.106
Bob | 5 8.333 | 4.638 8.106
To elaborate:

I was drunk last night and my latent annoyance with the wiki-genius factor is far more likely to result in flames under such circumstances. (this isn't directed specifically at Ksero) It only requires a basic understanding (i.e. read the $#@!ing website) of trueskill to realise that a highly ranked player would have a relatively low sigma. (The conservative rank for 30/8.3333 is (5)) Additionally, the standard range of ranks in trueskill is 0 - 50. Thus, a Helo (30) is roughly equivalent to a trueskill (50/1 - ConRank = (47)) and a Helo (0) is a trueskill (25/8.33333 - ConRank = (0)).

Code: Select all

          mu    sigma  ConRank   mu        sigma    ConRank
Alice | 50.0    1.0    47     -> 50.002    1.003    46.993 |    (winner)
Bob   | 25.0    8.333  0      -> 24.858    8.191    0.285  |    (loser)
Draw probability: 1.01%
Match quality: 2.9%
The technical bit:

I. A players conservative rank is calculated thus: ConRank = mu - 3 * sigma. This formula is only ever used to generate a single ranking figure for human consumption without the consumer having to worry about the relationship between mu and sigma. The rough analogy for a conservative ranking figure is 'we are 99% certain the player is not less skilled than this'.

II. For every game played a player's sigma tends towards zero. As sigma tends towards zero the changes to mu also tend towards zero. To prevent sigma becoming zero, thus preventing any further mu updates, sigma is increased by a small amount for every game played. This increase is known and the dynamics factor. In a standard Trueskill setup it is 0.08333...

III. Standard Truskill behaviour dictates that the less surprising a game outcome (i.e. a supervet beating a newb), the smaller the rank updates made to both winner and loser. The opposite is also true.

IV. In the above example the game outcome is not surprising. Consequently the updates made to Alice's mu and sigma are very small. In extreme cases dynamics > -1 * delta sigma and delta mu < 3 * (delta sigma + dynamics), hence delta Conservative Rank < 0

Q.E.D.


A question:

Why on god's green earth would I make a statement if it were so readily proven to be untrue? I'm honestly starting to believe that people regard me as some sort of bumbling idiot. I appreciate that the mathematical workings of Trueskill are difficult to understand, but I fail to grasp why I should take anyone seriously if they've neglected to gain even the most basic understanding of the subject at hand. In this particular instance in doubly shocked because Ksero actually wrote a very nearly working implementation of Trueskill in Python long before I was involved with the stats project. I could go on... /smile.gif" style="vertical-align:middle" emoid=":)" border="0" alt="smile.gif" />

Posted: Sun Dec 30, 2007 9:21 pm
by cashto
sgt_baker wrote:QUOTE (sgt_baker @ Dec 30 2007, 05:54 AM) Why on god's green earth would I make a statement if it were so readily proven to be untrue? I'm honestly starting to believe that people regard me as some sort of bumbling idiot. I appreciate that the mathematical workings of Trueskill are difficult to understand, but I fail to grasp why I should take anyone seriously if they've neglected to gain even the most basic understanding of the subject at hand. In this particular instance in doubly shocked because Ksero actually wrote a very nearly working implementation of Trueskill in Python long before I was involved with the stats project. I could go on... /smile.gif" style="vertical-align:middle" emoid=":)" border="0" alt="smile.gif" />
Because you didn't make it clear that you were talking about conrank going down, rather than mu (which is what Ksero assumed you meant by "rank").

I think it's a reasonable assumption. Conrank isn't an essential part of the mathematics of Trueskill, it's just a convenient way to package up mu and sigma for human consumption. The choice of 3 sigmas is arbitrary. One could publish mu - 1 * sigma, or mu itself. Mu represents the "best guess" at what the player's true rank is; I could see how someone could interpret "rank" as "mu".

Posted: Mon Dec 31, 2007 11:21 am
by sgt_baker
cashto wrote:QUOTE (cashto @ Dec 30 2007, 09:21 PM) Because you didn't make it clear that you were talking about conrank going down, rather than mu (which is what Ksero assumed you meant by "rank").

I think it's a reasonable assumption. Conrank isn't an essential part of the mathematics of Trueskill, it's just a convenient way to package up mu and sigma for human consumption. The choice of 3 sigmas is arbitrary. One could publish mu - 1 * sigma, or mu itself. Mu represents the "best guess" at what the player's true rank is; I could see how someone could interpret "rank" as "mu".
I concede that the lack of absolute clarity could lead to this misunderstanding. My bad.

Mu the centre of the Gaussian which represents the average of all player performances. Without reference to sigma, which is the standard deviation around mu, mu itself is a relatively useless figure for measuring skill. A practical example of this would be to compare the mus of a new player and an very experienced, yet absolutely average player (assuming the distribution of mus for the entire community is centred around 25). New players start with the values Mu = 25 and Sigma = 8.333... which corresponds to a 99% belief that the new player's skill rating is somewhere in the range 0 - 50 (99% of player performances are expected lie within 3 standard deviations of mu). Our experienced yet average vet, who may have played hundreds of games and is clearly more likely to contribute to a win than the newb, might have the values Mu = 25 and Sigma = 1. If one arbitrarily chooses to ignore sigma, how would one differentiate between these players?

Another important aspect of any ranking system is the ability for it's users to readily identify newbies. The choice of 3 for the k factor (Conrank = mu - k * sigma) is far from an arbitrarily chosen number. The initial figure for chosen for sigma (where initial mu = 25) is 25/3. This harks back to our logical statement regarding our initial belief in the new player's true skill. When one calculates the conservative rank for a newbie one arrives at 25 - 3 * 8.333... = 0. Therein we automatically have our 'newbie helper function' without ever having to explicitly manipulate mu or sigma, or track the player's age/number of games played.

I hope this goes some way towards demonstrating why using a figure other than 3 for calculating the initial sigma or as the k-factor in the ConRank formula would have a profound effect on our mathematical assessment of the very meaning of 'new player', and also renders the ConRank function somewhat useless.

/smile.gif" style="vertical-align:middle" emoid=":)" border="0" alt="smile.gif" />

B

Posted: Mon Dec 31, 2007 11:25 am
by Loriana
i am confused >.<

Posted: Mon Dec 31, 2007 1:17 pm
by mcwarren4
Its too bad, but I just read the Trueskill maierial in full and this is similar to what I was wanting to work on doing two years ago when we were originally working on ranking systems. Perhaps I should have been more pushy but Pook seemed to have a good start going. I use similar metrics to measure the skill of money management firms for my clients. In the end, we don't necessarily 'rank' them but given what you need a portfolio to behave like you build a portfolio of managers that when put together in certain weights you get a portfolio that statisctically should behave in a certain manner with reasonable certainty.

Posted: Mon Dec 31, 2007 3:45 pm
by Checkmate
I tend to judge a game by being stacked depending on the skill of the team...not necessarily helo, although I have antistacked games where the helo was over 100 difference. You can always tell a stacked game when you have 5+ good vets sitting in Noat for 10 minutes just to join blue team. In peoples defence, I will say that early on in a game the teams may be even, and then the team you are on may get stacked....that is clearly not your fault.

Some people claim that they dont like the commander of one side, so they stack them. Poor excuse to me.

I try to anti-stack as often as possible...if teams are even I will join whatever faction I like more. But for the most part I try to join the side who needs a vet. When I join I look to see what role is not being filled and I work on that (does team need probing, miner d, bombing, etc...).

The fact is that games are more fun and interesting when the teams are even. Most people can use their good judgement to determine which team to join to make a good game. There are a handful of religious stackers though, who I almost NEVER will see join a team perceived as weaker. I just make sure to pod them :-)

OH, almost forgot...Happy New Year everyone :-)