ranking system

Catch-all for all development not having a specific forum.
Grim_Reaper_4u
Posts: 356
Joined: Wed Jul 30, 2003 7:00 am
Location: Netherlands

Post by Grim_Reaper_4u »

We'll see what you guys come up with but any system that solely relies on win/loss statistics is inherently flawed.

Taking into account commander skill is dangerous too since newb comms are most often just handpuppets for 1 or 2 vets on their team who are actually commanding the team (so modifiers for comm skill are easy to abuse by putting a newb comm in the saddle and telling him exactly what to do and place his cons for him)

Since alleg skills can usually be put into just a few categories (scouting/dogfighting/bombing,etc) and these skills are preferably present in both teams in an equal measure, it then stands to reason that ideally some kind of algorithm be used that balances these skill categories across both teams. Having 1 team full of scout whores fight a team full of int whores isn't much fun /wink.gif" style="vertical-align:middle" emoid=";)" border="0" alt="wink.gif" /> It should be relatively easy to find each players "specialty" or multiple specialties and use this to create balanced teams (or at least to prevent blatant stacks) : [int whore sum].[team1]>[int whore sum].[team2] then int whore X can't play on team1.

Typically wins/losses should be important for commanders but not so much for players (and giving players points even if they lost will reduce stacking although it might not be good for teamwork /wink.gif" style="vertical-align:middle" emoid=";)" border="0" alt="wink.gif" /> ). However using kills/dmg nanned/ships eyed by our probes/miner killed/cons killed/bases killed/etc. stats might put a lot of strain on the servers who have to process this @#(!load of events daily.


Post specifics ffs so we can stop this guessing /mrgreen.gif" style="vertical-align:middle" emoid=":D" border="0" alt="mrgreen.gif" />
MrChaos
Posts: 8352
Joined: Tue Mar 21, 2006 8:00 am

Post by MrChaos »

Grim_Reaper_4u wrote:QUOTE (Grim_Reaper_4u @ Jan 13 2008, 02:56 PM) We'll see what you guys come up with but any system that solely relies on win/loss statistics is inherently flawed.
Opinions on what we do with the information is welcome but blatant my opinion is better then yours makes me just go *meh*

QUOTE Taking into account commander skill is dangerous too since newb comms are most often just handpuppets for 1 or 2 vets on their team who are actually commanding the team (so modifiers for comm skill are easy to abuse by putting a newb comm in the saddle and telling him exactly what to do and place his cons for him)[/quote]
Really? The idea that newbie comms are statistically relevant to a 1,000,000 plus lines of game data is just plain silly

QUOTE Since alleg skills can usually be put into just a few categories (scouting/dogfighting/bombing,etc) and these skills are preferably present in both teams in an equal measure, it then stands to reason that ideally some kind of algorithm be used that balances these skill categories across both teams.[/quote]
Provide the alogrithm and btw keep providing for each core and each iteration of it. Don't bother me with some half assed points equal chart roll up those shirt selves and do the tremendous amount of work please provide your work and theories too.

QUOTE Having 1 team full of scout whores fight a team full of int whores isn't much fun /wink.gif" style="vertical-align:middle" emoid=";)" border="0" alt="wink.gif" /> It should be relatively easy to find each players "specialty" or multiple specialties and use this to create balanced teams (or at least to prevent blatant stacks) : [int whore sum].[team1]>[int whore sum].[team2] then int whore X can't play on team1.

Typically wins/losses should be important for commanders but not so much for players (and giving players points even if they lost will reduce stacking although it might not be good for teamwork /wink.gif" style="vertical-align:middle" emoid=";)" border="0" alt="wink.gif" /> ). However using kills/dmg nanned/ships eyed by our probes/miner killed/cons killed/bases killed/etc. stats might put a lot of strain on the servers who have to process this @#(!load of events daily.
Post specifics ffs so we can stop this guessing /mrgreen.gif" style="vertical-align:middle" emoid=":D" border="0" alt="mrgreen.gif" />
Grim[/quote]

Please give me your year's worth of work that proves your point otherwise I'm afraid your words are nothing more then hot air.

MrChaos <--- turn to be annoyed
Ssssh
sgt_baker
Posts: 1510
Joined: Wed Oct 20, 2004 7:00 am
Location: London, UK.
Contact:

Post by sgt_baker »

Grim_Reaper_4u wrote:QUOTE (Grim_Reaper_4u @ Jan 13 2008, 07:56 PM) We'll see what you guys come up with but any system that solely relies on win/loss statistics is inherently flawed.

Taking into account commander skill is dangerous too since newb comms are most often just handpuppets for 1 or 2 vets on their team who are actually commanding the team (so modifiers for comm skill are easy to abuse by putting a newb comm in the saddle and telling him exactly what to do and place his cons for him)

Since alleg skills can usually be put into just a few categories (scouting/dogfighting/bombing,etc) and these skills are preferably present in both teams in an equal measure, it then stands to reason that ideally some kind of algorithm be used that balances these skill categories across both teams. Having 1 team full of scout whores fight a team full of int whores isn't much fun /wink.gif" style="vertical-align:middle" emoid=";)" border="0" alt="wink.gif" /> It should be relatively easy to find each players "specialty" or multiple specialties and use this to create balanced teams (or at least to prevent blatant stacks) : [int whore sum].[team1]>[int whore sum].[team2] then int whore X can't play on team1.

Typically wins/losses should be important for commanders but not so much for players (and giving players points even if they lost will reduce stacking although it might not be good for teamwork /wink.gif" style="vertical-align:middle" emoid=";)" border="0" alt="wink.gif" /> ). However using kills/dmg nanned/ships eyed by our probes/miner killed/cons killed/bases killed/etc. stats might put a lot of strain on the servers who have to process this @#(!load of events daily.
Post specifics ffs so we can stop this guessing /mrgreen.gif" style="vertical-align:middle" emoid=":D" border="0" alt="mrgreen.gif" />

I know exactly where you're coming from, Grim. The problem with points-based stats, as mentioned previously, is that the 'algorithm' is incredibly difficult to get right. Ask some of the old AZers (you may be one of them, I just dunno who's who) and they'll tell ya stories about whole teams refusing to bomb due to the crappy scoring for bombing, people dropping rather than suffer a PK etc etc. That list goes on and on. Secondly, there are tangible skill factors which are nigh on impossible to rate in terms of points. Situational awareness is one such skill.

Ironically, my initial involvement with the stats project was to lead an effort to unite the numerous and various points stats into just such a rank. The most practical method for doing so whereby one could relate, say, nanning to whoring involved using neural networks. Even with such a flexible tool it became rapidly apparent that the sheer complexity of the game and number of potential interactions between it's players renders the problem one of mind boggling magnitude, even when using neural nets. Please don't think that it's an unexplored path. /smile.gif" style="vertical-align:middle" emoid=":)" border="0" alt="smile.gif" />

Finally, and I'll not make this point lightly:

Why would Microsoft, the very people who used to run a points-based ranking system in alleg, abandon the points-based approach in favour of Trueskill for *all* their modern multiplayer games on XBox Live? Surely this question must have occurred to you. It is of some amusement to me that this community (or certain members thereof) has adopted a position of 'we know better than the people who developed 'our' game'. If you believe that a win/loss system is inherently flawed I suggest you grab a PhD in stats and take the issue up with Microsoft Research. Better still, why not just develop your own system?

Sadly, I think neither of the above is even remotely likely, so whether you 'believe' in the efficacy of Bayesian inference or not, Trueskill and the mathematical fundaments it is built upon (which, incidentally pop up all over RL - just ask yourself how you know that taking paracetamol is safe) are here to stay. This has got to be the twentieth post making generally the same point. I've yet to see anything even vaguely proving that win/loss systems are flawed.

P.S. Commander skill isn't taken into account when balancing teams or when updating players' ranks - it's only there to enable well-balanced comms to start with.

B
Image
Granary Sergeant Baker - Special Bread Service (Wurf - 13th Oct 2011)
sgt_baker
Posts: 1510
Joined: Wed Oct 20, 2004 7:00 am
Location: London, UK.
Contact:

Post by sgt_baker »

Hehehe.

MrC and I have this Jekyll and Hyde thing going on... guess today it's my turn with the fluffy stick.
Image
Granary Sergeant Baker - Special Bread Service (Wurf - 13th Oct 2011)
MrChaos
Posts: 8352
Joined: Tue Mar 21, 2006 8:00 am

Post by MrChaos »

Dear Grim

Im usually much much nicer /doh.gif" style="vertical-align:middle" emoid=":doh:" border="0" alt="doh.gif" /> what Baker said /laugh.gif" style="vertical-align:middle" emoid=":lol:" border="0" alt="laugh.gif" />

[ just it got on my tits for some reason so at least you get an award for " Getting on MrChaos' Tits " ]

< medal is in the mail Grim >

MrChaos
Ssssh
Grim_Reaper_4u
Posts: 356
Joined: Wed Jul 30, 2003 7:00 am
Location: Netherlands

Post by Grim_Reaper_4u »

relax chaos, i'm not here to get on your tits (unless they look a lot better than I imagine they do) /wink.gif" style="vertical-align:middle" emoid=";)" border="0" alt="wink.gif" /> actually 50% of my job involves data-mining and doing statistical analyses of data so I'm not a complete newb at this /wink.gif" style="vertical-align:middle" emoid=";)" border="0" alt="wink.gif" /> (the dozens of academic papers i did the analyses for have been published in A journals worldwide) If you tell us exactly what your system looks like and what it uses for it's rank than I'll be more than happy to shoot holes in your theory (backed up with solid reasons why it is flawed of course /wink.gif" style="vertical-align:middle" emoid=";)" border="0" alt="wink.gif" /> ).

Since currently i'm only guessing what your system looks like it's quite useless to comment on it besides the aforementioned generalizations (which i actually do stand by and can defend)

So take my comments as friendly advice and do with it what you like. /smile.gif" style="vertical-align:middle" emoid=":)" border="0" alt="smile.gif" />
Grimmwolf_GB
Posts: 3711
Joined: Wed Jul 02, 2003 7:00 am
Location: Germany
Contact:

Post by Grimmwolf_GB »

With Grim_Reaper, no advice is friendly. /mrgreen.gif" style="vertical-align:middle" emoid=":D" border="0" alt="mrgreen.gif" />
sgt_baker
Posts: 1510
Joined: Wed Oct 20, 2004 7:00 am
Location: London, UK.
Contact:

Post by sgt_baker »

Grim_Reaper_4u wrote:QUOTE (Grim_Reaper_4u @ Jan 14 2008, 07:47 AM) If you tell us exactly what your system looks like and what it uses for it's rank than I'll be more than happy to shoot holes in your theory (backed up with solid reasons why it is flawed of course /wink.gif" style="vertical-align:middle" emoid=";)" border="0" alt="wink.gif" /> ).

Since currently i'm only guessing what your system looks like it's quite useless to comment on it besides the aforementioned generalizations (which i actually do stand by and can defend
I get the distinct impression you've not been following as closely as might be desired.

Microsoft Trueskill

Trueskill technical report
Last edited by sgt_baker on Mon Jan 14, 2008 10:13 am, edited 1 time in total.
Image
Granary Sergeant Baker - Special Bread Service (Wurf - 13th Oct 2011)
Grim_Reaper_4u
Posts: 356
Joined: Wed Jul 30, 2003 7:00 am
Location: Netherlands

Post by Grim_Reaper_4u »

So you are seriously considering using trueskill in it's team based form?

OK then, I thought you might adapt the free-for-all version where you would use the points which the players earned in a game to determine their rank (maybe after modifying the points for win/loss or something)

Let me give you my opinion on using trueskill in that form for Alleg and possible implications/ways of cheating. I'm gonna make it point based so it's easier for you to comment on each point. I don't have a problem with trueskill for use in free-for-alls or even small balanced games, however :

1) the way i see trueskill works (correct me if i'm wrong because i don't own a X-Box and can't find how the match making works online) : Peeps only get ranked on win/loss (given the typical 2 team alleg environment) but here's the big catch if i read the documentation correctly (I could be wrong though):

a) Teams consist of equally ranked players so a rank 32 could never join a game with only rank 12's ? (yeah that will work for 200.000 players but creating games only for similarly ranked players in alleg will be a little harder /wink.gif" style="vertical-align:middle" emoid=";)" border="0" alt="wink.gif" /> )

b) Does MS have multiplayer games with 25 per side that use trueskill? Can you calculate how long it will take to get a reasonably accurate rank with 25 per side games when a 8vs8 environment takes 91 games according to MS? (and no: if i'm correct about only similarly ranked players being allowed to play we cannot use old stats)

2) could you tell me if X-box live trueskill games usually have 80% of the team join after game start and how you think you can adress early/late game stack changes?

3) in what way will trueskill be different from HELO in that it rewards people for choosing the right team even if they are skill-less ?(connected to the fact that players of vastly different skill play on the same team and with 25 players per side a few slackers who know which team to choose can easily choose a winning team and join them without contributing and yet without hurting that team's chances of succes)

4) in what way wil trueskill reward people that always anti-stack and thus lose more than they should? i used to rank 19th in alleg and was ranked much higher than a @#(!load of players that are much better than me just because they anti-stacked a hell of lot more. a Win/loss system will never fix this without truely balanced teams IMHO

5) How will you counter the famous "<5 minute drop doesn't hurt my rank" hellenus trick?

6) how will you deal with the fact that only a small% of the players play the whole game and that Game match% sometimes change dramatically during Alleg games because we don't restrict the rank of peeps that enter a game? (related to 1a)

7) will you enforce some kind of autobalance?

Once you realize you have problems i can brain storm with you about possible solutions but if you just gonna play the "statistician without a clue for what the real world looks like" then I'd rather not waste time on that, i have to deal with mathematicians/statisticians who never think past/about the restrictions of their models daily and i don't feel like doing that in my spare time too /wink.gif" style="vertical-align:middle" emoid=";)" border="0" alt="wink.gif" />

For me trueskill appears to work best in situations where similarly ranked players play 1vs1 or free-for-alls in short games where virtually all players play the full length of the game. It will probably work well in environments with smallish teams (<10) where the team is made up of similarly ranked players. From what i've seen trueskill will not be very good in a Alleg environment because Alleg is a bit too complex for such a "simple" win/loss system /smile.gif" style="vertical-align:middle" emoid=":)" border="0" alt="smile.gif" />

And guys : the fact that I spent time on reading up on the trueskill stuff and bothered to post here means I wanna help you build a good ranking system, try not to be offended if I don't agree with some of the choices you guys have made. Just see me as the devils advocate who critically reviews your work and prevents too much groupthink (BTW I hope not all of you have a background in stats/math because that's a recipe for failure, get a few peeps from other backgrounds too, you'd be surprised that they might bring in fresh ideas and much needed common sense /wink.gif" style="vertical-align:middle" emoid=";)" border="0" alt="wink.gif" /> )
AaronMoore
Posts: 471
Joined: Sun May 06, 2007 3:09 pm
Location: Australia

Post by AaronMoore »

sgt_baker wrote:QUOTE (sgt_baker @ Jan 14 2008, 10:10 AM) I get the distinct impression you've not been following as closely as might be desired.

Microsoft Trueskill

Trueskill technical report
Thanks for these links Sgt. Baker, I am reading through them and it sounds like it has great promise!

What does it take to implement this in Alleg? Will there need to be a client update or is it all server code?

You guys are doing a great job, and if it works as well as MS is publishing, we will see more of the back and forth pushes, critical events changing balance etc. that we love. `yt

/mrgreen.gif" style="vertical-align:middle" emoid=":D" border="0" alt="mrgreen.gif" />
Image
Post Reply