Ranking system accuracy

Non-Allegiance related. High probability of spam. Pruned regularly.
LANS
Posts: 1030
Joined: Wed Feb 24, 2010 5:17 am
Location: Toronto, Canada

Post by LANS »

Does anyone have a metric on how accurate TrueSkill/AllegSkill is with confidence in predicting the winner of a match? I don't need it for anything alleg-related.

I know the Alleg statistics are heavily skewed by the small community effect, but even a rough idea would be useful.
ImageImage
Psychosis
Posts: 4218
Joined: Wed Oct 27, 2004 7:00 am
Location: California

Post by Psychosis »

its accurate enough for the people who regularly play, the smurfs like me who don't have time to sink into that many matches are the dark horses that you have to watch out for.

luckily enough, for most of you I don't use my hiders so you can remember to buy me a frigging gunship ( or TF bomber)
Camaro
Posts: 2418
Joined: Sat Jan 28, 2006 8:00 am

Post by Camaro »

No one uses gunships anymore. It's a lost art. :(
Image
Image
MrChaos
Posts: 8352
Joined: Tue Mar 21, 2006 8:00 am

Post by MrChaos »

LANS wrote:QUOTE (LANS @ Dec 4 2013, 11:51 PM) Does anyone have a metric on how accurate TrueSkill/AllegSkill is with confidence in predicting the winner of a match? I don't need it for anything alleg-related.

I know the Alleg statistics are heavily skewed by the small community effect, but even a rough idea would be useful.
Yes the value has been shared numerous times with the community but I will let the person who did the work give it. Talk to SgtBaker edit: as the devil can be in the details and he knows them.
I would contend a small community decreases the time for it to decide a rank, there is no inherent skew in a small vs large community. IF the system is properly implemented of course

LANS do this:

Recreate TrueSkills at home (that part isn't as hard as it seems I think there already written stuff now to reduce the work load... hell just crib it from the coding done for Allegiance)
Get the database for all of the games played from TigerEye pkk (that part has proven elusive in the past)
Pass them through system allowing it to gather ranks for everyone (remember that time played has an extremely important part on things)
For ALL games given there was the reboot awhile back (ranking is not a reward system blah blah blah)
Once done run the games find when it guessed right and wrong recording each team's mu and sigma for review and analysis
Realize this really isn't telling you much tbh because:
a) You did not check the database for accuracy (let alone orthogonality) removing the obvious clinkers
b) Decide and filter on what constitutes an actual team size and length of time for a "real game"
c) Contend with the fluid nature of the game that allows sides to heavily skew, rank wise, in a heartbeat (after all you want game prediction not rank accuracy)
d) etc etc etc

How you look at the data and what you do with it is very important as I am quite sure you already know is all Im getting at, Im quite sure a deep think will bring up more stuff

Somewhere in my endless blathering are several links to incrediblely detailed and well written websites on the whole matter of TrueSkills.

I am not trying to start a debate, be rude, or anything else just answering your question (kind of) is all. I for one would be incrediblely interested in the matter... so by all means please share the progress with the community.

Your Pal in Numbers
MrChaos


P.S. It will make Duckie cranky but soldier on as he will quickly become distracted and start snapping pictures of his naughty bits to sell on Craig's List once again
Last edited by MrChaos on Thu Dec 05, 2013 11:46 am, edited 1 time in total.
Ssssh
djrbk
Posts: 2341
Joined: Tue Jul 01, 2008 5:51 am

Post by djrbk »

I agree with Psyche. Its fairly accurate. The problem though is that is lacks significant data on pilots since there has been a marked diminishment in people's playtime since the reset. Sooo, while some players like tenforward are probably gauged about right at a 6-8? , there are so many more veteran players that are around that level (or lower) too because of having say sub 50 hours in. Meaning having him on your team might be "costly" in terms of what 6-8 skill points can get you now, but he performs as what I'd expect from a 6-8 during the era of a 1000 strong leaderboards.

Hell, I think I only have about 70 hours logged since the reset. Pre-reset I was a 13, with around 1600 hours logged, sitting somewhere in the lower echelon of the top 100. That felt about right. Now I'm at something like an 11, but in the top 10-15? The stats program is less interesting now due to lack of data, but the math behind it always felt decently solid.
djrbk
Posts: 2341
Joined: Tue Jul 01, 2008 5:51 am

Post by djrbk »

Ohh yeah, and as for your initial question, I believe I saw it posted by Sgt Baker that through their algorithms they were able to accurately predict the winners of matches ~95% of the time.
Psychosis
Posts: 4218
Joined: Wed Oct 27, 2004 7:00 am
Location: California

Post by Psychosis »

Camaro wrote:QUOTE (Camaro @ Dec 5 2013, 02:01 AM) No one uses gunships anymore. It's a lost art. :(
I think that this image states why there are problems with the ranking, and with not buying me gunships.
LANS
Posts: 1030
Joined: Wed Feb 24, 2010 5:17 am
Location: Toronto, Canada

Post by LANS »

djrbk wrote:QUOTE (djrbk @ Dec 5 2013, 01:22 PM) Ohh yeah, and as for your initial question, I believe I saw it posted by Sgt Baker that through their algorithms they were able to accurately predict the winners of matches ~95% of the time.

Thank you.

@MrC: Thanks, but I'm not going to do that with alleg games anytime soon. If I end up needing to actually needing to implement trueskill (It was part of a discussion on statistical ranking systems I was having) I'd use results data from whatever system it was implemented in as a measurement of accuracy and work from there.

I wouldn't have much problem recreating my own trueskill system, and it sounds like a reasonably fun project I might do if I have time one day, but it goes on the list of stuff I'm never going to get around to with the things I'm never going to build, movies I'm never going to watch, books I'm never going to read, games I'm never going to play, places I'm never going to go and people I'm never going to see.

Trueskll accuracy came up in a discussion I was having with someone regarding different ways of ranking skill in team games, and I figured I'd use alleg as a rough example. He pointed me at World of Tank's WN7 system, I haven't spent much time looking at it yet.


Edit: @dj: I haven't played alleg in, I dunno how long (probably a year, I think I installed it last summer for about 5 minutes when pkk was helping me dig through some directx issues to use it as a diagnostic). What reset?
Last edited by LANS on Thu Dec 05, 2013 7:40 pm, edited 1 time in total.
ImageImage
djrbk
Posts: 2341
Joined: Tue Jul 01, 2008 5:51 am

Post by djrbk »

They reset all the statistics/player stats on the leaderboard roughly a year ago, around when they pushed a new launcher/system (CSS) and got rid of the old ASCS launcher.
MrChaos
Posts: 8352
Joined: Tue Mar 21, 2006 8:00 am

Post by MrChaos »

What Dj said better than I would have done... errm I think the prediction was a bit lower on getting it right but I am speaking from a year or two after the last efforts and also from a sieve like memory

Also

Righto LANS, I think I got blathering on something I'd like to know about verse the actual question :blush:


On the same theme that LANS spoke to regardng movies I'll never watch type of thing I'd actual be up for getting ahold of the data base to do the blah blah blah above. As promised to Pook I nuked my version of the database lo those years ago when I stepped out of the picture and reaccuring it would be a start from scratch perspective... *rubs his wounds from the journey tentative and thoughtfully*

Hugs For All
MrChaos
Ssssh
Post Reply