Does anyone have a metric on how accurate TrueSkill/AllegSkill is with confidence in predicting the winner of a match? I don't need it for anything alleg-related.
I know the Alleg statistics are heavily skewed by the small community effect, but even a rough idea would be useful.
Ranking system accuracy
Yes the value has been shared numerous times with the community but I will let the person who did the work give it. Talk to SgtBaker edit: as the devil can be in the details and he knows them.LANS wrote:QUOTE (LANS @ Dec 4 2013, 11:51 PM) Does anyone have a metric on how accurate TrueSkill/AllegSkill is with confidence in predicting the winner of a match? I don't need it for anything alleg-related.
I know the Alleg statistics are heavily skewed by the small community effect, but even a rough idea would be useful.
I would contend a small community decreases the time for it to decide a rank, there is no inherent skew in a small vs large community. IF the system is properly implemented of course
LANS do this:
Recreate TrueSkills at home (that part isn't as hard as it seems I think there already written stuff now to reduce the work load... hell just crib it from the coding done for Allegiance)
Get the database for all of the games played from TigerEye pkk (that part has proven elusive in the past)
Pass them through system allowing it to gather ranks for everyone (remember that time played has an extremely important part on things)
For ALL games given there was the reboot awhile back (ranking is not a reward system blah blah blah)
Once done run the games find when it guessed right and wrong recording each team's mu and sigma for review and analysis
Realize this really isn't telling you much tbh because:
a) You did not check the database for accuracy (let alone orthogonality) removing the obvious clinkers
b) Decide and filter on what constitutes an actual team size and length of time for a "real game"
c) Contend with the fluid nature of the game that allows sides to heavily skew, rank wise, in a heartbeat (after all you want game prediction not rank accuracy)
d) etc etc etc
How you look at the data and what you do with it is very important as I am quite sure you already know is all Im getting at, Im quite sure a deep think will bring up more stuff
Somewhere in my endless blathering are several links to incrediblely detailed and well written websites on the whole matter of TrueSkills.
I am not trying to start a debate, be rude, or anything else just answering your question (kind of) is all. I for one would be incrediblely interested in the matter... so by all means please share the progress with the community.
Your Pal in Numbers
MrChaos
P.S. It will make Duckie cranky but soldier on as he will quickly become distracted and start snapping pictures of his naughty bits to sell on Craig's List once again
Last edited by MrChaos on Thu Dec 05, 2013 11:46 am, edited 1 time in total.
Ssssh
I agree with Psyche. Its fairly accurate. The problem though is that is lacks significant data on pilots since there has been a marked diminishment in people's playtime since the reset. Sooo, while some players like tenforward are probably gauged about right at a 6-8? , there are so many more veteran players that are around that level (or lower) too because of having say sub 50 hours in. Meaning having him on your team might be "costly" in terms of what 6-8 skill points can get you now, but he performs as what I'd expect from a 6-8 during the era of a 1000 strong leaderboards.
Hell, I think I only have about 70 hours logged since the reset. Pre-reset I was a 13, with around 1600 hours logged, sitting somewhere in the lower echelon of the top 100. That felt about right. Now I'm at something like an 11, but in the top 10-15? The stats program is less interesting now due to lack of data, but the math behind it always felt decently solid.
Hell, I think I only have about 70 hours logged since the reset. Pre-reset I was a 13, with around 1600 hours logged, sitting somewhere in the lower echelon of the top 100. That felt about right. Now I'm at something like an 11, but in the top 10-15? The stats program is less interesting now due to lack of data, but the math behind it always felt decently solid.
djrbk wrote:QUOTE (djrbk @ Dec 5 2013, 01:22 PM) Ohh yeah, and as for your initial question, I believe I saw it posted by Sgt Baker that through their algorithms they were able to accurately predict the winners of matches ~95% of the time.
Thank you.
@MrC: Thanks, but I'm not going to do that with alleg games anytime soon. If I end up needing to actually needing to implement trueskill (It was part of a discussion on statistical ranking systems I was having) I'd use results data from whatever system it was implemented in as a measurement of accuracy and work from there.
I wouldn't have much problem recreating my own trueskill system, and it sounds like a reasonably fun project I might do if I have time one day, but it goes on the list of stuff I'm never going to get around to with the things I'm never going to build, movies I'm never going to watch, books I'm never going to read, games I'm never going to play, places I'm never going to go and people I'm never going to see.
Trueskll accuracy came up in a discussion I was having with someone regarding different ways of ranking skill in team games, and I figured I'd use alleg as a rough example. He pointed me at World of Tank's WN7 system, I haven't spent much time looking at it yet.
Edit: @dj: I haven't played alleg in, I dunno how long (probably a year, I think I installed it last summer for about 5 minutes when pkk was helping me dig through some directx issues to use it as a diagnostic). What reset?
Last edited by LANS on Thu Dec 05, 2013 7:40 pm, edited 1 time in total.
What Dj said better than I would have done... errm I think the prediction was a bit lower on getting it right but I am speaking from a year or two after the last efforts and also from a sieve like memory
Also
Righto LANS, I think I got blathering on something I'd like to know about verse the actual question
On the same theme that LANS spoke to regardng movies I'll never watch type of thing I'd actual be up for getting ahold of the data base to do the blah blah blah above. As promised to Pook I nuked my version of the database lo those years ago when I stepped out of the picture and reaccuring it would be a start from scratch perspective... *rubs his wounds from the journey tentative and thoughtfully*
Hugs For All
MrChaos
Also
Righto LANS, I think I got blathering on something I'd like to know about verse the actual question
On the same theme that LANS spoke to regardng movies I'll never watch type of thing I'd actual be up for getting ahold of the data base to do the blah blah blah above. As promised to Pook I nuked my version of the database lo those years ago when I stepped out of the picture and reaccuring it would be a start from scratch perspective... *rubs his wounds from the journey tentative and thoughtfully*
Hugs For All
MrChaos
Ssssh



