Ksero wrote:QUOTE (Ksero @ Jul 13 2006, 02:43 PM) Consider another situation: If you want to see if a particular dice is weighted, you can roll the dice a hundred times and see how often each face comes up. So one way to test it would be to take two teams and let them play against each other 100 times. Then compare the win-percentages to what ELO predicted. But that's not feasible. Instead of throwing the same dice many times, we're throwing a new dice each time, since every pickup game is unique.
But what if we can group similar dice... I mean games... together? For example, we could check all the games where the estimated outcome was between 75-25 and 65-35. Then calculate how many games were won by the underdogs. That would be one way to measure the accuracy of ELO. If it deviates significantly from 30, then we should become suspicious.
/wub.gif" style="vertical-align:middle" emoid=":iluv:" border="0" alt="wub.gif" /> Ksero
The predicted outcomes aren't like dice (plural) at all since we use the evaluation criteria, ranking, to effect the individual rankings those also our predictive model. It's not a WTFMrCS© (What the $#@! is MrChaos Saying) honestly and it's at the heart of the matter.
The classic example given for die throwing is this one: Any number on a thrown die has a 1 in 6 chance of coming up, no matter how many times the die ha been thrown or the number has appeared previous.
Thrown enough times all numbers 1 to six will come up,1/6 of the time with a fair/balanced die. Any individual throw all bets are on so speak. Follow?
Pay even money that 1 will be rolled next throw even if 1 has been thrown ten times I'll take the bet everytime (die fair assumed)
Checking for a weighted dice requires using a mean, confidence interval, standard deviation approach. Since it is CERTAIN that you WILL throw 100 1s in a row if you throw a die (a fair balanced one) enough times.
There is NO reason it cannot happen on throw 1 to 100 then it can on throw 20000 to 20099.
More data more confidence but even 1,000,000 1s in a row doesn't CONCLUSIVELY prove anything . Im pretty darn confident but you don't KNOW.
Taking some rather good advice, STFU, and waiting for more discussion.
MrChaos <---- squaded up today