New algorithm for autobalance=auto

MrChaos · Post by **MrChaos** » Fri Jul 30, 2010 2:36 pm

the.ynik

Before you implement anything that has such a profound effect to the end user's gaming experience you 100% need to validiate it at every level. Currently a number of people are screaming go go go and wow! Is it the best thing since sliced bread, or have the potentional to be a ginormous mess? I am being absolutely rational and honest when I say; I don't know and the.ynik if your honest with yourself neither do you.

Do the work to validate it, and if you miss R6 to make sure it works as well as you hope then why not R6.1? I'd actually encourage you to consider working on it still even if the TS AB has the ability to go live... which would be news to me btw too. If it's superior prove it, or maybe there are aspects to it that are like peas and carrots for Allegiance.

We had to add stuff to TrueSkills which made it become AllegSkills to deal with the uniquely Allegiance experience (MSR, the makers of this game and TrueSkills ironically enough, wanted Baker to publish his work btw) and maybe we need to do that with TS AB too. Yes this means real work for someone but that's how one makes changes that are actually good ones.

DasSmiter

The response rambles because the peanut gallery rambles all over the place. If Im not super nice, and explain things to the nth degree I'm a troll and/or worse someone standing in th way of a better idea. The funny thing is the.ynik was me three plus years ago. I didn't take no for an answer. Got the data, gathered the crew, accepted the idea may never get to implementation, and when the time came buried my ego to see that the idea got the consideration so it MIGHT make it to your hard drive. Is his idea good *shrug* who knows, early days is my thoughts, but please by all means if he is that passionate about it find out. Just implement it approach.... Jesus H Christ how many more times do we have to relearn that lesson?

Answer today: no more times
Answer tomorrow: hopefully the same exact thing

Thanks for listening
MrChaos

Xynth

Let's just try it got us to 450 people on the leaderboard.

MrChaos · Post by **MrChaos** » Fri Jul 30, 2010 2:57 pm

Phantom032 wrote:QUOTE (Phantom032 @ Jul 30 2010, 09:09 AM) Oh btw, since I didnt respond to MrC yet since I felt no need to read his too-large-to-be-useful post earlier:
Leaderboard
450 players left. Last year this time it was still a lot more than 600.

Answer: Just do it is one of the reasons why we have 450 people on the leaderboard

QUOTE Imagine you just lost 1/4th of your blood and are still bleeding.
Should you rush and try to stop the bleeding or first do a detailed statistical analysis on how your chances to survive are, depending on how you move?[/quote]

Answer: Imagine if you broke your leg and then put your head in a wood chipper to relieve the pain. Stop the analogies thats my department

QUOTE I'm not saying planning out things is bad. In fact I like detailed, well thought out plans. They tend to work, thats why.
You say we risk changing the ranks with changing the autobalance. Sure thing.[/quote]

Answer: Then you did read the thread you sly puss you

To help you out it will change things but no one knows to what degree and you best find out before you implement another change without proper testing it that AGAIN drives off more people.

QUOTE Just like Autobalance 2 would make a difference by allowing more stacking.
What makes you so sure the Autobalance 1 ranks are PERFECTLY CORRECT? Do you HONESTLY believe every AS equal game is a game of equal teams?
If you do I really misjudged you, as thats just plain stupid. It would take a mu/sigma balancing algorithm to GET the ranks to fit better; and even then an EQUAL game would NOT be a game of equal RANK.[/quote]

Answer: Autobalance=1? If you mean balance=1 that's the traditional game play for the Allegiance and requires diligence on the commander to not allow a stack to occur. IS this the best approach EVAH! well no, and if you took a moment to put down your pitch fork and torch you'd know I agree 100%.

What I want, and I've never been shy about stating: universal NOAT, forced autobalance, and if you don't like things start a new game. If you meant one uber player in a 6vs6 for small games obviously imbalances things.... ok for nickle I'll buy it. What do you do when you have a bunch of low ranks and say Champy? No soup for Champy? Ahhh we didn't think about that now did we, and why implementing it first then asking questions later is a piss poor choice

QUOTE You are so afraid to change anything that you don't even notice that if nothing is done ALLEG IS DONE FOR.
PS: Currently on 'Main': 3 [34] vs 6 [23]. Ab = auto. You should be crying now.[/quote]

Answer: At this point Im not sure what that means but I think you are going for Allegiance is going to die and Im a big old dickhead... duly noted and filed for future consideration... if it makes you feel better my ex-wife would probably agree with the later.

Thanks for the response
MrChaos

edit: totally unreadable without the quoting, my bad

MrChaos · Post by **MrChaos** » Fri Jul 30, 2010 3:09 pm

That's the 1PM floor show everyone, you can get your parking tickets validated at the door.

All the words are there and I leave it to others to decide to listen to me or not. Ive got hope for the.ynik, Phantom032 not so much

. No hard feelings, I've been called worse, and see you all in game pretty please with sugar on top.

Good Nite Everybody and Thank You for Coming!
MrChaos

the.ynik · Post by **the.ynik** » Fri Jul 30, 2010 3:32 pm

MrChaos wrote:QUOTE (MrChaos @ Jul 30 2010, 04:36 PM) Before you implement anything that has such a profound effect to the end user's gaming experience you 100% need to validiate it at every level. Currently a number of people are screaming go go go and wow! Is it the best thing since sliced bread, or have the potentional to be a ginormous mess? I am being absolutely rational and honest when I say; I don't know and the.ynik if your honest with yourself neither do you.

The problem is: autobalance interacts with player behavior. Unlike ranks, it cannot be tested with existing data because we don't know how players would behave if they were allowed to join a different team. Would they follow autobalances suggestion? Would they try to mis-use my "flexibility" to continue stacking? Would they have to wait more or less until they can join a side? Would they stop playing at all (because they cannot join their favorite team)?

All we can do is ask the community: "given teams A and B, where do you think should player C join to balance the game?" and try to see if the algorithm's output matches the community's expectation. That's what I was trying to do with my last question in the original post (and yes, I've done this myself for games of different sizes and stacks).
And of course we can just try it out for R6 beta Wednesdays - because it replaces an option that nobody is using today, I don't see how it can do much harm. Game commanders can still put autobalance=1 if this doesn't work out. In fact, my other suggestion of letting newbies join both sides is much more problematic in this regard (but was accepted without much of an discussion).

If you have any other idea of how to validate my (or any) approach to post-launch autobalance, please let me know.

HSharp · Post by **HSharp** » Fri Jul 30, 2010 3:40 pm

Raveen wrote:QUOTE (Raveen @ Jul 30 2010, 03:02 PM) I do not know for sure but I think this is the reason it was suggested that you hold back.

As you say ASGS handing over the AS information cannot be done (at least not easily if I understand it which I don't). However there's a successor system to ASGS in development (CSS, it has a forum and everything, it's hardly a secret).

The last update on that forum dates June 13th, it might not be a secret but it's certainly not open.

QUOTE So getting fully operational AB into the game is a case of waiting for CSS, however long that will be. If it's only going to be a month or so then it's probably not worth your time and effort.[/quote]

From what I can tell there is a basic algorithm already thought up and implementation doesn't look that tricky, not to mention that even if implemented as suggested it gives server operators the ability to switch between autobalance methods.

QUOTE Also there's a PR aspect to this whole issue. There's a general perception that AB sucks and I would fear that your implementation, better though it would no doubt be, would tarnish the concept as and when a better solution rolls up.[/quote]

I don't think anyone who has experienced AB now can possibly thinks it does anything but suck, this isn't a call to bring about the new implementation straight into gameplay ASAP but to put it into beta testing where it can be tested. This isn't some half-baked idea that is going straight to players to use right away! It's a half baked idea which will be tested in the fiery ovens of the beta server!

the.ynik · Post by **the.ynik** » Fri Jul 30, 2010 4:27 pm

HSharp wrote:QUOTE (HSharp @ Jul 30 2010, 05:40 PM) From what I can tell there is a basic algorithm already thought up and implementation doesn't look that tricky, not to mention that even if implemented as suggested it gives server operators the ability to switch between autobalance methods.

Turkey already implemented it, take a look at the patch on http://trac.alleg.net/ticket/192.
It just needs testing.
And maybe some code cleanup, especially in the area between "//OK this is hideous" and "//thank God that's over"

QUOTE Also there's a PR aspect to this whole issue. There's a general perception that AB sucks and I would fear that your implementation, better though it would no doubt be, would tarnish the concept as and when a better solution rolls up.[/quote]

I'm fully aware that it will suck; but hopefully it'll suck less than the existing autobalance. I don't think it would tarnish the concept; even the crappy autobalance=1 gets used over the autobalance=N/A option.

QUOTE ( @ Jul 30 2010, 05:40 PM) I don't think anyone who has experienced AB now can possibly thinks it does anything but suck, this isn't a call to bring about the new implementation straight into gameplay ASAP but to put it into beta testing where it can be tested. This isn't some half-baked idea that is going straight to players to use right away! It's a half baked idea which will be tested in the fiery ovens of the beta server![/quote]

Exactly, let's get this into testing for next Beta Wednesday™.

Btw MrChaos: we CAN test rank-based autobalance on the beta servers because the server does have ranks (even though players can put in any number they want).

MrChaos · Post by **MrChaos** » Fri Jul 30, 2010 5:30 pm

the.ynik wrote:QUOTE (the.ynik @ Jul 30 2010, 10:32 AM) The problem is: autobalance interacts with player behavior. Unlike ranks, it cannot be tested with existing data because we don't know how players would behave if they were allowed to join a different team. Would they follow autobalances suggestion? Would they try to mis-use my "flexibility" to continue stacking? Would they have to wait more or less until they can join a side? Would they stop playing at all (because they cannot join their favorite team)?

All we can do is ask the community: "given teams A and B, where do you think should player C join to balance the game?" and try to see if the algorithm's output matches the community's expectation. That's what I was trying to do with my last question in the original post (and yes, I've done this myself for games of different sizes and stacks).
And of course we can just try it out for R6 beta Wednesdays - because it replaces an option that nobody is using today, I don't see how it can do much harm. Game commanders can still put autobalance=1 if this doesn't work out. In fact, my other suggestion of letting newbies join both sides is much more problematic in this regard (but was accepted without much of an discussion).

If you have any other idea of how to validate my (or any) approach to post-launch autobalance, please let me know.

Only the rarest of rare games have autobalance on. You do, of course mean maxium team imbalance=1 which is of course not autobalance at all. I'm not sure if those who are proponents are purposely using the term autobalance=1 to cloud the discussion, I hope not.

Anyhow what all those endless games availible with autobalance off and reduced player choice on to the tune of 100,000s of games it fortunately gives you glimpses into common player behavior via the old games. Which shows beyond a shadow of a doubt if you don't watch them like a hawk they will stack their collective balls off. You can watch by time played, the stacks coming and going, the little bugggers will early game stack, mid game stack, late game stack, and just stack for pure hate. That's one way to see how player behavior and stacking occur... Im not sure if you can generate useful statistical values, ANOVA and it's ilk ftw.

If you don't think players are intentionally doing it... I have anodotical evidence that "stealth stacking" happens purposely both on the player side and the commander side when the community raises a stink (by this I mean they are less blatant, players coordinate, and hide the stack once the game is clearly in hand) about it from time to time but *meh* you know what I think about this entire anoditical approach now don't we. That's however what happens when you spend months dinking with data, you start to see patterns that aren't first obvious and then need to check yourself using statistical tools.

All people, at some point, will purposely go to the team they think will win, the frequency of this vs trying to keep game play fair is the 64 million dollar question. The answer is: don't allow them the chance at all. Like when Mom allowed you to break the candy bar and your sister to choose which half the players see this as equally fair but still kinda sucky. There is literature on this topic btw. Anyone playing over the ten years of Allegiance can tell you, ugh again this word, subjectively that stacking is ridiculously high if you allow it to occur without any type of check. MSR did research specifcally on the subject when they implemented their balancing system. 168 pages of powerpoint fun ftw.

As for maximum team imbalance=1 count on the commanders to not allow stack current method *shrug* it's a @#(!ty way to run a railroad indeed. So your sharp enough to see and figured $#@! it I want my turn driving the train, how bad can it be anyway? Pretty bad given the all the cargo strewn about from the previous train wrecks taking this exact same approach, in this area alone.

As for the whole firey crucible business of beta... there are no ranks gents so ah yeah for the beta idea. Validating the code only proves that you didn't make an oopsie in your pants writing the code.. not so much in this case now will it. Hmmm. Ok so you can test code and nothing more ~yp (cause obviously bogus ranks are no way to do a proper test and people won't do it right)

TurkeyXIII · Post by **TurkeyXIII** » Fri Jul 30, 2010 5:53 pm

MrChaos wrote:QUOTE (MrChaos @ Jul 30 2010, 08:33 AM) I've been around for at least four distinctive ranking system changes, autobalance implementation, repeated attempts to deal with rookie integration in the game. Each and every single time code goes live without an actual check of the implications of it *shakes head* bad things occur.

This isn't a ranking system, though. This doesn't even touch AllegSkill, just sort of peers curiously at it from a distance.

MrChaos wrote:QUOTE (MrChaos @ Jul 30 2010, 08:33 AM) TurkeyXIII: I once fiddled with the mus and sigmas to try to see if I'd be able to use AS to predict the winning probability of games. The main problem is newbies: they have a mu 25, which is higher than plenty of people who know what they're doing, so balancing algorithms based on it will often put them on the lower-ranked team. Also their crazy-high sigma increases the sigma of the whole team, so such a system would think that newbies joining would even a game out, regardless of which team they join.

Conservative rank is more accurate imo. ynik's algorithm is a work of art. Consider putting the weights into a .txt file in the server artwork so it can be modified without a code change.

TurkeyXIII: That's not what I meant, but it's not a bad idea... some reciprocal of sigma as a weighting factor... But depending on the brains behind it, it wouldn't be any less arbitrary than ynik's method, and ASGS/CSS would need to pass two rank values instead of just one.

Please induldge me.

First the term "conservative rank" is taking the player's rank and looking at it as a one-tail implementation rather then a two-tailed implmenation. It slides the collective and individual Mu leftward in the distribution for the displayed rank only. You know consider the bit to the left and this depresses ranks every so slightly.

TS AB doesn't use just Mu but also Sigma. Their "crazy high sigma" and AllegSkill Mu of 15 is EXACTLY the right way to handle their introduction to the team. Im not sure what you fiddled with and how (Im aware you are numbers guy, no slight meant just mean i wasb't there) but my experience is there is no bias to team choice that you mention.

When Mu and Sigma compared to ranks, there is no question that ranks are not as accurate, and I leave it to you to do the math. It's a one hundred percent lock. The information is on the websire, papers, and other links all over sigs, wiki and elsewhere

I have an example using thewiki's equations for team ranks to support this crazy claim, but it's mostly muted by this post:

sgt_baker wrote:QUOTE (sgt_baker @ Jul 30 2010, 03:16 AM) We have some post-launch balance algorithms which perform better than any system based purely on (MyRank) alone, yet I'm not absolutely happy with them. If someone were to demonstrate to me that there might be developments in this area, I'd be happy to re-engage in the research required to develop said algorithms to fruition.

So even though I didn't see it, somebody else has come close at least. Good to know... although now I'm curious as to what it involves.

the.ynik · Post by **the.ynik** » Fri Jul 30, 2010 6:02 pm

MrChaos wrote:QUOTE (MrChaos @ Jul 30 2010, 07:30 PM) Only the rarest of rare games have autobalance on. You do, of course mean maxium team imbalance=1 which is of course not autobalance at all. I'm not sure if those who are proponents are purposely using the term autobalance=1 to cloud the discussion, I hope not.

Sorry, I think I started using 'autobalance=1' to refer to 'maximum team imbalance=1'. I did this because of the '#autobalance 1' chat command; I didn't want to confuse anybody.
You're correct that this doesn't really balance anything. But the same goes for the existing 'autobalance' (sum of ranks [originally ELO ranks]).

MrChaos wrote:QUOTE (MrChaos @ Jul 30 2010, 07:30 PM) Anyhow what all those endless games availible with autobalance off and reduced player choice on to the tune of 100,000s of games it fortunately gives you glimpses into common player behavior via the old games. Which shows beyond a shadow of a doubt if you don't watch them like a hawk they will stack their collective balls off. You can watch by time played, the stacks coming and going, the little bugggers will early game stack, mid game stack, late game stack, and just stack for pure hate. That's one way to see how player behavior and stacking occur... Im not sure if you can generate useful statistical values, ANOVA and it's ilk ftw.

If you don't think players are intentionally doing it... I have anodotical evidence that "stealth stacking" happens purposely both on the player side and the commander side when the community raises a stink (by this I mean they are less blatant, players coordinate, and hide the stack once the game is clearly in hand) about it from time to time but *meh* you know what I think about this entire anoditical approach now don't we. That's however what happens when you spend months dinking with data, you start to see patterns that aren't first obvious and then need to check yourself using statistical tools.

Yes there are people who are stacking on purpose. And I've got the impression that the more imbalanced the game becomes, the less are people willing to join the "losing" team (where "losing" is just the impression they got from looking at the teams). Most people only don't mind anti-stacking if the game is almost even anyways (or if they assume the game is over soon and they can reduce their 'stack rating' without taking much of a rank penalty).
And then we also have people creating stacks by leaving their team once something goes wrong for them. No autobalance will ever be able to fix that.

(The above information is just my experience playing PUGs during the last year; I don't have any hard data)

But what I don't know and what we can't figure out from past data is how those people will behave in a system that prevents stacking. Will they fly for the less skilled team (once they figure out they cannot stack anymore), or will they stop playing at all?
The only way to figure this out is to try it in real games. Not even beta tests can help here (unless we force everybody to play beta for a week, or something like that).

MrChaos wrote:QUOTE (MrChaos @ Jul 30 2010, 07:30 PM) All people, at some point, will purposely go to the team they think will win, the frequency of this vs trying to keep game play fair is the 64 million dollar question. The answer is: don't allow them the chance at all. Like when Mom allowed you to break the candy bar and your sister to choose which half the players see this as equally fair but still kinda sucky. There is literature on this topic btw. Anyone playing over the ten years of Allegiance can tell you, ugh again this word, subjectively that stacking is ridiculously high if you allow it to occur without any type of check. MSR did research specifcally on the subject when they implemented their balancing system. 168 pages of powerpoint fun ftw.

I don't like "don't allow them the chance at all". There are valid reasons for choosing a particular team, like preferring one faction or trying to fly with a specific other player. This should be allowed as long as it doesn't lead to stacking.

MrChaos wrote:QUOTE (MrChaos @ Jul 30 2010, 07:30 PM) As for maximum team imbalance=1 count on the commanders to not allow stack current method *shrug* it's a @#(!ty way to run a railroad indeed. So your sharp enough to see and figured $#@! it I want my turn driving the train, how bad can it be anyway? Pretty bad given the all the cargo strewn about from the previous train wrecks taking this exact same approach, in this area alone.

You should see this as cleaning up (a bit) the mess left behind by the ELO autobalance. It's an improvement, though probably only a small one.

MrChaos wrote:QUOTE (MrChaos @ Jul 30 2010, 07:30 PM) As for the whole firey crucible business of beta... there are no ranks gents so ah yeah for the beta idea. Validating the code only proves that you didn't make an oopsie in your pants writing the code.. not so much in this case now will it. Hmmm. Ok so you can test code and nothing more ~yp (cause obviously bogus ranks are no way to do a proper test and people won't do it right)

Yes beta testing works only for showing that the algorithm implemented by Turkey matches what I implemented in JavaScript. I have anecdotal evidence that the latter would have prevented some stacks from forming (I put values from running games into my webapp). We have plenty of examples that show my algorithm makes a better choice than 'max imbalance 1' or the old ELO autobalance (sum of ranks) of where newbies should join to reduce the stack.

What we don't have is a simulation of what happens to the 'gameplay experience' - but given that we cannot simulate future player behavior, such a thing is impossible.

MrChaos · Post by **MrChaos** » Fri Jul 30, 2010 6:23 pm

Turkey

Im aware how the math works involving AllegSkills since well I helped develop it. Can I make mistakes, sure, misremember, sure too. Im not at all catching why your linking me to the article I helped Baker write early days for wiki. What am I missing? Again: I know you are a numbers guy, and respect that fact. I didn't see the behavior you did regarding the zero rank always going to the lower ranked team. I was wondering what numbers you put into it and/or how you arrived at that conclusion.

The Mu for a rookie is 15 for Allegskills due to legacy issues but lets take 25 for the sake of discussion. The Mu is 25 and the Sigma is 8.33, and when plugged into the TS AB algorithm it very elegantly handles there contribution to the team's chances to win (it says *meh* in a word). It's been literally years now so I don't remember if we or they put in no newbie stacking possiblities (all newbs one team on AB) for example but there are some qualifiers in the algorithm for sure... *sigh* it has been awhile gents, and I didn't code it

If you mean am I happy with a pure TS AB implementation noway jose. I actually see an oppurtunity to possiblely work with the.ynik and others on aspects of his idea. I'm probably guilty of riling everyone up in an attempt to explain myself and now Bard wants to boil me in oil as a result.

If CCS is about to go live then wait, please wait, and then lets look at aspects of the.ynik work, MSR's TS AB, and of course crabby old Baker's too. Im not sure what dovetails with the earlier work, and what does not with theynik's work. You can check the results, and beta testing with accurate ranks is also good too.

Again AB and any qualifiers will absolutely effect rank. Positively, negatively it all depends but the less you allow them to stack, AND you accurately sort them the better the rank will work for all.

Again it's the "let's just do it" and release it live Ive got concerns approaches and this is being done the exact same way.
MrChaos