Friday, August 5, 2011

Warhammer: Against Nikephoros' Metric System

So there's a fellow going by the nom de guerre of Nikephoros, writing on the blog Bringer of Victory. He and his Warhammer "metric" are gaining popularity. I believe this article to be his original post on the subject. Lately I've come to realize that I was unfairly dismissive of the execution of his attempt to design a Warhammer 40,000 evaluation metric for list-building. In part this is because I did not address it fairly and objectively the first time I reviewed it, and in part this is because the apparent popularity of this metric requires that it be seriously and objectively refuted.

If it sounds like I'll be bending some of my arguments to fit some pre-conceived notions, in suggesting the Nikephoros' metric (hereafter NM) need refutation. I'm going to be using the traditional essay format of stating my thesis, explaining my reasoning, and then reiterating my thesis as the conclusion to an argument formed by my reasoning. And as any good scholastic philosopher knows, if you cannot escape the conclusions of your argument, go back and edit the introduction so that you're arguing for those conclusions!

In this case I've come to the conclusion that NM is flawed in both general principles and detail, even if it is both well-meaning and heroic in the attempt.

To summarize NM as honestly and charitably as I can, NM is about correlating expected performance with an empirical record of wins on the tabletop. The expected performance is measured alone four axes: Dead Marines from optimal shooting ('DMS'), Dead Marines on an optimal charge ('DMCC'), Dead Rhinos per game ('DRPG'), Dead Land Raiders per game ('DLRPG'). These numbers are interpreted on a point-by-point basis. So while one weapon may have raw power on its side, it may cause more proportionally. These number are aggregated on a unit by unit basis for armies at a particular point level. Nikephoros uses the top four lists from the 2010 Nova Open to generate scores.

So let's go back to how one wins games in Warhammer 40,000. One way of winning any game is to eliminate your opponent entirely. Another way is to win more kill-points. A third way is to capture more objectives. The first way definitely depends on your army's ability to kill the other player's army, and the other player's ability to both kill your units and avoid getting their own killed. Likewise you don't have to reach a certain number of kill points so much as get more than your opponent. Tournaments like the NOVA, for instance, even defer winning by two kill points or less to secondary victory conditions. Finally, winning objectives is a matter of having more objectives held on the turn the game ends, which again is also accomplished by contesting objectives as well as killing units.

Don't get me wrong, killing stuff, especially Space Marines is important in Warhammer 40,000. But if you only consider killing Space Marines, and then only certain Space Marines (and vehicles), and don't consider the utility of glancing hits given the new rules for cumulative damage effects, and make unsupportable assumptions about optimal positioning in a zero-sum game, then you're no longer doing useful math.

So besides generating these metrics from highly artificial and unusual (aka optimal) conditions, against a disproportionately represented material asset, Nikephoros doesn't even generate accurate numbers. Units armed with Missile Launchers, for example, apparently have no effect on AV14 Land Raiders, which is (strictly speaking), false. In game theory terms it's like arguing for rock in Rock-Scissors-Paper because lots of people like rock, and rock does well when people take scissors. In other words this isn't game theory, it's the same old bad statistics that have mislead people about the optimal tactics and strategies in Warhammer 40,000 time and again, not because the numbers themselves are inaccurate, but because they have not been properly contextualized according to the game theoretic concepts that the rules of the game actually employ.

So while Nikephoros is on the right track in advocating the application of game theory to the game of Warhammer, he is not actually applying the right parts of game theory, and thus failing to provide adequate context to test the value of truth-bearing statements about the game. I've heard mention that people admire Nikephoros for being a good scientist, and certainly his approach is actually a very good one if you're planning on testing some natural phenomenon. However, he and his followers should not fool themselves into believing that attempting to treat Warhammer units like major league managers treat baseball players is in any way realistic or useful. Warhammer 40,000 is much more like one of the toy-games of mathematicians than a complex phenomenon like major league baseball.

I mean, obviously NM is useful in the sense that it gets people thinking, which is better than not-thinking and gives me an opportunity to harp on about the right way to do these things. But NM is not useful in the sense that it gets people thinking the right way, which is about using the material defined by an army list on the space and time of the board to achieve the game's Nash Equilibrium.

So to review my thesis. I said that I would show how NM is wrong in both detail and general principles. I have shown how NM ignores the general principles of the game, concentrating on killing models at the expense of winning the game, since killing models is neither necessary nor sufficient for winning the game. I have shown how NM is wrong in detail, citing both the un-justifiable assumption of optimal conditions, and lack of computation of marginal results such as glancing hits. These results are no mere mistakes of calculation, they are indicative of severe theoretical problems underlying the NM, in both detail and principle.

I would like to add that NM also fails to show how time, material, and position act to synergize and 'de-synergize' (can't think of a better negation for synergize...) the kill-numbers that it generates. Finally, and this isn't Nikephoros' fault considering how difficult it is to collect and appraise play-testing feedback, NM isn't applied in a rigorous empirical fashion to a sufficiently large sample, as if empirical testing could tell us anything about winning strategies for a game like Warhammer 40,000.

Put another way, if you're going to tell me that a unit of six Long Fangs with five Missile Launchers will be expected to kill exactly 0 Land Raiders per game while killing more than 8 Rhinos per game, I'm going to dismiss you out of hand for never actually reading the fucking rules.

Sorry Nikephoros, but you're doing it wrong, and misleading people.

9 comments:

  1. The problem to me looks something like this:
    http://xkcd.com/793/

    NM's system simplifies 40k.
    It might get long fangs (and other units) scores wrong, it makes wild assumptions (optimum conditions)and doesn't take tactics into account.

    But a few things are true:
    Firstly it treats all units the same, so while Long Fangs get inflated DRPG and low DLRPG so does every similar unit, as too does every unit get optimum conditions, so it doesn't throw the metric off by as much as you want.

    Secondly tactics are already accounted for:
    Firstly because it's based around 4-0 lists, all the lists it creates a figure for are ones which can successfully complete all the objectives you stated. This becomes significant because what NM ISN'T is what you've taken it as, a tool to build armies with. What it is, is a way of showing relative firepower of competitive armies, and saying your army needs to be in this ballpark to be competitive. Min/maxing along the lines of NM would be, as you say, tactically stupid because the aim of 40k isn't to score kills in optimum conditions, but that isn't the point of NM.

    I think you're looking at this wrong, you're looking at it as "the optimal tactics and strategies in Warhammer 40,000" and how to reach them. It was never meant to be that, it was meant to be a way of seeing if a list is competitive, and seeing why other lists weren't. To have missed that point I can only assume you either haven't been reading NM or have a grudge against him.

    ReplyDelete
  2. I believe you have the wrong end of the statistical stick with regards to the 40k metrics measurement. Most of your arguments *are* correct in my opinion, but they are based on a flawed assumption that NM is method of determining ‘how good is your army?’
    As far as my understanding (and use of) this system goes, it is a set of best-case tests of what your army is capable of doing; 5.9 inches away from the enemy, on an empty field, with no interruptions from your opponent / enemy, with perfectly random dice and a suitably large sample number (n>1000).
    It is a *theoretical* statistical measurement of what your army can achieve in a perfect situation. As such, it fails as a measurement of how good your army list is. It doesn’t take into account your opponent, terrain, win conditions, or how hung-over you may be that morning.
    What it does do is give you an idea of where you may be lacking, if you have the tools for the job, comparing different army lists to see which can be stronger. Of course it doesn’t take into account ephemera such as mobility, survivability, age / author of codex / personal taste / play style. As long as you recognise that the scope and value of the NM method is limited to only the aforementioned criteria, then it is a worthwhile exercise.
    As such, I think you are mistaken in claiming that Nikephoros is doing it wrong. He is doing it right, as long as you recognise what ‘it’ is. As far as I’ve read, he is not misleading people; people are not recognising its high value as a specific tool, and trying to use this as, or incorrectly interpret it as, a way of measuring how good your army is.

    ReplyDelete
  3. So, finally two well-written comments that are actually worth posting. So first let me say: Thank you.

    Secondly, both your opinions are wrong.

    If a system does not analyze optimal strategies and tactics, lists and what you do with them, then it says nothing about whether a particular strategy (aka 'list') is competitive or not. Calculating optimal killing power isn't simply insufficient for evaluating a strategy's competitiveness, it's downright misleading.

    While you can win by wiping out your opponent's army, killing power is tangential to how the game works and is competitive. I'd unpack this notion more, but I have a series on analyzing the game entitled "Warhammer Basics" that explains it at length. Maybe I'll write another article...

    ReplyDelete
  4. While the DMS/DMCC are one turn statistics and I agree with the simplifications inherent in them, the DRPG/DLRPG are:
    1. five turn, no losses (sic!) game
    2. CC and shooting combined
    3. not actual wrecked results but penetrations - it dismisses glances completely.

    This three points combined make the two statistics complete nonsense - no army is able to shoot _and_ CC enemy vehicles for five turns with all its units and without any losses.

    Instead I proposed set of three pairs of statistics:

    DMS/CC
    DRS/CC
    DLRS/CC

    where DRS, DRCC, DLRS, DLRCC are one turn based and take into account glancing to death.

    You can read it at the Nikephoros blog - I even gave link to spreadsheet with calculations for an ork army.

    http://nike40k.blogspot.com/2011/07/40k-metrics-frequently-asked-questions.html

    ReplyDelete
  5. Alright, here are the breakdowns of three competitive lists, and two non-competitive ones.

    List 1:
    DMS: 25.05
    DMCC: 24.82
    DRPG: 64.80
    DLRPG: 19.27

    List 2:

    DMS: 22.21
    DMCC: 15.15
    DRPG: 82.35
    DLRPG: 33.74

    List 3:
    DMS: 28.61
    DMCC: 14.53
    DRPG: 27.03
    DLRPG: 6.23


    List 4:
    DMS: 19.28
    DMCC: 28.29
    DRPG: 59.18
    DLRPG: 17.04

    List 5:
    DMS: 12.58
    DMCC: 38.54
    DRPG: 45.48
    DLRPG: 27.73

    You can probably see which are the good and bad lists, if you know what's important in tournament play in 40k. (1,2 and 4 if you can't, or don't want to guess) These lists are all different, they focus on different ways of winning. The bad lists? They're clear because they have 2 scores significantly out of the acceptable range.

    The metric shows that:
    You need 50+ DRPG
    DLRPG is far less important than DRPG, but still matters.

    All this data perfectly fits the picture of tournament play, you need potentially 50+ dead rhinos a game in optimum conditions because your long fangs can only pop 5 rhinos not 8 in 5 turns, because they're going to lose a few men and because of smoke launchers and cover saves etc. etc. etc.

    And you know what, most of the conclusions you can draw are really obvious. There is more AV11/12 than AV14, so it's more important to pop AV11/12.

    Ironically NM supports your conclusion, killing power isn't all.

    Look at List 6:
    DMS: 18.90
    DMCC: 45.75
    DRPG: 80.00
    DLRPG: 31.04

    That (competitive) list kills 64 marines a turn, List 2 only kills 37 while List 5, an uncompetitive one, kills 50. Killing marines, as you've said all along isn't the be-all and end-all of 40k, and the metrics prove that, which i'd think you'd be happy with.

    List 2 incidentally has the lowest dead marines, but the highest dead rhinos.

    So what do the metrics show? Stopping rhinos from getting troops to objectives, table quarters to whatever, is more important than killing marines, and in fact a successful list might be looking at how few marines a turn it can get away with killing and still be successful.

    NM seems to perfectly break the game down, supporting your conclusion that "killing doesn't win 40k, objectives do". You seem to want this metric to fail.

    The system is limited, I might not agree with all of Nike's analysis, but the numbers and the metric aren't the problem, it's the way they're being analyzed.

    ReplyDelete
  6. When I created the metric, I didn't do it in order to compare list A to list B in order to see which list was "better." I wish people didn't try to use it for that.

    I think the best example of it being used properly is as follows.

    Johnny Space Wolf players creates a mech-wolves list and struggles to win against players of similar playskill. Upon applying the metric to his list, and comparing the scores to tournament proven successful mech-wolves lists, he sees that he has far too low of a DRPG and far too much DMCC. He then adjusts his list to spend fewer points on CC ability and more on anti-light mech shooting. Consequently, his list preforms better (or doesn't!) and he is either happy with the results or he postulates further changes. This is how I would use it, and how I do use it.

    Secondly, you bring up the lack of defense taken into account. I've addressed this before multiple times, but I think the most succinct way to put it is thus: most lists from the same codex will have similar defensive traits. Some may have a few more infantry bodies, some a few more tanks. But it will take similar firepower to kill Marine List A as it will Marine List B, even if one of the lists is pretty bad, and the other pretty good.

    Lastly, look at the lists people submit to Stelek and Kirby to review. What is the common flaw almost everytime? Lack of appropriate firepower. It is very rare for someone to submit a flawed list and the criticism end up being "this list has plenty of guns but isn't defensive enough." I've never heard that, actually.

    ReplyDelete
  7. This type of analysis is not misleading so long as it fully describes itself. The metric itself measures very specific things i.e. The killing power of the army at optimal range. So you can suggest (as I have) that the metric does not in fact accurately determine the killing power of an army, but that would be a critique of the metric itself. It seems you are making a second critique, however, which is that even if the metric did accurately measure the killing power of an army, that this measure is not related to actual success on the battlefield. You believe that the killiness of an army is NOT causally related to its success. I propose that it is possible to use very basic statistical techniques to determine what variation of the outcome is predicted by the measured killiness of the army.

    ReplyDelete
  8. I might as well post a comment in reply to Nikephoros' comment. Let's see if we can manage to communicate, shall we?

    Nikephoros: here's the thing, quantifying the ability to kill Tactical Space Marines in optimal conditions with shooting and close combat, combined with the ability to penetrate and destroy both Rhinos and Land Raiders, does not adequately describe whether your army has enough firepower.

    Quite simply your metric abstracts away the game, and Warhammer 40,000 is a game, not a shooting gallery. The fact that you are facing another player makes your metric so inadequate as to be misleading.

    The answer that the metric at least demonstrates whether an army list could kill enough given optimal conditions is unfortunately bullshit, to follow Frankfurt's definition. The game can be won by wiping out your opponent's army, but as DashofPepper found out one day, that only works if you have a co-operative opponent.

    Basically what you're doing is a work-intensive version of Tommy Tyranid calculating how many Space Marines his unit of Genestealers can kill in a single round of combat. It's fun to know the odds, but without the context in terms of the game (time, position, material, victory conditions, etc) it is useless.

    In fact it is worse than useless because it distracts people from the important part of the game, placing tools before goals.

    Rather than comparing idealized kill-ratios, players should be considering the conditions under which they will be scored the winner when the game ends, whenever it ends. Then you work backwards each turn of the game to deployment, and then finally to the material that you plan to deploy; your army list (aka 'teh build').

    Killing stuff is just one tool in a player's toolbox for winning the game. If you want to evaluate it, then as a player you need to calculate the comparative effects of your units against a variety of units, not just Space Marines, and under a variety of conditions.

    Fortunately we live in an age of spreadsheets so that you can make your calculations without taking your eye off the ball. Limiting your calculations to Tactical Space Marines means that you won't be caught off-guard by armies that exist outside of your artificially small sample.

    Unlike baseball we can model all of the possiblities in Warhammer 40,000 with combinatoric accuracy, rather than relying on misleading and misapplied statistical models. It's a simple game in many many ways.

    ReplyDelete
  9. My biggest grip is the small sampling pool that he uses to measure the effectiveness of a unit. What happens when your opponent has no Rhinos but two or even three Landraiders? What happens when your opponent fields a none MEQ army? You are left out in the wind with a bunch of numbers that have no value to them. What happens when you don't get the charge and are instead subjected to the charge instead? Once again you left out int he cold as NM has abandoned you once again. For units that have defensive rules/characteristics like that of counter charge how are they factored into the equation? It doesn't as it only accounts for 4 small aspects/units in a game that is filled with numerous facets and nuances.

    ReplyDelete

Try not to write like a wanker.