How to test Outlaw Rogue - Frustrating, nickle-slot-machine of a spec.

enchlol

Legend
TL;DR I wanted to contribute useful data to the community. I failed to do so and instead wrote a bunch of words that go nowhere. If you have hard data to share or tips on how to make testing more reliable, please let me know.

Evening all,
I've been messing around with Rogue specs for a while now working on a PvE focused project. Which naturally made me curious about weapon enchants. I hear things thrown out occasionally about this weapon / enchant combo or the other being BiS, but I never see anybody share data on how they came to that conclusion. So I decided to test for myself and publish my findings. (It goes poorly).

I decided to try to narrow down the best weapon + enchant combo for each spec. To start, I decided to try a number of different weapons un-enchanted for each spec, then take whatever is highest damage weapon and test between Ele force and + Damage to determine best combo for each spec. (I decided to do in this order in to minimize number of enchants I would need to burn testing).

Where I went wrong, was starting the test with Outlaw. While RNG is prevalent in WoW, this class takes it to a whole new level.

Leaving aside rng that all classes have, this spec has
30% chance to apply instant poison
35% chance to hit twice with sinister strike
30% chance to proc Main Gauche
20% increased chance to crit if you use Between the Eyes (I didn't)

Yet, I pressed on.

Methodology
  • Test each weapon with no enchant.
  • Do 5 runs of 4 minutes each on the Raiders Training Dummy and record dps at the end of each.
  • Additionally, note the damage on 5 point evis (I planned on weighting this lower than dps).
  • Only use Slice and Dice and Evis as finishers. Prioritizing SnD. I didn't want to introduce even more variability by using Between the Eyes.
  • Use the same off-hand weapon for all tests. No damage enchant.
  • Use static, plus stat trinkets (no on proc or on use)

The Weapons
Blue Moks
5 point evis tooltip = 231
Run 1 = 147 dps
Run 2 = 150 dps
Run 3 = 128 dps
Run 4 = 143 dps
Run 5 = 130 dps
Average = 139.6 dps

Revenger
5 point evis tooltip = 222
Run 1 = 137 dps
Run 2 = 140 dps
Run 3 = 133 dps
Run 4 = 130 dps
Run 5 = 139 dps
Average = 135.8 dps

Ced's Carver
5 point evis tooltip = 225
Run 1 = 145 dps
Run 2 = 138 dps
Run 3 = 133 dps
Run 4 = 142 dps
Run 5 = 143 dps
Average = 140.2

Soulrender's (Socketed)
5 point evis tooltip = 227
Run 1 = 135 dps
Run 2 = 139 dps
Run 3 = 146 dps
Average = 140 (but only 3 runs)

At this point, whatever buff I had when I started this ordeal in org (red dragon icon?) fell off. I hadn't noticed it when I started my test but I decided I couldn't continue or I'd be comparing apples to oranges.

Weapons still to test
Coldrage Dagger, Green Cudgel (for science)

And of course, I haven't started testing Assassin or Subt specs.

Conclusion
After banging away on a target dummy for well over an hour, I still don't think I have a solid answer. The averages of the runs other than Revenger are all very close. Mok's has highest tooltip damage and had the highest dps run. It also had the lowest dps run. I don't know if I just low-rolled the hell out of that 128 dps run, or if Mok's genuinely is more variable that the other weapons for some reason (which would call into question the high end run as well).

I'll edit this post if I get more data but I'm curious what methods everybody else uses when testing these things.

Cheers

Changelog:
Added averages based on @Monkeyflip excellent suggestion.
 
Last edited:
Great work on your data with different types of specs!
It surprises me that there isn't that much of a difference if you do an average on all the different weapons that you've used.

Maybe you can do some different charts and calculations in the future, I think it'll be a much deeper test and the outcome of the results can have different variables - or maybe not, we can always guess until someone makes several tests.
 
It occurs to me rather than just relying on the simple dps number, I could control for most of the confounding factors. That would require me recording crit % actual vs crit % expected and adjusting accordingly. Similarly, total number of strikes and actual poison applications vs expected poison applications (what counts as a strike?). Same for main gauche. And lastly, I could drop the quick draw talent (that kind of nerf combo point generation but is probably the easiest way to control).

I'll end up doing more tests, but not tonight.
 
I mean, you can run a t-test on this data and it probably won’t qualify as statistically relevant.

In general with noisy data the solution is… just collect more data. Getting 10 sets of 10 minutes per condition should give you a signal.

You also want to reduce any variance on the testing side: write a stable /cast sequence macro, automate clicks with a simple python mouse controller, etc.
 
  • Like
Reactions: Sil
TBH PvE at 20 isn't challenging enough to justify DPS benchmarks. Bosses die fast enough to be worried about dps.
 
  • Like
Reactions: Sil

Users who are viewing this thread

Top