Well I wasn't going to explicitly get into which formula is correct, but if you really want to...
Originally Posted by
zealous
Would you expect to get significantly different results for either a multiplicative duration or a additive attack rate model fitted to your data?
The multiplicative duration model is:
duration = A * (1 - a) * (1 - b) * (1 - c) * ... + B
with A and B being constants, and a, b, c, etc. are the alacrity modifiers (so haste boost 1 would be 0.15, jorgundal's collar would be 0.1, etc.). Converted into swings per unit time (i.e. per minute), it then becomes:
swings per minute = 1 / [ A * (1 - a) * (1 - b) * (1 - c) * ... + B ]
However, for my additive model, the swings per minute is:
swings per minute = C * [1 + D*(a + b + c + ...) ]
These two formulae are mutually incompatible, that is, given A and B constants for the first equation, it is generally impossible to find C and D for the second equation that would match up with the first equation for arbitrary values of a, b, c. In other words, it is generally impossible to find A, B, C, and D such that the following holds:
1 / [ A * (1 - a) * (1 - b) * (1 - c) * ... + B ] = C * [1 + D*(a + b + c + ...) ]
for arbitrary a, b, c, etc. Or at least, I wasn't able to. If you find a solution, let me know.
So this means that there is one that is correct and one that is not (or possibly, that both are not). In other words, there is a fundamental difference in the structure of the model, which is what I said before. What it remains to do then is to see which one can actually match the data.
In the multiplicative duration model, using A = 0.498012 and B = 0.174479, this is what I get for the duration for two-handed (let me know if you got something different):
Code:
0.67249 0.59779 0.57289 0.54799 0.52309
0.62269 0.55546 0.53305 0.51064 0.48823
0.59779 0.53429 0.51313 0.49196 0.47080
By contrast, this is what I get by using my model above (i.e. C = 87.2 and D = 1), converted into duration (in seconds):
Code:
0.68807 0.59832 0.57339 0.55046 0.52929
0.62552 0.55046 0.52929 0.50968 0.49148
0.59832 0.52929 0.50968 0.49148 0.47453
If we now take the percentage error, that is:
(model's predicted value - data's value) / data's value
then for the multiplicative model above, it becomes:
Code:
-3.385% 0.229% -0.413% -0.358% -0.962%
-0.370% 1.001% 0.302% 0.425% -0.158%
-0.368% 1.159% 0.402% 0.032% -0.897%
while for the linear additive model, it is:
Code:
-1.147% 0.319% -0.325% 0.092% 0.212%
0.083% 0.092% -0.406% 0.238% 0.508%
-0.279% 0.212% -0.272% -0.066% -0.111%
Now both models don't do well when there's no boosts at all (the upper left cell). However, as explained in my post, due to the testing methodology, the resolution of the measurement was 1 video frame (0.1 sec), and so that's the measurement error. The average testing length was 17.6 seconds, so this works out to a percentage error of 0.57%. Thus an error below this may simply be due to limits of the testing method, but an error above this is not explained by the measurement methodology and is due to modeling error, lag, computer processing load, etc. The multiplicative duration model has no less than 4 out of 14 (non-unboosted) values that are above this measurement error, while all the predicted values (except the unboosted case, which was disregarded) for the additive model are within this error.
"Now wait" you say, "maybe there was something wrong, or your computer is really crappy and unsteady when you made those videos, and all that bad stuff...the difference wasn't that much beyond the measurement error" and you'd have a point. The problem is that there were only two kinds of boosts used for THF, an enhancement bonus (from haste spell or the collar) and the bonus from the haste action boost. However, the difference between an additive model and a multiplicative model is going to be at the edges of the situations, in other words, a linear function and a nonlinear function will be relatively similar in normal circumstances (where you made the fit), and differ as you move away from them. This is roughly the justification for using small-angle approximations and more generally, Taylor series expansion, that nonlinear and linear (or, polynomial) functions will be closely approximate within a given (small) range.
The key then is to add more boosts (or have very few boosts), or more generally, to have a wide range of boosts, to get to the edges of the situations. Fortunately, because of the tempest bug, this can be done. If you go back to my post, you'll see that for THF, the tested range was from 0% to 45% of the base attack speed, whereas for TWF, the range was 0% to 55% -- or actually, 0% to 66% if you include that TWF gets more benefit from alacrity bonuses. Unfortunately, I haven't gotten around to filling out the rest of the tempest chart yet. However, I can insert the tempest data in with the rest of the TWF data, in swings per minute:
Code:
enh_bon 0% temp10% 15% 20% 25% 30% t+30%
none 86.7 97.1 101.8 107.9 112.9 118.0 128.3
collar 96.6 107.3 112.9 117.4 122.7 127.8 138.7
haste 102.1 112.9 118.0 122.7 128.3 133.3 143.5
where "temp10%" refers to the tempest alacrity 10% bonus but no haste boosts, the percentages refer to the amount of haste boost, and "t+30%" refers to tempest alacrity along with 30% haste boost. If you prefer the data in terms of duration:
Code:
enh_bon 0% temp10% 15% 20% 25% 30% t+30%
none 0.69167 0.61786 0.58929 0.55625 0.53125 0.50833 0.46750
collar 0.62143 0.55938 0.53125 0.51111 0.48889 0.46944 0.43250
haste 0.58750 0.53125 0.50833 0.48889 0.46750 0.45000 0.41818
How this data was arrived at is explained in my first post, I just combined the TWF and TWF tempest data sets together.
To fit this data using a multiplicative duration model, the best I could come up with was:
A = 0.546725
B = 0.120909
Let me know if you can find better values for the constants A and B. I ignored the duration for the unboosted case in my best-fit. The data ended up being (in duration):
Code:
enh_bon 0% temp10% 15% 20% 25% 30% t+30%
none 0.66763 0.61296 0.58563 0.55829 0.53095 0.50362 0.46535
collar 0.61296 0.56376 0.53915 0.51455 0.48995 0.46535 0.43090
haste 0.58563 0.53915 0.51592 0.49268 0.46945 0.44621 0.41368
Using the above TWF formula (C = 86.6 and D = 1.2), I get:
Code:
enh_bon 0% temp10% 15% 20% 25% 30% t+30%
none 0.69284 0.61861 0.58715 0.55874 0.53295 0.50944 0.46814
collar 0.61861 0.55874 0.53295 0.50944 0.48792 0.46814 0.43303
haste 0.58715 0.53295 0.50944 0.48792 0.46814 0.44990 0.41737
If you again take the percentage error, you'll find that for the multiplicative duration model, the percentage error for each scenario is:
Code:
enh_bon 0% temp10% 15% 20% 25% 30% t+30%
none -3.475% -0.792% -0.621% 0.367% -0.056% -0.928% -0.461%
collar -1.363% 0.783% 1.488% 0.673% 0.217% -0.873% -0.369%
haste -0.319% 1.488% 1.492% 0.776% 0.416% -0.842% -1.076%
For the additive model, it's:
Code:
enh_bon 0% temp10% 15% 20% 25% 30% t+30%
none 0.170% 0.121% -0.362% 0.448% 0.321% 0.218% 0.136%
collar -0.454% -0.113% 0.321% -0.327% -0.199% -0.279% 0.121%
haste -0.059% 0.321% 0.218% -0.199% 0.136% -0.023% -0.193%
For the TWF case, you can see that even disregarding the unboosted case (which was not used to find the optimal A and B), there are no less than 13 out of the remaining 20 scenarios in which the discrepancy between the model's predicted value and the data is greater than the measurement error -- some by nearly 1.5%, which would represent a 2.6-frame difference from the actual data (easily observable when the measurement's resolution is 1 frame). By contrast, the linear model matches all observed scenarios within the measurement error -- and as a bonus, predicts the unboosted amount as well.
Furthermore, the errors in the multiplicative model are not random -- they are negative at the low end (little boosts), positive in the middle (medium boosts), and negative again at the high end (lots of boosts), which is exactly what you'd expect if you are trying to fit a nonlinear curve to a linear set of data (consider what happens if you try to fit an exponential curve to some linear data: the errors would be one direction at the ends, and the opposite direction in the middle). As a side note, remember that though I gave the data in terms of durations of a swing, in terms of swings per minute, the additive model is linear with respect to the alacrity modifiers whereas the multiplicative model is basically hyperbolic (nonlinear). I thought maybe there's some kind of test to determine if perhaps the model you're using is fundamentally different than the data, but can't think of it off the top of my head.
Returning to the unboosted attack speed, other than loading up on boosts, the other edge is to not have any boosts. In this case, the linear additive model for TWF fits pretty well with the data even for the unboosted case, while THF was off by -1.147%. However, this was due to disregarding that data point explicitly and treating it as a special case (and not trying to fit it). If I had chosen to include it instead, the formula for THF would have been 86.6215 * (1 + 1.0307*sum of alacrity modifiers), and the resulting percentage error (in terms of duration) would have been:
Code:
-0.487% 0.586% -0.170% 0.145% 0.171%
0.471% 0.145% -0.446% 0.110% 0.299%
-0.014% 0.171% -0.399% -0.273% -0.393%
Note that now, all errors are actually within the measurement error. For the 0.586% case (which is slightly above the given average measurement error of 0.57%), that test was actually out of 167 frames, so its actual measurement error was 1/167 = 0.599%. So had I chosen to, both of my formulas for THF and TWF could have included the unboosted attack speed and would still have remained within the bounds of the measurement error.
By contrast, if the multiplicative model had tried to take into account the unboosted attack speed without the use of the "fudge factor" 0.95, then fully 8 out of the 15 THF scenarios would have ended up outside of the measurement error, indicating modeling problems. And for TWF, it would've made it 15 out of 21 TWF scenarios that would exceed the measurement error, with multiple scenarios being around 1.8% off from observed values. So treating unboosted THF attack speed was a luxury for my model -- it was not necessary within the bounds of measurement error, but I preferred a better match to the rest of the data -- whereas it was absolutely necessary for the multiplicative model to treat unboosted THF and TWF as separate cases, since it was an "edge" case. This is why the original formulation had that "fudge factor" of 0.95 thrown into the mix for when alacrity bonuses are involved -- it was to change the value of A for the unboosted case but to use a different A in all the other cases (i.e. when there are alacrity bonuses) in order to fit the data, whereas in my linear additive model no changes were needed -- I just preferred to lay it out separately.
Last thing I wanted to point out about the THF unboosted scenario, is that a percentage error of -1.147% was significant enough for me to treat it as a special case, while for TWF, the percentage error for the multiplicative model exceeds this amount 4 times -- 5 if you count the unboosted case. And for THF, the multiplicative model doesn't exceed this (except in the unboosted case of course) but approaches it (>+-0.9% error) 4 times, yet is considered "fits pretty much perfectly" with the data. That this difference was big enough for me to make it into a special case while it was small enough to be considered "fits pretty much perfectly" for the multiplicative model speaks volumes about how accurately each model can match the experimental results.
So the multiplicative model actually is demonstrably off from observed results, beyond what can be accounted for by experimental measurement error, with as much as a quarter-second difference during the time of a haste boost (around 17.6 seconds was the average testing length), while a linear model fits all the current data to within 0.1 seconds (the limits of the testing). You can draw your own conclusions from there.
You are welcome to see if you can find A and B for the multiplicative model that fit the data better. It's always possible that Excel found the wrong local optimum when I ran the solver routine.
Originally Posted by
zealous
F-statistic looks good, R-squared looks good, residuals looks good.
Wow, the model fits pretty much perfectly, then this must truly be the real deal!???
There's no need to appeal to more complicated statistical tests such as the F-test in this case, since interpreting them (i.e. what values are good or bad) depends on the situation. For example, the R-squared may have looked pretty good, but the model's error is demonstrably higher than what is accounted for by measurement error, indicating problems with the modeling -- which the R-squared value won't tell you. After all, you probably got something like R^2 = 0.998 for the multiplicative model (at least I did in the first post) which usually indicates a pretty good fit; however, it's still not good enough in this case due to the precision of the measurement. And if you did the same analysis with the additive model, I'd bet you'll find that by those tests it'd look even better.
Originally Posted by
zealous
I'd say it being reasonable and generally applicable are two quite important points too.
I personally think that the additive model is a lot easier to work with than trying to fit in a hyperbolic formula (which is what the multiplicative model end up being). It's much more intuitive to think a 15% haste spell and a 20% haste boost means 15%+20%=35% more DPS rather than the amount of time needed to figure out how the multiplicative model works.