Whenever your predictions are that far off results, there is a problem in how you are designing and assessing changes.
Maybe your play testing system needs to involve skilled players, rather than staff. You may be in an echo chamber. You may not be comparing apples to, well, earlier versions of apples.