Quantifying Benefits on Refactoring Work

When last we talked about estimating and prioritizing code maintenance, we'd left it as an exercise for the reader to devise a method to compare the relative size of the benefit from two different refactorings. In other words, you can work on A or B; which one is likely to give a greater benefit? I've had an idea for how to do this. It even includes some satisfying math.

Start with the premise that, when estimating the size of effort, it is sufficient—nay, preferable—to compare values in an abstract unit ("story points") that does not map directly to any real-world concept (such as "hours"). When you try to estimate effort in a real-world unit, people get distracted by (hung up on) the wrong details. Better to use an abstract unit that lets you compare the relative sizes of two efforts. Then you can prioritize them ("Let's do the smaller one."), and over time you develop a predictive measure of how many units your team can deliver. Can we create a similarly abstract yet useful unit for comparing benefits?

You undertake a refactoring because you want to make the code better. The benefit from this work comprises two pieces: how much you're going to improve an area of the codebase, and how important that area is. Your impact will be made up of how much "betterness" you can impart, and how much the betterness will matter.

You've probably had a similar experience: You find an area of the code that makes you frankly itch to improve it. You could dramatically increase its beauty. But it's not really used in very many places, and business needs hardly ever drive you to change it, so its beauty or lack thereof is actually rather insignificant. On the other hand, there's a class that makes you queasy every time you have to interact with it, but it's used everywhere, so changing it would be a huge, risky undertaking. It stays ugly, despite being so important.

There are a number of intuitive and emotional influences in those decisions, and they interact with each other in additive and multiplicative ways. This is a good place to apply some rigor, to get the emotions out of the way and compare options more objectively. You apply a similar rigor when you tackle a difficult decision by actually writing down the pros and cons in two columns on a piece of paper, so that you can see how the two sides stack up. So let's apply that to comparing the possible benefits from two refactorings. We only have time to do one, so we're trying to decide which one to do.

Consider a questionnaire, with pairs of questions, asking about what the code might be like after you're done.

  1. Future proofing:

    1. How much easier will it be to change it the next time?

    2. How often do we get asked to change this area?

  2. Regression proofing:

    1. How much better will our test suite be able to prevent defects in this area?

    2. How business-critical is it that we don't introduce defects in this area?

  3. Avoiding risk:

    1. How likely are we to succeed without creating problems?

    2. How tolerant is our business of risk in this area?

  4. Improving satisfaction and increasing velocity:

    1. How likely are we to reduce tech support incidents by this work?

    2. How many of our tech support incidents can be attributed to this area?

And so on, with questions that are specific to your own team. Collaborate with your team to create the questions, so that they represent the team's decisions. Note how a pair of questions covers how much improvement and how much that improvement will matter. Also see that for each question, a more emphatic answer is a good thing. For example, the avoiding-risk question asks how likely we are to succeed, not how risky the task is. Questions are phrased so that, the more you say "Yes, a lot," the more that's an endorsement in favor.

Now, depending on your bent, this next bit will remind you either of a prioritization matrix or a Cosmo quiz. Nevertheless, for each question in your questionnaire, answer a 1, 3, or 5, to represent "barely," "some," "a lot." Then multiply the coordinated a's and b's and add those up: (1a * 1b) + (2a * 2b) + (3a * 3b)... In this way, you represent the multiplicative relationship between How Much Better and How Much Does It Matter.

We need a unit for these scores. I'm thinking "bunits" (BYOO-nits) because they are a unit of benefit and of beauty. So if you have two refactoring tasks on the table during your sprint planning, and Refactoring A has an effort of 13 story points and a benefit of 17 bunits, while Refactoring B has an effort of 8 story points and 23 bunits, you can lean towards choosing Refactoring B. As with all matrices of this type, if that decision completely flouts your intuition, then discuss it with your team—maybe your intuition can be calmed, or maybe the questionnaire is failing to cover an important aspect of your work and needs some additional questions.

The real benefit of refactoring is proven only over time. Keep previous bunit estimates in mind and use those in your comparisons and trade-offs when selecting refactorings to undertake. Add this technique as another tool to help you understand the parameters of your decision, but never to override your own good sense.

This is a technique for quantifying the benefit of work that does not have a direct dollar ROI. Why quantify benefit, especially in a unit that has no analog in the real world? Two reasons. First, if you're going to prioritize your to-do list, every item needs some sense of effort and some sense of reward. It just makes sense to do easy things with lots of benefit before hard things that barely matter. Second, asking yourself questions like the above imparts rigor to your decision-making process. We are seekers of beauty, and want desperately to clean up whatever icky thing we looked at most recently. Surveying the choices and comparing relative benefits prods us to make sound business decisions, instead of scratching an itch.