What is refactoring and what is only modifying code?

With Martin Fowler's definition in mind,

Refactoring is a disciplined technique for restructuring an existing body of code, altering its internal structure without changing its external behavior.

... I think you are clearly right.

They also suggested things like changing data structures (like a Java LinkedList to an ArrayList), changing algorithms (using merge sort instead of bubble sort), and even rewriting large chunks of code as refactoring.

Changing an algorithm to something much faster is obviously not refactoring, because external behaviour is changed! (Then again, if the effect is never noticeable, perhaps you could call it refactoring after all - and also premature optimisation. :-)

This is a pet peeve of mine; it's annoying when people use the term sloppily - I've even come across some who might casually use refactoring for basically any kind of change or fix. Yeah, it's a hip and cool buzzword and all, but there's nothing wrong with plain old terms like change, rewrite or performance improvement. We should use those when appropriate, and reserve refactoring for cases when you are truly just improving the internal structure of your software. Within a development team, especially, having a common language for accurately discussing your work does matter.


Martin Fowler's "Refactoring: Improving the Design of Existing Code" is perhaps THE reference:

Refactoring is a controlled technique for improving the design of an existing code base. Its essence is applying a series of small behavior-preserving transformations, each of which "too small to be worth doing". However the cumulative effect of each of these transformations is quite significant. By doing them in small steps you reduce the risk of introducing errors. You also avoid having the system broken while you are carrying out the restructuring - which allows you to gradually refactor a system over an extended period of time.

Refactoring goes hand-in-hand with unit testing. Write tests before you refactor and then you have a confidence level in the refactoring (proportional to the coverage of the tests).

A good reference is: Information about Refactoring


To give my view:

Small, incremental changes that leave the code in a better state than it was found

Definitely Yes: "Cosmetic" changes that are not directly related to features (i.e. it's not billable as a change request).

Definitely No: Rewriting large chunks clearly violates the "small, incremental" part. Refactoring is often used as the opposite of a rewrite: instead of doing it again, improve the existing.

Definitely Maybe: Replacing data structures and algorithms is somewhat of a border case. The deciding difference here IMO is the small steps: be ready to to deliver, be ready to work on another case.


Example: Imagine you have a Report Randomizer Module that is slowed down by it's use of a vector. You've profiled that vector insertions are the bottleneck, but unfortunately the module relies on contigous memory in many places so that when using a list, things would break silently.

Rewriting would mean throwing the Module away an building a better and faster one from scratch, just picking some pieces from the old one. Or writing a new core, then fitting it into the existing dialog.

Refactoring would mean to take small steps to remove the pointer arithmetics, so that the switch. Maybe you even create a utility function wrapping the pointer arithmetics, replacing direct pointer manipulation with calls to that function, then switch to an iterator so that the compiler complains about places where pointer arithmetics is still used, then switch to a list, and then remove the ultility function.


The idea behind is that code gets worse on its own. When fixing bugs and adding features, quality decays in small steps - the meaning of a variable subtly changes, a functions gets an additional parameter that breaks isolation, a loop gets a bit to complex etc. None of these is a real bug, you can't tell a line count that makes the loop to complex, but you hurt readability and maintenance.

Similarly, changing a variable name or extracting a function, aren't tangible improvements of their own. But alltogether, they fight the slow erosion.

Like a wall of pebbles where everyday one falls to the ground. And everyday, one passerby picks it up and puts it back.


Fowler draws a clean line between changes to code that do, and those that do not, affect its behavior. He calls those that do not, "refactoring". This is an important distinction, because if we divide our work into refactoring and non-refactoring code modification activities (Fowler calls it "wearing different hats"), we can apply different, goal-appropriate techniques.

If we are making a refactoring, or behavior-preserving code modification:

  • all our unit tests should pass before and after the modification
  • we should not need to modify any tests, or write any new ones
  • we expect cleaner code when we are done
  • we do not expect new behavior

If we are making a behavior-changing code modification:

  • we expect new behavior
  • we should write new tests
  • we may get dirtier code when we are done (and should then refactor it)

If we lose sight of this distinction, then our expectations for any given code modification task are muddled and complex, or at any rate more muddled and more complex than if we are mindful of it. That is why the word and its meaning are important.

Tags:

Refactoring