What can you do to a legacy codebase that will have the greatest impact on improving the quality?

This is a GREAT book.

If you don't like that answer, then the best advice I can give would be:

  • First, stop making new legacy code[1]

[1]: Legacy code = code without unit tests and therefore an unknown

Changing legacy code without an automated test suite in place is dangerous and irresponsible. Without good unit test coverage, you can't possibly know what affect those changes will have. Feathers recommends a "stranglehold" approach where you isolate areas of code you need to change, write some basic tests to verify basic assumptions, make small changes backed by unit tests, and work out from there.

NOTE: I'm not saying you need to stop everything and spend weeks writing tests for everything. Quite the contrary, just test around the areas you need to test and work out from there.

Jimmy Bogard and Ray Houston did an interesting screen cast on a subject very similar to this: http://www.lostechies.com/blogs/jimmy_bogard/archive/2008/05/06/pablotv-eliminating-static-dependencies-screencast.aspx


I work with a legacy 1M LOC application written and modified by about 50 programmers.

* Remove unused code

Almost useless... just ignore it. You wont get a big Return On Investment (ROI) from that one.

* Remove duplicated code

Actually, when I fix something I always search for duplicate. If I found some I put a generic function or comment all code occurrence for duplication (sometime, the effort for putting a generic function doesn't worth it). The main idea, is that I hate doing the same action more than once. Another reason is because there's always someone (could be me) that forget to check for other occurrence...

* Add unit tests to improve test coverage where coverage is low

Automated unit tests is wonderful... but if you have a big backlog, the task itself is hard to promote unless you have stability issue. Go with the part you are working on and hope that in a few year you have decent coverage.

* Create consistent formatting across files

IMO the difference in formatting is part of the legacy. It give you an hint about who or when the code was written. This can gave you some clue about how to behave in that part of the code. Doing the job of reformatting, isn't fun and it doesn't give any value for your customer.

* Update 3rd party software

Do it only if there's new really nice feature's or the version you have is not supported by the new operating system.

* Reduce warnings generated by static analysis tools

It can worth it. Sometime warning can hide a potential bug.


I'd say 'remove duplicated code' pretty much means you have to pull code out and abstract it so it can be used in multiple places - this, in theory, makes bugs easier to fix because you only have to fix one piece of code, as opposed to many pieces of code, to fix a bug in it.