Standard methods of debugging

My approach varies based on my familiarity with the system at hand. Typically I do something like:

  1. Replicate the failure, if at all possible.
  2. Examine the fail state to determine the immediate cause of the failure.
  3. If I'm familiar with the system, I may have a good guess about to root cause. If not, I start to mechanically trace the data back through the software while challenging basic assumptions made by the software.
  4. If the problem seems to have a consistent trigger, I may manually walk forward through the code with a debugger while challenging implicit assumptions that the code makes.

Tracing the root cause is, of course, where things can get hairy. This is where having a dump (or better, a live, broken process) can be truly invaluable.

I think that the key point in my debugging process is challenging pre-conceptions and assumptions. The number of times I've found a bug in that component that I or a colleague would swear is working fine is massive.

I've been told by my more intuitive friends and colleagues that I'm quite pedantic when they watch me debug or ask me to help them figure something out. :)


Consider getting hold of the book "Debugging" by David J Agans. The subtitle is "The 9 Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems". His list of debugging rules — available in a poster form at the web site (and there's a link for the book, too) is:

  • Understand the system
  • Make it fail
  • Quit thinking and look
  • Divide and conquer
  • Change one thing at a time
  • Keep an audit trail
  • Check the plug
  • Get a fresh view
  • If you didn't fix it, it ain't fixed

The last point is particularly relevant in the software industry.


When I'm up against a bug that I can't get seem to figure out, I like to make a model of the problem. Make a copy of the section of problem code, and start removing features from it, one at a time. Run a unit test against the code after every removal. Through this process your will either remove the feature with the bug (and hence, locate the bug), or you will have isolated the bug down to a core piece of code that contains the essence of the problem. And once you figure out the essence of the problem, its a lot easier to fix.