Debugging and the Scientific Method

Debugging and the Scientific Method

Once a program reaches a stable state in terms of architecture, design and functionality, debugging becomes the predominant activity in the development process (unless the architecture or design were flawed to begin with). The largest part of the debugging process is to find the root cause for errors, whereas the actual fixing is usually a much simpler activity: Many bugs are fixed with one line changes, or even one character changes.

Verifying Assumptions

My approach to understanding why a program fails is to identify a set of hypotheses, and then test one after another. There are a couple of important points in this process: Testing can be done via a various mechanisms, such as debugging, logging, modifying input data, modifying program code, modifying timings, or changing the execution environment. Each test should produce unambiguous results, and sequential verification is important in order to allow interpretation of those results. The hypotheses need to be simple enough and non-overlapping to reduce the set of possible root causes reliably, trying to verify too much at the same time can be quite difficult.

Reducing the Debugging Work

The actual verification process (implementing and running the tests) is usually simple but time consuming, which means that I try to automate as much as possible. But the true way to reduce debugging work is to identify the right set of hypotheses, and this is unfortunately also the toughest part of debugging. It is usually a mix of domain knowledge, intuition and a systematic approach which pay off. Domain knowledge can unfortunately only be gained by actually working with the program code, it is very rare that a problem is generic enough that it can be solved without deeper understanding about the code. Intuition however is another key component altogether: It allows to make assumptions without the deepest possible knowledge, rather, deliberate ignorance helps to prevent digging in too deep (after all, understanding a system down to the metal takes a long time), and produce hypotheses e.g. based on previous experience faster. There is of course the risk that such intuition based hypotheses can lead nowhere, and I have had many many situations where I declared victory too early only. Humility is therefore a very important parter to intuition!

Why Debuggers are Bad

Formulating a hypothesis why a program is broken requires thinking and knowledge. Using a debugger usually increases the knowledge about the inner workings of a program, but it can easily stand in the way of the thinking part when coming up with ideas about error root causes. Debuggers are good to verify mini-hypotheses when stepping through code, but they are very cumbersome at getting to the big picture and testing more complex problems (plus they are worthless for testing timing issues). My impression is that they are a popular tool because they tend to give immediate results on these mini-hypotheses, which gives the illusion of making progress. For me, far more important is to read code, work with a peer, and use tracing and logging to understand the problem.

Deliberate Breakage

One of my favourite approaches to produce hypotheses is to take the program in the two states of working and broken, and reduce the difference in implementation between them, until the smallest difference is identified as the root cause. In cases where a bug is the result of changes due to ongoing implementation work, this is very simple, git for example allows to bisect a series of changes and mark working ones and broken ones until the offending change is found.

This approach can be extended by looking at different definitions for working and broken. As an example, working does not necessarily mean that it has to apply to the same program which is broken, but it can also mean a completely different program which however shares some design or implementation with the broken program. In relation to Blunt, I have been able to test some assumptions about the packet flow by checking Blunt 1 versus Blunt 2, even though both are internally very different.

It is important to keep a very open mind when looking for these instances of a working system, having them is the only way to provide a solid foundation for further debugging, and to keep sanity. If everything is broken, it is a pretty rocky road to make any progress. It is not impossible though, and in those cases, I usually start removing parts of the broken program until at least something works, and then work backwards by adding code.

Now, how about Blunt?

All of the above also applies to reverse engineering, which is still the biggest chunk of work to get Blunt running. Reverse engineering is in some sense simpler than debugging since the code is known to work, and understanding it is achieved by disassembly, or black box testing techniques to verify assumptions about how the code works. I noticed that persistence and thoroughness really pays off in this area, and it seems that Blunt is very near to finally work as designed :) Stay tuned for more!

2011-11-05