In a previous post my colleague described our experiment on our ability to transfer the intention of the code by tests. The tests describe how the code behaves when called from the outside. Additional approach is to communicate through code.
To understand the code, at least the following two questions have to be answered:
- How does the code work?
- What is the reason behind the way the code is implemented?
Challenge
As long as the code is readable, it is possible to deduce its meaning. Improving readability is a common technique to help the reader. This includes using descriptive names, reducing complexity or hiding implementation details until they are absolutely necessary to understand the problem.
On the other hand deducing the reason why exactly this implementation was chosen by somebody is an impossible task without the knowledge (or lack thereof) of all implementors combined. One of the missing parts are the assumptions. Our code is full of them. Consider the following example:
void print(char* text)
{
printf("program says %s", text);
}
In this function the writer assumes that:
- the text is a valid pointer
- the text is zero terminated
- this program can write to stdout, i.e. is a console app
- the reader speaks english
Or something nastier:
void* allocateBuffer(size_t size)
{
void* buffer = malloc(size);
if (!buffer) {
printf("expect a segmentation fault!");
}
return buffer;
}
Here the writer assumes that malloc always returns either NULL or a pointer to dereferenceable memory. It is not always the case:
If size is zero, the return value depends on the particular library implementation (it may or may not be a null pointer), but the returned pointer shall not be dereferenced.
Assumptions not explicitly defined in the code lead sooner or later to hard to discover bugs.
Solution approaches
Comments are the quick and dirty way of writing down assumptions. They are easiest to read, but are never enforced and tend to diverge from the code with every edit made to it. However it is better to read “should never come here” and hear the alarm bells ringing than seeing nothing but whitespace.
Some of the assumptions can be documented and verified through tests, with varying level of detail. Unit tests will be most efficient on assumptions with little or no context, like verifying that only non-NULL-pointers are passed to a function. For more global assumptions integration or acceptance tests can be used. Together they ensure that no changes to the codebase break the assumptions made earlier. The drawback of unit tests is that they are locally decoupled from the code tested, forcing the reader to gather the information by searching for direct or indirect references to it.
When new code is written, assertions help to document how the API is meant to be used. Since they are executed not only during the test phase, they can capture wrong assumptions the authors made about the runtime environment. Writing down every possible assumption can quickly clutter the code with repeated statements like “assume pointer x is not NULL”, reducing readability and usefulness of this technique.
Conclusion
All of the shown approaches are not new. Each one has an aspect it excels at, so to get the most information out of the code they all have to be used. Their domains overlap partially, so it is possible to choose the approach depending on the situation, i.e. replacing assertions with unit tests for time critical code. One niche currently not filled by any of them is the description of global assumptions like the cultural background of the users.
Posted by vasilikvockin 