Communication Through Code

In a previous post my colleague described our experiment on our ability to transfer the intention of the code by tests. The tests describe how the code behaves when called from the outside. Additional approach is to communicate through code.

To understand the code, at least the following two questions have to be answered:

  • How does the code work?
  • What is the reason behind the way the code is implemented?

Challenge

As long as the code is readable, it is possible to deduce its meaning. Improving readability is a common technique to help the reader. This includes using descriptive names, reducing complexity or hiding implementation details until they are absolutely necessary to understand the problem.

On the other hand deducing the reason why exactly this implementation was chosen by somebody is an impossible task without the knowledge (or lack thereof) of all implementors combined. One of the missing parts are the assumptions. Our code is full of them. Consider the following example:

void print(char* text)
{
  printf("program says %s", text);
}

In this function the writer assumes that:

  • the text is a valid pointer
  • the text is zero terminated
  • this program can write to stdout, i.e. is a console app
  • the reader speaks english

Or something nastier:

void* allocateBuffer(size_t size)
{
  void* buffer = malloc(size);
  if (!buffer) {
    printf("expect a segmentation fault!");
  }
  return buffer;
}

Here the writer assumes that malloc always returns either NULL or a pointer to dereferenceable memory. It is not always the case:

If size is zero, the return value depends on the particular library implementation (it may or may not be a null pointer), but the returned pointer shall not be dereferenced.

Assumptions not explicitly defined in the code lead sooner or later to hard to discover bugs.

Solution approaches

Comments are the quick and dirty way of writing down assumptions. They are easiest to read, but are never enforced and tend to diverge from the code with every edit made to it. However it is better to read “should never come here” and hear the alarm bells ringing than seeing nothing but whitespace.

Some of the assumptions can be documented and verified through tests, with varying level of detail. Unit tests will be most efficient on assumptions with little or no context, like verifying that only non-NULL-pointers are passed to a function. For more global assumptions integration or acceptance tests can be used. Together they ensure that no changes to the codebase break the assumptions made earlier. The drawback of unit tests is that they are locally decoupled from the code tested, forcing the reader to gather the information by searching for direct or indirect references to it.

When new code is written, assertions help to document how the API is meant to be used. Since they are executed not only during the test phase, they can capture wrong assumptions the authors made about the runtime environment. Writing down every possible assumption can quickly clutter the code with repeated statements like “assume pointer x is not NULL”, reducing readability and usefulness of this technique.

Conclusion

All of the shown approaches are not new. Each one has an aspect it excels at, so to get the most information out of the code they all have to be used. Their domains overlap partially, so it is possible to choose the approach depending on the situation, i.e. replacing assertions with unit tests for time critical code. One niche currently not filled by any of them is the description of global assumptions like the cultural background of the users.

8 thoughts on “Communication Through Code

  1. What about a separate system documentation to nail down the global assumptions? In many projects we have documents like “design decisions” and “bussiness rules” trying to keep that kind of information from being forgotten.

    • David, Miq, thank you for the comments.

      Yes, I think storing global assumptions or some greater design decisions in a separate system is a good practice. Depending on the platform it enables you to create rich content and link topics together.

      What I am missing is the link between the documentation and the code (in both directions) and a kind of marker that draws my attention to the linked documentation after I edited the corresponding code. Without this features it takes some effort to find the necessary design decision and update it in the case it changes. The closest thing we do is putting issue numbers into the commit messages and automatically publishing the code changes in the corresponding issue tracking system.

      @David
      You mention coding guidelines. These are the good exaple of constraints that document the structure of a system. Expressed as something like checkstyle rules they can even be enforced.

      In the end, I think, it all boils down to the statement that I do not know any good tool that helps me to see all necessary decisions (big or small) that resulted in the particular piece of code and that allows me to validate and correct them at the same time.

  2. Thanks for the post, I find communication through code extremely important but difficult.

    Like Miq, I also had to think of communicating design decisions: For local, technical decisions that do not effect other parts functionally, I try to use concise comments directly in the code where I make the decision, listing the alternatives (other data structures/algorithms/design patterns) and their corresponding trade-offs. But often the comments still get large and difficult to understand for others (+myself after a couple of weeks already). Any advise on that?

    For assumptions and constraints in general, there are some further approaches: Coding guidelines (e.g., forbidding null altogether), richer type systems and libraries for more expressive assertions (up to full design by contract).

  3. First of all, thanks for the post! I think using your own types can at least ensure that your assumptions for input parameters are not broken. nonetheless, a lazy programmer could try to change your type and break the assumptions this way…
    In my opinion, it would be great to have an annotation “synchronizedDocuLink” before every method with a link to the docu (with design assumptions) in a kind of a wiki. Then a plugin should tell you when you start editing a method with this annotation before it “Probably the author(s) of this code had something in mind as they wrote it. We recommend to read it now -> Read/ I already know, thx”. Either when you save your changes or when you commit them (assuming you commit at least after every method changed) you should be asked to update the wiki doku with a quick link.
    Would this idea help to solve your problem? Is there anything available like this?
    An extension to the idea would be to make it possible, to link every line of code to a paragraph in a wiki and to integrate bidirectional links between wiki and code. So you could use the context menu/shortcut in your IDE to learn what the author thought when he wrote this line of code.
    I’m looking forward to hear about your thoughts on my idea!

    • Many heavyweight CASE tools like Enterprise Architect or Visual Paradigm also support code documentation traceability. If you are using Eclipse, maybe http://www.cs.wm.edu/semeru/traceclipse offers what you are looking for.

      I think the current trend is not to add explicit links between the artefacts, but to use information retrieval to automatically detect links. Traceclipse seems to support that, too.

      I have read multiple times that LOC documentation traceability is overkill, e.g. in http://fileadmin.cs.lth.se/cs/Education/Examensarbete/Rapporter/2009/2009-02_Rapport.pdf.

      • Thx David for your thoughts and hints. I rethought it and think now that it would be great to approach from the requirements side. Imagine you have a requirement, e.g. “Create a website with three buttons, one red, one yellow, one green. The website background color is white in the beginning. If you click one of the buttons, the website background changes to the clicked button’s color.”
        Then you would select “Create a website” in your requirements tracker, hit a short cut and create a html file. Then you select the the next requirement (three buttons), hit a short cut and enter your button code. Your IDE records the code that you write and links it to the selected requirement. If you are in the code, you can use a shortcut to display the related requirement. And if you’re in the requirements tracker, you can display the related code. This way you write only as much code as you need to get the requirement done and you can quickly look at the requirements for the code you are about to change. So you have a LOC traceability but with a requirement level of effort or even less as the requirements are already written. What do you think?
        But I think it’s not what Vasili asked for. For assumptions, using your own data types with telling names are the best thing I can think of. Linked to the requirements that can give hints why you think that e.g. a postal code is a 5 digit code.

    • It is a good idea to use annotations. They provide good granularity and already have compiler and IDE support. Javadocs @see tag seem to have similar properties.

      Combined with your ‘Remind me’-plugin and traceability tools David mentioned it is quite the solution I imagined.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s