4 Tips for better CMake

We are doing one of those list posts again! This time, I will share some tips and insights on better CMake. Number four will surprise you! Let’s hop right in:

Tip #1

model dependencies with target_link_libraries

I have written about this before, and this is still my number one tip on CMake. In short: Do not use the old functions that force properties down the file hierarchy such as include_directories. Instead set properties on the targets via target_link_libraries and its siblings target_compile_definitions, target_include_directories and target_compile_options and “inherit” those properties via target_link_libraries from different modules.

Tip #2

always use find_package with REQUIRED

Sure, having optional dependencies is nice, but skipping on REQUIRED is not the way you want to do it. In the worst case, some of your features will just not work if those packages are not found, with no explanation whatsoever. Instead, use explicit feature-toggles (e.g. using option()) that either skip the find_package call or use it with REQUIRED, so the user will know that another lib is needed for this feature.

Tip #3

follow the physical project structure

You want your build setup to be as straight forward as possible. One way to simplify it is to follow the file system and and the artifact structure of your code. That way, you only have one structure to maintain. Use one “top level” file that does your global configuration, e.g. find_package calls and CPack configuration, and then only defers to subdirectories via add_subdirectory. Only for direct subdirectories though: if you need extra levels, those levels should have their own CMake files. Then build exactly one artifact (e.g. add_executable or add_library) per leaf folder.

Tip #4

make install() an option()

It is often desirable to include other libraries directly into your build process. For example, we usually do this with googletest for our unit test. However, if you do that and use your install target, it will also install the googletest headers. That is usually not what you want! Some libraries handle this automagically by only doing the install() calls when they are the top level project. Similar to the find_package tip above, I like to do this with an option() for explicit user control!

Generating done

That is it for today! I hope this is helps and we will all see better CMake code in the future.

Advertisements

About API astonishments

Nowadays we developers tend to stand on the shoulders of giants: We put powerful building-blocks from different libraries together to build something worth man-years in hours. Or we fill-in the missing pieces in a framework infrastructure to create a complete application in just a few days.

While it is great to have such tools in the form of application programmer interfaces (API) at your disposal it is hard to build high quality APIs. There are many examples for widely used APIs, good and bad. What does “bad API” mean? It depends on your view point:

Bad API for the API user

For the application programmer a bad API means things like:

  • Simple tasks/use cases are complicated
  • Complex tasks are impossible or require patching
  • Easy to misuse producing bugs

A very simple real life example of such an API is a C++ camera API I had to use in a project. Our users were able to change the area of interest (AOI) of the picture to produce images consisting of only a part of full resolution images. Our application did crash or not work as expected without obvious reasons. It took many hours of debugging to spot the subtle API misuse that could be verified be reading the documentation:

The value of camera.Width.GetMax() changed instead of being constant! The reason is that AOI was meant and not the sensor resolution width. The full resolution width we actually wanted is obtained by calling camera.WidthMax.GetValue(). This kind of naming makes the properties almost undistinguishable and communicates nothing of the implications. Terms like AOI or sensor width or full resolution just do not appear in this part of the API.

Small things like the example above may really hurt productivity and user experience of an API.

Bad API for the API programmer

API programmers can easily produce APIs that are bad for themselves because they take away too much freedom away resulting in:

  • Frequent breaking changes
  • API rewrites
  • Unimplementable features
  • Confusing, not fitting interfaces

Design your interfaces small and focused. Use types in the interface that leave as much freedom as possible without hurting usability (see Iterable vs. Collection vs. List vs. ArrayList for example). Try to build composable and extendable types because adding types or methods is less of a problem than changing them.

Conclusion

Developers should put extra care in interfaces they want to publish for others to use. Once the API is out there breaking it means angry users. Be aware that good API design is hard and necessary for a painless evolution of an API. Consider reading books like “Practical API Design” or “Build APIs You Won’t Hate” if you want to target a wider audience.

A good name will shine forever

Naming things is supposedly one of the two hard things in Computer Science. Here are some tips on naming for programmers.

Getters

In the Java world property accessors are traditionally prefixed with “get” and “set”, the Java bean convention:

person.getFirstName()

Code becomes more pleasant to read if you omit the “get” prefix:

person.firstName()

Of course, you can do this only if you don’t use a framework that depends on the convention to recognize properties via reflection (like some OR mappers, for example).

What about setters? I rarely write setters anymore. If you design your classes as immutable types you don’t need setters. Even if your class has mutable state you probably want to control this state via methods more specific to the domain of the problem. Also, the more you apply the tell, don’t ask principle the less you will find the need for getters.

Brevity vs. verbosity

There were times when it was common to see mass variable declarations like the following at the beginning of a function:

int i, j, k, l, m, n;
float a, b, c, u, v, x, y, z;

Fortunately, times have changed for the better, and most programmers are aware that descriptive naming is important. However, some programmers do over-compensate. Length of an identifier is not a virtue by itself.

The Objective-C Cocoa framework is famous for overly long method names:

[array objectAtIndex:index]

Parts of Objective-C were inspired by Smalltalk. But in Smalltalk the same method is called at:

[array at:index]

This is a reasonably sufficient name for such a common functionality in programming.

Here’s another example: If the concept of a measurement station is very prevalent in the domain of your project then it’s ok to call instances just station instead of measurementStation if it’s the only kind of station in the domain.

Yes, the IDE does auto-complete long names. However, readability of the code decreases if the reader has to scan the same long-winded names over and over again:

MeasurementStation measurementStation = new MeasurementStation();
Measurement measurement = measurementStation.startMeasurement();

Often you can find names that are more to the point than longer descriptions, e.g. acquire instead of takeOwnershipOf. (source)

Hungarian notation and friends

The famous Hungarian notation is no longer in widespread use. However, there are variations of it that I would recommend against as well for the sake of readability. For example, bookList or bookArray can be simply books. Another variation would be conventions like myField or m_field for member variables. If you need these notations to determine the origin of a variable, then your scopes are probably too big, i.e. your methods, functions or classes are too long. Additionally, IDEs and editors for programmers can highlight these different scopes anyway. Other examples for unnecessary Hungarian-style notation are IFoo for interfaces, EFoo for enums or the infamous FooImpl.

Screaming constants

There is really no need for constants and enum values to constantly SCREAM at you and other readers. This SCREAMING_CASE convention has its origin in C, where constants used to be defined as macros when the const keyword wasn’t introduced yet, and it later found its way into other programming language ecosystems. Names for constants and enum values are not more important than other identifiers and don’t have to be spelled differently. Try it, you will enjoy the newfound silence in your code.

Conclusion

These are some tips to improve readability of code through better names. Some of these tips go against traditional conventions, so you should discuss them with your team before applying them. Consistency within an existing code base can definitely be more important. But if you have the freedom you should definitely give them a try.

Evolvability of Code: Uniform Access Principle

Most programmers like freedom. So there are many means of hiding implementations in modern programming languages, e.g. interfaces in Java, header files in C/C++ and visibility modifiers like private and protected in most object-oriented languages. Even your ordinary functions or public class interface gives you the freedom to change the implementation without needing to touch the clients. Evolvability in this sense means you can change and refine your implementations without requiring others, namely clients of your code, to change.

Changing the class interface or function signatures within a project is often possible and feasible, at least if you have access to all client code and use powerful refactoring tools. If you published your code as a library or do not want to break all client code or forcing them to adapt to your changes you have to consider your interface code to be fixed. This takes away some of your precious freedom. So you have to design your interfaces carefully with evolability in mind.

Some programming languages implement the uniform access principle (UAP) that eases evolvability in that it allows you to migrate from public attributes to properties/method calls without changing the clients: Read and write access to the attribute uses the same syntax as invoking corresponding methods. For clarification an example in Python where you may start with a class like:

class Person(object):
  def __init__(self, name, age):
    self.name = name
    self.age = age

Using the above class is trivial as follows

>>> pete = Person("pete", 32)
>>> print pete.age
32
# a year has passed
>>> pete.age = 33
>>> print pete.age
33

Now if the age is not a plain value anymore but needs checking, like always being greater zero or is calculated based on some calendar you can turn it to a property like so:

class Person(object):
  def __init__(self, name, age):
    self.name = name
    self._age = age

  @property
  def age(self):
    return self._age

  @age.setter
  def age(self, new_age):
    if new_age < 0:
      raise ValueError("Age under 0 is not possible")
    self._age = new_age

Now the nice thing is: The above client code still works without changes!

Scala uses a similar and quite concise mechanism for implementing the UAP wheres .NET provides some special syntax for properties but still migration from public fields easily possible.

So in languages supporting the UAP you can start really simple with public attributes holding the plain value without worrying about some potential future. If you later need more sophisticated stuff like caching, computation of the value, validation or even remote retrieval you can add it using language features without touching or bothering clients.

Unfortunately some powerful and widespread languages like Java and C++ lack support for UAP. Changing a public field to a more complex property means the introduction of getter and setter methods and changing all clients. Therefore you see, especially in Java, many data classes littered with trivial getter and setter pairs doing nothing interesting and introducing unnecessary bloat to maintain the evolvability of the code.

Why I’m not using C++ unnamed namespaces anymore

Well okay, actually I’m still using them, but I thought the absolute would make for a better headline. But I do not use them nearly as much as I used to. Almost exactly a year ago, I even described them as an integral part of my unit design. Nowadays, most units I write do not have an unnamed namespace at all.

What’s so great about unnamed namespaces?

Back when I still used them, my code would usually evolve gradually through a few different “stages of visibility”. The first of these stages was the unnamed-namespace. Later stages would either be a free-function or a private/public member-function.

Lets say I identify a bit of code that I could reuse. I refactor it into a separate function. Since that bit of code is only used in that compile unit, it makes sense to put this function into an unnamed namespace that is only visible in the implementation of that unit.

Okay great, now we have reusability within this one compile unit, and we didn’t even have to recompile any of the units clients. Also, we can just “Hack away” on this code. It’s very local and exists solely to provide for our implementation needs. We can cobble it together without worrying that anyone else might ever have to use it.

This all feels pretty great at first. You are writing smaller functions and classes after all.

Whole class hierarchies are defined this way. Invisible to all but yourself. Protected and sheltered from the ugly world of external clients.

What’s so bad about unnamed namespaces?

However, there are two sides to this coin. Over time, one of two things usually happens:

1. The code is never needed again outside of the unit. Forgotten by all but the compiler, it exists happily in its seclusion.
2. The code is needed elsewhere.

Guess which one happens more often. The code is needed elsewhere. After all, that is usually the reason we refactored it into a function in the first place. Its reusability. When this is the case, one of these scenarios usually happes:

1. People forgot about it, and solve the problem again.
2. People never learned about it, and solve the problem again.
3. People know about it, and copy-and-paste the code to solve their problem.
4. People know about it and make the function more widely available to call it directly.

Except for the last, that’s a pretty grim outlook. The first two cases are usually the result of the bad discoverability. If you haven’t worked with that code extensively, it is pretty certain that you do not even know that is exists.

The third is often a consequence of the fact that this function was not initially written for reuse. This can mean that it cannot be called from the outside because it cannot be accessed. But often, there’s some small dependency to the exact place where it’s defined. People came to this function because they want to solve another problem, not to figure out how to make this function visible to them. Call it lazyness or pragmatism, but they now have a case for just copying it. It happens and shouldn’t be incentivised.

A Bug? In my code?

Now imagine you don’t care much about such noble long term code quality concerns as code duplication. After all, deduplication just increases coupling, right?

But you do care about satisfied customers, possibly because your job depends on it. One of your customers provides you with a crash dump and the stacktrace clearly points to your hidden and protected function. Since you’re a good developer, you decide to reproduce the crash in a unit test.

Only that does not work. The function is not accessible to your test. You first need to refactor the code to actually make it testable. That’s a terrible situation to be in.

What to do instead.

There’s really only two choices. Either make it a public function of your unit immediatly, or move it to another unit.

For functional units, its usually not a problem to just make them public. At least as long as the function does not access any global data.

For class units, there is a decision to make, but it is simple. Will using preserve all class invariants? If so, you can move it or make it a public function. But if not, you absolutely should move it to another unit. Often, this actually helps with deciding for what to create a new class!

Note that private and protected functions suffer many of the same drawbacks as functions in unnamed-namespaces. Sometimes, either of these options is a valid shortcut. But if you can, please, avoid them.

The rule of additive changes

Change is in the nature of software development. Most difficult aspects of the craft revolve around dealing with change. How does one keep software extensible? How do you adapt to new business requirements?

With experience comes the intuition that some kind of changes are more volatile than other changes. For example, it is often safer to add a new function or type to an application than change an existing one.

This is because adding something new means that it is not already strongly connected to the rest of the application. Or at least that’s the assumption. You have yet to decide how the new component interacts with the rest of the application. Usually this is done by a, preferably small, incision in the innards of your software. The first change, the adding, should not break anything. If anything, the small incision should be the only dangerous aspect of the change.

This is as very important concept: adding should not break things! This is so important, I want to give it a name:

The Rule of Additive Changes

Adding something to a well-designed software system should not break existing functionality. Exceptions should be thoroughly documented and communicated.

Systems should always be designed and tought so that the rule of additive changes holds. Failure to do so will lead to confusing surprises in the best cases, and well hidden bugs in worse cases.

The rule is nothing new, however: it’s a foundation, an axiom, to many other rules, such as the Liskov Substitution Principle:

Inheritance

Quoting from Wikipedia:

“If S is a subtype of T, then objects of type T in a program may be replaced with objects of type S without altering any of the desirable properties of that program”

This relies on subtyping as an additive change: S works at least as good as any T, so it is an extension, an addition. You should therefore design your systems in a way that the Liskov Substition Principle, and therefore the rule of additive changes, both hold: An addition of a new type in a hierarchy cannot break anything.

Whitelists vs. Blacklists

Blacklists will often violate the rule of additive changes. Once you add a new element to the domain, the domain behind the blacklist will change as well, while the domain behind a whitelist will be unaffected. Ultimately, both can be what you want, but usually, the more contained change will break less – and you can still change the whitelist explicitly later!

Note that systems that filter classes from a hierarchy via RTTI or, even more subtle, via ask-interfaces, are blacklists. Those systems can break easily when new types are introduces to a hierarchy. Extra care needs to be taken to make sure the rule of addition holds for these systems.

Introspection and Reflection

Without introspection and reflection, programs cannot know when you are adding a new type or a new function. However, with introspection, they can. Any additive change can also be an incision point. Therefore, you need to be extra careful when designing systems that use introspection: They should not break existing functionality for adding something.

For example, adding a function to enable a specific new functionality is okay. A common case of this would be to adding a function to a controller in a web-framework to add a new action. This will not inferfere with existing functionality, so it is fine.

On the other hand, adding a member to a controller should not disable or change functionality. Adding a special member for “filtering” or some kind of security setting falls into this category. You think you’re merely adding something, but in fact you are modifying. A system that relies on such behavior therefore violates the rule of additive changes. Decorating the member is a much better alternative, as that makes it clear that you are indeed modifying something, which might break existing functionality.

Not all languages or frameworks provide this possibility though. In that case, the only alternative is good communication and documentation!

Refactoring

Many engineers implicitly assume that the rule of additive changes holds. In his book “Working Effectively With Legacy Code”, Micheal Feathers proposes the sprout and wrap techniques to change legacy software. The underlying technique is the same for both: formulating a potentially breaking change as mostly additive, with only a small incision point. In the presence of systems that do not follow the rule of additive changes, such risk minimization does not work at all. For example, adding additional function can break a system that relies heavily on introspection – which goes against all intuition.

Conclusion

This rule is not a new concept. It is something that many programmers have in their head already, but possibly fractured into lots of smaller guidelines. But it is one overarching concept and it needs a name to be accessible as such. For me, that makes things a lot clearer when reasoning about systems at large.

Every time you write a getter, a function dies

Don’t be too alarmed by the title. Functions are immortal concepts and there’s nothing wrong with a getter method. Except when you write code under the rules of the Object Calisthenics (rule number 9 directly forbids getter and setter methods). Or when you try to adhere to the ideal of encapsulation, a cornerstone of object-oriented programming. Or when your code would really benefit from some other design choices. So, most of the time, basically. Nobody dies if you write a getter method, but you should make a concious decision for it, not just write it out of old habit.

One thing the Object Calisthenics can teach you is the immediate effect of different design choices. The rules are strict enough to place a lot of burden on your programming, so you’ll feel the pain of every trade-off. In most of your day-to-day programming, you also make the decisions, but don’t feel the consequences right away, so you get used to certain patterns (habits) that work well for the moment and might or might not work in the long run. You should have an alternative right at hands for every pattern you use. Otherwise, it’s not a pattern, it’s a trap.

Some alternatives

Here is an incomplete list of common alternatives to common patterns or structures that you might already be aware of:

  • if-else statement (explicit conditional): You can replace most explicit conditionals with implicit ones. In object-oriented programming, calling polymorphic methods is a common alternative. Instead of writing if and else, you call a method that is overwritten in two different fashions. A polymorphic method call can be seen as an implicit switch-case over the object type.
  • else statement: In the Object Calisthenics, rule 2 directly forbids the usage of else. A common alternative is an early return in the then-block. This might require you to extract the if-statement to its own method, but that’s probably a good idea anyway.
  • for-loop: One of the basic building blocks of every higher-level programming language are loops. These explicit iterations are so common that most programmers forget their implicit counterpart. Yeah, I’m talking about recursion here. You can replace every explicit loop by an implicit loop using recursion and vice versa. Your only limit is the size of your stack – if you are bound to one. Recursion is an early brain-teaser in every computer science curriculum, but not part of the average programmer’s toolbox. I’m not sure if that’s a bad thing, but its an alternative nonetheless.
  • setter method: The first and foremost alternative to a state-altering operation are immutable objects. You can’t alter the state of an immutable, so you have to create a series of edited copies. Syntactic sugar like fluent interfaces fit perfectly in this scenario. You can probably imagine that you’ll need to change the whole code dealing with the immutables, but you’ll be surprised how simple things can be once you let go of mutable state, bad conscience about “wasteful” heap usage and any premature thought about “performance”.

Keep in mind that most alternatives aren’t really “better”, they are just different. There is no silver bullet, every approach has its own advantages and drawbacks, both shortterm and in the long run. Your job as a competent programmer is to choose the right approach for each situation. You should make a deliberate choice and probably document your rationale somewhere (a project-related blog, wiki or issue tracker comes to mind). To be able to make that choice, you need to know about the pros and cons of as much alternatives as you can handle. The two lamest rationales are “I’ve always done it this way” and “I don’t know any other way”.

An alternative for get

In this blog post, you’ll learn one possible alternative to getter methods. It might not be the best or even fitting for your specific task, but it’s worth evaluating. The underlying principle is called “Tell, don’t Ask”. You convert the getter (aka asking the object about some value) to a method that applies a function on the value (aka telling the object to work with the value). But what does “applying” mean and what’s a function?

191px-Function_machine2.svgA function is defined as a conversion of some input into some output, preferably without any side-effects. We might also call it a mapping, because we map every possible input to a certain output. In programming, every method that takes a parameter (or several of them) and returns something (isn’t void) can be viewed as a function as long as the internal state of the method’s object isn’t modified. So you’ve probably programmed a lot of functions already, most of the time without realizing it.

In Java 8 or other modern object-oriented programming languages, the notion of functions are important parts of the toolbox. But you can work with functions in Java since the earliest days, just not as convenient. Let’s talk about an example. I won’t use any code you can look at, so you’ll have to use your imagination for this. So you have a collection of student objects (imagine a group of students standing around). We want to print a list of all these students onto the console. Each student object can say its name and matriculation number if asked by plain old getters. Damn! Somebody already made the design choice for us that these are our duties:

  • Iterate over all student objects in our collection. (If you don’t want to use a loop for this you know an alternative!)
  • Ask each student object about its name and matriculation number.
  • Carry the data over to the console object and tell the console to print both informations.

But because this is only in our imagination, we can go back in imagined time and eliminate the imagined choice for getters. We want to write our student objects without getters, so let’s get rid of them! Instead, each student object knows about their name and matriculation number, but cannot be asked directly. But you can tell the student object to supply these informations to the only (or a specific) method of an object that you give to it. Read the previous sentence again (if you’ve not already done it). That’s the whole trick. Our “function” is an object with only one method that happens to have exactly the parameters that can be provided by the student object. This method might return a formatted string that we can take to the console object or it might use the console itself (this would result in no return value and a side effect, but why not?).  We create this function object and tell each student object to use it. We don’t ask the student object for data, we tell it to do work (Tell, don’t Ask).

In this example, the result is the same. But our first approach centers the action around our “main” algorithm by gathering all the data and then acting on it. We don’t feel pain using this approach, but we were forced to use it by the absence of a function-accepting method and the presence of getters on the student objects. Our second approach prepares the action by creating the function object and then delegates the work to the objects holding the data. We were able to use it because of the presence of a function-accepting method on the student objects. The absence of getters in the second approach is a by-product, they simply aren’t necessary anymore. Why write getters that nobody uses?

We can observe the following characteristics: In a “traditional”, imperative style with getters, the data flows (gets asked) and the functionality stays in place. In a Tell, don’t Ask style with functions, the data tends to stay in place while the functionality gets passed around (“flows”).

Weighing the options

This is just one other alternative to the common “imperative getter” style. As stated, it isn’t “better”, just maybe better suited for a particular situation. In my opinion, the “functional operation” style is not straight-forward and doesn’t pay off immediately, but can be very rewarding in the long run. It opens the door to a whole paradigm of writing sourcecode that can reveal inherent or underlying concepts in your solution domain a lot clearer than the imperative style. By eliminating the getter methods, you force this paradigm on your readers and fellow developers. But maybe you don’t really need to get rid of the getters, just reduce their usage to the hard cases.

So the title of this blog post is a bit misleading. Every time you write a getter, you’ve probably considered all alternatives and made the informed decision that a getter method is the best way forward. Every time you want to change that decision afterwards, you can add the function-accepting method right alongside the getters. No need to be pure or exclusive, just use the best of two worlds. Just don’t let the functions die (or never be born) because you “didn’t know about them” or found the style “unfamiliar”. Those are mere temporary problems. And one of them is solved right now. Happy coding!