C++ modules example

Two weeks back, I blogged about C++ modules, and why we so desperately need them. Now I have actually played with the implementation in Visual Studio 2017, and I want to share my findings.

The Files

My example consists of four files in two “components”, i.e. one library and one executable. The executable only has one file, main.cpp:

import pets;
import std.core;
import std.memory;

int main()
  std::unique_ptr<Pet> pet = std::make_unique<Dog>();
  std::cout << "Pet says: "
    << pet->pet() << std::endl;

The library consists of three files. First is pet.cpp, which contains the abstract base class for all pets:

import std.core;
module pets.pet;

export class Pet
  virtual char const* pet() = 0;

Then there is dog.cpp – our only concrete implementation of that base class (yes, I’m not a cat person).

module pets.dog;
import std.core;
import pets.pet;

export class Dog : public Pet
  char const* pet() override;

char const* Dog::pet()
  return "Woof!";

Notice they each define their own submodule. Finally, there is interface.cpp, which just cobbles those submodules together into one single “parent” module:

module pets;

export module pets.pet;
export module pets.dog;

You can get the full source code including the CMake setup at our github repository. I was not able to get the standard library path setup automated so far, so you probably want to adjust that.


There are no headers at all, which was one of my goals of laying it out like this. I think that alone means an enormous increase in productivity.

The information that was previously in the header files is now in .ifc files that the microsoft compiler generates automatically from the module definitions.
When trying this out, a couple of things stood out to me:

  • Intellisense does not work with the new keywords yet.
  • The way I used it, interface.cpp needs to be compiled after pet.cpp and dog.cpp, so that the appropriate .ifc file exists. Having an order dependency like that within a single library is a new challenge.
  • I was not able to use the standard lib in the library. That would compile, but not link.
  • Not having to duplicate the function declaration feels very strange in C++.
  • There are a lot of paradigm changes required. For example, include paths are a thing of the past – you will need to configure correct module search paths in the future.
  • We will need to get the naming straight: right now, “modules” is already used as a “distinct software component”. The new meaning is similar, but still competes with it. since the granularity is no longer so flexible. I already started using “components” as a new word for the former.

What are your experiences with modules so far? Do you have another way of composing modules? I really like to hear about it! I think the biggest challenge right now is how to use these new possibilities to improve the design of bigger C++ projects.

C++ modules and why we need them desperately

When I was interviewing for my job here at the Softwareschneiderei, I was asked a question:

“If you had one wish for what to add to C++, what would that be?”.

I vividly remember not having to give a lot of thought to answer that: modules. And now, it seems modules for C++ are finally materializing. About damn time.

The Past: Hello Preprocessor, my old friend

C++ has a problem with scalability. Traditionally, the only real way to use code from another compile unit, is to use header files and “use” them via preprocessor #include directives. This requires splitting your code into a header and implementation file, which requires duplicating a lot of information. And it does not even work for a lot of code. Templates need to be in the header, and a lot of modern C++ code is template code. This is descreasing uniformity and coherence of the code.
When resolving #include directives, the preprocessor really only copy-and-paste code from one file into another. Since this is a transitive process, the actual code that gets analyzed by the compiler quickly becomes huge.

Hello World?

Do this little experiment: write a simple C++ “Hello World!” program, and look at the preprocessed output.

#include <iostream>

int main()
  std::cout << "Hello, World!" << std::endl;
  return 0;

I preprocessed this simple version with Visual Studio 2017. The output was about 50500 lines! That’s more than 7200x. Now repeat that while including something from Boost. Still wonder why compilation is so slow?

Pay for what you use?

So if you include a header, you not only get the things you want from it, but also everything else. That means all the other contents of the header and all the headers it includes transitively. Usually, the number of transitively included headers counts over 10000 very quickly. This goes directly against C++’s design mantra: pay only for what you use.

The code that gets included is usually orders of magnitude more than the actual contents of your .cpp file, even in examples not as contrived as the “Hello, World!” above.

This means a lot of extra code for your toolchain to analyze.
And the work is duplicated for each compile unit.
This is obviously slow.

Leaks everywhere..

But it also means your modules are leaking. For example its dependencies: Some of your users will inadvertently use the code that you use, and if you change your dependency, they will break. How often have you used std::runtime_error without actually including stdexcept? Many C++ programmers do not even know which header a particular stdlib feature is located in. Not their fault really – it’s hard enough to memorize the contents alone without their locations in an arbitrary M:N mapping.

But dependencies are not the only things that are leaking. By exposing individual headers, you make clients dependent on the physical structure of your program as well. Moving one type from one header to another? You can not do that, unless you want to break a couple of clients.

Current workarounds

The C++ community has had different approaches on how to deal with the fallout.

  • Forward declarations and the PIMPL idiom let you break the transitive dependencies.
    But a forward declaration is a very subtle code duplication, and a PIMPL even creates runtime overhead.
  • Unity builds tackle problem of resolving your include graph multiple times, but at the cost of an obscure extension to your build system and negative impacts for incremental builds.
  • Meta-headers tackle to problem of more clearly defined module boundaries, but they make the compile time worse and make it harder to explore modules.

It’s a catch 22.

Tool support

Because macros leak in and out of headers, semantic analysis becomes very hard. In fact, a tool needs to understand the program in its entirety, including all source and build files to properly refactor. After all, each define given on a command line, or even each reordering (!) of #include files could potentially alter the semantics completely. Every line of code in a header can change its meaning completely depending on its context.

There are also techniques that abuse this feature, i.e. cross-includes, where an include does something based on a previous #define. Granted, only a small percentage of code is usually directly affected by such subtleties, but there is currently no way to properly isolate from it. That is why refactoring and introspection tools for other languages are so much better.

State of the union

The modules proposal is spearheaded by Gabriel Dos Reis at Microsoft. There’s an in-progress implementation of it since Visual Studio 2015, and they are still regularly updating it, so the most recent one is in VS 2017. If you want to know more, have a look at this video.

My C++ Tool Belt

I suspect that every developer has a “tool belt” that he or she uses to be productive. By that I mean a collection of tools, libraries and whatever else helps. With a few exceptions, these tool belts will probably be language specific, or at least platform specific. As my projects updated their compilers and transitioned to C++11 and beyond, my C++ tool belt changed quite a bit. Since things like threading, smart pointers and functional abstractions where added to the standard library, those are now already included by default. Today I wanna write about what is in my modernized C++11 tool belt.

The Standard Library

Ever since the tr1 extensions, the standard library has progressed into becoming truly powerful and exceptional. The smart pointers, containers, algorithms are much more language extensions than “just” a library, and they play perfectly with actual language features, such as lambdas, auto and initializer lists.


fmtlib provides placeholder-based text formatting a la Python’s String.format. There have been a few implementations of this idea over the years, but this is the first where I think that it might just dethrone operator<< overloading for good. It's fast, stable, portable and has a nice API.
I begin to miss this library the moment I need to work on a project that does not have it.
The next best thing is Qt’s QString::arg mechanism, with slightly inferior API, a less inclusive license, and a much bigger dependency.


Logging is a powerful tool, both for software development and maintenance. Chances are you are going to need it at one point. spdlog is my favorite choice for this task. It uses fmtlib internally, which is just another plus point. It’s simple, fast and very nice to use due to reuse of fmtlib’s formatting. I usually just include this in my projects and get the included fmtlib for free.


This one is actually part of the most recent C++17, but since that is not widely available yet (meaning not many projects have adopted it), I’m going to list it explicitly. There are also a few alternative implementations, such as the one in Boost or akrzemi1’s single-header variant.
Unlike many other programming languages, C++ has a relatively high emphasis on value types. While reference types usually have a built-in “not available” state (a.k.a. nullptr, NULL, Nothing or nil), an optional can transport intent much clearer. For value types, however, it’s absolutely mandatory to have an optional type. Otherwise, you just end up wrapping the value in a pointer just to make it optional.
Do not, however, fall into the trap of using optional for error handling. It’s not made for that, and other abstractions, such as expected are much better for that.


There is really only one choice when it comes to build tools, and that’s CMake. It’s got its own bunch of weaknesses, but the goods far outweight the bads. With the target_ functions, it’s actually quite nice and scales really well to bigger projects. The main downside here is that it still does not play nice with some tools, most notably visual studio. CLion and QtCreator fare much better. Then again, CMake enables the use of other tools easily, such as clang-tidy.

A word on Boost

Boost is no longer the must-have it once was. Much of the mandated functionality has already been incorporated into the standard library. It is no longer a requirement for a sane C++ project. On the contrary, boost is notoriously huge and somewhat cumbersome to integrate. Boost is not a library, it is a collection of libraries, therefore you can still decide whether to use Boost on a library by library basis. However, much of that is viral, and using a small part of Boost will easily drag in a few hundreds of other Boost headers. The libraries I tend to include most often are Boost.Utility (for boost::noncopyable) and Boost.Filesystem. The former is obviously easy to do without Boost, especially with = delete; and the latter is a part of the standard library since C++17. I hope to be doing the majority of my projects without it in the future. Boost was a catalyst for most of the C++ progress in recent years. It slowly becoming obsolete, either by being integrated into the standard or it’s idioms no longer being needed, is just a sign of its own success.

My honorable mentions are Qt and the stb single file libraries. What are your go-to tools?

The Great Rational Explosion

A Dream to good to be true

A few years back I was doing mostly computational geometry for a while. In that field, floating point errors are often of great concern. Some algorithms will simply crash or fail when it’s not taken into account. Back then, the idea of doing all the required math using rationals seemed very alluring.
For the uninitiated: a good rational type based on two integers, a numerator and a denominator allows you to perform the basic math operations of addition, subtraction, multiplication and division without any loss of precision. Doing all the math without any loss of precision, without fuzzy comparisons, without imperfection.
Alas, I didn’t have a good rational type available at the time, so the thought remained in the realm of ideas.

A Dream come true?

Fast forward a couple of years to just two months ago. We were starting a new project and set ourselves the requirement of not introducing floating point errors. Naturally, I immediately thought of using rationals. That project is written in java and already using jscience, which happens to have a nice Rational type. I expected the whole thing to be a bit slower than math using build-in types. But not like this.
It seemed like a part that was averaging about 2000 “count rate” rationals was extremely slow. It seemed to take about 13 seconds, which we thought was way too much. Curiously, the problem never appeared when the count rate was zero. Knowing a little about the internal workings of rational, I quickly suspected the summation to be the culprit. But the code was doing a few other things to, so naturally my colleagues demanded proof that that was indeed the problem. Hence I wrote a small demo application to benchmark the problem.

The code that I measured was this:

Rational sum = Rational.ZERO;
for (final Rational each : list) {
    sum = sum.plus(each);
return sum;

Of course I needed some test data, that I generated like this:

final List<Rational> list = new ArrayList<>();
for (int i=0; i<2000; ++i) {
    list.add(Rational.valueOf(i, 100));
return list;

This took about 10ms. Bad, but not 13s catastrophic.

Now from using rational numbers in school, we remember that summing up numbers with equal denominators is actually quite easy. You just leave the denominator as is and add the two numerators. But what if the denominators are different? We need to find a common multiple of the two denominators before we can add. Usually we want the smallest such number, which is called the lowest common multiple (lcm). This is so that the numbers don’t just explode, i.e. get larger and larger with each addition. The algorithm to find this is to just multiply the two numbers and divide by their greatest common divisor (gcd). Whenever I held the debugger during my performance problems, I’d see the thread in a function called gcd. The standard algorithm to determine the gcd is the Euclidean Algorithm. I’m not sure if jscience uses it, but I suspect it does. Either way, it successively reduces the problem via a division to a smaller instance.

What does this all mean?

This means that much of the complexity involved happens only when there’s variation in the denominator. Looking at my actual data, I saw that this was the case for our problem. The numbers were actually close to one, but with the numerator and the denominator each close to about 4 million. This happened because the counts that we based this data on where “normalized” by a time value that was close, but not equal to one. So let’s try another input sequence:

final Random randomGenerator = new Random();
final List<Rational> list = new ArrayList<>();
for (int i=0; i<2000; ++i) {
    list.add(Rational.valueOf(4000000, 4000000 + randomGenerator.nextInt(2000)));
return list;

That already takes 10 seconds. Wow. Here’s the rational number it produced:


I kid you not, that’s over 10000 digits! In the editor I’m writing this in, that’s roughly 3 pages. No wonder it took that long. Let’s use even more variation in the numbers:

final Random randomGenerator = new Random();
final List<Rational> list = new ArrayList<>();
for (int i=0; i<2000; ++i) {
    list.add(Rational.valueOf(4000000 + randomGenerator.nextInt(5000),
            4000000 + randomGenerator.nextInt(20000)));

Now that already takes 16 s, with about 14000 digits. Oh boy. Now the maximum number of values I expected to do this averaging for was about 4000, so let’s scale that up:

final Random randomGenerator = new Random();
final List<Rational> list = new ArrayList<>();
for (int i=0; i<4000; ++i) {
    list.add(Rational.valueOf(4000000 + randomGenerator.nextInt(5000),
            4000000 + randomGenerator.nextInt(20000)));
return list;

That took 77 seconds! More than 4 times as long as for half the amount of data. The resulting number has over 26000 digits. Obviously, this scales way worse than linear.

An Explanation

By now it was pretty clear what was happening: The ever so slightly not-1 values were causing an “explosion” in the denominator after all. When two denominators are coprime, i.e. their greatest common divisor is 1, the length of the denominators just adds up. The effect also happens when the gcd is very small, such as 2 or 3. This can happen quite a lot with huge numbers in a sufficiently large range. So when things go bad for your input data, the length of the denominator just keeps growing linearly with the number of input values, making each successive addition slower and slower. Your rationals just exploded.


After this, it became apparent that using rationals was not a great idea after all. You should be very careful when doing series of additions with them. Ironically, we were throwing away all the precision anyways before presenting the number to a user. There’s no way for anyone to grok a 26000 digit number anyways, especially if the result is basically 4000.xx. I learned my lesson and buried the dream of perfect arithmetic. I’m now using fixed point arithmetic instead.

Integration Tests with CherryPy and requests

CherryPy is a great way to write simple http backends, but there is a part of it that I do not like very much. While there is a documented way of setting up integration tests, it did not work well for me for a couple of reasons. Mostly, I found it hard to integrate with the rest of the test suite, which was using unittest and not py.test. Failing tests would apparently “hang” when launched from the PyCharm test explorer. It turned out the tests were getting stuck in interactive mode for failing assertions, a setting which can be turned off by an environment variable. Also, the “requests” looked kind of cumbersome. So I figured out how to do the tests with the fantastic requests library instead, which also allowed me to keep using unittest and have them run beautifully from within my test explorer.

The key is to start the CherryPy server for the tests in the background and gracefully shut it down once a test is finished. This can be done quite beautifully with the contextmanager decorator:

from contextlib import contextmanager

def run_server():

This allows us to conviniently wrap the code that does requests to the server. The first part initiates the CherryPy start-up and then waits until that has completed. The yield is where the requests happen later. After that, we initiate a shut-down and block until that has completed.

Similar to the “official way”, let’s suppose we want to test a simple “echo” Application that simply feeds a request back at the user:

class Echo(object):
    def echo(self, message):
        return message

Now we can write a test with whatever framework we want to use:

class TestEcho(unittest.TestCase):
    def test_echo(self):
        with run_server():
            url = ""
            params = {'message': 'secret'}
            r = requests.get(url, params=params)
            self.assertEqual(r.status_code, 200)
            self.assertEqual(r.content, "secret")

Now that feels a lot nicer than the official test API!

Let’s talk about C++

It’s almost time for the holidays again. A time to reminisce. A time for family. A time for community.

Us software developers seem like an odd folk. We spend endless hours tinkering with our machines and gadgets. It appears like a lonely profession to outsiders. And it can be. Sometimes we have to get in The Zone to solve our tasks and problems. Other times we need to have sword fights. But sometimes we just have to meet other developers.

I’m not talking about your 10 o’clock daily standup or agile flavor-of-the-month meeting with other departments. Those are great. But sometimes it just has to be us programmers, as tech people.

Let’s talk about cool and tricky algorithms. Let’s talk about the latest and greatest language features that make all code some much cooler. Let’s talk which editor is the greatest. All the technical details.
It’s not necessarily the most important and essential aspect of our craft, no. But it’s kind of like the seasoning to a well cooked meal. It’s flavor and character. It’s fun.

I’m the C++ guy. It’s not the only one of my specialties, but kind of what I got a bit of a reputation for. And I like to talk about it. So far, this was either limited to colleagues and friends or “out there” on IRC, stackoverflow or other online communities. But I want to extend that and be a more active member of the local community.
David Farago had the great idea to create a platform for this in Karlsruhe: The C++ User Group Karlsruhe. He asked me to kindly extend an invitation. The kick-off is next month, right at the start of the new year, on the 11th of January, with one meeting scheduled every month. I think this is a perfect time to do this. C++ is in a great place right now. The language is evolving in a very positive way and the ecosystem is looking better and better.
So if you’re in any way interested meeting other local C++ people, please join us. I’m very much looking forward to meeting you guys!

Why I’m not using C++ unnamed namespaces anymore

Well okay, actually I’m still using them, but I thought the absolute would make for a better headline. But I do not use them nearly as much as I used to. Almost exactly a year ago, I even described them as an integral part of my unit design. Nowadays, most units I write do not have an unnamed namespace at all.

What’s so great about unnamed namespaces?

Back when I still used them, my code would usually evolve gradually through a few different “stages of visibility”. The first of these stages was the unnamed-namespace. Later stages would either be a free-function or a private/public member-function.

Lets say I identify a bit of code that I could reuse. I refactor it into a separate function. Since that bit of code is only used in that compile unit, it makes sense to put this function into an unnamed namespace that is only visible in the implementation of that unit.

Okay great, now we have reusability within this one compile unit, and we didn’t even have to recompile any of the units clients. Also, we can just “Hack away” on this code. It’s very local and exists solely to provide for our implementation needs. We can cobble it together without worrying that anyone else might ever have to use it.

This all feels pretty great at first. You are writing smaller functions and classes after all.

Whole class hierarchies are defined this way. Invisible to all but yourself. Protected and sheltered from the ugly world of external clients.

What’s so bad about unnamed namespaces?

However, there are two sides to this coin. Over time, one of two things usually happens:

1. The code is never needed again outside of the unit. Forgotten by all but the compiler, it exists happily in its seclusion.
2. The code is needed elsewhere.

Guess which one happens more often. The code is needed elsewhere. After all, that is usually the reason we refactored it into a function in the first place. Its reusability. When this is the case, one of these scenarios usually happes:

1. People forgot about it, and solve the problem again.
2. People never learned about it, and solve the problem again.
3. People know about it, and copy-and-paste the code to solve their problem.
4. People know about it and make the function more widely available to call it directly.

Except for the last, that’s a pretty grim outlook. The first two cases are usually the result of the bad discoverability. If you haven’t worked with that code extensively, it is pretty certain that you do not even know that is exists.

The third is often a consequence of the fact that this function was not initially written for reuse. This can mean that it cannot be called from the outside because it cannot be accessed. But often, there’s some small dependency to the exact place where it’s defined. People came to this function because they want to solve another problem, not to figure out how to make this function visible to them. Call it lazyness or pragmatism, but they now have a case for just copying it. It happens and shouldn’t be incentivised.

A Bug? In my code?

Now imagine you don’t care much about such noble long term code quality concerns as code duplication. After all, deduplication just increases coupling, right?

But you do care about satisfied customers, possibly because your job depends on it. One of your customers provides you with a crash dump and the stacktrace clearly points to your hidden and protected function. Since you’re a good developer, you decide to reproduce the crash in a unit test.

Only that does not work. The function is not accessible to your test. You first need to refactor the code to actually make it testable. That’s a terrible situation to be in.

What to do instead.

There’s really only two choices. Either make it a public function of your unit immediatly, or move it to another unit.

For functional units, its usually not a problem to just make them public. At least as long as the function does not access any global data.

For class units, there is a decision to make, but it is simple. Will using preserve all class invariants? If so, you can move it or make it a public function. But if not, you absolutely should move it to another unit. Often, this actually helps with deciding for what to create a new class!

Note that private and protected functions suffer many of the same drawbacks as functions in unnamed-namespaces. Sometimes, either of these options is a valid shortcut. But if you can, please, avoid them.