Keeping connections alive with libcurl

libcurl is quite a comfortable option to transfer files across a variety of network protocols, e.g. HTTP, FTP and SFTP.

It’s really easy to get started: downloading a single file via http or ftp takes only a couple of lines.

Drip, drip..

But as with most powerful abstractions, it is a bit leaky. While it does an excellent job of hiding such steps as name resolution and authentication, these steps still “leak out” by increasing the overall run-time.

In our case, we had five dozen FTP servers and we needed to repeatedly download small files from all of them. To make matters worse, we only had a small time window of 200ms for each transfer.

Now FTP is not the most simple protocol. Essentially, it requires the client to establish a TCP control connection, that it uses negotiate a second data connection and initiate file transfers.

This initial setup phase needs a lot of back and forth between server and client. Naturally, this is quite slow. Ideally, you would want to do the connection setup once and keep both the control and the data connection open for subsequent transfers.

libcurl does not explicitly expose the concept of an active connection. Hence you cannot explicitly tell the library not to disconnect it. In a naive implementation, you would download multiple files by simply creating an easy session object for each file transfer:

for (auto file : FILE_LIST)
{
  std::vector<uint8_t> buffer;
  auto curl = curl_easy_init();
  if (!curl)
    return -1;
  auto url = (SERVER+file);
  curl_easy_setopt(curl, CURLOPT_URL,
    url.c_str());
  curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION,
    appendToVector);
  curl_easy_setopt(curl, CURLOPT_WRITEDATA,
    &buffer);
  if (curl_easy_perform(curl) != CURLE_OK)
    return -1;

  process(buffer);
  curl_easy_cleanup(curl);
}

That does indeed reset the connection for every single file.

Re-use!

However, libcurl can actually keep the connection open as part of a connection re-use mechanism in the session object. This is documented with the function curl_easy_perform. If you simply hoist the easy session object out of the loop, it will no longer disconnect between file transfers:

auto curl = curl_easy_init();
if (!curl)
  return -1;

for (auto file : FILE_LIST)
{
  std::vector<uint8_t> buffer;
  auto url = (SERVER+file);
  curl_easy_setopt(curl, CURLOPT_URL, 
    url.c_str());
  curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, 
    appendToVector);
  curl_easy_setopt(curl, CURLOPT_WRITEDATA, 
    &buffer);
  if (curl_easy_perform(curl) != CURLE_OK)
    return -1;

  process(buffer);
}
curl_easy_cleanup(curl);

libcurl will now cache the active connection in the session object, provided the files are actually on the same server. This improved the download timings of our bulk transfers from 130ms-260ms down to 30ms-40ms, quite the enormous gain. The timings now fit into our 200ms time window comfortably.

Advertisements

Evolution of programming languages

Programming languages evolve over time. They get new language features and their standard library is extended. Sounds great, doesn’t it? We all know not going forward means your go backward.

But I observe very different approaches looking at several programming ecosystems we are using.

Featuritis

Java and especially C# added more and more “me too” features release after release making relatively lean languages quite complex multi-paradigm languages. They started object oriented and added generics, functional programming features and declarative programming (LINQ in C#) and different UI toolkits (AWT, Swing, JavaFx in Java; Winforms, WPF in C#) to the mix.

Often the new language features add their own set of quirks because they are an afterthought and not carefully enough designed.

For me, this lack of focus makes said language less attractive than more current approaches like Kotlin or Go.

In addition, deprecation often has no effect (see Java) where 20 year old code and style still works which increases the burden further . While it is great from a business perspektive in that your effort to maintain compatibility is low it does not help your code base. Different styles and old ways of doing something tend to remain forever.

Revolution

In Grails (I know, it is not a programming language, but I has its own ecosystem) we see more of a revolution. The core concept as a full stack framework stays the same but significant components are changed quite rapidly. We have seen many changes in technology like jetty to tomcat, ivy to maven, selenium-rc to geb, gant to gradle and the list goes on.

This causes many, sometimes subtle, changes in behaviour that are a real pain when maintaining larger applications over many years.

Framework updates are often a time-consuming hassle but if you can afford it your code base benefits and will eventually become cleaner.

Clean(er) evolution

I really like the evolution in C++. It was relatively slow – many will argue too slow – in the past but it has picked up pace in the last few years. The goal is clearly stated and only features that support it make it in:

  • Make C++ a better language for systems programming and library building
  • Make C++ easier to teach and learn
  • Zero-Cost abstractions
  • better Tool-support

If you cannot make it zero-cost your chances are slim to get your feature in…

C at its core did not change much at all and remained focused on its merits. The upgrades mostly contained convenience features like line comments, additional data type definitions and multithreading.

Honest evolution – breaking backwards compatibility

In Python we have something I would call “honest evolution”. Python 3 introduced some breaking changes with the goal of a cleaner and more consistent language. Python 2 and 3 are incompatible so the distinction in the version number is fair. I like this approach of moving forward as it clearly communicates what is happening and gets rid of the sins in the past.

The downside is that many systems still come with both, a Python 2 and a Python 3 interpreter and accompanying libraries. Fortunately there are some options and tools for your code to mitigate some of the incompatibilities, like the __future__ module and python-six.

At some point in the future (expected in 2020) there will only support for Python 3. So start making the switch.

A Tale of Two Languages

Recently, I presented my mysteriously titled talk “A Tale of Two Languages” at our local C++ user group. Before the talk, I was not really sure whether it would resonate with my audience. But it did, and helped to engage people in a healthy discussion about how to use C++.

Essentially, the talk was about how I am using two different modes or dialects of C++ to write and maintain applications. The title suggests two languages – and it sure can be thought of that way, but for now I’m using the word “modes” to distinguish it from the term programming languages.

You write the application in one mode, while keeping the style relatively easy. In the other mode, you make sure that you can write easy and efficient code in the other, while leveraging the full power of C++.

I call the first application mode and the second library mode.

Library mode

As I said before, this the power mode. One of C++’s design paradigms is self-extension. You are extending the language from the language itself. It’s a very powerful mechanism, the same one that drives the standard library. It’s also why C++ does not have the need for a built-in string type.

This power is a bit of a double-edged sword. On the one hand, it allows you to adapt the language specifically for your needs, for example with application specific value types. For a 3D application, a well designed 2d vector or point type will make your code easier and probably faster. On the other hand, a badly designed type on this level will haunt your application for years to come. I have seen both.

That’s a simple example though. More powerful primitives, such as domain-specific-language like constructs also belong into this mode. In general, things in this mode are less discoverable and less maintainable, but they strive to improve discoverability, efficiency and maintainability on the other side. As a consequence, this code needs more stringent documentation and specification.

Application mode

This is the mode that you use to write the majority of your application. Application mode is all about agility and leveraging opportunities. You intentionally restrict yourself in order to keep your development speed up. Simplicity trumps most other qualities in this mode. If you need another quality to be the defining factor, for example because you need some code to be a little more complex in order to run faster, you should put it into library mode instead.

Unlike code in library mode, this code needs to speak for itself. Therefore, documentation is usually nothing but a duplication.

One important aspect is that this code should be devoid of all subtleties.

Parameter passing and its consequences

That last bit is especially uncommon in C++, where most decisions are really a catch-22. Hence the resulting code hints at the struggles endured while writing it.

For example, to write an efficient function in C++, you need to decide whether to pass each parameter by value, or by reference, or by a pointer. The decision on which to use depends largely on your implementation, i.e. what you are doing with the parameter after it was passed to the function. That usually couples your implementation too tightly to its interface and degrades programmer productivity by giving too many options.

Using a shared-ownership smart-pointer such as std::shared_ptr by default is a good middle ground here. It does the right thing most of the time and is not to far off at most other times. Many other mainstream languages, such as python, go this route. Some frameworks, such as Qt, use that semantic as well.
Like const-correctness, passing all parameters in a std::shared_ptr is viral. Object thus passed need to be created on a the stack, preferably with std::make_shared. You will also store those smart pointers in other objects, so shared_ptr will have quite a lot of screen space. Therefore I usually make an alias:

template using Ptr = std::shared_ptr;

If it’s going to be the default, it should not clutter your code. Since objects are transported in a Ptr by default, they usually do not even need a copy constructor or other “value-like” semantics. These objects are less about maintaining invariants, and more about implementing abstract interfaces and bundling functionality in maintainable chunks. I usually use boost::noncopyable to mark them, though Herb Sutter’s Metaclasses proposal could make this even nicer.
Note that you can still promote them to value types in library mode, should the need arise. But they will become more costly to maintain.

Other simplifications

There are plenty of other things to avoid in application mode. Writing templated types makes your code inherently non-local and dependent on a type that can be anywhere. Note that instantiation of templates from library mode is fine – at that point, all the facts are known.

Another thing that makes your code non-local, and therefore unfitting for application mode, is overloading. Especially in the presence of ADL. For example, which functions are in your actual overload set depends on which headers you include and which using-directives and declarations are active. Sometimes, that is desirable. But not in application mode.

Resolution

Since using this “two modes” approach, I have found that my productivity is much higher – even in older code that went through a lot of evolution. The code does not actually get a lot slower, even with all the smart pointers. In fact, I am sure that I could only optimize a few cases because the design in application mode is a lot more flexible, and the structure more visible.

Do most language make false promises?

Some years ago I stumbled over this interesting article about C being the most effective of programming language and one making the least false promises. Essentially Damien Katz argues that the simplicity of C and its flaws lead to simple, fast and easy to reason about code.

C is the total package. It is the only language that’s highly productive, extremely fast, has great tooling everywhere, a large community, a highly professional culture, and is truly honest about its tradeoffs.

-Damien Katz about the C Programming language

I am Java developer most of the time but I also have reasonable experience in C, C++, C#, Groovy and Python and some other languages to a lesser extent. Damien’s article really made me think for quite some time about the languages I have been using. I think he is right in many aspects and has really good points about the tools and communities around the languages.

After quite some thought I do not completely agree with him.

My take on C

At a time I really liked the simplicity of C. I wrote gtk2hack in my spare time as an exercise and definitely see interoperability and a quick “build, run, debug”-cycle as big wins for C. On the other hand I think while it has a place in hardware and systems programming many other applications have completely different requirements.

  • A standardized ABI means nothing to me if I am writing a service with a REST/JSON interface or a standalone GUI application.
  • Portability means nothing to me if the target system(s) are well defined and/or covered by the runtime of choice.
  • Startup times mean nothing to me if the system is only started once every few months and development is still fast because of hot-code replacement or other means.
  • etc.

But I am really missing more powerful abstractions and better error handling or ressource management features. Data structures and memory management are a lot more painful than in other languages. And this is not (only) about garbage collection!

Especially C++ is making big steps in the right direction in the last few years. Each new standard release provides additional features making code more readable and less error prone. With zero cost abstractions at the core of language evolution and the secondary aim of ease of use I really like what will come to C++ in the future. And it has a very professional community, too.

Aims for the C++11 effort:

  • Make C++ a better language for systems programming and library building
  • Make C++ easier to teach and learn

-Bjarne Stroustup, A Tour of C++

What we can learn from C

Instead of looking down at C and pointing at its flaws we should look at its strengths and our own weaknesses/flaws. All languages and environments I have used to date have their own set of annoyances and gotchas.

Java people should try building simple things and having a keen eye on dependencies especially because the eco system is so rich and crowded. Also take care of ressource management – the garbage collector is only half the deal.

Scala and C++ people should take a look at ABI stability and interoperability in general. Their compile times and “build, run, debug”-cycle has much room for improvement to say the least.

C# may look at simplicity instead of wildly adding new features creating a language without opinion. A plethora of ways implementing the same stuff. Either you ban features or you have to know them all to understand code in a larger project.

Conclusion

My personal answer to the title of this blog: Yes, they make false promises. But they have a lot to offer, too.

So do not settle with the status quo of your language environment or code style of choice. Try to maintain an objective perspective and be aware of the weaknesses of the tools you are using. Most platforms improve over time and sometimes you have to re-evaluate your opinion regarding some technology.

I prefer C++ to C for some time now and did not look back yet. But I also constantly try different languages, platforms and frameworks and try to maintain a balanced view. There are often good reasons to choose one over the other for a particular project.

 

C++ modules example

Two weeks back, I blogged about C++ modules, and why we so desperately need them. Now I have actually played with the implementation in Visual Studio 2017, and I want to share my findings.

The Files

My example consists of four files in two “components”, i.e. one library and one executable. The executable only has one file, main.cpp:

import pets;
import std.core;
import std.memory;

int main()
{
  std::unique_ptr<Pet> pet = std::make_unique<Dog>();
  std::cout << "Pet says: "
    << pet->pet() << std::endl;
}

The library consists of three files. First is pet.cpp, which contains the abstract base class for all pets:

import std.core;
module pets.pet;

export class Pet
{
public:
  virtual char const* pet() = 0;
};

Then there is dog.cpp – our only concrete implementation of that base class (yes, I’m not a cat person).

module pets.dog;
import std.core;
import pets.pet;

export class Dog : public Pet
{
public:
  char const* pet() override;
};

char const* Dog::pet()
{
  return "Woof!";
}

Notice they each define their own submodule. Finally, there is interface.cpp, which just cobbles those submodules together into one single “parent” module:

module pets;

export module pets.pet;
export module pets.dog;

You can get the full source code including the CMake setup at our github repository. I was not able to get the standard library path setup automated so far, so you probably want to adjust that.

Discussion

There are no headers at all, which was one of my goals of laying it out like this. I think that alone means an enormous increase in productivity.

The information that was previously in the header files is now in .ifc files that the microsoft compiler generates automatically from the module definitions.
When trying this out, a couple of things stood out to me:

  • Intellisense does not work with the new keywords yet.
  • The way I used it, interface.cpp needs to be compiled after pet.cpp and dog.cpp, so that the appropriate .ifc file exists. Having an order dependency like that within a single library is a new challenge.
  • I was not able to use the standard lib in the library. That would compile, but not link.
  • Not having to duplicate the function declaration feels very strange in C++.
  • There are a lot of paradigm changes required. For example, include paths are a thing of the past – you will need to configure correct module search paths in the future.
  • We will need to get the naming straight: right now, “modules” is already used as a “distinct software component”. The new meaning is similar, but still competes with it. since the granularity is no longer so flexible. I already started using “components” as a new word for the former.

What are your experiences with modules so far? Do you have another way of composing modules? I really like to hear about it! I think the biggest challenge right now is how to use these new possibilities to improve the design of bigger C++ projects.

C++ modules and why we need them desperately

When I was interviewing for my job here at the Softwareschneiderei, I was asked a question:

“If you had one wish for what to add to C++, what would that be?”.

I vividly remember not having to give a lot of thought to answer that: modules. And now, it seems modules for C++ are finally materializing. About damn time.

The Past: Hello Preprocessor, my old friend

C++ has a problem with scalability. Traditionally, the only real way to use code from another compile unit, is to use header files and “use” them via preprocessor #include directives. This requires splitting your code into a header and implementation file, which requires duplicating a lot of information. And it does not even work for a lot of code. Templates need to be in the header, and a lot of modern C++ code is template code. This is descreasing uniformity and coherence of the code.
When resolving #include directives, the preprocessor really only copy-and-paste code from one file into another. Since this is a transitive process, the actual code that gets analyzed by the compiler quickly becomes huge.

Hello World?

Do this little experiment: write a simple C++ “Hello World!” program, and look at the preprocessed output.

#include <iostream>

int main()
{
  std::cout << "Hello, World!" << std::endl;
  return 0;
}

I preprocessed this simple version with Visual Studio 2017. The output was about 50500 lines! That’s more than 7200x. Now repeat that while including something from Boost. Still wonder why compilation is so slow?

Pay for what you use?

So if you include a header, you not only get the things you want from it, but also everything else. That means all the other contents of the header and all the headers it includes transitively. Usually, the number of transitively included headers counts over 10000 very quickly. This goes directly against C++’s design mantra: pay only for what you use.

The code that gets included is usually orders of magnitude more than the actual contents of your .cpp file, even in examples not as contrived as the “Hello, World!” above.

This means a lot of extra code for your toolchain to analyze.
And the work is duplicated for each compile unit.
This is obviously slow.

Leaks everywhere..

But it also means your modules are leaking. For example its dependencies: Some of your users will inadvertently use the code that you use, and if you change your dependency, they will break. How often have you used std::runtime_error without actually including stdexcept? Many C++ programmers do not even know which header a particular stdlib feature is located in. Not their fault really – it’s hard enough to memorize the contents alone without their locations in an arbitrary M:N mapping.

But dependencies are not the only things that are leaking. By exposing individual headers, you make clients dependent on the physical structure of your program as well. Moving one type from one header to another? You can not do that, unless you want to break a couple of clients.

Current workarounds

The C++ community has had different approaches on how to deal with the fallout.

  • Forward declarations and the PIMPL idiom let you break the transitive dependencies.
    But a forward declaration is a very subtle code duplication, and a PIMPL even creates runtime overhead.
  • Unity builds tackle problem of resolving your include graph multiple times, but at the cost of an obscure extension to your build system and negative impacts for incremental builds.
  • Meta-headers tackle to problem of more clearly defined module boundaries, but they make the compile time worse and make it harder to explore modules.

It’s a catch 22.

Tool support

Because macros leak in and out of headers, semantic analysis becomes very hard. In fact, a tool needs to understand the program in its entirety, including all source and build files to properly refactor. After all, each define given on a command line, or even each reordering (!) of #include files could potentially alter the semantics completely. Every line of code in a header can change its meaning completely depending on its context.

There are also techniques that abuse this feature, i.e. cross-includes, where an include does something based on a previous #define. Granted, only a small percentage of code is usually directly affected by such subtleties, but there is currently no way to properly isolate from it. That is why refactoring and introspection tools for other languages are so much better.

State of the union

The modules proposal is spearheaded by Gabriel Dos Reis at Microsoft. There’s an in-progress implementation of it since Visual Studio 2015, and they are still regularly updating it, so the most recent one is in VS 2017. If you want to know more, have a look at this video.

My C++ Tool Belt

I suspect that every developer has a “tool belt” that he or she uses to be productive. By that I mean a collection of tools, libraries and whatever else helps. With a few exceptions, these tool belts will probably be language specific, or at least platform specific. As my projects updated their compilers and transitioned to C++11 and beyond, my C++ tool belt changed quite a bit. Since things like threading, smart pointers and functional abstractions where added to the standard library, those are now already included by default. Today I wanna write about what is in my modernized C++11 tool belt.

The Standard Library

Ever since the tr1 extensions, the standard library has progressed into becoming truly powerful and exceptional. The smart pointers, containers, algorithms are much more language extensions than “just” a library, and they play perfectly with actual language features, such as lambdas, auto and initializer lists.

fmtlib

fmtlib provides placeholder-based text formatting a la Python’s String.format. There have been a few implementations of this idea over the years, but this is the first where I think that it might just dethrone operator<< overloading for good. It's fast, stable, portable and has a nice API.
I begin to miss this library the moment I need to work on a project that does not have it.
The next best thing is Qt’s QString::arg mechanism, with slightly inferior API, a less inclusive license, and a much bigger dependency.

spdlog

Logging is a powerful tool, both for software development and maintenance. Chances are you are going to need it at one point. spdlog is my favorite choice for this task. It uses fmtlib internally, which is just another plus point. It’s simple, fast and very nice to use due to reuse of fmtlib’s formatting. I usually just include this in my projects and get the included fmtlib for free.

optional

This one is actually part of the most recent C++17, but since that is not widely available yet (meaning not many projects have adopted it), I’m going to list it explicitly. There are also a few alternative implementations, such as the one in Boost or akrzemi1’s single-header variant.
Unlike many other programming languages, C++ has a relatively high emphasis on value types. While reference types usually have a built-in “not available” state (a.k.a. nullptr, NULL, Nothing or nil), an optional can transport intent much clearer. For value types, however, it’s absolutely mandatory to have an optional type. Otherwise, you just end up wrapping the value in a pointer just to make it optional.
Do not, however, fall into the trap of using optional for error handling. It’s not made for that, and other abstractions, such as expected are much better for that.

CMake

There is really only one choice when it comes to build tools, and that’s CMake. It’s got its own bunch of weaknesses, but the goods far outweight the bads. With the target_ functions, it’s actually quite nice and scales really well to bigger projects. The main downside here is that it still does not play nice with some tools, most notably visual studio. CLion and QtCreator fare much better. Then again, CMake enables the use of other tools easily, such as clang-tidy.

A word on Boost

Boost is no longer the must-have it once was. Much of the mandated functionality has already been incorporated into the standard library. It is no longer a requirement for a sane C++ project. On the contrary, boost is notoriously huge and somewhat cumbersome to integrate. Boost is not a library, it is a collection of libraries, therefore you can still decide whether to use Boost on a library by library basis. However, much of that is viral, and using a small part of Boost will easily drag in a few hundreds of other Boost headers. The libraries I tend to include most often are Boost.Utility (for boost::noncopyable) and Boost.Filesystem. The former is obviously easy to do without Boost, especially with = delete; and the latter is a part of the standard library since C++17. I hope to be doing the majority of my projects without it in the future. Boost was a catalyst for most of the C++ progress in recent years. It slowly becoming obsolete, either by being integrated into the standard or it’s idioms no longer being needed, is just a sign of its own success.

My honorable mentions are Qt and the stb single file libraries. What are your go-to tools?