Keeping connections alive with libcurl

libcurl is quite a comfortable option to transfer files across a variety of network protocols, e.g. HTTP, FTP and SFTP.

It’s really easy to get started: downloading a single file via http or ftp takes only a couple of lines.

Drip, drip..

But as with most powerful abstractions, it is a bit leaky. While it does an excellent job of hiding such steps as name resolution and authentication, these steps still “leak out” by increasing the overall run-time.

In our case, we had five dozen FTP servers and we needed to repeatedly download small files from all of them. To make matters worse, we only had a small time window of 200ms for each transfer.

Now FTP is not the most simple protocol. Essentially, it requires the client to establish a TCP control connection, that it uses negotiate a second data connection and initiate file transfers.

This initial setup phase needs a lot of back and forth between server and client. Naturally, this is quite slow. Ideally, you would want to do the connection setup once and keep both the control and the data connection open for subsequent transfers.

libcurl does not explicitly expose the concept of an active connection. Hence you cannot explicitly tell the library not to disconnect it. In a naive implementation, you would download multiple files by simply creating an easy session object for each file transfer:

for (auto file : FILE_LIST)
{
  std::vector<uint8_t> buffer;
  auto curl = curl_easy_init();
  if (!curl)
    return -1;
  auto url = (SERVER+file);
  curl_easy_setopt(curl, CURLOPT_URL,
    url.c_str());
  curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION,
    appendToVector);
  curl_easy_setopt(curl, CURLOPT_WRITEDATA,
    &buffer);
  if (curl_easy_perform(curl) != CURLE_OK)
    return -1;

  process(buffer);
  curl_easy_cleanup(curl);
}

That does indeed reset the connection for every single file.

Re-use!

However, libcurl can actually keep the connection open as part of a connection re-use mechanism in the session object. This is documented with the function curl_easy_perform. If you simply hoist the easy session object out of the loop, it will no longer disconnect between file transfers:

auto curl = curl_easy_init();
if (!curl)
  return -1;

for (auto file : FILE_LIST)
{
  std::vector<uint8_t> buffer;
  auto url = (SERVER+file);
  curl_easy_setopt(curl, CURLOPT_URL, 
    url.c_str());
  curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, 
    appendToVector);
  curl_easy_setopt(curl, CURLOPT_WRITEDATA, 
    &buffer);
  if (curl_easy_perform(curl) != CURLE_OK)
    return -1;

  process(buffer);
}
curl_easy_cleanup(curl);

libcurl will now cache the active connection in the session object, provided the files are actually on the same server. This improved the download timings of our bulk transfers from 130ms-260ms down to 30ms-40ms, quite the enormous gain. The timings now fit into our 200ms time window comfortably.

Advertisements

Using PostgreSQL with Entity Framework

The most widespread O/R (object-relational) mapper for the .NET platform is the Entity Framework. It is most often used in combination with Microsoft SQL Server as database. But the architecture of the Entity Framework allows to use it with other databases as well. A popular and reliable is open-source SQL database is PostgreSQL. This article shows how to use a PostgreSQL database with the Entity Framework.

Installing the Data Provider

First you need an Entity Framework data provider for PostgreSQL. It is called Npgsql. You can install it via NuGet. If you use Entity Framework 6 the package is called EntityFramework6.Npgsql:

> Install-Package EntityFramework6.Npgsql

If you use Entity Framework Core for the new .NET Core platform, you have to install a different package:

> Install-Package Npgsql.EntityFrameworkCore.PostgreSQL

Configuring the Data Provider

The next step is to configure the data provider and the database connection string in the App.config file of your project, for example:

<configuration>
  <!-- ... -->

  <entityFramework>
    <providers>
      <provider invariantName="Npgsql"
         type="Npgsql.NpgsqlServices, EntityFramework6.Npgsql" />
    </providers>
  </entityFramework>

  <system.data>
    <DbProviderFactories>
      <add name="Npgsql Data Provider"
           invariant="Npgsql"
           description="Data Provider for PostgreSQL"
           type="Npgsql.NpgsqlFactory, Npgsql"
           support="FF" />
    </DbProviderFactories>
  </system.data>

  <connectionStrings>
    <add name="AppDatabaseConnectionString"
         connectionString="Server=localhost;Database=postgres"
         providerName="Npgsql" />
  </connectionStrings>

</configuration>

Possible parameters in the connection string are Server, Port, Database, User Id and Password. Here’s an example connection string using all parameters:

Server=192.168.0.42;Port=5432;Database=mydatabase;User Id=postgres;Password=topsecret

The database context class

To use the configured database you create a database context class in the application code:

class AppDatabase : DbContext
{
  private readonly string schema;

  public AppDatabase(string schema)
    : base("AppDatabaseConnectionString")
  {
    this.schema = schema;
  }

  public DbSet<User> Users { get; set; }

  protected override void OnModelCreating(DbModelBuilder builder)
  {
    builder.HasDefaultSchema(this.schema);
    base.OnModelCreating(builder);
  }
}

The parameter to the super constructor call is the name of the configured connection string in App.config. In this example the method OnModelCreating is overridden to set the name of the used schema. Here the schema name is injected via constructor. For PostgreSQL the default schema is called “public”:

using (var db = new AppDatabase("public"))
{
  var admin = db.Users.First(user => user.UserName == "admin")
  // ...
}

The Entity Framework mapping of entity names and properties are case sensitive. To make the mapping work you have to preserve the case when creating the tables by putting the table and column names in double quotes:

create table public."Users" ("Id" bigserial primary key, "UserName" text not null);

With these basics you’re now set up to use PostgreSQL in combination with the Entity Framework.

 

Evolution of programming languages

Programming languages evolve over time. They get new language features and their standard library is extended. Sounds great, doesn’t it? We all know not going forward means your go backward.

But I observe very different approaches looking at several programming ecosystems we are using.

Featuritis

Java and especially C# added more and more “me too” features release after release making relatively lean languages quite complex multi-paradigm languages. They started object oriented and added generics, functional programming features and declarative programming (LINQ in C#) and different UI toolkits (AWT, Swing, JavaFx in Java; Winforms, WPF in C#) to the mix.

Often the new language features add their own set of quirks because they are an afterthought and not carefully enough designed.

For me, this lack of focus makes said language less attractive than more current approaches like Kotlin or Go.

In addition, deprecation often has no effect (see Java) where 20 year old code and style still works which increases the burden further . While it is great from a business perspektive in that your effort to maintain compatibility is low it does not help your code base. Different styles and old ways of doing something tend to remain forever.

Revolution

In Grails (I know, it is not a programming language, but I has its own ecosystem) we see more of a revolution. The core concept as a full stack framework stays the same but significant components are changed quite rapidly. We have seen many changes in technology like jetty to tomcat, ivy to maven, selenium-rc to geb, gant to gradle and the list goes on.

This causes many, sometimes subtle, changes in behaviour that are a real pain when maintaining larger applications over many years.

Framework updates are often a time-consuming hassle but if you can afford it your code base benefits and will eventually become cleaner.

Clean(er) evolution

I really like the evolution in C++. It was relatively slow – many will argue too slow – in the past but it has picked up pace in the last few years. The goal is clearly stated and only features that support it make it in:

  • Make C++ a better language for systems programming and library building
  • Make C++ easier to teach and learn
  • Zero-Cost abstractions
  • better Tool-support

If you cannot make it zero-cost your chances are slim to get your feature in…

C at its core did not change much at all and remained focused on its merits. The upgrades mostly contained convenience features like line comments, additional data type definitions and multithreading.

Honest evolution – breaking backwards compatibility

In Python we have something I would call “honest evolution”. Python 3 introduced some breaking changes with the goal of a cleaner and more consistent language. Python 2 and 3 are incompatible so the distinction in the version number is fair. I like this approach of moving forward as it clearly communicates what is happening and gets rid of the sins in the past.

The downside is that many systems still come with both, a Python 2 and a Python 3 interpreter and accompanying libraries. Fortunately there are some options and tools for your code to mitigate some of the incompatibilities, like the __future__ module and python-six.

At some point in the future (expected in 2020) there will only support for Python 3. So start making the switch.

Quick and dirty is a skill

Being clean coders we build our software based on quality and reflect on how we do it. We set internal standards in code and UIs, we write tests, we polish.
But there are times when all this focus on quality is obstructive. Times when we need to learn something. For example: at a start of a project when fundamental questions like is it feasible, how should this interaction work, what’s the right order of steps are unanswered, learning needs to be as cheap as possible.
Here quick and dirty is important. The problem is our ego. We want to polish it, we want to build real software with a sound structure. But quality takes time. The problem is quality is not important when answering the fundamental project questions, learning is. May be a mockup in Powerpoint is enough? (not even writing code? ugh). A simple sketch on a piece of paper. Or maybe just a quick demo hacked together in an afternoon.
I know these suggestions may insult our pride. But we need to focus on what’s important: sometimes that’s quality, sometimes that’s speed.
Decades ago when I started coding, quick and dirty wasn’t a problem. Everything I wrote was quick and dirty. I was learning all the time. Over time I got better at developing software, structuring applications and building robust systems. But quick and dirty was lost along the way.
When you write something for the purpose of learning it can happen that you are wrong and all the code has to be thrown away. If it was just 2 hours patching something together that’s okay, but what if you spent a whole week? Just like writing quality software, quick and dirty is a skill in itself and as with other skills we need to practice it.
But beware this is not only a problem at the start of a project: often as developers we tend to overthink something, we plan for every possible outcome, imagine scenarios with weirdly acting users or systems. This is the time to stop and implement something to learn. To get feedback. Not to overanalyse or overdesign. Just release something and test it with real users, it doesn’t need to be part of the software in production, just use a demo or a staging environment. But if you need to learn something, focus on that, not on quality.

The project jugglers

By Usien (Own work) [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia CommonsIn this blog post, I will shine some light on a feature of our company that is often met with disbelief: How five developers can work on twenty projects at once without being stressed. I try to use the metaphor of a juggler, though I know nothing about juggling other than it can be done. I cannot hold more than one object in the air at any given time and even that is an optimistic estimate. But I’ve seen jugglers keep six to eight objects flying with seemingly no effort. We do this with software development projects.

A layman theory of juggling

A good metaphor can be applied from start to finish. I’ve probably chosen a bad metaphor, but it gives the right initial impression: Every developer at our company leads several projects at once. He (or she) keeps the projects alive and in the “green zone”, the ratio of remaining budget, time and scope (read: work left to be done) that promises little to no trouble in the foreseeable future.

In order to juggle without visible effort, you probably need to practice a lot. You probably drop your objects a lot. You probably need to watch the objects fly in the beginning.

In our case, we needed a lot of practice to reach our level of confidence. We lead development projects for up to 17 years now. Each developer finishes between four and seven projects per year. That’s up to a hundred projects to gain experience from. But we couldn’t drop (read: fail) a lot of projects, because it directly hurt our bottom line. Just imagine you want to learn how to juggle, but all you got are expensive ming vases that you bought from your own money, without insurance. That’s how it feels to “experiment” with projects. So we play it safe and only accept projects we know we can handle. And we watch our projects fly, very very closely. In fact, we have a dedicated position, the “project manager”, with the one duty to periodically ask a bunch of questions to assure that the project is still in the green zone.

Draw the trajectory

Every object that you can juggle has its own characteristics of how it behaves once it is in the air. A good juggler can feel its trajectory and grab it in just the right moment before it would fall out of reach. The trick to keep a software development project in the green zone is the same: Get a hold on it before it ventures too far in an unfortunate direction, which is the natural tendency of all projects. The project lead has to periodically apply effort to keep the project afloat. But when is the right time to invest this effort? Spend it too soon and it has only minimal effect. Spend it too often and you’ll exhaust your power and your budget. But because every project has its own trajectory, too, and you can’t afford to let it slip, you should make the trajectory as visible as possible. You should draw it!

We’ve experimented with a lot of tools and visualizations. The one setup that works best for us is a low technology, high visibility approach. The project lead takes one half of a whiteboard and draws a classic burn-down chart or a variation (we often use a simple vertical progress bar). This chart is hand-drawn and rather crude, but big enough that everybody can see it. It is updated at least every time the project manager comes around to ask his or her questions. One of the questions actually is: “Is this chart up to date?”. The remaining budget of your project needs to be available at a glance, from across the hall. The project lead needs to “feel” this budget. And if it makes him or her nervous, it’s high time to determine the remaining scope and calendar time of the project once again.

In doing this positioning by triangulation on a regular schedule, the project lead draws the trajectory of the project for all three axes and can probably interpolate its future course. He can then apply effort to nudge (or yank, if things got worse fast) the project back on track.

Without the visibly drawn trajectory, your project lead is like a juggler in the dark, tossing unknown objects in the air and hoping that they’ll fall in place somehow.

Know your limit

As the complete noob to juggling that I am, I imagine that jugglers have a secret dress code like martial artists (watch their belts!), where other jugglers can read how many objects they can hold up at once. Something like buttons on the vest or the length of a scarf. So the beginning 3-objects juggler bows in awe to the master 12-juggler, who himself is star-struck by the mighty 18-juggler that happens to attend the same meet-up.

In our company, this “dress code” would be based on the number of projects you are leading. And just like with the jugglers, it is important that you know your current limit. There is no use in over-extending yourself, if you accidentally let one project slip, the impact is big enough that you’ll fail your other projects, too. Just like the juggler loses his or her rhythm, you’ll lose your “flow”.

The most important part of juggling many projects is that you always juggle one less than you are capable of. You need reaction time if one of them topples over. A good juggler can “rescue” the situation with subtle speedup or extra movements because the delay between necessary actions allows for it. A good project lead has emergency reserves to spend without compromising other projects.

There is nothing wrong to start with two projects and add more later on when you are more confident. But don’t start with only one. You can only form habits of resource sharing if you share from the beginning. Even I can pose as a competent 1-juggler, but the lowest bar to juggling has to be two objects.

Box your time

Again, I know nothing about juggling. But from a mathematical viewpoint, juggling is “just” an exercise in timeboxing. If you have four objects in the air, in an arc that requires one second to go around, you’ll be able to spend a quarter second (250 ms) of attention to each object on each rotation. The master 12-juggler from above can only afford 1/12th or 80 milliseconds for each object. If he takes longer for one object, the next one will suffer. If he has no time reserves, a jam will build up and ultimately break the routine.

So, as a project lead, you need to apply timeboxes on all of your projects. They don’t need to be of equal size, but small enough that you can multiplex between your projects fast and often enough. A time box is a fixed-size amount of time that you allot to a specific task. The juggler uses a time box to put the next falling object flying back up. In our lunch break, we allot 60 minutes to food, beverages and some amount of walking. And if the process of eating takes longer than usual (I’m a chronic slow-eater, I can’t help it)? Then I have to go back partially hungry because I’ll end my lunch break on time. That’s the most important characteristic of a time box: You either succeed “in time” or interrupt or even cancel the task. The juggler will drop the one iffy object instead of risking a complete breakdown of the arc. You need to let your problematic task go (for the moment) instead of spreading the problem onto your other projects, too.

We’ve found that the amount of “one workday” is the most natural and easiest to manage time box. So we try our best to partition our week in the granularity of days and not our days in the granularity of hours. One aspect that helps tremendously is to have different physical locations for different projects. So you can be physically present “in the project” or “too far away at the moment” from the project. You plan your work week in locations as much as you plan it in project time boxes. The correlation of workdays, locations and projects is so strong that it doesn’t even seem to be timeboxing or project multiplexing. You just happen to be in the right place to work on project X for today. This is how you can juggle up to five projects without having to compromise all that much (provided you have a five-day work week).

If you can’t physically relocate your work, at least try to have a fixed schedule for your projects, like the “project A monday” or the “project X friday”. This might also mean to postpone emerging issues with project X until next friday. You need to build up skills to negotiate these delays with your customers. If your customers can dictate your schedule, you’ll get torn to shreds in no time. It’s friday or no day for issues on project X – at least as a good start for heavy bargaining. But that’s a whole topic for another blog post. Please leave a comment if you are interested to hear more about it.

Stuff your box

The “one workday” time box has a strong implication: Every little thing you do for a project takes one day. That doesn’t mean you should work for five minutes on project A, completing the task, and then stare into the air and twiddle your thumbs. It means that you should accumulate enough tasks for project A that you can spend the better half of the day on the known tasks and the remaining time on the unknown problems that arise on the way. In the evening, you should be able to finish your work for project A with a feeling of closure. You can put project A aside until next week (or whenever your next cycle is). You can concentrate on project B tomorrow and project C the next day. Both projects didn’t bother you today (well, perhaps a bit, but you only acknowledged some e-mails and deferred any real thoughts on it until you enter their timebox).

Perpetual closure

The feeling of closure at the end of a successful work day is the most important thing that keeps you composed. You’ve done your thinking for project A this week and will think of project B tomorrow. But now, you can rest.

This must be the feeling that the juggler experiences with each object that goes up again. It is out of sight and only needs attention after it has nearly completed its arc again. And now for the next object, one at a time…

 

A Tale of Two Languages

Recently, I presented my mysteriously titled talk “A Tale of Two Languages” at our local C++ user group. Before the talk, I was not really sure whether it would resonate with my audience. But it did, and helped to engage people in a healthy discussion about how to use C++.

Essentially, the talk was about how I am using two different modes or dialects of C++ to write and maintain applications. The title suggests two languages – and it sure can be thought of that way, but for now I’m using the word “modes” to distinguish it from the term programming languages.

You write the application in one mode, while keeping the style relatively easy. In the other mode, you make sure that you can write easy and efficient code in the other, while leveraging the full power of C++.

I call the first application mode and the second library mode.

Library mode

As I said before, this the power mode. One of C++’s design paradigms is self-extension. You are extending the language from the language itself. It’s a very powerful mechanism, the same one that drives the standard library. It’s also why C++ does not have the need for a built-in string type.

This power is a bit of a double-edged sword. On the one hand, it allows you to adapt the language specifically for your needs, for example with application specific value types. For a 3D application, a well designed 2d vector or point type will make your code easier and probably faster. On the other hand, a badly designed type on this level will haunt your application for years to come. I have seen both.

That’s a simple example though. More powerful primitives, such as domain-specific-language like constructs also belong into this mode. In general, things in this mode are less discoverable and less maintainable, but they strive to improve discoverability, efficiency and maintainability on the other side. As a consequence, this code needs more stringent documentation and specification.

Application mode

This is the mode that you use to write the majority of your application. Application mode is all about agility and leveraging opportunities. You intentionally restrict yourself in order to keep your development speed up. Simplicity trumps most other qualities in this mode. If you need another quality to be the defining factor, for example because you need some code to be a little more complex in order to run faster, you should put it into library mode instead.

Unlike code in library mode, this code needs to speak for itself. Therefore, documentation is usually nothing but a duplication.

One important aspect is that this code should be devoid of all subtleties.

Parameter passing and its consequences

That last bit is especially uncommon in C++, where most decisions are really a catch-22. Hence the resulting code hints at the struggles endured while writing it.

For example, to write an efficient function in C++, you need to decide whether to pass each parameter by value, or by reference, or by a pointer. The decision on which to use depends largely on your implementation, i.e. what you are doing with the parameter after it was passed to the function. That usually couples your implementation too tightly to its interface and degrades programmer productivity by giving too many options.

Using a shared-ownership smart-pointer such as std::shared_ptr by default is a good middle ground here. It does the right thing most of the time and is not to far off at most other times. Many other mainstream languages, such as python, go this route. Some frameworks, such as Qt, use that semantic as well.
Like const-correctness, passing all parameters in a std::shared_ptr is viral. Object thus passed need to be created on a the stack, preferably with std::make_shared. You will also store those smart pointers in other objects, so shared_ptr will have quite a lot of screen space. Therefore I usually make an alias:

template using Ptr = std::shared_ptr;

If it’s going to be the default, it should not clutter your code. Since objects are transported in a Ptr by default, they usually do not even need a copy constructor or other “value-like” semantics. These objects are less about maintaining invariants, and more about implementing abstract interfaces and bundling functionality in maintainable chunks. I usually use boost::noncopyable to mark them, though Herb Sutter’s Metaclasses proposal could make this even nicer.
Note that you can still promote them to value types in library mode, should the need arise. But they will become more costly to maintain.

Other simplifications

There are plenty of other things to avoid in application mode. Writing templated types makes your code inherently non-local and dependent on a type that can be anywhere. Note that instantiation of templates from library mode is fine – at that point, all the facts are known.

Another thing that makes your code non-local, and therefore unfitting for application mode, is overloading. Especially in the presence of ADL. For example, which functions are in your actual overload set depends on which headers you include and which using-directives and declarations are active. Sometimes, that is desirable. But not in application mode.

Resolution

Since using this “two modes” approach, I have found that my productivity is much higher – even in older code that went through a lot of evolution. The code does not actually get a lot slower, even with all the smart pointers. In fact, I am sure that I could only optimize a few cases because the design in application mode is a lot more flexible, and the structure more visible.

Text editing tricks

Multiple cursors

Recently I was in a pair programming session, when I noticed that there were three cursors blinking in the editor window of the IDE. I initially assumed it was a bug in the IDE (which is not that uncommon), but it acutally turned out it was a feature. In IntelliJ based IDEs you can place multiple cursors in the editor window, start typing and the typed text gets inserted at all of these positions. To achieve this press Shift+Alt while clicking the mouse to position the cursor carets.

When I start using a new IDE I usually look up and memorize the shortcuts for refactoring operations like “rename” or “extract method” or other essential operations like “quick fix”.

But of course there are many other neat tricks for advanced text editing in most IDEs and programming editors. Here are some of them, in this case for IntelliJ based IDEs.

Rectangular selection

Rectangular selection

This one comes in handy when you have to select a column of text, for example a common prefix of keys in a properties file. I initially knew this type of selection from Vim, but many other programming editors allow it too. For a rectangular selection in IntelliJ based IDEs press Ctrl+Shift+Alt (Shift+Alt+Cmd on Mac) and drag the mouse pointer.

Multiselection

Similar to the multiple cursors feature mentioned at the beginning of this post, you can also select multiple parts of a text at once and then cut, copy or delete them. To achieve this multi-selection press Shift+Alt while selecting text.

Multiple selction

Extending selection

By pressing Ctrl+W repeatedly within a fragment of code the selection will progressively extend. First the current expression under the cursor is selected, then the surrounding code block, then the code block surrounding this code block, etc.

Extending selection

Conclusion

Take some time and look up some of the more advanced text editing capabilities of your IDE or text editor. Adopt them if you find them helpful and share them with your colleagues, for example in a pair programming session.