Keep your ovens clean

Let’s assume for a moment that you are a baker, producing different types of pastries in your small bakery. The production process is always the same: prepare the dough, put it in the oven, wait some time and retrieve the most delicious buns or bread. If we can abstract the real baking process to these steps, it’s the same as with software: prepare the sourcecode, put it in the compiler, wait some time and retrieve the most delicious binary or executable. There is only one difference: The oven of the baker is a self-contained, closed system, while our compilers require a distinct system setup around them in order to produce anything edible. The oven is independent from the kitchen around it, the compiler is depedent on the environment. To finish the analogy, what would a baker say if he can’t bake bread in his oven unless he nurtures a certain type of yeast in his kitchen?

A most unpleasant case

While developing a platform dependent application recently, we met a most unpleasant case of build dependency on a third-party library. It was an old dynamic link library (DLL) that requires registering in the windows registry. There was no other way than to register the DLL using the regsrv32 utility. If you didn’t do this, the build process would abort with an error stating unmet dependencies. If you ran the resulting program on a machine without registered DLL, it would crash with a runtime error complaining about the missing registry entry. And by the way, there are two totally independent regsrv32 utilities on a 64-bit windows system, one for 32-bit and one for 64-bit registrations. No, the name of the latter one isn’t regsvr64, that would be way too easy.

We accepted the fact that you need to prepare your system if you want to run the program, but we quarreled a lot with the nuisance that you need to alter your system just to build the software. This process of alteration is called snowflaking in the DevOp mentality and it’s not a desired activity. We would need to alter every build machine in our continuous integration cluster that comes into contact with the project. And we would need to de-snowflake them again afterwards, because this kind of tinkering adds up to inscrutable side-effects very fast.

A practicable workaround

We found a way around the abovementioned snowflaking for our build servers. It’s not a solution, it’s only a workaround, as it solves the immediate problem but produces some lesser problems on the way. Let’s look at what we did.

At first, our situation could be described with this module diagram:

dependency1We couldn’t modify the problematic DLL itself, it was a given binary. But we could wrap it in our own DLL. Wrapping less pleasant things into something you can control is a proven technique even in baking, by the way. We now had a system layout that looks like this:

dependency2Nothing gained so far, just that we now have a layer outside our system that can provide the functionality of the DLL and is actually under our control. The wrapper really does nothing on its own but to forward each call to the DLL. To profit from this indirection, we need to introduce another module, like this:

dependency3The second module provides the same interface as the first, but does nothing, not even forwarding anywhere. It’s a complete stub, just there to be uncomplicated during the build process. The goal is to build the system using this “empty” DLL and then replace it with the “problematic” DLL afterwards. The only question is: how do we build the problematic DLL? Here’s the workaround part of the solution: We actually had to compile the problematic DLL on a snowflaked system and add it to the project repository. Good thing our target system’s specification is known, so we only need to do this for one platform. Because we are reasonably sure that the DLL interface will not change over time (it had every opportunity in the last ten years and didn’t use it), we can assume that the interface of our two wrapper DLLs also won’t change. So it’s not too problematic to check in a precompiled binary that needs to satisfy an interface that’s reproduced with every build cycle. Still, we need to keep an eye on the method signatures of our two wrapper DLLs. If one of them changes, the modification needs to be replicated on the other wrapper, too. It’s a classic duplication.

When we balanced the duplication in the interfaces of the two wrapper DLLs against the snowflaking of every CI and developer machine, we found our aversion against snow outweighing the other negative aspects. Your mileage may vary.

Conclusion

We kept our build ovens clean by introducing a wrapping layer around the problematic depedency and then using the benefits of indirection by switching to a non-problematic stub during the build cycle. The technique is very old, but still use- and powerful.

Be(a)ware of Laziness

Let’s assume we have a simple JavaScript “class” called Module. Each instance of the class has a name, a start() method and a stop() method to manage its lifecycle:

function Module(name) {
    this.name = name;
    console.log("Creating " + this.name);
}
Module.prototype.start = function() {
    console.log("Starting " + this.name);
};
Module.prototype.stop = function() {
    console.log("Stopping " + this.name);
};

We want to create a couple of instances with the names “a”, “b” and “c”. At the beginning of the program we want to start each module, and at the end of the program we want to stop each module. For the creation of the instances we use a map() function call on the names array:

var names = ["a", "b", "c"];
var modules = names.map(function(name) {
    return new Module(name);
});
modules.forEach(function(module) {
    module.start();
});
// do something
modules.forEach(function(module) {
    module.stop();
});

The output is as intended:

Creating a
Creating b
Creating c
Starting a
Starting b
Starting c
Stopping a
Stopping b
Stopping c

Now we want to port this code to C#. The definition of the class is straight-forward:

class Module
{
    private readonly String name;

    public Module(string name)
    {
        this.name = name;
        Console.WriteLine("Creating " + name);
    }

    public void Start()
    {
        Console.WriteLine("Starting " + name);
    }

    public void Stop()
    {
        Console.WriteLine("Stopping " + name);
    }
}

The map() function is called Select() in .NET:

var names = new List<string>{"a", "b", "c"};
var modules = names.Select(
                 name => new Module(name));

foreach (var module in modules)
{
    module.Start();
}

foreach (var module in modules)
{
    module.Stop();
}

But when we run this program, we get a completely different output:

Creating a
Starting a
Creating b
Starting b
Creating c
Starting c
Creating a
Stopping a
Creating b
Stopping b
Creating c
Stopping c

Each module is created twice, and the creation calls are interleaved with the start() and stop() calls.

What has happened?

The answer is that .NET’s Select() method does lazy evaluation. It does not return a new list with the mapped elements. It returns an IEnumerable instead, which evaluates each mapping operation only when needed. This is a very useful concept. It allows for the chaining of multiple operations without creating an intermediate list each time. It also allows for operations on infinite sequences.

But in our case it’s not what we want. The stopped instances are not the same as the started instances.

How can we fix it?

By appending a .ToList() call after the .Select() call:

var modules = names.Select(
        name => new Module(name)).ToList();

Now the IEnumerable gets evaluated and collected into a list before the assignment to the modules variable.

So be aware of whether your programming language or framework uses lazy or eager evaluation for functional collection operations to avoid running into subtle bugs. Other examples of tools based on the concept of lazy evaluation are the Java stream API or the Haskell programming language. Some languages support both, for example Ruby since version 2.0:

range.collect { |x| x*x }
range.lazy.collect { |x| x*x }

Object slicing – breaking polymorphic objects in C++

C++ has one pitfall called “object slicing” alien to most programmers using other object-oriented languages like Java. Object slicing (fruit ninja-style) occurs in various scenarios when copying an instance of a derived class to a variable with the type of (one of) its base class(es), e.g.:

#include <iostream>

// we use structs for brevity
struct Base
{
  Base() {}
  virtual void doSomething()
  {
    std::cout << "All your Base are belong to us!\n";
  }
};

struct Derived : public Base
{
  Derived() : Base() {}
  virtual void doSomething() override
  {
    std::cout << "I am derived!\n";
  }
};

static void performTask(Base b)
{
  b.doSomething();
}

int main()
{
  Derived derived;
  // here all evidence that derived was used to initialise base is lost
  performTask(derived); // will print "All your Base are belong to us!"
}

Many explanations of object slicing deal with the fact, that only parts of the fields of derived classes will be copied on assignment or if a polymorphic object is passed to a function by value. Usually this is ok because most of the time only the static type of the Base class is used further on. Of course you can construct scenarios where this becomes a problem.

I ran into the problem with virtual functions that are sliced off of polymorphic objects, too. That can be hard to track down if you are not aware of the issue. Sadly, I do not know of any compilers that issue warnings or errors when passing/copying polymorphic objects by value.

The fix is easy in most cases: Use naked pointers, smart pointers or references to pass your polymorphic objects around. But it can be really hard to track the issue down. So try to define conventions and coding styles that minimise the risk of sliced objects. Do not avoid using and passing values around just out of fear! Values provide many benefits in correctness and readability and even may improve performance when used with concrete classes.

Edit: Removed excess parameters in contruction of derived. Thx @LorToso for the comment and the hint at resharper C++!

VB.NET for Java Developers – Updated Cheat Sheet

The BASIC programming language (originally invented at Dartmouth College in 1964) and Microsoft share a long history together. Microsoft basically started their business with the licensing of their BASIC interpreter (Altair BASIC), initially developed by Paul Allan and Bill Gates. Various dialects of Microsoft’s BASIC implementation were installed in the ROMs of many home computers like the Apple II (Applesoft BASIC) or the Commodore 64 (Commodore BASIC) during the 1970s and 1980s. A whole generation of programmers discovered their interest for computer programming through BASIC before moving on to greater knowledge.

BASIC was also shipped with Microsoft’s successful disk operating system (MS-DOS) for the IBM PC and compatibles. Early versions were BASICA and GW-BASIC. Original BASIC code was based on line numbers and typically lots of GOTO statements, resulting in what was often referred to as “spaghetti code”. Starting with MS-DOS 5.0 GW-BASIC was replaced by QBasic (a stripped down version of Microsoft QuickBasic). It was backwards compatible to GW-BASIC and introduced structured programming. Line numbers and GOTOs were no longer necessary.

When Windows became popular Microsoft introduced Visual Basic, which included a form designer for easy creation of GUI applications. They even released one version of Visual Basic for DOS, which allowed the creation of GUI-like textual user interfaces.

Visual Basic.NET

The current generation of Microsoft’s Basic is Visual Basic.NET. It’s the .NET based successor to Visual Basic 6.0, which is nowadays known as “Visual Basic Classic”.

Feature-wise VB.NET is mostly equivalent to C#, including full support for object-oriented programming, interfaces, generics, lambdas, operator overloading, custom value types, extension methods, LINQ and access to the full functionality of the .NET framework. The differences are mostly at the syntax level. It has almost nothing in common with the original BASIC anymore.

Updated Cheat Sheet for Java developers

A couple of years ago we published a VB.NET cheat sheet for Java developers on this blog. The cheat sheet uses Java as the reference language, because today Java is a lingua franca that is understood by most contemporary programmers. Now we present an updated version of this cheat sheet, which takes into account recent developments like Java 8:

Creating a GPS network service using a Raspberry Pi – Part 2

In the last article we learnt how to install and access a GPS module in a Raspberry Pi. Next, we want to write a network service that extracts the current location data – latitude, longitude and altitude – from the serial port.

Basics

We use Perl to write a CGI-script running within an Apache 2; both should be installed on the Raspberry Pi. To access the serial port from Perl, we need to include the module Device::SerialPort. Besides, we use the module JSON to generate the HTTP response.

use strict;
use warnings;
use Device::SerialPort;
use JSON;
use CGI::Carp qw(fatalsToBrowser);

Interacting with the serial port

To interact with the serial port in Perl, we instantiate Device::SerialPort and configure it according to our hardware. Then, we can read the data sent by our hardware via device->read(…), for example as follows:

my $device = Device::SerialPort->new('...') or die "Can't open serial port!";
//configuration
...
($count, $result) = $device->read(255);

For the Sparqee GPSv1.0 module, the device can be configured as shown below:

our $device = '/dev/ttyAMA0';
our $baudrate = 9600;

sub GetGPSDevice {
 my $gps = Device::SerialPort->new($device) or return (1, "Can't open serial port '$device'!");
    $gps->baudrate($baudrate);
    $gps->parity('none');
    $gps->databits(8);
    $gps->stopbits(1);
    $gps->write_settings or return (1, 'Could not write settings for serial port device!');
    return (0, $gps);
}

Finding the location line

As described in the previous blog post, the GPS module sends a continuous stream of GPS data; here is an explanation for the single components.

$GPGSA,A,3,17,09,28,08,26,07,15,,,,,,2.4,1.4,1.9*.6,1.7,2.0*3C
$GPRMC,031349.000,A,3355.3471,N,11751.7128,W,0.00,143.39,210314,,,A*76
$GPGGA,031350.000,3355.3471,N,11751.7128,W,1,06,1.7,112.2,M,-33.7,M,,0000*6F
$GPGSA,A,3,17,09,28,08,07,15,,,,,,,2.6,1.7,2.0*3C
$GPGSV,3,1,12,17,67,201,30,09,62,112,28,28,57,022,21,08,55,104,20*7E
$GPGSV,3,2,12,07,25,124,22,15,24,302,30,11,17,052,26,26,49,262,05*73
$GPGSV,3,3,12,30,51,112,31,57,31,122,,01,24,073,,04,05,176,*7E
$GPRMC,031350.000,A,3355.3471,N,11741.7128,W,0.00,143.39,210314,,,A*7E
$GPGGA,031351.000,3355.3471,N,11741.7128,W,1,07,1.4,112.2,M,-33.7,M,,0000*6C

We are only interested in the information about latitude, longitude and altitude, which is part of the line starting with $GPGGA. Assuming that the first parameter contains a correctly configured device, the following subroutine reads the data stream sent by the GPS module, extracts the relevant line and returns it. In detail, it searches for the string $GPGGA in the data stream, buffers all data sent afterwards until the next line starts, and returns the buffer content.

# timeout in seconds
our $timeout = 10;

sub ExtractLocationLine {
    my $gps = $_[0];
    my $count;
    my $result;
    my $buffering = 0;
    my $buffer = '';
    my $limit = time + $timeout;
    while (1) {
        if (time >= $limit) {
           return '';
        }
        ($count, $result) = $gps->read(255);
        if ($count <= 0) {
            next;
        }
        if ($result =~ /^\$GPGGA/) {
            $buffering = 1;
        }
        if ($buffering) {
            my $part = (split /\n/, $result)[0];
            $buffer .= $part;
        }
        if ($buffering and ($result =~ m/\n/g)) {
            return $buffer;
        }
    }
}

Parsing the location line

The $GPGGA-line contains more information than we need. With regular expressions, we can extract the relevant data: $1 is the latitude, $2 is the longitude and $3 is the altitude.

sub ExtractGPSData {
    $_[0] =~ m/\$GPGGA,\d+\.\d+,(\d+\.\d+,[NS]),(\d+\.\d+,[WE]),\d,\d+,\d+\.\d+,(\d+\.\d+,M),.*/;
    return ($1, $2, $3);
}

Putting everything together

Finally, we convert the found data to JSON and print it to the standard output stream in order to write the HTTP response of the CGI script.

sub GetGPSData {
    my ($error, $gps) = GetGPSDevice;
    if ($error) {
        return ToError($gps);
    }
    my $location = ExtractLocationLine($gps);
    if (not $location) {
        return ToError("Timeout: Could not obtain GPS data within $timeout seconds.");
    }
    my ($latitude, $longitude, $altitude) = ExtractGPSData($location);
    if (not ($latitude and $longitude and $altitude)) {
        return ToError("Error extracting GPS data, maybe no lock attained?\n$location");
    }
    return to_json({
        'latitude' => $latitude,
        'longitude' => $longitude,
        'altitude' => $altitude
    });
}

sub ToError {
    return to_json({'error' => $_[0]});
}

binmode(STDOUT, ":utf8");
print "Content-type: application/json; charset=utf-8\n\n".GetGPSData."\n";

Configuration

To execute the Perl script with a HTTP request, we have to place it in the cgi-bin directory; in our case we saved the file at /usr/lib/cgi-bin/gps.pl. Before accessing it, you can ensure that the Apache is configured correctly by checking the file /etc/apache2/sites-available/default; it should contain the following section:

ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/
<Directory "/usr/lib/cgi-bin">
    AllowOverride None
    Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
    Order allow,deny
    Allow from all
</Directory>

Furthermore, the permissions of the script file have to be adjusted, otherwise the Apache user will not be able to execute it:

sudo chown www-data:www-data /usr/lib/cgi-bin/gps.pl
sudo chmod 0755 /usr/lib/cgi-bin/gps.pl

We also have to add the Apache user to the user group dialout, otherwise it cannot read from the serial port. For this change to come into effect the Raspberry Pi has to be rebooted.

sudo adduser www-data dialout
sudo reboot

Finally, we can check if the script is working by accessing the page <IP address>/cgi-bin/gps.pl. If the Raspberry Pi has no GPS reception, you should see the following output:

{"error":"Error extracting GPS data, maybe no lock attained?\n$GPGGA,121330.326,,,,,0,00,,,M,0.0,M,,0000*53\r"}

When the Raspberry Pi receives GPS data, they should be given in the browser:

{"longitude":"11741.7128,W","latitude":"3355.3471,N","altitude":"112.2,M"}

Last, if you see the following message, you should check whether the Apache user was correctly added to the group dialout.

{"error":"Can't open serial port '/dev/ttyAMA0'!"}

Conclusion

In the last article, we focused on the hardware and its installation. In this part, we learnt how to access the serial port via Perl, wrote a CGI script that extracts and delivers the location information and used the Apache web server to make the data available via network.

Declaration-site and use-site variance explained

A common question posed by programming novices who have their first encounters with parametrized types (“generics” in Java and C#) is “Why can’t I use a List<Apple> as a List<Fruit>?” (given that Apple is a subclass of Fruit) Their reasoning usually goes like this: “An apple is a fruit, so a basket of apples is a fruit basket, right?”

Here’s another, similar, example:

Milk is a dairy product, but is a bottle of milk a dairy product bottle? Try putting a Cheddar cheese wheel into the milk bottle (without melting or shredding the cheese!). It’s obviously not that simple.

Let’s assume for a moment that it was possible to use a List<Apple> as a List<Fruit>. Then the following code would be legal, given that Orange is a subclass of Fruit as well:

List<Apple> apples = new ArrayList<>();
List<Fruit> fruits = apples;
fruits.add(new Orange());

// what's an orange doing here?!
Apple apple = apples.get(0);

This short code example demonstrates why it doesn’t make sense to treat a List<Apple> as a List<Fruit>. That’s why generic types in Java and C# don’t allow this kind of assignment by default. This behaviour is called invariance.

Variance of generic types

There are, however, other cases of generic types where assignments like this actually could make sense. For example, using an Iterable<Apple> as an Iterable<Fruit> is a reasonable wish. The opposite direction within the inheritance hierarchy of the type parameter is thinkable as well, e.g. using a Comparable<Fruit> as a Comparable<Apple>.

So what’s the difference between these generic types: List<T>, Iterable<T>, Comparable<T>? The difference is the “flow” direction of objects of type T in their interface:

  1. If a generic interface has only methods that return objects of type T, but don’t consume objects of type T, then assignment from a variable of Type<B> to a variable of Type<A> can make sense. This is called covariance. Examples are: Iterable<T>, Iterator<T>, Supplier<T>inheritance
  2. If a generic interface has only methods that consume objects of type T, but don’t return objects of type T, then assignment from a variable of Type<A> to a variable of Type<B> can make sense. This is called contravariance. Examples are: Comparable<T>, Consumer<T>
  3. If a generic interface has both methods that return and methods that consume objects of type T then it should be invariant. Examples are: List<T>, Set<T>

As mentioned before, neither Java nor C# allow covariance or contravariance for generic types by default. They’re invariant by default. But there are ways and means in both languages to achieve co- and contravariance.

Declaration-site variance

In C# you can use the in and out keywords on a type parameter to indicate variance:

interface IProducer<out T> // Covariant
{
    T produce();
}

interface IConsumer<in T> // Contravariant
{
    void consume(T t);
}

IProducer<B> producerOfB = /*...*/;
IProducer<A> producerOfA = producerOfB;  // now legal
// producerOfB = producerOfA;  // still illegal

IConsumer<A> consumerOfA = /*...*/;
IConsumer<B> consumerOfB = consumerOfA;  // now legal
// consumerOfA = consumerOfB;  // still illegal

This annotation style is called declaration-site variance, because the type parameter is annotated where the generic type is declared.

Use-site variance

In Java you can express co- and contravariance with wildcards like <? extends A> and <? super B>.

Producer<B> producerOfB = /*...*/;
Producer<? extends A> producerOfA = producerOfB; // legal
A a = producerOfA.produce();
// producerOfB = producerOfA; // still illegal

Consumer<A> consumerOfA = /*...*/;
Consumer<? super B> consumerOfB = consumerOfA; // legal
consumerOfB.consume(new B());
// consumerOfA = consumerOfB; // still illegal

This is called use-site variance, because the annotation is not placed where the type is declared, but where the type is used.

Arrays

The variance behaviour of Java and C# arrays is different from the variance behaviour of their generics. Arrays are covariant, not invariant, even though a T[] has the same problem as a List<T> would have if it was covariant:

Apple[] apples = new Apple[10];
Fruit[] fruits = apples;
fruits[0] = new Orange();
Apple apple = apples[0];

Unfortunately, this code compiles in both languages. However, it throws an exception at runtime (ArrayStoreException or ArrayTypeMismatchException, respectively) in line 3.

C++ inheritance for Java developers

Both Java and C++ are modern programming languages with native support for object oriented programming (OOP). While similar in syntax and features there are a bunch of differences in implementation and (default) behaviour which can be surprising for Java developers learning C++ or vice versa. In my post I will depict the basics of inheritance and polymorphism in C++ and stress the points Java developers may find surprising.

Basic inheritance or user defined types

Every Java programmer knows that all classes have java.lang.Object at the root of their inheritance hierarchy. This is not true for C++ classes: They do not share some common base class and do not inherit more or less useful methods like toString() oder clone(). C++ classes do not have any member functions besides the constructor and destructor by default. In Java all instance methods are virtual by default, meaning that if a subclass overrides a method from a superclass only the overridden version will be visible and callable on all instances of the subclass. We will see the different behaviour in the following example:

#include <iostream>
#include <memory>

using namespace std;

class Parent
{
public:
  void myName() { cout << "Parent\n"; }
  virtual void morph() { cout << "Base class\n"; }
};

class Child : public Parent
{
public:
  void myName() { cout << "Child\n"; }
  virtual void morph() { cout << "Child class\n"; }
};

int main()
{
  // initialisations more or less equivalent to
  // Parent parent = new Parent(); etc. in Java
  unique_ptr parent(new Parent);
  unique_ptr parentPointerToChild(new Child);
  unique_ptr child(new Child);
  parent->myName(); // prints Parent as expected
  parent->morph(); // prints Base class as expected
  parentPointerToChild->myName(); // surprise: prints Parent!
  parentPointerToChild->morph(); // prints Child class as expected
  child->myName(); // prints Child as expected
  child->morph(); // prints Child class as expected
  return 0;
}

The difference to Java becomes visible in line 29 where we call to an instance of Child using the type Parent. Since myName() is not virtual the implementation in the parent class is not overridden but only shadowed by the subclass. Depending on the type of our variable on which we call the method either the parent method or the child method (line 31) is invoked.

Access modifiers in Java and C++ are almost identical as both are offering public, protected and private. As there are no packages in C++ protected restricts access to child classes only.

Different syntax and fewer keywords

There are no keywords for interface or abstract class in C++ but the concepts are supported by pure virtual functions and multiple inheritance. So a class that defines a virtual function without a body like:


class Interface
{
public:
  virtual void m() = 0;
};

becomes abstract and cannot be instanciated. Subclasses must provide implementations for all pure virtual functions or become abstract themselves. A class having exclusively pure virtual functions serves as the equivalent of an Java interface in C++. Since C++ supports inheritance from multiple classes you can easily use these features for interface segregation (see ISP). Most people advise against multiple implementation inheritance although C++ makes that possible, too.

Class interface control

One problem with inheritance in Java is that you always inherit all non-private methods of the base classes. This can lead to large and sometimes unfocused interfaces when using inheritance blindly. Delegation gives you more control about your classes interface and means less coupling but requires a bit more writing effort. While you can (and should most of the time) do the same in C++ it offers more control to your classes interface using private/protected inheritance where you can decide on an per-method-basis which functions to expose and what virtual functions you would like to override:

class ReuseMe
{
public:
    void somethingGreat();
    void somethingSpecial();
};

// does expose somethingGreat() but hides sometingSpecial()
class PrivateParent : private ReuseMe
{
public:
    using ReuseMe::somethingGreat;
};

Children also gain access to their parents protected members.

Using super class functions and delegating constructors

Calling the functions of the parent class or other constructors looks a bit different in C++, as there is no super keyword and this is only a pointer to the current instance (please excuse the contrived example…):

class CoordinateSet
{
public:
    string toString() {
        stringstream s;
        std::copy(coordinates.begin(), coordinates.end(), std::ostream_iterator<double>(s, ","));
        return s.str();
    }
protected:
    vector<double> coordinates;
};

class Point : public CoordinateSet
{
public:
    // delegating constructor
    Point() : Point(0, 0) {}
    Point(double x, double y)
    {
        coordinates.push_back(x);
        coordinates.push_back(y);
    }
    // call a function of the parent class
    void print() { cout << CoordinateSet::toString() << "\n"; }
};

I hope my little guide helps some Java people getting hold of C++ inheritance features and syntax.