How we distribute our backups geographically

We are a software development company, so all of our most valueable assets are constantly endangered by hardware failure. We regularly do risk assessments in regard to data security and over the years created a fine-tuned system of duplication and doubled duplication to prevent data loss. Those assessments aren’t really complicated, you basically sit down, relax and think about your deepest fears on a certain topic. Then you write them down and act on their avoidance or circumvention. Here’s an example of some results:

  • No data transfer over unsecured internet connections
  • No single point of failure
  • No single area of failure

The last result is of particular interest today: We want to prevent data loss in case of “area-based desaster”, like a whole-building fire or meteorite impact. Well, to be clear on the meteorite scenario, it is both highly improbable and dangerous. If the meteorite happens to be just a bit bigger than average, we won’t worry about backups anymore because we all live in a perimeter around our company. Yes, worst-case scenarios are always morbid.

Stages of data-loss prevention

We have several measures in effect to prevent data-loss in place. Technologies like RAID drives and processes like daily backups and several copies of that backups make sure that we always have at least one copy of all important data even in the most drastic locally confined desaster. But to adhere to the first rule that no data transfer can happen over unsecured internet connections and to make sure that an internet connection isn’t a single point of failure that may compromise data security, we had to come up with a way to distribute our backups in a physical manner without much effort.

The backup export disks

Our system relies on three facts:

  • Small and resilient hard drives with high capacity are affordable
  • Every home of our employees can be an unique backup storage location
  • If we take turns, the effort is low for everybody, but high enough to be effective

So we bought an “backup export disk” for every employee. It’s an 2,5″ USB-powered hard drive with enough storage capacity to keep our most important data. All export disks are registered at the backup distribution system that can, upon connect, provide them with the most current backup. And a little “backup export token” that gets passed from employee to employee in a predetermined order. The token is just a piece of cardboard that says “tag, you are it!”.

Our backup export process

So what do you have to do when you find the “backup export token” on your desk? Just five easy steps:

  • Bring your backup export disk next day (this is the hardest part: remembering to bag the disk at home)
  • Plug it into the backup distribution system (a specific computer in off-state with an USB-cable) and switch it on
  • Wait for the system to do its job. This will take a while, but you’ll get an e-mail at completion, so just wait for the e-mail to arrive
  • Unplug the backup export disk and take it back home (store it in a dry and safe place)
  • Forward the backup export token to the next employee in line

That’s all there is to the obvious process. Some more things happen behind the scenes, but the process mostly relies on the effect of repetition by several operators.

Simple and effective

This process ensures that our backup gets “exported” at least thrice a week to different locations. All in all, we store our backup in at least five locations with a maximum age of two weeks. The system can scale up (or down) without limitation, so it won’t change even if we double or triple the location count or the export frequency. And any individual disk cannot be compromised as the data is secured by strong encryption, so there is no need to restrict physical access to it on the storage locations (like using a safe) or fret if a disk would get lost.

Decentralized, but supervised

Every time a backup export disk is connected to the backup distribution system, the disk’s health figures and remaining space is reported to the administrators. Using this information, we can also reconstruct the distribution history and fetch the most current disk in an emergency case. If a disk shows its age, it gets replaced by a new one without effort. We only need to tell the backup distribution system about it and associate it with an employee so that the e-mail is sent to the right person.


By assigning our employees with the core mechanics of keeping the backups distributed and automating the rest, we reached a level of data security that even protects against area effect scenarios.

The work experience improvement budget (“Kreativbudget”)

We at the Softwareschneiderei are a small team of software developers working in a founder-owned company. We develop software since 15 years now and have experimented with a lot of management ideas and concepts. We can conclude that a lot of things don’t work for us while others are highly effective. There is no guarantee that anything we do works anywhere else, so don’t expect wonders just because it works wonders for us. But we are willing to share nearly every detail of our management style, and here is another bit of it: the “creativity budget”.

I’ve already blogged about this idea five years ago, but it’s still a good (and fairly uncommon) idea, so why not do it again? The name “creativity budget” (“Kreativbudget” in german) is actually really bad, but it stuck and we cannot realistically change it anymore. A more fitting name would be “work experience improvement budget” or something similar. The core of the idea is simple: Every employee can spend a certain amount of money every year to improve his/her own work experience. The investment doesn’t need to be profitable, the improvement doesn’t need to be effective, whatever was bought, the employee never needs to justify it. It’s just company money that the employee can rule over to improve the company in his/her fashion.

The actual ruleset is fairly simple: In recent years, the amount was defined to be 1000 EUR per year for each employee, regardless of actual job (development or administration, for example). Our students could invest half the amount (500 EUR). You don’t need to buy coffee or food, your work computer or laptop, all the basics are provided outside of the budget. You shouldn’t spend the budget on silly things just to get rid of it, but if you have an idea – even a crazy one – and think, “hey, that would be cool to have”, you just need to create a “purchase order” issue in our administration issue tracker and flag it as “on creativity budget”. We will buy it right away, without further discussion.

Why the creativity budget?

The most competent person to improve the work experience of an employee is he himself. Every hurdle we impose between him and his improvement ideas, like bureaucratic overhead or reviews, will only damage the improvement effect, but not improve the financial situation of the company. Our financial situation is directly linked to the productivity and happiness of all employees, so we will actually damage it by trying to go cheap. Not spending money won’t buy us happiness. And remember, we are a small company. The maximum amount of all creativity budgets combined is still only a small percentage of our total revenue (under 2%). If we can improve our total revenue just a little bit, it is totally worth it. But why speculate? We have hard numbers from the last dozen years that show that it works for us.

What did the budget gain us?

The most important gain is making room for errors. If you have to plea and convince higher-ups of an improvement, it better has convincing figures and a realistic chance of success. If not, you are the moron that suggested it. Using our budget, we can try crazy things and never need to explain ourselves. If it doesn’t work – who cares? If it works – well, you were the first, now we need to implement it for everyone.
We try things earlier. New technologies like solid state disks were frowned upon in the beginning – how long do they last, etc. We tried them early and got convinced quicker than most (but that’s another blog post).
We don’t calculate improvements first. One of the most common refusals for a new idea is the worry “what if everybody wants one?”. That’s the fear of upscaling paired with the fear of failure. What if the idea works and is a huge improvement and nobody wants it? We rather err on the side of monetary losses instead of productivity loss.

But what did it gain us precisely?

Well, to answer that, I have to present you the three categories of improvements we identified (without limiting the budget to them!):

  • Hardware: A certain piece of technology believed to make work easier or more enjoyable. Examples are computer mouses (everyone has his favorite mouse), keyboards, monitor upgrades (if the default double 24″ aren’t enough), SSDs (before we got rid of spindle disks) or even your favorite computer brand. It gained us fine-tuned workplaces that fit perfectly with the developers using them – no “one size fits all”.
  • Software: A computer program that you’d like to use even if that requires license costs. Examples are IDEs, editors, version control clients or even screenshot utilities. Don’t get me wrong – we had all these things before, but mostly open source products. If you want a commercial twin of a software, you don’t have to argue. It made our software landscape more diverse and introduced some products for the whole company – SmartGit is the example of choice.
  • Wetware: An activity you’d like to undertake – in the professional context of your job. You want to visit that certain conference? Have paid training on a specific topic? This category introduced us to some conferences that are worth revisiting and some we’ve already forgotten again. We got trainings and went to workshops, without any upfront filtering or “strategic planning”.

We’ve gained a lot of agility in pursuing technical excellence, each of us on his/her own course. We gained the insight that “work experience” is something we can directly influence and steer. It makes already self-confident employees even more confident. And it relieves the boss from important, but highly individual micro-management (but that’s just my own personal gain from it all).


In giving every employee the power to improve his/her direct work experience, we improved our overall experience even more. In all these years, we never used up the budgets completely, but the effect is very noticeable. We acted on impulse, tried it out, reflected and adopted it if worthwhile. And it was very worthwhile indeed. Currently, we discuss the idea to double or even triple the budget per year and see where it leads us.

Recap of the Schneide Dev Brunch 2014-12-14

brunch64-borderedIn mid-december, we held another Schneide Dev Brunch, a regular brunch on a sunday, only that all attendees want to talk about software development and various other topics. If you bring a software-related topic along with your food, everyone has something to share. The brunch was well-attended and we didn’t even think about using the roof garden (cold and rainy). There were lots of topics and chatter. As always, this recapitulation tries to highlight the main topics of the brunch, but cannot reiterate everything that was spoken. If you were there, you probably find this list inconclusive:

International brunch

We tried to establish a video conference with a guest from San Francisco and had tried the technical implementation beforehands. But we didn’t succeed, mostly because of a sudden christmas party on the USA side. So we can’t really say if the brunch character is preserved even if you join us in the middle of the (local) night.

How much inheritance do you use?

One question was how inheritance is used in the initial development of systems. Is it a pre-planned design feature or something that helps to resolve difficult programming situations in an ad-hoc manner? How deep are the inheritance levels?
The main response was that inheritance is seldom used upfront. The initial implementations are mostly free of class hierarchies. Inheritance is often used after the fact to extract abstractions (or generalizations) from the code. The hierarchies mostly grow “upwards” from the concrete level to abstract superclasses.
Another use case of inheritance is the handling of special cases with further specialization through subclasses. The initial class is modified just enough to enable proper insertion of the new code in its own subclass.
A third use case of inheritance, upfront this time, was proposed in regard of the domain model. Behavioural typing is a common motivation for the usage of inheritance in the model, as contrasted to the technical usage of inheritance to solve non-domain problems. In the domain level, inheritance resembling a “behaves-like” relation can be the most powerful expression of actual connections between types.

Book review “Analysis patterns”

The discussion about inheritance led to questions about domain models and their expression through formal notation. An example about accounts resulted in a short review of the book “Analysis Patterns”, written by Martin Fowler in 1999. The book introduces its own notation for models to be able to express the interrelations without being dragged down into the implementation level. UML isn’t suited as it’s a notation from the technical domain. Overall, the book seems to be mostly overlooked and under-appreciated. It contains a lot of valueable wisdom in the area of domain analysis, an activity that has to be done upfront of any larger project. This “upfront activity” characteristics might have led to it being ignored in most agile processes. The book is a perfect companion to Eric Evan’s “Domain-Driven Design”.

Book review “Agile!”

Another book review of this brunch was a deep review of Bertrand Meyer’s book “Agile! The Good, the Hype and the Ugly”. The book is the written opinion of Mr. Meyer in regard of all current agile processes and very polarizing as such – he does state his points clearly. But it’s also a very well-researched assessment of nearly all aspects of agile software development. You might want to argue with certain conclusions, but you’ll have to admit that Mr. Meyer knows what he’s talking about and got his facts right (even if his temper shines through sometimes). This book is the perfect companion to all the major agile books you’ve read. It serves as a counter-balance to the dogmatic views that sometimes come across. And it serves as a (albeit personal) rating of all agile practices, a gold mine for every project manager out there. the book itself is rather short with some reiterations (you’ll get the major points, even if you skip some pages) and written in an informal tone, so it’s an easy read as long as you’re neutral towards the topic.
When we reviewed the rating of agile practices on a big whiteboard, ranging from ugly to brilliant, it didn’t took long until discussions started. If nothing else, this book will help you review your practices and beliefs.

Embedded Agile on the rise

The next topic was related to agile software development, too. In the large field of embedded software development, adoption of agile practices lagged behind substantially. This has many reasons, of which we discussed a few, but the more interesting trend was that this changes. While there is still a considerable lack of literature for embedded software overall, the number of publications advocating modifications to the agile processes to fit the intricacies of embedded software development is steadily increasing.
A similar trend can be observed in the user experience community (think: user interface designers), termed “lean UX“.

Mobile game presentation

A long-awaited highlight of this brunch was the presentation of a mobile platforms game under development by one attendee. It’s a cool-looking Jump-and-Run game in the tradition of Super Mario, with lots of gimmicks and innovative effects. The best part of the presentation was the gameplay, controlled by the developer from behind the device, upside down and with live commentary. The game is developed in a platform-agnostic manner using several frameworks and suitable coding habits. Right now, it’s in its final phase of development and will be released soon. I don’t want to spoil too much beforehands and invite Martin (the author) to insert a comment below with links leading to more information.

A change in the Dev Brunch mechanics

The last topic on our agenda was a short review of the Dev Brunch series in the last years. In 2013, we introduced the extra “workshop events” that were adapted to the “game nights” in 2014. We want to return to more serious topics in 2015 and revive the workshops. Attendees (and future ones) are invited to make suggestions which workshop they would like to see. The Dev Brunch itself will be formalized further by introducing a steady pace of bi-monthly dates.


As usual, the Dev Brunch contained a lot more chatter and talk than listed here. The number of attendees makes for an unique experience every time. We are looking forward to the next Dev Brunch at the Softwareschneiderei. And as always, we are open for guests and future regulars. Just drop us a notice and we’ll invite you over next time.

TANGO device server step-by-step tutorial

Now that we learned about TANGO in general and the architecture of device servers it is time to get our hands dirty. Here is a step-by-step tutorial for making your software remotely accessible as TANGO devices.

We will develop a small C++ class that can provide us the current time and date as a string and then build a device server that makes our functionality available over TANGO to remote clients. Our plain C++ project structure looks like this:


Here are our CMake build files:

cmake_minimum_required(VERSION 2.8)



and for the TimeProvider


add_library(time TimeProvider.cpp)

add_executable(timeprovider main.cpp)
target_link_libraries(timeprovider time)

And the C++ sources for our standalone application:

#include <string>

class TimeProvider
    TimeProvider() {}

    const std::string now();


#include "TimeProvider.h"

#include <ctime>

const std::string TimeProvider::now()
    time_t now = time(0);
    struct tm time;
    char timeString[100];
    time = *localtime(&now);
    strftime(timeString, sizeof(timeString), "%Y-%m-%d %X", &time);
    return timeString;


#include <iostream>
#include "TimeProvider.h"

int main()
    TimeProvider tp;
    std::cout << << std::endl;
    return 0;

Next we create a new subdirectory “TimeDevice” and add it to our toplevel CMakeLists.txt along with the TANGO package lookup:

pkg_check_modules(TANGO tango>=7.2.6 REQUIRED)


In this newly created directory we now run the Pogo application with pogo TimeDevice from our TANGO installation to generate our device server skeleton:

Pogo-Create Deviceand add the Attribute:Pogo-AddAttributeso the result looks like:


Now we need to add the generated sources to our CMake build like this:



# this is needed because of wrong generation of include statements
# you may correct them in generated code because they are in protected regions



add_executable(time_device_server ${SOURCES})

As the last step, we implement the code for the CurrentTime attribute like this:

void TimeDevice::read_CurrentTime(Tango::Attribute &attr)
	DEBUG_STREAM << "TimeDevice::read_CurrentTime(Tango::Attribute &attr) entering... " << endl;
	/*----- PROTECTED REGION ID(TimeDevice::read_CurrentTime) ENABLED START -----*/

    attr_CurrentTime_read = new Tango::DevString;
    TimeProvider timeProvider;
    *attr_CurrentTime_read = Tango::string_dup(;
    //	Set the attribute value
    attr.set_value(attr_CurrentTime_read, 1, 0, true);

	/*----- PROTECTED REGION END -----*/	//	TimeDevice::read_CurrentTime

For other correct implementations of string attributes see the documentation on the TANGO website.
Now we should end up with a ready to run TANGO device server executable.

If  you structure your project with hindsight you can integrate your drivers or services in your TANGO control system with very low effort. In the next post we we will show how to add a device server to a TANGO database and use its facilities like device properties for configuration or jive for inspection of a device.

Feel free to download the full source code of this tutorial.

How I find the source of bugs

You know the situation: a user calls or emails you to tell you your program has a problem. When you are lucky he lists some steps he believe he did to reproduce the behaviour. When you are really lucky those steps are the right ones.
In some cases you even got a stacktrace on the logs. High fives all around. You follow the steps the problem shows and you get the exact position in the code where things get wrong. Now is a great time to write a test which executes the steps and shows the bug. If the bug isn’t data dependent you normally can nail it with your test. If it is dependent on the data in the production system you have to find the minimal set of data constraints which causes the problem. The test fails, fixing it should make it green and fix the problem. Done.
But there are cases where the problem is caused not in the last action but sometime before. If the data does not reflect that the problem is buried in layers between like caches, in memory structures or the particular state the system is in.
Here knowledge of the frameworks used or the system in question helps you to trace back the flow of the state and data coming to the position of the stack trace.
Sometimes the steps do not reproduce the behaviour. But most of the time the steps are an indicator for how to reproduce the problem. The stack trace should give you enough information. If it doesn’t, take a look in your log. If this also does not show enough info to find the steps you should improve your log around the position of the strack trace. Wait for the next instance or try your own luck and then you should have enough information to find the real problem.
But what if you have no stack trace? No position to start your hunt? No steps to reproduce? Just a message like: after some days I got an empty transmission happening every minute. After some days. No stack trace. No user actions. Just a log message that says: starting transmission. No error. No further info.
You got nothing. Almost. You got a message with three bits of info: transmission, every minute and empty.
I start with transmission. Where does the system transmit data. Good for the architecture but bad for tracing the transmission is decoupled from the rest of the system by using a message bus. Next.
Every minute. How does the system normally start recurring processes? By quartz, a scheduler for Java. Looking at the configuration of the live system no process is nearly triggered every minute. There must be another place. Searching the code and the log another message indicates a running process: a watchdog. This watchdog puts a message on the bus if it detects a problem. Which is then send by the transmission process. Bingo. But why is it empty?
Now the knowledge about the facilities the system uses comes into play: UMTS. Sometimes the transmission rate is so low that the connection does not transfer any packages. The receiving side records a transmission but gets no data.
Most of the time the problem can be found in your own code but all code has bugs. If you assume after looking at your code that the frameworks you use have a bug. Search the bug database of the framework hopefully it is found there and already fixed.

The four rules of data safety

firefly-gunOne of the most dangerous objects to handle is guns. No wonder there are strict and understandable rules how to handle them safely. The Canadians have The Four Firearm ACTS, but for this blog entry, I will cite the Four Rules stated by Captain Ira L. Reeves right before the first world war and restated by Colonel Jeff Cooper:

  1. All guns are always loaded
  2. Never let the muzzle (the business end of a gun) cover anything you are not willing to destroy
  3. Keep you finger off the trigger until your sights are on the target
  4. Be sure of your target and what is beyond it

Even if you accidentally break one rule (for example, rule 3 is often blatantly disobeyed on television), there are still enough precautions in place to keep you (and everybody around you) relatively safe. The rules are meant to instill a certain amount of respect for the gun into the owner so that offloading of responsibility isn’t possible any more, as in the line “I know this gun is unloaded, so it’s probably mighty fun to point it at somebody”.

The guns of software development

In software development, the most dangerous objects we can handle is user-created data or inputs. To mitigate the risks we take when we accept inputs from our users (and most software would be pretty useless otherwise), we have the concept of validation: Before anything other may happen with the data, it needs to be validated, meaning “proved to be free of danger”. Improper input validation is so prevalent in software development that it has its own CWE number (CWE-20) and ranked number 1 on the Top 25 list of “most dangerous programming errors”.

There are some concepts ready to help us tackle this task. The most promising is the Taint checking that treats all input as dangerous and therefore unworthy of further usage unless proven otherwise. Taint checking reminds you of validation, but not how to validate and isn’t available in most programming languages, unfortunately. What we need is a language agnostic set of rules that shape our behaviour in a way that we can’t make the most common mistakes of validation. It seems that gun owners have tried the same and succeeded. So Let’s formulate our Four Rules of data safety, inspired by the gun rules.

Our four rules

  1. All data always contains malicious aspects
  2. Never accept input for modules you cannot afford to have hacked
  3. Leave input data alone until you actually want to use it
  4. Be sure what aspects to validate and how to do it properly

This is just a starting ground for discussion, let’s call it the first version of the Four Rules. Here is my motivation for each rule:

All data always contains malicious aspects

Most users of most systems are in no way harmful. But if they attempt to harm a system, it better stands prepared. Problem is, even with a thorough validation in your current context, there is always the possibility that your attacker plays a rail shot, entering the system here, but causing damage somewhere else. A good example of this practice were images with Javascript code in their metadata. An adequate validation of uploaded images would check for a valid image format, but don’t mind the “dead content” in the meta tags. A browser would later discover the Javascript and execute it – a classic cross-site scripting attack. Never treat any data as fully validated. If you know that your particular code is vulnerable to a specific threat, let’s say a zero value in a variable used as a divisor, validate once more against this threat. This practice is also contained in the idea of Defensive programming.

Never accept input for modules you cannot afford to have hacked

Behind this rule lies a simple truth: Everything that can be hacked will be hacked, given enough time. The only protection against any hack is no access at all (like in “some air between network cable and network card”). If for example you run a certificate authority and absolutely cannot risk losing your secret private key, the machine using this key must not be connected to any network. If your database contains data much too valuable to be “stolen”, the database shouldn’t be accessible directly – and all access need to be validated beforehand. You need to think about a pragmatic compromise for your scenario when following this rule, but you’ve always been warned.

Leave input data alone until you actually want to use it

This was the most difficult rule for me to decide on. The rationale is that even the slightest bit of validation is actually usage of the input. Given enough knowledge about the validation, an attacker could possibly attack the system by abusing weaknesses in the validation itself (see rule 1 for inspiration). Any contact with input data is dangerous, even when it happens with the best intentions. The downside is that you won’t have a stronghold security architecture, where a mighty wall separates the danger zone from friendly territory (or tainted from cleaned data). Remember that even persisting the input data is using it in some form.

Be sure what aspects to validate and how to do it properly

If the time has come to use the input and to validate it right before, you need to think deep about the threats you want to eliminate. Just like with guns, where real bullets (as opposed by “television bullets”) won’t stop at the shooter’s convenience, your validation has consequences beyond an immediate gain of security. A common error is the rushed countermeasure, when you think of a specific threat and immediately try to abolish it. Take your time and think deep! For example, if your users can enter way too high values, it’s of no use to constrain the input field length, because direct web requests and notations like “1E9″ are still possible. But converting an input string to a number to check its value might not be the smartest idea, too. Not long ago, you could crash nearly every application by entering a certain “number of death”. Following this rule requires experience and lots of reading, learning and thinking. And even then, there’s always somebody smarter than you, so ultimately, you should plan your system under the impression of rule 2.

As stated, this is just a starting point to try to formulate rules for data validation that provide a behaviour framework that avoids the most common mistakes and pitfalls. I’m highly interested to hear your thoughts about this topic. Please leave a comment below – but be gentle with the comment validation algorithm.

TANGO device server architecture

In my previous post I explained the basics of TANGO and why you probably want to use TANGO for development of a distributed system. Now I would like to explain how to build and design a TANGO device server. There are several best practices and even a comprehensive and ever evolving guide you should definately have a look at.

General Approach

I like to think about TANGO as a thin wrapper around some software object. That means almost all logic and hardware/platform dependent stuff is implemented in the software object which should provide all services the TANGO wrapper needs. Usually you will design an opinionated library supporting your use cases and encapsulating platform, hardware and driver issues and leaves out the stuff you do not need.

TANGO Server - ArchitectureThe opinionated library has no dependencies on TANGO and can be use in different clients independently of TANGO. The TANGO device classes mostly delegate to the library and manage just the TANGO specific things like device state, synchronisation, allowed methods and so on.

TANGO Server Architecture

As said before the TANGO device that makes use of the software component developed with TANGO in mind contains only short methods doing parameter conversion and some TANGO book keeping and life-cycle-management. The design of the server itself is an interesting part in itself though. Often it pays off to implement several devices in one (or more) TANGO servers that perform different tasks and provide special interfaces to their clients.

For example, a multi-axis motor controller could export one device per axis, so clients can move the axes independently in a natural fashion by denoting the respective axis by its device name. Alongside there may be some controller device that provides access to controller functionality not specific to a single axis like a stop all axes command. Sometimes it is helpful to let the axis devices talk to the controller and not directly to the component you are trying to expose via TANGO. That way you can for example synchronise access to the component with TANGO framework functionality on the controller device.

For imaging systems like CCD cameras or other detectors additional devices for image transformations, persisting the images or additional buffering may be a good decision. Such devices can be made largely independent of the actual hardware or imaging system which makes for nice reuse and plug-able functionality.

So it is good to think about the different tasks and aspects your TANGO server should perform and separate them into specialised devices. That should make each device itself clearer and enables specialised service interfaces for different clients. Your devices become easier to use and many parts may be even reusable. We try to standardise on device interfaces every time we identify general abstractions. That makes it much easier for the clients to work with your exposed TANGO devices.