Keep your ovens clean

Let’s assume for a moment that you are a baker, producing different types of pastries in your small bakery. The production process is always the same: prepare the dough, put it in the oven, wait some time and retrieve the most delicious buns or bread. If we can abstract the real baking process to these steps, it’s the same as with software: prepare the sourcecode, put it in the compiler, wait some time and retrieve the most delicious binary or executable. There is only one difference: The oven of the baker is a self-contained, closed system, while our compilers require a distinct system setup around them in order to produce anything edible. The oven is independent from the kitchen around it, the compiler is depedent on the environment. To finish the analogy, what would a baker say if he can’t bake bread in his oven unless he nurtures a certain type of yeast in his kitchen?

A most unpleasant case

While developing a platform dependent application recently, we met a most unpleasant case of build dependency on a third-party library. It was an old dynamic link library (DLL) that requires registering in the windows registry. There was no other way than to register the DLL using the regsrv32 utility. If you didn’t do this, the build process would abort with an error stating unmet dependencies. If you ran the resulting program on a machine without registered DLL, it would crash with a runtime error complaining about the missing registry entry. And by the way, there are two totally independent regsrv32 utilities on a 64-bit windows system, one for 32-bit and one for 64-bit registrations. No, the name of the latter one isn’t regsvr64, that would be way too easy.

We accepted the fact that you need to prepare your system if you want to run the program, but we quarreled a lot with the nuisance that you need to alter your system just to build the software. This process of alteration is called snowflaking in the DevOp mentality and it’s not a desired activity. We would need to alter every build machine in our continuous integration cluster that comes into contact with the project. And we would need to de-snowflake them again afterwards, because this kind of tinkering adds up to inscrutable side-effects very fast.

A practicable workaround

We found a way around the abovementioned snowflaking for our build servers. It’s not a solution, it’s only a workaround, as it solves the immediate problem but produces some lesser problems on the way. Let’s look at what we did.

At first, our situation could be described with this module diagram:

dependency1We couldn’t modify the problematic DLL itself, it was a given binary. But we could wrap it in our own DLL. Wrapping less pleasant things into something you can control is a proven technique even in baking, by the way. We now had a system layout that looks like this:

dependency2Nothing gained so far, just that we now have a layer outside our system that can provide the functionality of the DLL and is actually under our control. The wrapper really does nothing on its own but to forward each call to the DLL. To profit from this indirection, we need to introduce another module, like this:

dependency3The second module provides the same interface as the first, but does nothing, not even forwarding anywhere. It’s a complete stub, just there to be uncomplicated during the build process. The goal is to build the system using this “empty” DLL and then replace it with the “problematic” DLL afterwards. The only question is: how do we build the problematic DLL? Here’s the workaround part of the solution: We actually had to compile the problematic DLL on a snowflaked system and add it to the project repository. Good thing our target system’s specification is known, so we only need to do this for one platform. Because we are reasonably sure that the DLL interface will not change over time (it had every opportunity in the last ten years and didn’t use it), we can assume that the interface of our two wrapper DLLs also won’t change. So it’s not too problematic to check in a precompiled binary that needs to satisfy an interface that’s reproduced with every build cycle. Still, we need to keep an eye on the method signatures of our two wrapper DLLs. If one of them changes, the modification needs to be replicated on the other wrapper, too. It’s a classic duplication.

When we balanced the duplication in the interfaces of the two wrapper DLLs against the snowflaking of every CI and developer machine, we found our aversion against snow outweighing the other negative aspects. Your mileage may vary.

Conclusion

We kept our build ovens clean by introducing a wrapping layer around the problematic depedency and then using the benefits of indirection by switching to a non-problematic stub during the build cycle. The technique is very old, but still use- and powerful.

Recap of the Schneide Dev Brunch 2015-08-09

brunch64-borderedTwo weeks ago, we held another Schneide Dev Brunch, a regular brunch on the second sunday of every other (even) month, only that all attendees want to talk about software development and various other topics. So if you bring a software-related topic along with your food, everyone has something to share. The brunch was well attented this time with enough stuff to talk about. As usual, a lot of topics and chatter were exchanged. This recapitulation tries to highlight the main topics of the brunch, but cannot reiterate everything that was spoken. If you were there, you probably find this list inconclusive:

News on Docker

Docker is the hottest topic among developers and operators in 2015. No wonder we started chatting about it the minute we sat down. There are currently two interesting platform projects that provide runtime services for docker: Tutum (commercial) and Rancher (open source). We all noted the names and will check them out. The next interesting fact was that Docker is programmed in the Go language. The team probably one day decided to give it a go.

Air Conditioning

We all experienced the hot spell this summer and observed that work in the traditional sense is impossible beyond 30° Celsius. Why there are still so few air conditioned offices in our region is beyond our grasp. Especially since it’s possible to power the air condition system with green electricity and let sun-power deal with the problem that, well, the sun brought us. In 2015 alone, there are at minimum two work weeks lost to the heat. The productivity gain from cooling should outweigh the costs.

License Management

We talked about how different organisations deal with the challenge of software license management. Nearly every big company has a tool that does essentially the same license management but has its own cool name. Other than that, bad license management is such a great productivity killer that even air conditioning wouldn’t offset it.

Windows 10

Even if we are largely operation system agnostic, the release of Windows 10 is hot news. A few of our participants already tried it and concluded that “it’s another Windows”. A rather confusing aspect is the split system settings. And you have to abdicate the Cortana assistant if you want to avoid the data gathering.

Patch Management

A rather depressing topic was the discussion about security patches. I just repeat two highlights: A substantial number of servers on the internet are still vulnerable to the heartbleed attack. And if a car manufacturer starts a big recall campaign with cost-free replacements, less than 10 percent of the entitled cars are actually fixed on average. These explicitely includes safety-critical issues. That shouldn’t excuse us as an industry for our own shortcomings and it’s not reassuring to see that other industries face the same problems.

Self-Driving Cars

We disgressed on the future hype topic of self-driving cars. I can’t reiterate the complete discussion, but we agreed that those cars will hit the streets within the next ten years. The first use case will be freight transports, because the cargo doesn’t mind if the driver is absent and efficiency matters a lot in logistics. Plus, machines don’t need breaks. Ok, those were enough puns on the topic. Sorry.

Tests on Interfaces

An interesting question was how to build tests that can ensure a class or interface contract. Much like regression tests for recently broken functionality, compatibility tests should deal with backward compatibility issues in the interface. Turns out, the Eclipse foundation gave the topic some thoughts and came up with an exhaustive list of aspects to check. There are even some tools that might come in handy if you want to compare two versions of an API.

API Design

When the topic of API Design came up, some veterans of the Schneide Events immediately mentioned the API Design Fest we held in November 2013 to get our noses bloody on API design. Well, bleed we did. The most important take-away from the Fest was that if you plan to publish an API that can endure some years in production while being enhanced and improved, you just shouldn’t do it. Really, don’t do it, it’s probably a bad idea and you lack the required skill without even knowing it. If you want to know, participate or even host an API Design Fest.

And if you happen to design a web-based API, you might abandon backward compatibility by offering several distinct “versions” of APIs of a service. The version is included in the API URL, and acts more like a name than a version. This will ease your burden a bit. A nice reference resource might also be the PayPal API style guide.

Let’s just agree that API design is really hard and should not be done until it’s clear you don’t suffer from Dunning-Kruger effect symptoms too much.

Performance Tests

We talked about the most effective setup of performance tests. There were a lot of ideas and we cornerstoned the topic around this:

  • There was a nearly heroic effort from the Eclipse development team to measure their IDE performance, especially to compare different versions of the IDE. The Eclipse Test & Performance Tools Platform (TPTP) was (as in: discontinued) a toolkit of interesting approaches to the topic. The IDE itself was measured by performance fingerprints like this example from 2011. As far as we know, all those things ceased to exist.
  • At the last Java Forum Stuttgart, there was a talk about performance testing from an experienced tester that loved to give specific advice. The slides can be viewed online in german language (well, not really, but the talk was).
  • The book Release It! has a lot of insights to this topic. It’s one of the bigger books on the pragmatic bookshelf.
  • The engineers at NetFlix actually did a lot of thinking about the topic. They came up with Hystrix, a resilience library, aimed to make it easier to prevent complete system blackouts. They also came up with Chaos Monkey, a service that makes it easier to have a complete system blackout. If we can say anything about NetFlix, it is that they definitely approach their problems from the right angle.

Company Culture

Leaking over from the previous topic about effective performance-related measures, we talked about different company cultures, especially in regard to a centralized human resources departments and works council (german: Betriebsrat). We agreed that it is very difficult to maintain a certain culture and continued growth. We also agreed that culture trickles down from top management.

OpenGL

The last topic on this Dev Brunch was about the rendering of text or single characters in OpenGL. By using signed distance fields, you can render text more crisp and still only use cheap computation instructions. There is a paper from Valve on the topic that highlights the benefits and gives a list of additional reading. It’s always cool to learn about something simple that actually improves things.

Epilogue

As usual, the Dev Brunch contained a lot more chatter and talk than listed here. The number of attendees makes for an unique experience every time. We are looking forward to the next Dev Brunch at the Softwareschneiderei. And as always, we are open for guests and future regulars. Just drop us a notice and we’ll invite you over next time.

Physical Quantities in C#

Scientific applications usually perform lots of calculations with physical quantities. If you do not represent them properly in your code you run the risk of mixing them up. For example, it’s easy to add a value in meters to a value in kilometers if you simply work with variables of primitive types like double or decimal. It can help to encode the unit in the variable name, e.g. massInKilogram, but it’s much better to let the type system handle it.

So here’s some C# code which models a generic physical quantity and its unit:

public abstract class Quantity<T> where T : Quantity<T>, new()
{
    private decimal value;

    public decimal In(Unit<T> unit)
    {
        return value / unit.Factor;
    }

    public string ToStringIn(Unit<T> unit)
    {
        return string.Format("{0} {1}", In(unit), unit.Text);
    }

    public class Unit<Q> where Q : Quantity<Q>, new()
    {
        public decimal Factor { get; private set; }
        public string Text { get; private set; }

        public Unit(string representation, decimal factor)
        {
            Text = representation;
            Factor = factor;
        }

        public static Q operator *(decimal value, Unit<Q> unit)
        {
            var quantity = new Q();
            quantity.value = value * unit.Factor;
            return quantity;
        }
    }
}

With this base class we can easily implement some quantities:

class Duration : Quantity<Duration>
{
    public static readonly Unit<Duration> Millisecond = new Unit<Duration>("ms", 1e-3M);
    public static readonly Unit<Duration> Second = new Unit<Duration>("s", 1M);
    public static readonly Unit<Duration> Minute = new Unit<Duration>("min", 60M);
}

class Mass : Quantity<Mass>
{
    public static readonly Unit<Mass> Milligram = new Unit<Mass>("mg", 1e-3M);
    public static readonly Unit<Mass> Gram = new Unit<Mass>("g", 1M);
    public static readonly Unit<Mass> Kilogram = new Unit<Mass>("kg", 1e+3M);
}

And this is how they are used:

var t = 30 * Duration.Second;
Console.WriteLine(t.ToStringIn(Duration.Minute));
Console.WriteLine(t.ToStringIn(Duration.Second));
Console.WriteLine(t.ToStringIn(Duration.Millisecond));

var m = 10 * Mass.Gram;
Console.WriteLine(m.ToStringIn(Mass.Milligram));
Console.WriteLine(m.ToStringIn(Mass.Gram));
Console.WriteLine(m.ToStringIn(Mass.Kilogram));

The Unit class uses operator overloading to overload the multiplication operator. Instead of calling a constructor directly we use multiplication with a unit to create new instances of a quantity.

The type system prevents nonsense like this:

// does not compile
(10 * Mass.Gram).In(Duration.Minute);

You will probably want to overload more operators of the quantity classes depending on your use case. You can also overload operators to produce instances of new quantities:

Velocity v = (3.5 * Length.Kilometer) / (10 * Duration.Minute);

class Velocity : Quantity
{
    public static readonly Unit MeterPerSecond = new Unit("m/s", 1M);
}

class Length : Quantity
{
    public static readonly Unit Meter = new Unit("m", 1M);
    public static readonly Unit Kilometer = new Unit("m", 1e+3M);

    public static Velocity operator /(Length s, Duration t)
    {
        return (s.In(Length.Meter) / t.In(Duration.Second)) * Velocity.MeterPerSecond;
    }
}

If you do not want to hand craft your quantities you might want to check out existing libraries for working with quantities like QuantityTypes.

Packaging kernel modules/drivers using DKMS

Hardware drivers on linux need to fit to the running kernel. When drivers you need are not part of the distribution in use you need to build and install them yourself. While this may be ok to do once or twice it soon becomes tedious doing it after every kernel update.

The Dynamic Kernel Module Support (DKMS) may help in such a situation: The module source code is installed on the target machine and can be rebuilt and installed automatically when a new kernel is installed. While veterans may be willing to manually maintain their hardware drivers with DKMS end user do not care about the underlying system that keeps their hardware working. They want to manage their software updates using the tools of their distribution and everything should be working automagically.

I want to show you how to package a kernel driver as an RPM package hiding all of the complexities of DKMS from the user. This requires several steps:

  1. Preparing/patching the driver (aka kernel module) to include dkms.conf and follow the required conventions of DKMS
  2. Creating a RPM spec-file to install the source, tool chain and integrate the module source with DKMS

While there is native support for RPM packaging in DKMS I found the following procedure more intuitive and flexible.

Preparing the module source

You need at least a small file called dkms.conf to describe the module source to the DKMS system. It usually looks like that:

PACKAGE_NAME="menable"
PACKAGE_VERSION=3.9.18.4.0.7
BUILT_MODULE_NAME[0]="menable"
DEST_MODULE_LOCATION[0]="/extra"
AUTOINSTALL="yes"

Also make sure that the source tarball extracts into the directory /usr/src/$PACKAGE_NAME-$PACKAGE_VERSION ! If you do not like /usr/src as a location for your kernel modules you can configure it in /etc/dkms/framework.conf.

Preparing the spec file

Since we are not building a binary and package it but install source code, register, build and install it on the target machine the spec file looks a bit different than usual: We have no build step, instead we just install the source tree and potentially additional files like udev rules or documentation and perform all DKMS work in the postinstall and preuninstall scripts. All that means, that we build a noarch-RPM an depend on dkms, kernel sources and a compiler.

Preparation section

Here we unpack and patch the module source, e.g.:

Source: %{module}-%{version}.tar.bz2
Patch0: menable-dkms.patch
Patch1: menable-fix-for-kernel-3-8.patch

%prep
%setup -n %{module}-%{version} -q
%patch0 -p0
%patch1 -p1

Install section

Basically we just copy the source tree to /usr/src in our build root. In this example we have to install some additional files, too.

%install
rm -rf %{buildroot}
mkdir -p %{buildroot}/usr/src/%{module}-%{version}/
cp -r * %{buildroot}/usr/src/%{module}-%{version}
mkdir -p %{buildroot}/etc/udev/rules.d/
install udev/10-siso.rules %{buildroot}/etc/udev/rules.d/
mkdir -p %{buildroot}/sbin/
install udev/men_path_id udev/men_uiq %{buildroot}/sbin/

Post-install section

In the post-install script of the RPM we add our module to the DKMS system build and install it:

occurrences=/usr/sbin/dkms status | grep "%{module}" | grep "%{version}" | wc -l
if [ ! occurrences > 0 ];
then
    /usr/sbin/dkms add -m %{module} -v %{version}
fi
/usr/sbin/dkms build -m %{module} -v %{version}
/usr/sbin/dkms install -m %{module} -v %{version}
exit 0

Pre-uninstall section

We need to remove our module from DKMS if the user uninstalls our package to leave the system in a clean state. So we need a pre-uninstall script like this:

/usr/sbin/dkms remove -m %{module} -v %{version} --all
exit 0

Conclusion

Packaging kernel modules using DKMS and RPM is not really hard and provides huge benefits to your users. There are some little quirks like the post-install and pre-uninstall scripts but after you got that working you (and your users) are rewarded with a great, fully integrated experience. You can use the full spec file of the driver in the above example as a template for your driver packages.

Universal skills every software developer can benefit from

Disclaimer: I develop software. Professionally for almost 15 years. These are some skills that helped and help me and I think they could help any software developer.

Debug

I cannot tell you how many times debugging saved me. I debug with print statements, with IDEs, with command line debuggers and with my brain. Understanding how a system works is crucial. Which parts are connected and which are not. Asking what if. And asking what happened.

Profile

If things go slow, I need to know why. Users and stakeholders expect a certain speed. And rightly so. But beware: if you optimize for one scenario, others might suffer. Profiling and optimizations are a thing of priority: which tasks should be fast and which can be slow.

Sketch

If I work with or for others, understanding each other and the models and concepts they use is essential. Sketching helps me to illustrate my view, my understanding of their view and the misunderstandings between us. Even when not communicating with others I can communicate with myself. Sketching a model of what is in my head or what I plan helps me reason about it. You don’t need to be a master artist, simple shapes like lines, rectangles, circles and arrows get you a long way.

Concepts (domain and technical)

Everybody thinks in concepts and models. May they be from a technical or a user domain. In my daily work I need to understand, to develop, to extract and to communicate concepts. Concepts come from very different places: code has concepts, domains have concepts, our profession has concepts and all kinds of people have concepts. Concepts form the base for my communication.

Budgeting

Time is limited. Concentration is limited. Constraints in a project help me to focus. To be pragmatic. But they also push me to plan and to estimate. I need to develop a notion of how long a feature takes, how important it is, how risky.

Evaluate

In my work I constantly evaluate. From small scale: which implementation is better, to large scale: which technology, which architecture should I use. To evaluate, I need to know the goals and the criteria. My experience helps me and hinders me. I know that no evaluation can be objective. Every one has his personal favorites (and dislikes). Some things can only be seen afterwards. So I have to remind me that I don’t take too long with evaluation and start using it.

Talk

With other developers I can talk in IT lingo. With designers I need to use words from design. With users and stakeholders I speak so that they understand. My job is not only writing code. My job is to explain my job to others. If they do not understand me, it is my fault, not theirs. I do not need to bother them with every detail but sometimes they are the only ones who can decide. I need to tell them what their options are and what the consequences for each of them is – in their words.

Plan and prepare

They are two kinds of people: the ones who like to prepare and the ones who like to improvise. I am in the middle. Some things can be prepared and planned. It is useful when you can move work ahead of time or when you have (more) options when you need to improvise. Don’t overplan. Remember: a plan is there to be changed.

Improvise

During my career I face new situations every now and then. I cannot plan for them or I didn’t. When I am in a client meeting, in a demo presentation, at the production system and something goes not as planned, I need to do something. Sometimes now. It helps me to have a emergency mode. In these situations I focus on what I have: my brain and my voice and on what I can (maybe) get: help from others, a pencil and a paper, time. And sometimes I need to say: I am sorry.

Lead / own

If I work on an issue, I need to own it. If I lead a project, I need to own it. My career: I need to own it. I am the one who is responsible. That does not mean I have to perform the work myself. I also need to know when I am not the right person for the job. But I need to decide. The work, the project, the career is not a boat which drifts in a giant ocean, I need to take the paddle and use it.

Collaborate

I work not alone. I have teammates. I have clients. My goal is to work with them to a common goal. For this I need to collaborate. To delegate. To talk and to ask. To lead and to follow.

Define goals

Goals are measurable. I can ask: did I reach that goal? And answer with yes or no. Often we define something like: I want to get better at X as a goal. But that isn’t a goal. Think of a goal as a destination, not a direction. It is important to strike the balance between focusing too little or too much on our goals. But without even knowing what the goals are we just wander around.

Reflect and how to get feedback

I confess: I cannot live without feedback for too long. Am I on my way to the goal? Is this code any good? Was this the right decision? Do I make progress? Does it work? Reflection and feedback are stepping stones for me. A base from which I can move to higher mountains.

Ask

Asking is hard. Asking for help is hard. Asking for things you don’t (and maybe should) know is hard. But it helps immensely in learning. Be curious. Asking so that the other understands your question and you get the answers you need, needs practice. So: feel free to ask :)

Be(a)ware of Laziness

Let’s assume we have a simple JavaScript “class” called Module. Each instance of the class has a name, a start() method and a stop() method to manage its lifecycle:

function Module(name) {
    this.name = name;
    console.log("Creating " + this.name);
}
Module.prototype.start = function() {
    console.log("Starting " + this.name);
};
Module.prototype.stop = function() {
    console.log("Stopping " + this.name);
};

We want to create a couple of instances with the names “a”, “b” and “c”. At the beginning of the program we want to start each module, and at the end of the program we want to stop each module. For the creation of the instances we use a map() function call on the names array:

var names = ["a", "b", "c"];
var modules = names.map(function(name) {
    return new Module(name);
});
modules.forEach(function(module) {
    module.start();
});
// do something
modules.forEach(function(module) {
    module.stop();
});

The output is as intended:

Creating a
Creating b
Creating c
Starting a
Starting b
Starting c
Stopping a
Stopping b
Stopping c

Now we want to port this code to C#. The definition of the class is straight-forward:

class Module
{
    private readonly String name;

    public Module(string name)
    {
        this.name = name;
        Console.WriteLine("Creating " + name);
    }

    public void Start()
    {
        Console.WriteLine("Starting " + name);
    }

    public void Stop()
    {
        Console.WriteLine("Stopping " + name);
    }
}

The map() function is called Select() in .NET:

var names = new List<string>{"a", "b", "c"};
var modules = names.Select(
                 name => new Module(name));

foreach (var module in modules)
{
    module.Start();
}

foreach (var module in modules)
{
    module.Stop();
}

But when we run this program, we get a completely different output:

Creating a
Starting a
Creating b
Starting b
Creating c
Starting c
Creating a
Stopping a
Creating b
Stopping b
Creating c
Stopping c

Each module is created twice, and the creation calls are interleaved with the start() and stop() calls.

What has happened?

The answer is that .NET’s Select() method does lazy evaluation. It does not return a new list with the mapped elements. It returns an IEnumerable instead, which evaluates each mapping operation only when needed. This is a very useful concept. It allows for the chaining of multiple operations without creating an intermediate list each time. It also allows for operations on infinite sequences.

But in our case it’s not what we want. The stopped instances are not the same as the started instances.

How can we fix it?

By appending a .ToList() call after the .Select() call:

var modules = names.Select(
        name => new Module(name)).ToList();

Now the IEnumerable gets evaluated and collected into a list before the assignment to the modules variable.

So be aware of whether your programming language or framework uses lazy or eager evaluation for functional collection operations to avoid running into subtle bugs. Other examples of tools based on the concept of lazy evaluation are the Java stream API or the Haskell programming language. Some languages support both, for example Ruby since version 2.0:

range.collect { |x| x*x }
range.lazy.collect { |x| x*x }

Drawing Graphs with Circular Vertices

Graph drawing is a common algorithmical problem applicable to many fields besides software; common examples include the analysis of social networks or the visualization of biological graphs. In this article, I describe a simple extension to a standard force-directed algorithm that allows to draw graphs with circular vertices.

Force-directed algorithms or spring embedders place vertices by assigning forces according to the edges connecting the vertices. These algorithms are intuitive, able to yield solutions of high quality in a reasonable amount of time and can be applied to most kinds of graphs. For conciseness, most fundamentals will be omitted, a basic knowledge of these kind of algorithms is presumed.

Original algorithm

We employ a standard force-directed algorithm as a basis, that is, the algorithm of Fruchterman and Reingold. It assumes vertices to be point-shaped and defines two forces for influencing vertices: An attractive force fattr that pulls connected vertices towards each other and a repulsive force frep that disperses the vertices by repelling them from each other. The absolute value of the forces can be computed as follows:

  • fattr(u, v) = k2 / distance(u, v)
  • frep(u, v) = distance(u, v)2k

The directions of the forces are determined from the positions of the vertices given as two-dimensional vectors; for two vertices, the direction of repulsion and attraction is inverse. The complete force affecting a vertex v is computed by adding the repulsive forces for all other vertices and the attractive forces for all connected vertices together. As shown in the following figure, k describes the distance between two connected vertices whose attractive and repulsive forces are in equilibrium.

original-forces

The factor k is a constant and usually chosen according to the area of the drawing. If the distance between two vertices shrinks towards zero, the repulsive force grows infinitely. Similarly, for two connected vertices the attractive force grows with the distance between them. More information about the original algorithm can be found in the paper “Graph Drawing by Force-directed Placement“.

This approach works fine for point-shaped vertices, however, it cannot deal with two-dimensional vertices: It cannot ensure that vertices do not overlap, but only prevents their centers from touching. Next, I will present a slight modification of the algorithm in order to enable the handling of circular vertices.

Modified algorithm

Our goal is to ensure a minimum distance between all vertices so that their borders can neither touch nor overlap. Additionally, in contrast to the original algorithm where the distance between vertices depends on a constant, the distance of vertices should be determined according to the size of the vertices, that is, smaller vertices may be placed more near to each other than larger vertices.

As illustrated in the next figure, one way to meet the first requirement is to adjust the force functions. If the repulsive force grows infinitely when dwindling to a certain distance, this distance poses a lower bound for the distance of two vertices.

redefined-forcesFurthermore, in order to fulfill the second requirement, the minimum and preferred distance functions are parameterized with the radii of the vertices as follows:

  • dmin(u, v) = ru + rv + bmin · (cmin min(ru, rv) + cmax max(ru, rv))
  • dpref(u, v) = ru + rv + bpref · (cmin min(ru, rv) + cmax max(ru, rv))

The addition of the radii ru and rv ensures that vertices corresponding to the minimum distance never overlap. The constant bx controls the size of the buffer between two vertices subject to their radii as shown in the figure below. Finally, the constants cmin and cmax weigh the influence of the radii; this allows us to place a small vertex more near to a large vertex than another large vertex. The factors bx, cmin and cmax should be positive and cmin and cmax should add up to one.

preferred-distanceOn this basis the force functions can be redefined. Let dactual(u, v) be the actual distance between two vertices u and v and let dX(u, v) be the distance normalized by the minimum distance:

  • dactual(u, v) = |pos(v) – pos(u)|
  • dX(u, v) = dX(u, v) – dmin(u, v)

This results in the following force functions, which comply with the criteria specified before:

frep fattr

Finally, this leads us to the pseudocode of the complete algorithm for layouting a graph G = (V, E):

procedure Layout(G = (V, E))
    initialize temperature t
    for i = 1 to n do
        for each v in V do
            disp(v) = 0
            for each u in V do
                if u ≠ v then
                    Δ = pos(v) - pos(u)
                    disp(v) = disp(v) + Δ/|Δ| · f_rep(u, v)
        for each (u, v) in E do
            Δ = pos(v) - pos(u)
            disp(v) = disp(v) - Δ/|Δ| · f_attr(u, v)
            disp(u) = disp(u) + Δ/|Δ| · f_attr(u, v)
        for each v in V do
            pos(v) = pos(v) + disp(v)/|disp(v)| · min(|disp(v)|, t)
        t = cool(t)

Applications

The drawings produced by this algorithm are not perfect, however, it is possible to employ it in a more complex context. For example, we used it to visualize the structure of software; an example can be seen in the figure below. The techniques behind this other layout algorithm are described in detail here.

software-project