This is the beginning of a long tale about an examined software project. It is too long to tell in one blog post, so I cut it in three parts. The first part will describe the initial situation and high-level observations. The second part will dive deep into the actual source code and reveal some insights from there. The third part wraps everything together and gives some hints on how to avoid being examined with such a negative result.
We made contact with a software product, lets call it “the application”, that was open for adoption. The original author wanted to get rid of it, yet it was a profitable asset. Some circumstances in this tale are altered to conceal and protect the affected parties, but everything else is real, especially on the technical level.
You can imagine the application as being the coded equivalent to a decommissioned aircraft carrier (coincidentally, the british Royal Navy tries to sell their HMS Invincible right now). It’s still impressive and has its price, but it will take effort and time to turn it around. This tale tells you about our journey to estimate the value that is buried in the coded equivalent of old rusted steel, hence the name “scrap metal code” (and this entry’s picture).
Some basic facts about the application: The software product is used by many customers that need it on a daily basis. It is developed in plain Java for at least three years by a single developer. The whole project is partitioned in 6 subprojects with references to each other. There are about 650 classes with a total of 4.5k methods, consisting of 85k lines of code. There are only a dozen third-party dependencies to mostly internal libraries. Each project has an ant build script to create a deployable artifact without IDE interference. On this level, the project seems rather nice and innocent. You’ll soon discover that this isn’t the truth.
Read the last paragraph again and look out for anything that might alert you about the fives major failures that I’m about to describe. In fact, the whole paragraph contains nothing else but a warning. We will look at five aspects of the project in detail: continuity, modularization, size, dependencies and build process. And we won’t discover much to keep us happy. The last paragraph is the upmost happiness you can get from that project.
You’ve already guessed it: Not a single test. No unit test, zero integration tests and no acceptance test other than manually clicking through the application guided by the user manual (which we only hoped would exist somewhere). No persisted developer documentation other than generated APIDoc, in which the only human-written entries were abbreviated domain specific technical terms. We could also only hope that there is a bug tracker in use or else the whole project history would be documented in a few scrambled commit messages from the SCM (one thing done right!).
The whole project was an equally distributed change risk. The next part will describe some of the inherent design flaws that prohibited changes from having only local effects. Every feature could possibly interact with every other piece of code and would probably do so if you keep trying long enough.
It’s no use ranting about something that isn’t there. Safety measures to ensure the continuity of development on the application just weren’t there. FAIL!
The six modules are mostly independent, but have references to types in other modules (mostly through normal java imports). This would not cause any trouble, if the structure of the references was hierarchical, with one module on top and other modules only referencing moduls “higher” in the hierarchy. Sadly, this isn’t the case, as there is a direct circular dependency between two modules. You can almost see the clear hierarchical approach that got busted on a single incident, ruining the overall architecture. You cannot use Eclipse’s “project dependencies” anymore, but have to manually import “external class folders” for all projects now. The developer has forsaken the clean and well supported approach for a supposable short-term achievement, when he needed class A of module X in the context of module Y and didn’t mind the extra effort to think about a refactoring of the type and package structure. What could have been some clicks in your IDE (or an automatic configuration) will now take some time to figure out where to import which external folder and what to rebuild first because of the cycle. FAIL!
The project isn’t giantic. Let’s do some math to triangulate our expectations a bit. One developer worked for three years to pour out nearly 90k lines of code (with build scripts and the other stuff included). That’s about 30k lines per year, which is an impressive output. He managed to stuff these lines in 650 classes, so the average class has a line count of 130 lines of code. Doesn’t fit on a screen, but nothing scary yet. If you distribute the code evenly over the methods, it’s 19 lines of code per method (and 7 methods per class). Well, there I get nervous: twenty lines of code in every method of the system is a whole lot of complexity. If a third of them are getter and setter methods, the line count rises to an average of 26 lines per method. I don’t want every constructor i have to use to contain thirty lines of code!
To be sure what code complexity we are talking about, we ran some analysis tools like JDepend or Crap4j. The data from Crap4j is very explicit, as it categorizes each method into “crappy” or “not crappy”, based on complexity and test coverage (not given here). We had over 14 percent crappy methods, in absolute numbers roughly 650 crappy methods. That is one crappy method per class. The default percentage gamut of Crap4j ends at 15 percent, the bar turns red (bad!) over 5 percent. So this code is right at the edge of insanity in terms of accumulated complexity. If you want to know more about this, look forward to the next parts of this series.
Using the CrapMap, we could visualize the numerical data to get an overview if the complexity is restricted to certain parts of the application. You can review the result as a picture here. Every cell represents a method, the green ones are okay while the red ones are not. The cell size represent the actual complexity of the method. As you can see, the “overly complex code syndrome” is typically for virtually all the code. Whenever a method isn’t a getter or setter (the really tiny dark green square cells), it’s mostly too complex. Additional numbers we get from the Crap4j metric are “Crap” and “Crap Load”, stating the amount of “work” necessary to tame a code base. Both values are very high given the class and method count.
All the numbers indicate that the code base is bloated, therefore constantly using the wrong abstraction level. Applying non-local changes to this code will require a lot of effort and discipline from an experienced developer. FAIL!
The project doesn’t use any advanced mechanism of dependency resolution (like maven or ant ivy). All libraries are provided alongside the source code. This isn’t the worst option, given the lack of documentation.
A quick search for “*.jar” retrieves only a dozen files in all six modules. That’s surprisingly less for a project of this size. Further investigation shows some inconvenient facts:
- Some of these libraries are published under commercial licenses. This cannot always be avoided, but it’s an issue if the project should be adopted.
- Most libraries provide no version information. At least a manifest entry or an appended version number in the filename would help a lot.
- Some libraries are included multiple times. They are present for every module on their own, just waiting to get out of sync. With one library, this has already happened. It’s now up to the actual classpath entry order on the user’s machine how this software will behave. The (admittedly non-present) unit tests would not safeguard against the real dependency, but the local version of the library, which could be newer or older.
As there is no documentation about the dependencies, we can only guess about their scope: Maybe the classes are required at compile time but optional at runtime? The best bet is to start with the full set and accept another todo entry on the technical debt list. FAIL!
But wait, for every module, there is a build script. A quick glance shows that there are in fact four build scripts for every module. All of them are very similar with minor differences like which configuration file gets included and what directory to use for a specific fileset. Nothing some build script configuration files couldn’t have handled. Now we have two dozen build scripts that all look suspiciously copy&pasted. Running one reveals the next problem: All these files contain absolute paths, as if the “works on my machine award” was still looking for a winner. When we adjusted the entries, the build went successful. The build script we had to change was a messy collection of copied code snippets (if you want to call ant’s XML dialect “code”). You could tell by the different formatting, naming and solution finding styles. But besides being horribly mangled, the build included code obfuscation and other advanced topics. Applied to the project, it guaranteed that no stacktrace from any user would ever contain useful information for anybody, including the project’s developers. FAIL!
Lets face the facts: The project behind the application fails on every aspect except delivering value to the current customers. While the latter is the most important ingredient of a successful project, it cannot be the only one and is only sustainable for a short period of time. The project suffers from the lonely superhero syndrome: one programmer knows everything (and can defend every design decision, even the ridiculous ones) and has no incentive to persist this knowledge. And the project will soon suffer from the truck factor: The superhero programmer will not be available anymore soon.
There are a lot of take-away lessons from this project, but I have to delay them until part three. In the next part, we’ll discover the inner mechanics and flaws of the code base.