Monitoring long running operations in Oracle databases

We regularly work with database tables with hundreds of millions of entries. Some operations on these table can take a while. Not necessarily queries, but operations in preparation to make queries fast, for example the creation of materialized views or indexes.

The problem with most SQL tools is: once you run your SQL statement you have no indication of how long it will take to complete the operation. No progress bar and no display of the remaining time. Will it take minutes or hours?

Oracle databases have a nice feature I learned about recently that can answer these questions. Operations that take longer than 6 seconds to complete are considered “long operations” and get an entry in a special view called V$SESSION_LONGOPS.

This view does not only contain the currently running long operations but also the history of completed long operations. You can query the status of the current long operations like this:

SELECT * FROM V$SESSION_LONGOPS 
  WHERE time_remaining > 0;

This view contains columns like

  • TARGET (table or view on which the operation is carried out)
  • SOFAR (units of work done so far)
  • TOTALWORK (total units of work)
  • ELAPSED_SECONDS (number of elapsed seconds from the start of the operation)

Based on these values the view offers another column, which contains the estimated remaining time in seconds: TIME_REMAINING.

This remaining time is really just an estimate, because it assumes long running operations to be linear, which is not necessarily true. Also some SQL statements can spawn multiple consecutive operations, e.g. first a “Table Scan” operation and then a “Sort Output” operation, which will only become visible after the first operation has finished. Nevertheless I found this feature quite helpful to get a rough idea of how long I will have to wait or to inform decisions such as whether I really want to perform an operation until completion or if I want to cancel it.

Personas: The great misunderstanding

Reminder: What are personas ?

Personas were first described by Alan Cooper in his ground breaking book “The inmates are running the asylum”:

Our most effective tool is profoundly simple: Develop a precise description of our user and what he wishes to accomplish.

He goes on to define personas as “hypothetical archetypes of actual users” and states that personas “are defined by their goals”.
One of the key points here is that personas are never made up but are grounded in research. They are used to provide condensed information about the result of the user research. Another take away is that a persona description should include its goals.

The misunderstanding

In recent times some designers dumped personas because they are 1) imaginary and 2) defined by attributes that leave out causality. The problem here is that personas are often seen as a collection of mere demographic data (like age, job, income, …). But this only describes marketing personas not the personas imagined by Alan Cooper. As seen in his books the data of a persona is never made up but inferred from user research. Also demographics play only a minor role in creating personas, citing Mr. Cooper again:

Personas are segmented along ranges of user behaviour, not demographics or buying behaviour.

So the behaviour of our users defines the persona not any demographic trait.

The causality mentioned in the criticism misses a vital part of a persona: the scenario. Personas go hand-in-hand with scenarios (by Alan Cooper, About Face):

Persona-based scenarios are concise narrative descriptions of one or more personas using a product or service to achieve specific goals.

and

Scenario content and context are derived from information gathered during the Research phase and analyzed during the Modeling phase

So with these scenarios personas describe the context and the goals and behaviours of our users.

As we see with the criticism the context, goals and motivations of our users are important. Personas and scenarios should not be made up but condensed from research. They are used to say ‘no’ to decisions in the process of designing. A word of warning: do not abstract your persona too far away from your users. One goal of personas are to built empathy. If your personas are too artificial your empathy will suffer. Also I like how Jeff Patton uses research findings: for him they are like vacation photos, if you’ve been there they are reminder what happened.

Consensus

The criticism largely comes from designers favoring the jobs-to-be-done (JTBD) methodology. Jobs-to-be-done is a framework to analyse and describe why a users hires a product or service to get something done. It provides a very useful perspective on the context and behaviour of users. Both approaches (personas and jobs) can be combined. Where personas provide a human connection, jobs provide a contextual one. Shahrzad Samadzadeh provides a sketch how both can be combined with the help of a journey map. All three methods help to balance each approach: the personas help to avoid making the jobs too analytic, the jobs help to ground and limit the personas in research valuable to the problem at hand and journeys can bring all together.

Packaging Python projects for Debian/Ubuntu

Deployment of software using built-in software management tools is very convenient and provides a nice user experience (UX) for the users. For debian-based linux distributions like Ubuntu packaging software in .deb-packages is the way to go. So how can we prepare our python projects for packaging as a deb-package? The good news is that python is supported out-of-the-box in the debian package build system.

Alternatively, you can use the distutils-extension stdeb if you do not need complete flexibility in creating the packages.

Basic python deb-package

If you are using setuptools/distutils for your python project debian packaging consists of editing the package metadata and adding --with python to the rules file. For a nice headstart we can generate templates of the debian metadata files using two simple commands (the debhelper package is needed for dh_make:

# create a tarball with the current project sources
python setup.py sdist
# generate the debian package metadata files 
dh_make -p ${project_name}_${version} -f dist/${project_name}-${version}.tar.gz 

You have to edit at least the control-file, the changelog and the rules-file to build the python package. In the rules-file the make-target % is the crucial point and should include the flag to build a python project:

# main packaging script based on dh7 syntax
%:
	dh $@ --with python

After that you can build the package issueing dpkg-buildpackage.

The caveats

The debian packaging system is great in complaining about non-conformant aspects of your package. It demands digital signatures, correct file and directory names including version strings etc. Unfortunately it is not very helpful when you make packaging  mistakes resulting in empty, incomplete or broken packages.

Issues with setup.py

The setup.py build script has to reside on the same level as the debian-directory containing the package metadata. The packaging tools will not tell you if they could not find the setup script. In addition it will always run setup.py using python 2, even if you specified --with python3 in the rules-file.

Packaging for specific python versions

If you want better control over the target python versions for the package you should use Pybuild. You can do this by a little change to the rules-file, e.g. a python3-only build using Pybuild:

# main packaging script based on dh7 syntax
%:
	dh $@ --with python3 --buildsystem=pybuild

For pybuild to work it is crucial to add the needed python interpreter(s) besides the mandatory build dependency dh-python to the Build-Depends of the control-file, for python3-only it could look like this:

Build-Depends: debhelper (>=9), dh-python, python3-all
...
Depends: ${python3:Depends}

Without the dh-python build dependency pybuild will silently do nothing. Getting the build dependencies wrong will create incomplete or broken packages. Take extra care of getting this right!

Conclusion

Debian packaging looks quite intimidating at first because there are so many ways to build a package. Many different tools can ease package creation but also add confusion. Packaging python software is done easily if you know the quirks. The python examples from the Guide for Debian Maintainers are certainly worth a look!

The personal economics of programming languages

Recently, one of my students asked a good question about what programming languages I would recommend learning. His ideal language would be “syntactically ugly, but giving insights that are universal to programming”. My first reaction was to answer that he has just described Perl, but that was too easy of an answer. So I tried to define the basics about programming languages, starting with the personal economics.

Economics of programming languages

An organization that wants to produce a piece of software needs to answer a lot of questions like “what programming language will be best suited for the task?”. Often, these questions get diluted and rather sound like “what programming language should we stipulate for all our projects, now and forever?”. That’s when politics and economics overlap and intermingle. We can leave this problem for the organizations to solve themselves. But if we scale the question down to an individual programmer – you, what influences are there to find an answer to the question “what programming languages should I learn?”.

I try to answer with the concept of utility: Learn those languages that, over a reasonable time, yield the most “utility”. There are at least two types of utility in our profession: money and joy. You can learn a programming language because your job requires it (money) or because you are curious and/or dig its particularities (joy). Most of the time, a specific programming language contains a mixture of both utilities for you. How you rate those utilities is up to you and probably varies from situation to situation. If you start a private fun project, picking the boring mainstream language from work might get things get done faster, but when would you want the fun to be over sooner?

Let me give two extreme examples for this concept:

  • If you start to learn COBOL now, chances are high that you will achieve two things: You will be disgusted by the language and the existing codebase, but delighted by the salary and job security. COBOL is a high money-utility programming language. It ranks low in any survey or statistics about programming languages, but is widely used in big business today and tomorrow. You might refer to https://blog.hackerrank.com/the-inevitable-return-of-cobol/ for more information.
  • If you start to learn Esterel now, you might experience two things over time: an epiphany about how flawed our concept of time is in most programming languages and an existential crisis because your brain isn’t capable to wrap itself around most sourcecode. Whatever comes first will define your learning success. There are virtually no jobs that require Esterel (even if some might benefit from it) and you can only program and build so many bicycle computers in your spare time (this is a typical introduction project to Esterel). Esterel is a pure joy-utility programming language. You can claim to be proficient in synchronous programming afterwards, but nobody will know what that even is.

A third type of utility

But I think that there might be a third type of utility for personal learning choices based on economics: The stirrup iron utility. Knowledge of some programming languages isn’t useful from a money-driven viewpoint and may lack enjoyability, but it serves as a door-opener to more enjoyable or sellable languages. It serves as an interim utility because it doesn’t have value in itself, but serves as a multiplier for either the money or joy utility. To rate the value of this utility to your career, you need to be clear about your career goals, especially your anticipated skill portfolio.

Skill portfolio shapes

Modern recruitment differentiates between several skill portfolio shapes, most noteably the “I” and “T” shape:

  • Programmers with “I”-shaped skill portfolios are experts in one specific field of programming. They might, for example, be the best C# programmer you’ve ever met. But they flop around like a fish out of water once they need to use another programming language. They will choose their familiar tools for every problem that needs to be solved and will solve it fast if possible or
  • Programmers with “T”-shaped skill portfolios have knowledge across all fields of programming, albeit limited, and drilled down into one field specifically. Why they chose to master their field can mostly be explained with the money or joy utility. They probably gained their broad knowledge base by using stirrup irons.

If you happen to know what’s expected from you until your retirement (let’s say you chose to program in COBOL), the “I”-shape is a viable and efficient strategy to manage your skill portfolio. There is nothing wrong with this approach (as long as it works).

If you have a hunch that you don’t have the capability to invest in broad knowledge, the “I”-shaped skill portfolio is your logical choice. It takes a lot to be able to come to a self-assessment that shows your limitations. It’s a good thing to know your limits and build a career within them. A lot of programmers don’t know their limits and burn out, because not meeting the requirements produce a lot of stress (on both sides). Better be yourself than over-promise and under-deliver constantly.

The “T”-shape means that you need to invest your time wisely. And we are not talking “work time” only, but “life time”, because you’ll probably need to spend your spare time working on your portfolio, too. Becoming a “jack of all trades” programmer is an endeavour of at least ten years without any possibility to shortcut. You need to select your jobs in accordance to your learning strategy and always be receptive to opportunities. You need to improve your learning abilities. You need to do so much at once that I suggest you start by watching Cory House’s talk about “Becoming an Outlier”. He’s spot on with so many things.

Stirrup iron programming languages

There are some programming languages that can be seen as the archetypes of a whole class of languages. Most knowledge of these archetypes can be directly applied or transfered to each language in the class. It’s the language’s concepts that are the real benefit. If you understand the synchronous programming aspect in Esterel, you’ll recognize it straight away in languages like LabView or SIGNAL. It may even just be a part of the other language (like in many multi-paradigm programming languages), but it will be familiar to you.

So what are some stirrup iron languages?

That’s a tough question and I want to place it out there. Can you drop a comment and name the programming language that had the most peculiar influence on your knowledge? I would like to refer to the book Seven Languages In Seven Weeks from the Pragmatic Bookshelf. It covers Ruby, Io, Prolog, Scala, Erlang, Clojure and Haskell. Do you agree with that selection? I would like to hear from you.

There are some ideas about this topic already: The talk “The Future of Programming” from Bret Victor (if you don’t know this guy already, please watch his legendary “Inventing on Principle” too). Richard Astbury presents three “new” hot programming languages (with matching outfits) in his talk “The State of the Art”. And Robert C. Martin is sure to have found “The Last Programming Language”.

One thing is sure: We should train the next generations of programmers in those stirrup iron languages, so they can quickly grasp the language flavour of the year. This is mostly done already, of course, but the students inevitably complain about the “weird” choices. So we need to explain upfront the economics of programming languages.

And, in a lighter tone at the end, there is always the ongoing competition for the worst programming language ever.

My C++ Tool Belt

I suspect that every developer has a “tool belt” that he or she uses to be productive. By that I mean a collection of tools, libraries and whatever else helps. With a few exceptions, these tool belts will probably be language specific, or at least platform specific. As my projects updated their compilers and transitioned to C++11 and beyond, my C++ tool belt changed quite a bit. Since things like threading, smart pointers and functional abstractions where added to the standard library, those are now already included by default. Today I wanna write about what is in my modernized C++11 tool belt.

The Standard Library

Ever since the tr1 extensions, the standard library has progressed into becoming truly powerful and exceptional. The smart pointers, containers, algorithms are much more language extensions than “just” a library, and they play perfectly with actual language features, such as lambdas, auto and initializer lists.

fmtlib

fmtlib provides placeholder-based text formatting a la Python’s String.format. There have been a few implementations of this idea over the years, but this is the first where I think that it might just dethrone operator<< overloading for good. It's fast, stable, portable and has a nice API.
I begin to miss this library the moment I need to work on a project that does not have it.
The next best thing is Qt’s QString::arg mechanism, with slightly inferior API, a less inclusive license, and a much bigger dependency.

spdlog

Logging is a powerful tool, both for software development and maintenance. Chances are you are going to need it at one point. spdlog is my favorite choice for this task. It uses fmtlib internally, which is just another plus point. It’s simple, fast and very nice to use due to reuse of fmtlib’s formatting. I usually just include this in my projects and get the included fmtlib for free.

optional

This one is actually part of the most recent C++17, but since that is not widely available yet (meaning not many projects have adopted it), I’m going to list it explicitly. There are also a few alternative implementations, such as the one in Boost or akrzemi1’s single-header variant.
Unlike many other programming languages, C++ has a relatively high emphasis on value types. While reference types usually have a built-in “not available” state (a.k.a. nullptr, NULL, Nothing or nil), an optional can transport intent much clearer. For value types, however, it’s absolutely mandatory to have an optional type. Otherwise, you just end up wrapping the value in a pointer just to make it optional.
Do not, however, fall into the trap of using optional for error handling. It’s not made for that, and other abstractions, such as expected are much better for that.

CMake

There is really only one choice when it comes to build tools, and that’s CMake. It’s got its own bunch of weaknesses, but the goods far outweight the bads. With the target_ functions, it’s actually quite nice and scales really well to bigger projects. The main downside here is that it still does not play nice with some tools, most notably visual studio. CLion and QtCreator fare much better. Then again, CMake enables the use of other tools easily, such as clang-tidy.

A word on Boost

Boost is no longer the must-have it once was. Much of the mandated functionality has already been incorporated into the standard library. It is no longer a requirement for a sane C++ project. On the contrary, boost is notoriously huge and somewhat cumbersome to integrate. Boost is not a library, it is a collection of libraries, therefore you can still decide whether to use Boost on a library by library basis. However, much of that is viral, and using a small part of Boost will easily drag in a few hundreds of other Boost headers. The libraries I tend to include most often are Boost.Utility (for boost::noncopyable) and Boost.Filesystem. The former is obviously easy to do without Boost, especially with = delete; and the latter is a part of the standard library since C++17. I hope to be doing the majority of my projects without it in the future. Boost was a catalyst for most of the C++ progress in recent years. It slowly becoming obsolete, either by being integrated into the standard or it’s idioms no longer being needed, is just a sign of its own success.

My honorable mentions are Qt and the stb single file libraries. What are your go-to tools?

Analyzing iOS crash dumps with Xcode

The best way to analyze a crash in an iOS app is if you can reproduce it directly in the iOS simulator in debug mode or on a local device connected to Xcode. Sometimes you have to analyze a crash that happened on a device that you do not have direct access to. Maybe the crash was discovered by a tester who is located in a remote place. In this case the tester must transfer the crash information to the developer and the developer has to import it in Xcode. The iOS and Xcode functionalities for this workflow are a bit hidden, so that the following step-by-step guide can help.

Finding the crash dumps

iOS stores crash dumps for every crash that occured. You can find them in the Settings app in the deeply nested menu hierarchy under Privacy -> Analytics -> Analytics Data.

There you can select the crash dump. If you tap on a crash dump you can see its contents in a JSON format. You can select this text and send it to the developer. Unfortunately there is no “Select all” option, you have to select it manually. It can be quite long because it contains the stack traces of all the threads of the app.

Importing the crash dump in Xcode

To import the crash dump in Xcode you must save it first in a file with the file name extension “.crash”. Then you open the Devices dialog in Xcode via the Window menu:

To import the crash dump you must have at least one device connected to your Mac, otherwise you will find that you can’t proceed to the next step. It can be any iOS device. Select the device to open the device information panel:

Here you find the “View Device Logs” button to open the following Device Logs dialog:

To import the crash dump into this dialog select the “All Logs” tab and drag & drop the “.crash” file into the panel on the left in the dialog.

Initially the stack traces in the crash dump only contain memory addresses as hexadecimal numbers. To resolve these addresses to human readable symbols of the code you have to “re-symbolicate” the log. This functionality is hidden in the context menu of the crash dump:

Now you’re good to go and you should finally be able to find the cause of the crash.

About API astonishments

Nowadays we developers tend to stand on the shoulders of giants: We put powerful building-blocks from different libraries together to build something worth man-years in hours. Or we fill-in the missing pieces in a framework infrastructure to create a complete application in just a few days.

While it is great to have such tools in the form of application programmer interfaces (API) at your disposal it is hard to build high quality APIs. There are many examples for widely used APIs, good and bad. What does “bad API” mean? It depends on your view point:

Bad API for the API user

For the application programmer a bad API means things like:

  • Simple tasks/use cases are complicated
  • Complex tasks are impossible or require patching
  • Easy to misuse producing bugs

A very simple real life example of such an API is a C++ camera API I had to use in a project. Our users were able to change the area of interest (AOI) of the picture to produce images consisting of only a part of full resolution images. Our application did crash or not work as expected without obvious reasons. It took many hours of debugging to spot the subtle API misuse that could be verified be reading the documentation:

The value of camera.Width.GetMax() changed instead of being constant! The reason is that AOI was meant and not the sensor resolution width. The full resolution width we actually wanted is obtained by calling camera.WidthMax.GetValue(). This kind of naming makes the properties almost undistinguishable and communicates nothing of the implications. Terms like AOI or sensor width or full resolution just do not appear in this part of the API.

Small things like the example above may really hurt productivity and user experience of an API.

Bad API for the API programmer

API programmers can easily produce APIs that are bad for themselves because they take away too much freedom away resulting in:

  • Frequent breaking changes
  • API rewrites
  • Unimplementable features
  • Confusing, not fitting interfaces

Design your interfaces small and focused. Use types in the interface that leave as much freedom as possible without hurting usability (see Iterable vs. Collection vs. List vs. ArrayList for example). Try to build composable and extendable types because adding types or methods is less of a problem than changing them.

Conclusion

Developers should put extra care in interfaces they want to publish for others to use. Once the API is out there breaking it means angry users. Be aware that good API design is hard and necessary for a painless evolution of an API. Consider reading books like “Practical API Design” or “Build APIs You Won’t Hate” if you want to target a wider audience.