Evolution of programming languages

Programming languages evolve over time. They get new language features and their standard library is extended. Sounds great, doesn’t it? We all know not going forward means your go backward.

But I observe very different approaches looking at several programming ecosystems we are using.

Featuritis

Java and especially C# added more and more “me too” features release after release making relatively lean languages quite complex multi-paradigm languages. They started object oriented and added generics, functional programming features and declarative programming (LINQ in C#) and different UI toolkits (AWT, Swing, JavaFx in Java; Winforms, WPF in C#) to the mix.

Often the new language features add their own set of quirks because they are an afterthought and not carefully enough designed.

For me, this lack of focus makes said language less attractive than more current approaches like Kotlin or Go.

In addition, deprecation often has no effect (see Java) where 20 year old code and style still works which increases the burden further . While it is great from a business perspektive in that your effort to maintain compatibility is low it does not help your code base. Different styles and old ways of doing something tend to remain forever.

Revolution

In Grails (I know, it is not a programming language, but I has its own ecosystem) we see more of a revolution. The core concept as a full stack framework stays the same but significant components are changed quite rapidly. We have seen many changes in technology like jetty to tomcat, ivy to maven, selenium-rc to geb, gant to gradle and the list goes on.

This causes many, sometimes subtle, changes in behaviour that are a real pain when maintaining larger applications over many years.

Framework updates are often a time-consuming hassle but if you can afford it your code base benefits and will eventually become cleaner.

Clean(er) evolution

I really like the evolution in C++. It was relatively slow – many will argue too slow – in the past but it has picked up pace in the last few years. The goal is clearly stated and only features that support it make it in:

  • Make C++ a better language for systems programming and library building
  • Make C++ easier to teach and learn
  • Zero-Cost abstractions
  • better Tool-support

If you cannot make it zero-cost your chances are slim to get your feature in…

C at its core did not change much at all and remained focused on its merits. The upgrades mostly contained convenience features like line comments, additional data type definitions and multithreading.

Honest evolution – breaking backwards compatibility

In Python we have something I would call “honest evolution”. Python 3 introduced some breaking changes with the goal of a cleaner and more consistent language. Python 2 and 3 are incompatible so the distinction in the version number is fair. I like this approach of moving forward as it clearly communicates what is happening and gets rid of the sins in the past.

The downside is that many systems still come with both, a Python 2 and a Python 3 interpreter and accompanying libraries. Fortunately there are some options and tools for your code to mitigate some of the incompatibilities, like the __future__ module and python-six.

At some point in the future (expected in 2020) there will only support for Python 3. So start making the switch.

Advertisements

Do most language make false promises?

Some years ago I stumbled over this interesting article about C being the most effective of programming language and one making the least false promises. Essentially Damien Katz argues that the simplicity of C and its flaws lead to simple, fast and easy to reason about code.

C is the total package. It is the only language that’s highly productive, extremely fast, has great tooling everywhere, a large community, a highly professional culture, and is truly honest about its tradeoffs.

-Damien Katz about the C Programming language

I am Java developer most of the time but I also have reasonable experience in C, C++, C#, Groovy and Python and some other languages to a lesser extent. Damien’s article really made me think for quite some time about the languages I have been using. I think he is right in many aspects and has really good points about the tools and communities around the languages.

After quite some thought I do not completely agree with him.

My take on C

At a time I really liked the simplicity of C. I wrote gtk2hack in my spare time as an exercise and definitely see interoperability and a quick “build, run, debug”-cycle as big wins for C. On the other hand I think while it has a place in hardware and systems programming many other applications have completely different requirements.

  • A standardized ABI means nothing to me if I am writing a service with a REST/JSON interface or a standalone GUI application.
  • Portability means nothing to me if the target system(s) are well defined and/or covered by the runtime of choice.
  • Startup times mean nothing to me if the system is only started once every few months and development is still fast because of hot-code replacement or other means.
  • etc.

But I am really missing more powerful abstractions and better error handling or ressource management features. Data structures and memory management are a lot more painful than in other languages. And this is not (only) about garbage collection!

Especially C++ is making big steps in the right direction in the last few years. Each new standard release provides additional features making code more readable and less error prone. With zero cost abstractions at the core of language evolution and the secondary aim of ease of use I really like what will come to C++ in the future. And it has a very professional community, too.

Aims for the C++11 effort:

  • Make C++ a better language for systems programming and library building
  • Make C++ easier to teach and learn

-Bjarne Stroustup, A Tour of C++

What we can learn from C

Instead of looking down at C and pointing at its flaws we should look at its strengths and our own weaknesses/flaws. All languages and environments I have used to date have their own set of annoyances and gotchas.

Java people should try building simple things and having a keen eye on dependencies especially because the eco system is so rich and crowded. Also take care of ressource management – the garbage collector is only half the deal.

Scala and C++ people should take a look at ABI stability and interoperability in general. Their compile times and “build, run, debug”-cycle has much room for improvement to say the least.

C# may look at simplicity instead of wildly adding new features creating a language without opinion. A plethora of ways implementing the same stuff. Either you ban features or you have to know them all to understand code in a larger project.

Conclusion

My personal answer to the title of this blog: Yes, they make false promises. But they have a lot to offer, too.

So do not settle with the status quo of your language environment or code style of choice. Try to maintain an objective perspective and be aware of the weaknesses of the tools you are using. Most platforms improve over time and sometimes you have to re-evaluate your opinion regarding some technology.

I prefer C++ to C for some time now and did not look back yet. But I also constantly try different languages, platforms and frameworks and try to maintain a balanced view. There are often good reasons to choose one over the other for a particular project.

 

Packaging Python projects for Debian/Ubuntu

Deployment of software using built-in software management tools is very convenient and provides a nice user experience (UX) for the users. For debian-based linux distributions like Ubuntu packaging software in .deb-packages is the way to go. So how can we prepare our python projects for packaging as a deb-package? The good news is that python is supported out-of-the-box in the debian package build system.

Alternatively, you can use the distutils-extension stdeb if you do not need complete flexibility in creating the packages.

Basic python deb-package

If you are using setuptools/distutils for your python project debian packaging consists of editing the package metadata and adding --with python to the rules file. For a nice headstart we can generate templates of the debian metadata files using two simple commands (the debhelper package is needed for dh_make:

# create a tarball with the current project sources
python setup.py sdist
# generate the debian package metadata files 
dh_make -p ${project_name}_${version} -f dist/${project_name}-${version}.tar.gz 

You have to edit at least the control-file, the changelog and the rules-file to build the python package. In the rules-file the make-target % is the crucial point and should include the flag to build a python project:

# main packaging script based on dh7 syntax
%:
	dh $@ --with python

After that you can build the package issueing dpkg-buildpackage.

The caveats

The debian packaging system is great in complaining about non-conformant aspects of your package. It demands digital signatures, correct file and directory names including version strings etc. Unfortunately it is not very helpful when you make packaging  mistakes resulting in empty, incomplete or broken packages.

Issues with setup.py

The setup.py build script has to reside on the same level as the debian-directory containing the package metadata. The packaging tools will not tell you if they could not find the setup script. In addition it will always run setup.py using python 2, even if you specified --with python3 in the rules-file.

Packaging for specific python versions

If you want better control over the target python versions for the package you should use Pybuild. You can do this by a little change to the rules-file, e.g. a python3-only build using Pybuild:

# main packaging script based on dh7 syntax
%:
	dh $@ --with python3 --buildsystem=pybuild

For pybuild to work it is crucial to add the needed python interpreter(s) besides the mandatory build dependency dh-python to the Build-Depends of the control-file, for python3-only it could look like this:

Build-Depends: debhelper (>=9), dh-python, python3-all
...
Depends: ${python3:Depends}

Without the dh-python build dependency pybuild will silently do nothing. Getting the build dependencies wrong will create incomplete or broken packages. Take extra care of getting this right!

Conclusion

Debian packaging looks quite intimidating at first because there are so many ways to build a package. Many different tools can ease package creation but also add confusion. Packaging python software is done easily if you know the quirks. The python examples from the Guide for Debian Maintainers are certainly worth a look!

Self-contained projects in python

An important concept for us is the notion of self-containment. For a project in development this means you find everything you need to develop and run the software directly in the one repository you check out/clone. For practical reasons we most of the time omit the IDE and the basic runtime like Java JDK or the Python interpreter. If you have these installed you are good to go in seconds.

What does this mean in general?

Usually this means putting all your dependencies either in source or object form (dll, jar etc.) directly in a directory of your project repository. This mostly rules out dependency managers like maven. Another not as obvious point is to have hardware dependencies mocked out in some way so your software runs without potentially unavailable hardware attached. The same is true for software services somewhere on the net that may be unavailable, like a payment service for example.

How to do it for Python

For Python projects this means not simply installing you dependencies using the linux package manager, system-wide pip or other dependency management tools but using a virtual environment. Virtual environments are isolated Python environments using an available, but defined Python interpreter on the system. They can be created by the tool virtualenv or since Python 3.3 the included tool venv. You can install you dependencies into this environment e.g. using pip which itself is part of the virtualenv. Preparing a virtual env for your project can be done using a simple shell script like this:

python2.7 ~/my_project/vendor/virtualenv-15.1.0/virtualenv.py ~/my_project_env
source ~/my_project_env/bin/activate
pip install ~/my_project/vendor/setuptools_scm-1.15.0.tar.gz
pip install ~/my_project/vendor/six-1.10.0.tar.gz
...

Your dependencies including virtualenv (for Python installations < 3.3) are stored into the projects source code repository. We usually call the directory vendor or similar.

As a side note working with such a virtual env even remotely work like charm in the PyCharm IDE by selecting the Python interpreter of the virtual env. It correctly shows all installed dependencies and all the IDE support for code completion and imports works as expected:

python-interpreter-settings

What you get

With such a setup you gain some advantages missing in many other approaches:

  • No problems if the target machine has no internet access. This would be problematic to classical pip/maven/etc. approaches.
  • Mostly hassle free development and deployment. No more “downloading the internet” feeling or driver/hardware installation issues for the developer. A deployment is in the most simple cases as easy as a copy/rsync.
  • Only minimal requirements to the base installation of developer, build, deployment or other target machines.
  • Perfectly reproducable builds and tests in isolation. You continuous integration (CI) machine is just another target machine.

What it costs

There are costs of this approach of course but in our experience the benefits outweigh them by a great extent. Nevertheless I want to mention some downsides:

  • Less tool support for managing the dependencies, especially if your are used to maven and friends and happen to like them. Pip can work with local archives just fine but updating is a bit of manual work.
  • Storing (binary) dependencies in your repository increases the checkout size. Nowadays disk space and local network speeds make mostly irrelevant, especially in combination with git. Shallow-clones can further mitigate the problem.
  • You may need to put in some effort for implementing mocks for your hardware or third-party software services and a mechanism for switching between simulation and the real stuff.

Conclusion

We have been using self-containment to great success in varying environments. Usually, both developers and clients are impressed by the ease of development and/or installation using this approach regardless if the project is in Java, C++, Python or something else.

Integration Tests with CherryPy and requests

CherryPy is a great way to write simple http backends, but there is a part of it that I do not like very much. While there is a documented way of setting up integration tests, it did not work well for me for a couple of reasons. Mostly, I found it hard to integrate with the rest of the test suite, which was using unittest and not py.test. Failing tests would apparently “hang” when launched from the PyCharm test explorer. It turned out the tests were getting stuck in interactive mode for failing assertions, a setting which can be turned off by an environment variable. Also, the “requests” looked kind of cumbersome. So I figured out how to do the tests with the fantastic requests library instead, which also allowed me to keep using unittest and have them run beautifully from within my test explorer.

The key is to start the CherryPy server for the tests in the background and gracefully shut it down once a test is finished. This can be done quite beautifully with the contextmanager decorator:

from contextlib import contextmanager

@contextmanager
def run_server():
    cherrypy.engine.start()
    cherrypy.engine.wait(cherrypy.engine.states.STARTED)
    yield
    cherrypy.engine.exit()
    cherrypy.engine.block()

This allows us to conviniently wrap the code that does requests to the server. The first part initiates the CherryPy start-up and then waits until that has completed. The yield is where the requests happen later. After that, we initiate a shut-down and block until that has completed.

Similar to the “official way”, let’s suppose we want to test a simple “echo” Application that simply feeds a request back at the user:

class Echo(object):
    @cherrypy.expose
    def echo(self, message):
        return message

Now we can write a test with whatever framework we want to use:

class TestEcho(unittest.TestCase):
    def test_echo(self):
        cherrypy.tree.mount(Echo())
        with run_server():
            url = "http://127.0.0.1:8080/echo"
            params = {'message': 'secret'}
            r = requests.get(url, params=params)
            self.assertEqual(r.status_code, 200)
            self.assertEqual(r.content, "secret")

Now that feels a lot nicer than the official test API!

Remote development with PyCharm

PyCharm is a fantastic tool for python development. One cool feature that I quite like is its support for remote development. We have quite a few projects that need to interact with special hardware, and that hardware is often not attached to the computer we’re developing on.
In order to test your programs, you still need to run it on that computer though, and doing this without tool support can be especially painful. You need to use a tool like scp or rsync to transmit your code to the target machine and then execute it using ssh. This all results in painfully long and error prone iterations.
Fortunately, PyCharm has tool support in its professional edition. After some setup, it allows you do develop just as you would on a local machine. Here’s a small guide on how to set it up with an ubuntu vagrant virtual machine, connecting over ssh. It work just as nicely on remote computers.

1. Create a new deployment configuration

In the Tools->Deployment->Configurations click the small + in the top left corner. Pick a name and choose the SFTP type.
add_server

In the “Connection” Tab of the newly created configuration, make sure to uncheck “Visible only for this project”. Then, setup your host and login information. The root path is usually a central location you have access to, like your home folder. You can use the “Autodetect” button to set this up.

connection
For my VM, the settings look like this.

On the “Mappings” Tab, set the deployment path for your project. This would be the specific folder of your project within the root you set on the previous page. Clicking the associated “…” button here helps, and even lets you create the target folder on the remote machine if it does not exist yet.

2. Activate the upload

Now check “Tools->Deployment->Automatic Upload”. This will do an upload when you change a file, so you still need to do the initial upload manually via “Tools->Deployment->Upload to “.

3. Create a project interpreter

Now the files are synced up, but the runtime environment is not on the remote machine. Go to the “Project Interpreter” page in File->Settings and click the little gear in the top-right corner. Select “Add Remote”.

remote_interpreter
It should have the Deployment configuration you just created already selected. Once you click ok, you’re good to go! You can run and debug your code just like on a local machine.

Have fun developing python applications remotely.

Getting Shibboleth SSO attributes securely to your application

Accounts and user data are a matter of trust. Single sign-on (SSO) can improve the user experience (UX), convenience and security especially if you are offering several web applications often used by the same user. If you do not want to force your users to big vendors offering SSO like google or facebook or do not trust them you can implement SSO for your offerings with open-source software (OSS) like shibboleth. With shibboleth it may be even feasible to join an existing federation like SWITCH, DFN or InCommon thus enabling logins for thousands of users without creating new accounts and login data.

If you are implementing you SSO with shibboleth you usually have to enable your web applications to deal with shibboleth attributes. Shibboleth attributes are information about the authenticated user provided by the SSO infrastructure, e.g. the apache web server and mod_shib in conjunction with associated identity providers (IDP). In general there are two options for access of these attributes:

  1. HTTP request headers
  2. Request environment variables (not to confuse with system environment variables!)

Using request headers should be avoided as it is less secure and prone to spoofing. Access to the request environment depends on the framework your web application is using.

Shibboleth attributes in Java Servlet-based apps

In Java Servlet-based applications like Grails or Java EE access to the shibboleth attributes is really easy as they are provided as request attributes. So simply calling request.getAttribute("AJP_eppn") will provide you the value of the eppn (“EduPrincipalPersonName”) attribute set by shibboleth if a user is authenticated and the attribute is made available. There are 2 caveats though:

  1. Request attributes are prefixed by default with AJP_ if you are using mod_proxy_ajp to connect apache with your servlet container.
  2. Shibboleth attributes are not contained in request.getAttributeNames()! You have to directly access them knowing their name.

Shibboleth attributes in WSGI-based apps

If you are using a WSGI-compatible python web framework for your application you can get the shibboleth attributes from the wsgi.environ dictionary that is part of the request. In CherryPy for example you can use the following code to obtain the eppn:

eppn = cherrypy.request.wsgi_environ['eppn']

I did not find the name of the WSGI environment dictionary clearly documented in my efforts to make shibboleth work with my CherryPy application but after that everything was a bliss.

Conclusion

Accessing shibboleth attributes in a safe manner is straightforward in web environments like Java servlets and Python WSGI applications. Nevertheless, you have to know the above aspects regarding naming and visibility or you will be puzzled by the behaviour of the shibboleth service provider.