Do most language make false promises?

Some years ago I stumbled over this interesting article about C being the most effective of programming language and one making the least false promises. Essentially Damien Katz argues that the simplicity of C and its flaws lead to simple, fast and easy to reason about code.

C is the total package. It is the only language that’s highly productive, extremely fast, has great tooling everywhere, a large community, a highly professional culture, and is truly honest about its tradeoffs.

-Damien Katz about the C Programming language

I am Java developer most of the time but I also have reasonable experience in C, C++, C#, Groovy and Python and some other languages to a lesser extent. Damien’s article really made me think for quite some time about the languages I have been using. I think he is right in many aspects and has really good points about the tools and communities around the languages.

After quite some thought I do not completely agree with him.

My take on C

At a time I really liked the simplicity of C. I wrote gtk2hack in my spare time as an exercise and definitely see interoperability and a quick “build, run, debug”-cycle as big wins for C. On the other hand I think while it has a place in hardware and systems programming many other applications have completely different requirements.

  • A standardized ABI means nothing to me if I am writing a service with a REST/JSON interface or a standalone GUI application.
  • Portability means nothing to me if the target system(s) are well defined and/or covered by the runtime of choice.
  • Startup times mean nothing to me if the system is only started once every few months and development is still fast because of hot-code replacement or other means.
  • etc.

But I am really missing more powerful abstractions and better error handling or ressource management features. Data structures and memory management are a lot more painful than in other languages. And this is not (only) about garbage collection!

Especially C++ is making big steps in the right direction in the last few years. Each new standard release provides additional features making code more readable and less error prone. With zero cost abstractions at the core of language evolution and the secondary aim of ease of use I really like what will come to C++ in the future. And it has a very professional community, too.

Aims for the C++11 effort:

  • Make C++ a better language for systems programming and library building
  • Make C++ easier to teach and learn

-Bjarne Stroustup, A Tour of C++

What we can learn from C

Instead of looking down at C and pointing at its flaws we should look at its strengths and our own weaknesses/flaws. All languages and environments I have used to date have their own set of annoyances and gotchas.

Java people should try building simple things and having a keen eye on dependencies especially because the eco system is so rich and crowded. Also take care of ressource management – the garbage collector is only half the deal.

Scala and C++ people should take a look at ABI stability and interoperability in general. Their compile times and “build, run, debug”-cycle has much room for improvement to say the least.

C# may look at simplicity instead of wildly adding new features creating a language without opinion. A plethora of ways implementing the same stuff. Either you ban features or you have to know them all to understand code in a larger project.

Conclusion

My personal answer to the title of this blog: Yes, they make false promises. But they have a lot to offer, too.

So do not settle with the status quo of your language environment or code style of choice. Try to maintain an objective perspective and be aware of the weaknesses of the tools you are using. Most platforms improve over time and sometimes you have to re-evaluate your opinion regarding some technology.

I prefer C++ to C for some time now and did not look back yet. But I also constantly try different languages, platforms and frameworks and try to maintain a balanced view. There are often good reasons to choose one over the other for a particular project.

 

Packaging Python projects for Debian/Ubuntu

Deployment of software using built-in software management tools is very convenient and provides a nice user experience (UX) for the users. For debian-based linux distributions like Ubuntu packaging software in .deb-packages is the way to go. So how can we prepare our python projects for packaging as a deb-package? The good news is that python is supported out-of-the-box in the debian package build system.

Alternatively, you can use the distutils-extension stdeb if you do not need complete flexibility in creating the packages.

Basic python deb-package

If you are using setuptools/distutils for your python project debian packaging consists of editing the package metadata and adding --with python to the rules file. For a nice headstart we can generate templates of the debian metadata files using two simple commands (the debhelper package is needed for dh_make:

# create a tarball with the current project sources
python setup.py sdist
# generate the debian package metadata files 
dh_make -p ${project_name}_${version} -f dist/${project_name}-${version}.tar.gz 

You have to edit at least the control-file, the changelog and the rules-file to build the python package. In the rules-file the make-target % is the crucial point and should include the flag to build a python project:

# main packaging script based on dh7 syntax
%:
	dh $@ --with python

After that you can build the package issueing dpkg-buildpackage.

The caveats

The debian packaging system is great in complaining about non-conformant aspects of your package. It demands digital signatures, correct file and directory names including version strings etc. Unfortunately it is not very helpful when you make packaging  mistakes resulting in empty, incomplete or broken packages.

Issues with setup.py

The setup.py build script has to reside on the same level as the debian-directory containing the package metadata. The packaging tools will not tell you if they could not find the setup script. In addition it will always run setup.py using python 2, even if you specified --with python3 in the rules-file.

Packaging for specific python versions

If you want better control over the target python versions for the package you should use Pybuild. You can do this by a little change to the rules-file, e.g. a python3-only build using Pybuild:

# main packaging script based on dh7 syntax
%:
	dh $@ --with python3 --buildsystem=pybuild

For pybuild to work it is crucial to add the needed python interpreter(s) besides the mandatory build dependency dh-python to the Build-Depends of the control-file, for python3-only it could look like this:

Build-Depends: debhelper (>=9), dh-python, python3-all
...
Depends: ${python3:Depends}

Without the dh-python build dependency pybuild will silently do nothing. Getting the build dependencies wrong will create incomplete or broken packages. Take extra care of getting this right!

Conclusion

Debian packaging looks quite intimidating at first because there are so many ways to build a package. Many different tools can ease package creation but also add confusion. Packaging python software is done easily if you know the quirks. The python examples from the Guide for Debian Maintainers are certainly worth a look!

Self-contained projects in python

An important concept for us is the notion of self-containment. For a project in development this means you find everything you need to develop and run the software directly in the one repository you check out/clone. For practical reasons we most of the time omit the IDE and the basic runtime like Java JDK or the Python interpreter. If you have these installed you are good to go in seconds.

What does this mean in general?

Usually this means putting all your dependencies either in source or object form (dll, jar etc.) directly in a directory of your project repository. This mostly rules out dependency managers like maven. Another not as obvious point is to have hardware dependencies mocked out in some way so your software runs without potentially unavailable hardware attached. The same is true for software services somewhere on the net that may be unavailable, like a payment service for example.

How to do it for Python

For Python projects this means not simply installing you dependencies using the linux package manager, system-wide pip or other dependency management tools but using a virtual environment. Virtual environments are isolated Python environments using an available, but defined Python interpreter on the system. They can be created by the tool virtualenv or since Python 3.3 the included tool venv. You can install you dependencies into this environment e.g. using pip which itself is part of the virtualenv. Preparing a virtual env for your project can be done using a simple shell script like this:

python2.7 ~/my_project/vendor/virtualenv-15.1.0/virtualenv.py ~/my_project_env
source ~/my_project_env/bin/activate
pip install ~/my_project/vendor/setuptools_scm-1.15.0.tar.gz
pip install ~/my_project/vendor/six-1.10.0.tar.gz
...

Your dependencies including virtualenv (for Python installations < 3.3) are stored into the projects source code repository. We usually call the directory vendor or similar.

As a side note working with such a virtual env even remotely work like charm in the PyCharm IDE by selecting the Python interpreter of the virtual env. It correctly shows all installed dependencies and all the IDE support for code completion and imports works as expected:

python-interpreter-settings

What you get

With such a setup you gain some advantages missing in many other approaches:

  • No problems if the target machine has no internet access. This would be problematic to classical pip/maven/etc. approaches.
  • Mostly hassle free development and deployment. No more “downloading the internet” feeling or driver/hardware installation issues for the developer. A deployment is in the most simple cases as easy as a copy/rsync.
  • Only minimal requirements to the base installation of developer, build, deployment or other target machines.
  • Perfectly reproducable builds and tests in isolation. You continuous integration (CI) machine is just another target machine.

What it costs

There are costs of this approach of course but in our experience the benefits outweigh them by a great extent. Nevertheless I want to mention some downsides:

  • Less tool support for managing the dependencies, especially if your are used to maven and friends and happen to like them. Pip can work with local archives just fine but updating is a bit of manual work.
  • Storing (binary) dependencies in your repository increases the checkout size. Nowadays disk space and local network speeds make mostly irrelevant, especially in combination with git. Shallow-clones can further mitigate the problem.
  • You may need to put in some effort for implementing mocks for your hardware or third-party software services and a mechanism for switching between simulation and the real stuff.

Conclusion

We have been using self-containment to great success in varying environments. Usually, both developers and clients are impressed by the ease of development and/or installation using this approach regardless if the project is in Java, C++, Python or something else.

Integration Tests with CherryPy and requests

CherryPy is a great way to write simple http backends, but there is a part of it that I do not like very much. While there is a documented way of setting up integration tests, it did not work well for me for a couple of reasons. Mostly, I found it hard to integrate with the rest of the test suite, which was using unittest and not py.test. Failing tests would apparently “hang” when launched from the PyCharm test explorer. It turned out the tests were getting stuck in interactive mode for failing assertions, a setting which can be turned off by an environment variable. Also, the “requests” looked kind of cumbersome. So I figured out how to do the tests with the fantastic requests library instead, which also allowed me to keep using unittest and have them run beautifully from within my test explorer.

The key is to start the CherryPy server for the tests in the background and gracefully shut it down once a test is finished. This can be done quite beautifully with the contextmanager decorator:

from contextlib import contextmanager

@contextmanager
def run_server():
    cherrypy.engine.start()
    cherrypy.engine.wait(cherrypy.engine.states.STARTED)
    yield
    cherrypy.engine.exit()
    cherrypy.engine.block()

This allows us to conviniently wrap the code that does requests to the server. The first part initiates the CherryPy start-up and then waits until that has completed. The yield is where the requests happen later. After that, we initiate a shut-down and block until that has completed.

Similar to the “official way”, let’s suppose we want to test a simple “echo” Application that simply feeds a request back at the user:

class Echo(object):
    @cherrypy.expose
    def echo(self, message):
        return message

Now we can write a test with whatever framework we want to use:

class TestEcho(unittest.TestCase):
    def test_echo(self):
        cherrypy.tree.mount(Echo())
        with run_server():
            url = "http://127.0.0.1:8080/echo"
            params = {'message': 'secret'}
            r = requests.get(url, params=params)
            self.assertEqual(r.status_code, 200)
            self.assertEqual(r.content, "secret")

Now that feels a lot nicer than the official test API!

Remote development with PyCharm

PyCharm is a fantastic tool for python development. One cool feature that I quite like is its support for remote development. We have quite a few projects that need to interact with special hardware, and that hardware is often not attached to the computer we’re developing on.
In order to test your programs, you still need to run it on that computer though, and doing this without tool support can be especially painful. You need to use a tool like scp or rsync to transmit your code to the target machine and then execute it using ssh. This all results in painfully long and error prone iterations.
Fortunately, PyCharm has tool support in its professional edition. After some setup, it allows you do develop just as you would on a local machine. Here’s a small guide on how to set it up with an ubuntu vagrant virtual machine, connecting over ssh. It work just as nicely on remote computers.

1. Create a new deployment configuration

In the Tools->Deployment->Configurations click the small + in the top left corner. Pick a name and choose the SFTP type.
add_server

In the “Connection” Tab of the newly created configuration, make sure to uncheck “Visible only for this project”. Then, setup your host and login information. The root path is usually a central location you have access to, like your home folder. You can use the “Autodetect” button to set this up.

connection
For my VM, the settings look like this.

On the “Mappings” Tab, set the deployment path for your project. This would be the specific folder of your project within the root you set on the previous page. Clicking the associated “…” button here helps, and even lets you create the target folder on the remote machine if it does not exist yet.

2. Activate the upload

Now check “Tools->Deployment->Automatic Upload”. This will do an upload when you change a file, so you still need to do the initial upload manually via “Tools->Deployment->Upload to “.

3. Create a project interpreter

Now the files are synced up, but the runtime environment is not on the remote machine. Go to the “Project Interpreter” page in File->Settings and click the little gear in the top-right corner. Select “Add Remote”.

remote_interpreter
It should have the Deployment configuration you just created already selected. Once you click ok, you’re good to go! You can run and debug your code just like on a local machine.

Have fun developing python applications remotely.

Getting Shibboleth SSO attributes securely to your application

Accounts and user data are a matter of trust. Single sign-on (SSO) can improve the user experience (UX), convenience and security especially if you are offering several web applications often used by the same user. If you do not want to force your users to big vendors offering SSO like google or facebook or do not trust them you can implement SSO for your offerings with open-source software (OSS) like shibboleth. With shibboleth it may be even feasible to join an existing federation like SWITCH, DFN or InCommon thus enabling logins for thousands of users without creating new accounts and login data.

If you are implementing you SSO with shibboleth you usually have to enable your web applications to deal with shibboleth attributes. Shibboleth attributes are information about the authenticated user provided by the SSO infrastructure, e.g. the apache web server and mod_shib in conjunction with associated identity providers (IDP). In general there are two options for access of these attributes:

  1. HTTP request headers
  2. Request environment variables (not to confuse with system environment variables!)

Using request headers should be avoided as it is less secure and prone to spoofing. Access to the request environment depends on the framework your web application is using.

Shibboleth attributes in Java Servlet-based apps

In Java Servlet-based applications like Grails or Java EE access to the shibboleth attributes is really easy as they are provided as request attributes. So simply calling request.getAttribute("AJP_eppn") will provide you the value of the eppn (“EduPrincipalPersonName”) attribute set by shibboleth if a user is authenticated and the attribute is made available. There are 2 caveats though:

  1. Request attributes are prefixed by default with AJP_ if you are using mod_proxy_ajp to connect apache with your servlet container.
  2. Shibboleth attributes are not contained in request.getAttributeNames()! You have to directly access them knowing their name.

Shibboleth attributes in WSGI-based apps

If you are using a WSGI-compatible python web framework for your application you can get the shibboleth attributes from the wsgi.environ dictionary that is part of the request. In CherryPy for example you can use the following code to obtain the eppn:

eppn = cherrypy.request.wsgi_environ['eppn']

I did not find the name of the WSGI environment dictionary clearly documented in my efforts to make shibboleth work with my CherryPy application but after that everything was a bliss.

Conclusion

Accessing shibboleth attributes in a safe manner is straightforward in web environments like Java servlets and Python WSGI applications. Nevertheless, you have to know the above aspects regarding naming and visibility or you will be puzzled by the behaviour of the shibboleth service provider.

Making CherryPy Application WSGI compatible

When choosing a micro web framework evolving it to fit your needs is key. As CherryPy is one of our choices I want to show you how to evolve it in terms of web server. Of course you can use the embedded CherryPy web server in development and for small sites. It is fast enough for many use cases and supports important features like SSL so you may come a long way just using it. There are several reasons to put your CherryPy behind a tried and trusted native web server like Apache or nginx:

  • Consistent production environment using different application servers (e.g. for Java and Python) using a powerful and uniform frontend
  • Many options and possibilites using Apache modules
  • Well known and understood environment for administrators
  • Separation of web-facing http server concerns and your web application
  • Improved performance and security

Making CherryPy a WSGI-compatible

The good news is that CherryPy application objects are already a WSGI-compliant application. So creating a wsgi.py like the following will enable integration with mod_wsgi of Apache:

def application(environ, start_response):
    cherrypy.tree.mount(MyApp(), script_name=None, config=None)
    return cherrypy.tree(environ, start_response)

Integrating with Apache’s mod_wsgi

It is quite easy to integrate a Python WSGI application with apache using mod_wsgi. If the module is present you just need to add some directives telling Apache where to mount the wsgi application defined by your wsgi.py script:

WSGIScriptAlias /my_app /path/to/wsgi.py
# May be required to allow your web app using libraries installed on the system
<Directory /usr/lib/python2.7/site-packages/ >
    Order deny,allow
    Allow from all
    Require all granted
</Directory>

After you have such a setup working properly you can consult the mod_wsgi documentation on how to improve in regards to threading, script reloading etc.

Configuring the WSGI-app

Many web applications need some form of configuration. Your application should not make assumptions on its install location or some directory structure. Generally speaking, an application should never assume that it can use relative path names for accessing the filesystem. Also access to operating system environment variables is dangerous because the application may run in different contexts. But we can specify WSGI-environment variables in the web servers’ configuration. An easy and safe way is to provide the configuration directory and other values using WSGI-environment variables that we can specify in the mod_wsgi configuration:

WSGIScriptAlias /my_app /path/to/wsgi.py
SetEnv configuration_dir /etc/my_shiny_web_app
...

We can access the wsgi-environment in python like so:

def application(environ, start_response):
    configdir = environ['configuration_dir']
    cherrypy.config.update(os.path.join(configdir, 'global.conf'))

    cherrypy.tree.mount(MyApp(), config=os.path.join(configdir, 'my_app.conf'))
    return cherrypy.tree(environ, start_response)

Note: Because your web app can be mounted to other locations than “/” on the the web server your application should not hard-code absolute links and the like. They all will be dead if your app is mounted at a different location.