Packaging Python projects for Debian/Ubuntu

Deployment of software using built-in software management tools is very convenient and provides a nice user experience (UX) for the users. For debian-based linux distributions like Ubuntu packaging software in .deb-packages is the way to go. So how can we prepare our python projects for packaging as a deb-package? The good news is that python is supported out-of-the-box in the debian package build system.

Alternatively, you can use the distutils-extension stdeb if you do not need complete flexibility in creating the packages.

Basic python deb-package

If you are using setuptools/distutils for your python project debian packaging consists of editing the package metadata and adding --with python to the rules file. For a nice headstart we can generate templates of the debian metadata files using two simple commands (the debhelper package is needed for dh_make:

# create a tarball with the current project sources
python setup.py sdist
# generate the debian package metadata files 
dh_make -p ${project_name}_${version} -f dist/${project_name}-${version}.tar.gz 

You have to edit at least the control-file, the changelog and the rules-file to build the python package. In the rules-file the make-target % is the crucial point and should include the flag to build a python project:

# main packaging script based on dh7 syntax
%:
	dh $@ --with python

After that you can build the package issueing dpkg-buildpackage.

The caveats

The debian packaging system is great in complaining about non-conformant aspects of your package. It demands digital signatures, correct file and directory names including version strings etc. Unfortunately it is not very helpful when you make packaging  mistakes resulting in empty, incomplete or broken packages.

Issues with setup.py

The setup.py build script has to reside on the same level as the debian-directory containing the package metadata. The packaging tools will not tell you if they could not find the setup script. In addition it will always run setup.py using python 2, even if you specified --with python3 in the rules-file.

Packaging for specific python versions

If you want better control over the target python versions for the package you should use Pybuild. You can do this by a little change to the rules-file, e.g. a python3-only build using Pybuild:

# main packaging script based on dh7 syntax
%:
	dh $@ --with python3 --buildsystem=pybuild

For pybuild to work it is crucial to add the needed python interpreter(s) besides the mandatory build dependency dh-python to the Build-Depends of the control-file, for python3-only it could look like this:

Build-Depends: debhelper (>=9), dh-python, python3-all
...
Depends: ${python3:Depends}

Without the dh-python build dependency pybuild will silently do nothing. Getting the build dependencies wrong will create incomplete or broken packages. Take extra care of getting this right!

Conclusion

Debian packaging looks quite intimidating at first because there are so many ways to build a package. Many different tools can ease package creation but also add confusion. Packaging python software is done easily if you know the quirks. The python examples from the Guide for Debian Maintainers are certainly worth a look!

Advertisements

Self-contained projects in python

An important concept for us is the notion of self-containment. For a project in development this means you find everything you need to develop and run the software directly in the one repository you check out/clone. For practical reasons we most of the time omit the IDE and the basic runtime like Java JDK or the Python interpreter. If you have these installed you are good to go in seconds.

What does this mean in general?

Usually this means putting all your dependencies either in source or object form (dll, jar etc.) directly in a directory of your project repository. This mostly rules out dependency managers like maven. Another not as obvious point is to have hardware dependencies mocked out in some way so your software runs without potentially unavailable hardware attached. The same is true for software services somewhere on the net that may be unavailable, like a payment service for example.

How to do it for Python

For Python projects this means not simply installing you dependencies using the linux package manager, system-wide pip or other dependency management tools but using a virtual environment. Virtual environments are isolated Python environments using an available, but defined Python interpreter on the system. They can be created by the tool virtualenv or since Python 3.3 the included tool venv. You can install you dependencies into this environment e.g. using pip which itself is part of the virtualenv. Preparing a virtual env for your project can be done using a simple shell script like this:

python2.7 ~/my_project/vendor/virtualenv-15.1.0/virtualenv.py ~/my_project_env
source ~/my_project_env/bin/activate
pip install ~/my_project/vendor/setuptools_scm-1.15.0.tar.gz
pip install ~/my_project/vendor/six-1.10.0.tar.gz
...

Your dependencies including virtualenv (for Python installations < 3.3) are stored into the projects source code repository. We usually call the directory vendor or similar.

As a side note working with such a virtual env even remotely work like charm in the PyCharm IDE by selecting the Python interpreter of the virtual env. It correctly shows all installed dependencies and all the IDE support for code completion and imports works as expected:

python-interpreter-settings

What you get

With such a setup you gain some advantages missing in many other approaches:

  • No problems if the target machine has no internet access. This would be problematic to classical pip/maven/etc. approaches.
  • Mostly hassle free development and deployment. No more “downloading the internet” feeling or driver/hardware installation issues for the developer. A deployment is in the most simple cases as easy as a copy/rsync.
  • Only minimal requirements to the base installation of developer, build, deployment or other target machines.
  • Perfectly reproducable builds and tests in isolation. You continuous integration (CI) machine is just another target machine.

What it costs

There are costs of this approach of course but in our experience the benefits outweigh them by a great extent. Nevertheless I want to mention some downsides:

  • Less tool support for managing the dependencies, especially if your are used to maven and friends and happen to like them. Pip can work with local archives just fine but updating is a bit of manual work.
  • Storing (binary) dependencies in your repository increases the checkout size. Nowadays disk space and local network speeds make mostly irrelevant, especially in combination with git. Shallow-clones can further mitigate the problem.
  • You may need to put in some effort for implementing mocks for your hardware or third-party software services and a mechanism for switching between simulation and the real stuff.

Conclusion

We have been using self-containment to great success in varying environments. Usually, both developers and clients are impressed by the ease of development and/or installation using this approach regardless if the project is in Java, C++, Python or something else.

Integration Tests with CherryPy and requests

CherryPy is a great way to write simple http backends, but there is a part of it that I do not like very much. While there is a documented way of setting up integration tests, it did not work well for me for a couple of reasons. Mostly, I found it hard to integrate with the rest of the test suite, which was using unittest and not py.test. Failing tests would apparently “hang” when launched from the PyCharm test explorer. It turned out the tests were getting stuck in interactive mode for failing assertions, a setting which can be turned off by an environment variable. Also, the “requests” looked kind of cumbersome. So I figured out how to do the tests with the fantastic requests library instead, which also allowed me to keep using unittest and have them run beautifully from within my test explorer.

The key is to start the CherryPy server for the tests in the background and gracefully shut it down once a test is finished. This can be done quite beautifully with the contextmanager decorator:

from contextlib import contextmanager

@contextmanager
def run_server():
    cherrypy.engine.start()
    cherrypy.engine.wait(cherrypy.engine.states.STARTED)
    yield
    cherrypy.engine.exit()
    cherrypy.engine.block()

This allows us to conviniently wrap the code that does requests to the server. The first part initiates the CherryPy start-up and then waits until that has completed. The yield is where the requests happen later. After that, we initiate a shut-down and block until that has completed.

Similar to the “official way”, let’s suppose we want to test a simple “echo” Application that simply feeds a request back at the user:

class Echo(object):
    @cherrypy.expose
    def echo(self, message):
        return message

Now we can write a test with whatever framework we want to use:

class TestEcho(unittest.TestCase):
    def test_echo(self):
        cherrypy.tree.mount(Echo())
        with run_server():
            url = "http://127.0.0.1:8080/echo"
            params = {'message': 'secret'}
            r = requests.get(url, params=params)
            self.assertEqual(r.status_code, 200)
            self.assertEqual(r.content, "secret")

Now that feels a lot nicer than the official test API!

Evolvability of Code: Uniform Access Principle

Most programmers like freedom. So there are many means of hiding implementations in modern programming languages, e.g. interfaces in Java, header files in C/C++ and visibility modifiers like private and protected in most object-oriented languages. Even your ordinary functions or public class interface gives you the freedom to change the implementation without needing to touch the clients. Evolvability in this sense means you can change and refine your implementations without requiring others, namely clients of your code, to change.

Changing the class interface or function signatures within a project is often possible and feasible, at least if you have access to all client code and use powerful refactoring tools. If you published your code as a library or do not want to break all client code or forcing them to adapt to your changes you have to consider your interface code to be fixed. This takes away some of your precious freedom. So you have to design your interfaces carefully with evolability in mind.

Some programming languages implement the uniform access principle (UAP) that eases evolvability in that it allows you to migrate from public attributes to properties/method calls without changing the clients: Read and write access to the attribute uses the same syntax as invoking corresponding methods. For clarification an example in Python where you may start with a class like:

class Person(object):
  def __init__(self, name, age):
    self.name = name
    self.age = age

Using the above class is trivial as follows

>>> pete = Person("pete", 32)
>>> print pete.age
32
# a year has passed
>>> pete.age = 33
>>> print pete.age
33

Now if the age is not a plain value anymore but needs checking, like always being greater zero or is calculated based on some calendar you can turn it to a property like so:

class Person(object):
  def __init__(self, name, age):
    self.name = name
    self._age = age

  @property
  def age(self):
    return self._age

  @age.setter
  def age(self, new_age):
    if new_age < 0:
      raise ValueError("Age under 0 is not possible")
    self._age = new_age

Now the nice thing is: The above client code still works without changes!

Scala uses a similar and quite concise mechanism for implementing the UAP wheres .NET provides some special syntax for properties but still migration from public fields easily possible.

So in languages supporting the UAP you can start really simple with public attributes holding the plain value without worrying about some potential future. If you later need more sophisticated stuff like caching, computation of the value, validation or even remote retrieval you can add it using language features without touching or bothering clients.

Unfortunately some powerful and widespread languages like Java and C++ lack support for UAP. Changing a public field to a more complex property means the introduction of getter and setter methods and changing all clients. Therefore you see, especially in Java, many data classes littered with trivial getter and setter pairs doing nothing interesting and introducing unnecessary bloat to maintain the evolvability of the code.

Remote development with PyCharm

PyCharm is a fantastic tool for python development. One cool feature that I quite like is its support for remote development. We have quite a few projects that need to interact with special hardware, and that hardware is often not attached to the computer we’re developing on.
In order to test your programs, you still need to run it on that computer though, and doing this without tool support can be especially painful. You need to use a tool like scp or rsync to transmit your code to the target machine and then execute it using ssh. This all results in painfully long and error prone iterations.
Fortunately, PyCharm has tool support in its professional edition. After some setup, it allows you do develop just as you would on a local machine. Here’s a small guide on how to set it up with an ubuntu vagrant virtual machine, connecting over ssh. It work just as nicely on remote computers.

1. Create a new deployment configuration

In the Tools->Deployment->Configurations click the small + in the top left corner. Pick a name and choose the SFTP type.
add_server

In the “Connection” Tab of the newly created configuration, make sure to uncheck “Visible only for this project”. Then, setup your host and login information. The root path is usually a central location you have access to, like your home folder. You can use the “Autodetect” button to set this up.

connection
For my VM, the settings look like this.

On the “Mappings” Tab, set the deployment path for your project. This would be the specific folder of your project within the root you set on the previous page. Clicking the associated “…” button here helps, and even lets you create the target folder on the remote machine if it does not exist yet.

2. Activate the upload

Now check “Tools->Deployment->Automatic Upload”. This will do an upload when you change a file, so you still need to do the initial upload manually via “Tools->Deployment->Upload to “.

3. Create a project interpreter

Now the files are synced up, but the runtime environment is not on the remote machine. Go to the “Project Interpreter” page in File->Settings and click the little gear in the top-right corner. Select “Add Remote”.

remote_interpreter
It should have the Deployment configuration you just created already selected. Once you click ok, you’re good to go! You can run and debug your code just like on a local machine.

Have fun developing python applications remotely.

Making CherryPy Application WSGI compatible

When choosing a micro web framework evolving it to fit your needs is key. As CherryPy is one of our choices I want to show you how to evolve it in terms of web server. Of course you can use the embedded CherryPy web server in development and for small sites. It is fast enough for many use cases and supports important features like SSL so you may come a long way just using it. There are several reasons to put your CherryPy behind a tried and trusted native web server like Apache or nginx:

  • Consistent production environment using different application servers (e.g. for Java and Python) using a powerful and uniform frontend
  • Many options and possibilites using Apache modules
  • Well known and understood environment for administrators
  • Separation of web-facing http server concerns and your web application
  • Improved performance and security

Making CherryPy a WSGI-compatible

The good news is that CherryPy application objects are already a WSGI-compliant application. So creating a wsgi.py like the following will enable integration with mod_wsgi of Apache:

def application(environ, start_response):
    cherrypy.tree.mount(MyApp(), script_name=None, config=None)
    return cherrypy.tree(environ, start_response)

Integrating with Apache’s mod_wsgi

It is quite easy to integrate a Python WSGI application with apache using mod_wsgi. If the module is present you just need to add some directives telling Apache where to mount the wsgi application defined by your wsgi.py script:

WSGIScriptAlias /my_app /path/to/wsgi.py
# May be required to allow your web app using libraries installed on the system
<Directory /usr/lib/python2.7/site-packages/ >
    Order deny,allow
    Allow from all
    Require all granted
</Directory>

After you have such a setup working properly you can consult the mod_wsgi documentation on how to improve in regards to threading, script reloading etc.

Configuring the WSGI-app

Many web applications need some form of configuration. Your application should not make assumptions on its install location or some directory structure. Generally speaking, an application should never assume that it can use relative path names for accessing the filesystem. Also access to operating system environment variables is dangerous because the application may run in different contexts. But we can specify WSGI-environment variables in the web servers’ configuration. An easy and safe way is to provide the configuration directory and other values using WSGI-environment variables that we can specify in the mod_wsgi configuration:

WSGIScriptAlias /my_app /path/to/wsgi.py
SetEnv configuration_dir /etc/my_shiny_web_app
...

We can access the wsgi-environment in python like so:

def application(environ, start_response):
    configdir = environ['configuration_dir']
    cherrypy.config.update(os.path.join(configdir, 'global.conf'))

    cherrypy.tree.mount(MyApp(), config=os.path.join(configdir, 'my_app.conf'))
    return cherrypy.tree(environ, start_response)

Note: Because your web app can be mounted to other locations than “/” on the the web server your application should not hard-code absolute links and the like. They all will be dead if your app is mounted at a different location.

Python Pitfall: Alleged Decrement Operator

The best way to make oneself more familiar with the possibilities and pitfalls of a newly learned programming language is to start pet projects using that language. That’s just what I did to dive deeper into Python. While working on my Python pet project I made a tiny mistake which took me quite a while to figure out. The code was something like (highly simplified):

for i in range(someRange):
  # lots of code here
  doSomething(--someNumber)
  # even more code here

For me, with a strong background in Java and C, this looked perfectly right. Yet, it was not. Since it compiled properly, I immediately excluded syntax errors from my mental list of possible reasons and began to search for a semantic or logical error.

After a while, I remembered that there is no such thing as post-increment or post-decrement operator, so why should there be a pre-decrement? Well, there isn’t. But, if there is no pre-decrement operator, why does –someNumber compile? Basically, the answer is pretty simple: To Python –someNumber is the same as -(-(someNumber)).

A working version of the above example could be:

for i in range(someRange):
  # lots of code here
  someNumber -= 1
  doSomething(someNumber)
  # even more code here