Self-contained projects in python

An important concept for us is the notion of self-containment. For a project in development this means you find everything you need to develop and run the software directly in the one repository you check out/clone. For practical reasons we most of the time omit the IDE and the basic runtime like Java JDK or the Python interpreter. If you have these installed you are good to go in seconds.

What does this mean in general?

Usually this means putting all your dependencies either in source or object form (dll, jar etc.) directly in a directory of your project repository. This mostly rules out dependency managers like maven. Another not as obvious point is to have hardware dependencies mocked out in some way so your software runs without potentially unavailable hardware attached. The same is true for software services somewhere on the net that may be unavailable, like a payment service for example.

How to do it for Python

For Python projects this means not simply installing you dependencies using the linux package manager, system-wide pip or other dependency management tools but using a virtual environment. Virtual environments are isolated Python environments using an available, but defined Python interpreter on the system. They can be created by the tool virtualenv or since Python 3.3 the included tool venv. You can install you dependencies into this environment e.g. using pip which itself is part of the virtualenv. Preparing a virtual env for your project can be done using a simple shell script like this:

python2.7 ~/my_project/vendor/virtualenv-15.1.0/virtualenv.py ~/my_project_env
source ~/my_project_env/bin/activate
pip install ~/my_project/vendor/setuptools_scm-1.15.0.tar.gz
pip install ~/my_project/vendor/six-1.10.0.tar.gz
...

Your dependencies including virtualenv (for Python installations < 3.3) are stored into the projects source code repository. We usually call the directory vendor or similar.

As a side note working with such a virtual env even remotely work like charm in the PyCharm IDE by selecting the Python interpreter of the virtual env. It correctly shows all installed dependencies and all the IDE support for code completion and imports works as expected:

python-interpreter-settings

What you get

With such a setup you gain some advantages missing in many other approaches:

  • No problems if the target machine has no internet access. This would be problematic to classical pip/maven/etc. approaches.
  • Mostly hassle free development and deployment. No more “downloading the internet” feeling or driver/hardware installation issues for the developer. A deployment is in the most simple cases as easy as a copy/rsync.
  • Only minimal requirements to the base installation of developer, build, deployment or other target machines.
  • Perfectly reproducable builds and tests in isolation. You continuous integration (CI) machine is just another target machine.

What it costs

There are costs of this approach of course but in our experience the benefits outweigh them by a great extent. Nevertheless I want to mention some downsides:

  • Less tool support for managing the dependencies, especially if your are used to maven and friends and happen to like them. Pip can work with local archives just fine but updating is a bit of manual work.
  • Storing (binary) dependencies in your repository increases the checkout size. Nowadays disk space and local network speeds make mostly irrelevant, especially in combination with git. Shallow-clones can further mitigate the problem.
  • You may need to put in some effort for implementing mocks for your hardware or third-party software services and a mechanism for switching between simulation and the real stuff.

Conclusion

We have been using self-containment to great success in varying environments. Usually, both developers and clients are impressed by the ease of development and/or installation using this approach regardless if the project is in Java, C++, Python or something else.

Advertisements