I started my first open source project towards the end of summer 2014 I began working at the Social Decision Analytics Laboratory at the Virginia Informatics Institutee. The project involves creating a simulation environment where we can observe how an idea or belief spreads within a social network. The initial thought was to look for and extend an agent-based model package in Python. I did manage to find the pyabm package by Alex Zvoleff, but for simplicity’s sake I wrote my own package instead of extending his existing code base.
I took it as an opportunity to learn and ‘do things right’:
- Write a module that contains all program logic
- Write functions that only do only one thing, but does it well
- Unit test everything
- Test builds on different versions of Python
- Document your code
- Get it on PyPI
This series of posts are essentially notes to myself, and to other programmers who are starting out from where I was. In this post I discuss the background, rational, and inital steps in my open source struggles. In part II I discuss automatic project and code documentation.
I had already been involved with Software Carpentry (SWC) for about 8 months when I started the project, so many of the concepts I just listed were not entirely foreign, just a matter of implementation. Thanks to a workshop by Gabriel Perez-Giz at NYU earlier that summer, I took it upon myself to practice my EMACS and setup elpy as my IDE. A development environment that can be used within a terminal was especially important, because the simulations I would be running would all be on a remote server.
For git, it was getting into the habit of not committing directly to master. For Python, I’d write a suite of unit tests for the first time, and I figured if I write some comments and docstrings I can get a nice document from it (I was wrong about that). Finally, I wanted to get those cool little badges people have on their github repo about build status, coverage, etc. That’s when I found an awesome blog post by Jeff Knupp titled ‘ Open Sourcing a Python Project the Right Way’.
It’s an amazing read if you are ready to take your programming practices to the next step. Jeff references a cookie-cutter that will setup the python project boilerplate, but I opted to just follow the blog post and do everything manually so I can have a better understanding as to what is going on in the background. Plus, this lets me slowly add features, rather than have an entire repo loaded with unknown files. More important, I added a few other things to make my project ‘better’:
- Use the git- flow paradigm to add new features
- Continuous integration (with TravisCI)
- Test your package with other versions of Python (using Tox)
- Code documentation with Sphinx and Read the Docs
I opted not to use Tox locally (at least not yet). TravisCI is
handling my Python compatibility since I was working with Python 3.4,
and was not going to have Python 2 support. I added a build for
Python 3.4 and 3.3, and called it a day. Also I opted to use
nosetests instead of
pytest since that’s what I was shown when
I helped out at the
MIT SWC workshop.
That, plus there was SWC
big open source projects use
it, was my rational to stick with
Multi-Agent Neural Network (MANN)
project had a
main.py script that loaded in my modules for the
individual agents and the network structure. Everything was placed
mann folder in the repo, with no subfolders. When I
eventually realized that I wanted the project to be PyPI ready, I
wanted to separate the main program logic (the MANN code) from the
actual script that sets up the simulation. I eventually moved the
main.py script (and all required files) into the
Multidisciplinary Diffusion Model Experiments (MDME)
I’ve realized the more I program in python, the more invaluable virtual environments are when developing packages. A Virtual Environment is a tool to keep dependencies required by different projects in separate places while simultaneously keeping your base python distribution clean and working should something go awry. They also allow you to flip between Python 2 and Python 3 depending on what version a piece of code you are trying to run was written in. Pretty cool stuff.
The Python distribution I use is called Anaconda.
Setting up virtual environments using
conda create -n VIRTUAL_ENV_NAME python=3.4
You can specify different versions of python and/or pre-create environments with a set of modules if needed.
Switching between environments:
source activate VIRTUAL_ENV_NAME
To exit out of an environment:
You can read up more about creating environments on the conda documentation
Turning a current python project into a ‘module’ can break a few
things. It is as simple as putting a
__init__.py into a directory
to signify that the contents of the folder is now a Python module, but
there can be some weird side-effects.
When you turn your project into a package, you will find that if you run nosetests it will start running the unit tests for all the modules you load (if they have any).
For example I was initially using
nosetests --cover-branches --with-coverage
to test my code, but I had to add the
get it to only test the code in my module:
nosetests --cover-branches --with-coverage --cover-erase --cover-package=mann
Since I’ve moved out my simulation code from the code that defines
that MANN module, I had to install the MANN module. One way to do it
is to upload the code to PyPI and
pip install the package. The
problem is when you want to load a module to PyPI you essentially need
to have a git tag associated with the version you want. This is
problematic when you are rapidly prototyping since you’ll need to
either delete tags, or constantly increment the tag. You’ll end up
with a v.0.314.0 very quickly. You can do a local install of your
module by going to where you have the setup.py file and doing:
python setup.py install
This is why virtual environments are really helpful.
- You won’t have to worry about cluttering the base Python distribution and modules with your ‘test’ code.
- You can test your code before uploading it to PyPI (or anywhere
else). This is really helpful becuase PyPI requires you to have a
tag. Doing a local install allows you to workout any potential bugs before submitting a release so you won’t have to have 15 release numbers for your first release.