How to share computer code?

Answer to 1: Where should I host my code?

Depending on what your University offers you, you could choose to host it with the University, or perhaps with an open-source repository such as Github, Bitbucket, SourceForge, or similar.

Many of these services have a "paid" subscription option for private repositories if those are required.

Answer to 2: What open-source license should I choose?

This question is relevant because we're having this discussion right now within one of our own research projects. I happen to know a little about open source software, having researched it in the past and having taught a few courses on it.

Though there are a lot of open-source licenses out there, they really end up coming in two main families. They're either permissive open licenses (ex: MIT, BSD, Apache) or they are Free (GNU Public License v2 or GPLv3). Here's a brief lowdown by the Open Source Initiative

Permissive open licenses These licenses generally allow you to release your code and anyone can do anything with them that they want as long as they retain certain copyright information with the code. In reality, this has a number of implications.

  1. Someone could take your entire code base, create a product with it, and sell it.

  2. Someone could take parts of your code, put it in their own project (commercial or not).

  3. Because the license is more permissive, you yourself could take the code, close it, and then keep under wraps any future releases so you can make money off of the code or hide it from the public.

  4. Because the license is more permissive, you might generate more interest as a result. People may take code from other projects and use it to improve yours. On the flip-side, they could also make improvements for your source code and never share them back with you.

On the flip-side, the GNU GPL is a Free Software License that disallows you from doing certain things. In that sense, it's more restrictive, but does so for a number of ideological reasons.

  1. If you release software under the GPL, you can't close-source it. Ever. It's going to remain in the open, and if someone asks you for the source code you are obligated by the terms of the license to provide it (if you host it on Github or another public repository, then you have already satisfied this requirement).

  2. A company could take the code and make products with it and sell it (it's their right to do so), but they would have to do so under the condition that any source code that they write for the project is also released under the GPL. Because of this, a lot of companies who make a lot of money writing software don't like this because they have to continually release code to the public. On the flip-side, any cool stuff that they do gets put into the public under the GPL, so you could fold it back into your project and improve it. They can't take your code, improve it, and then never share it again.

  3. If you happen to have used any GPL code in your project (let's say you took a few lines out of the Linux kernel or Git version control or whatever) then you'll have to release your code as GPL as well.

In the end, the choice of license affects more about how you want the software to be used (and the eventual community it might bring in). If you plan to commercialize the software, (and implicitly allow others to do the same), then you might want to lean BSD. If you don't want people to take your hard work and profit off of it without showing you the results, then you want to go GPL. If you don't care either way, then you could probably just choose one. I think BSD is popular in academia precisely because of the commercialization aspect (for example LLVM is gaining a lot of traction because of its permissive license).

Answer for 3: How do I make it easy for others to run the code?

You make it easy to run code by engineering it to be easy to run and by being extremely detailed with your documentation.

Packaging/distribution can actually be pretty hard and usually take more effort than most people would think. A good way to make the software easy to run is to test it on multiple machines. Make sure that you're not forgetting any of the libraries that you're using in your software project, for example, and when possible, try to use software libraries that are common and well-maintained. Use mainstream languages with easy-to-manage package repositories.

When appropriate, use installers, installer scripts, Makefiles (distutils, which uses automake/autoconf is better), etc. Even shell scripts are better than nothing. If you can provide binaries and/or an installer, that will make things even easier. The problem is that this is a LOT of work!

Accompany it with documentation. Ideally, the documentation will contain a description of how to set it up and run it, with descriptions of necessary packages/libraries, data that you might have to get, and what to type or click on. Usually, something called README or INSTALL will attract attention. Put the instructions on the web page as well, most of the hosting solutions also allow you to have web pages.

Hope this all helps. The hardest part of the process is by far Step #3 and most people don't get as far as to use good techniques like installers, automake/autoconf, and so forth because it's a LOT of work and development often moves faster than you can write documents. However, no one is grading you on your style so it's often easier to get it out than it is to clean it up and prettify it first.


To some extent, the answer will depend on what you wish to accomplish with this release. There was a fantastic blog post recently on that precise topic.

If the code is of great shape, and you hope others will build on it, then choosing the licence is going to reflect your philosophy. A BSD style license if you just want the algorithm and code out there, or perhaps a Copyleft (GPL) style licence if you want to make sure improvements return to the commons.

If the code isn't in such great shape, but for transparency's sake needs to be out there, consider something along the lines of the CRAPL, which acknowledges the messy nature of modern computational sciences. I think the preamble is worth quoting:

I. Preamble

Science thrives on openness.

In modern science, it is often infeasible to replicate claims without
access to the software underlying those claims.

Let's all be honest: when scientists write code, aesthetics and
software engineering principles take a back seat to having running,
working code before a deadline.

So, let's release the ugly.  And, let's be proud of that.

As far as the actual mechanics of putting the code up, use GitHub or Bitbucket. These services are going to give you code hosting, a home for the project, the ability to manage contribution, and the ability to track bugs and issues.


Matthew G. and Irwin have given great answers, but I'd like to provide some additional resources and references for those interested.

First, take a look at answers to this similar question on scicomp.SE:

What material should I include with a journal article (or post online) in order to make my computational research reproducible?

Reproducibility was the subject of a 2012 workshop at ICERM; you'll find a lot of useful material on the wiki and in the final report (see especially appendices D, E, and F).

Archival/hosting

Update:You can get a DOI and permanent hosting for a snapshot of your code via Figshare or Zenodo.

Licensing

See this section of the wiki for an extensive list of resources.

Making it easy to run the code

There are some sites and tools out there aimed specifically at this. These also solve the hosting issue:

  • ActivePapers: An ActivePaper is a single file containing all the software and datasets related to a research project.
  • RunMyCode: This service is based on the innovative concept of a companion website associated with a scientific publication.

A major hurdle is often re-creating the correct environment (including libraries and such) necessary to run the code. To overcome this, you could

  • distribute a virtual machine or use Vagrant or CDE
  • ensure that your code runs on some cloud platform, like
    • Wakari
    • SageMathCloud
    • Amazon web services
    • Windows Azure

It can be useful to put your code in a worksheet format, where you can intersperse comments and even mathematical formulas (for instance, using the IPython notebook or a Sage worksheet. Here is an example.

Examples

Finally, here are some examples of my own efforts. They're far from perfect, but may still be helpful.