Strangely enough I am a co-author on a paper to be submitted to PLoS Computational Biology. Aren’t you an astronomer, you ask? Why yes. But this journal has a nifty collection of articles called “10 Simple Rules” and many are relevant far outside computational biology. Several astronomers have published there before.

The paper is about how scientists should store digital data and I think it has some interesting ideas. At least as interesting (to me, anyway), is the the story of how this paper came to be. The discussion started out as a GitHub issue on Software Carpentry’s site repo, prompted by Greg Wilson’s query to the Software Carpentry discuss email list. The issue had quite a bit of discussion and 22 participants.

I don’t actually remember how Ted Hart decided that it would be a good idea to write a `10 simple rules’ paper, but the project got its own repo a little over a week after the discussion started. Within a few days, a number of us signed on to work on the project and Ted had proposed a timeline. A bunch of rules were proposed, we discussed them via issues, and Ted attempted to summarize. Each rule got its own issue and Ted proposed a writing workflow where contributors self-assigned themselves to particular rules (here’s mine). All of this was done within about 2 months of the discussion starting.

It took a while for everyone to write their stuff, and we did a number of rounds of comments and edits. (Warning: GitHub jargon ahead.) The actual writing was done in markdown, in forks of the repository, with changes to the master manuscript submitted as pull requests. (I forget why we decided to use fork-and-pull rather that direct commits to the repository: easier to guard against mistakes, I think.) Jeff Hollister, Sarah Mount, and Naupaka Zimmerman wrote some clever code to turn the markdown file into LaTeX and then a PDF automatically.

Things slowed down over the summer, but by mid-September things were mostly done and it was time for a final push. We got things together, congratulated ourselves, figured out the author order, and put the thing on the PeerJ Computer Science preprint server where it awaits comments before being submited to the journal. A few comments are already up and have been added to the GitHub issues.

This was a really interesting way to write a paper with a bunch of people that I have never met except via email and Twitter. At least a few of the authors have met in person, though. We were all tolerably proficient at using Git and GitHub, so the pull-request model of making changes worked pretty well. It’s neat to be able to trace the project history through the pull requests and issues, something that would be a lot trickier if one were trying to do it by following an email chain. I am looking forward to the next paper I write this way; it was fun to be involved in such a true collaborative effort.

[v2: Edited to fix link to preprint.]