https://gdstechnology.blog.gov.uk/2016/01/19/opening-gov-uks-puppet-repository/

Opening GOV.UK's Puppet repository

Point 8 of the Digital by Default Service Standard that we publish on GOV.UK says that source code for government services should be open and reusable, and our 10th design principle is "Make things open: it makes things better". We hadn't been following those bits of advice for one of the most important pieces of GOV.UK's code - until this week.

Puppet is one of a number of tools for configuration management which we use to configure servers. It can do things like set up databases and web server software and just generally get everything up and running so that applications can be deployed.

When GOV.UK was first set up we were unable to publish our Puppet repository because our code and secrets were tied together. This goes against patterns like the 12-factor app which "requires strict separation of config from code":

A litmus test for whether an app has all config correctly factored out of the code is whether the codebase could be made open source at any moment, without compromising any credentials.

This wasn't true for our Puppet repository, but we gradually moved our credentials into a separate repository (rotating them as we did so).

Pushing the commit history

Our commit history is very important to us - the team frequently makes use of the git pickaxe to find code, and we try to make sure the commit message always explains why the thing has changed.

We could have reset the commit history to an "Initial commit" before we pushed the code to GitHub and made it available, but that would mean losing all of the work that 134 contributors have put into 13,348 commits over the last 4 years.

In search of passwords

Given that we'd decided we were going to push the commit history, we had to make sure there were no credentials anywhere in the history of the repo which were still in use.

Our credentials are stored in a separate repository on GitHub Enterprise so it was easy to generate a text file containing just our current credentials, one per line. I ran each credential through git log -p -S to search the content of every diff, and through git log --grep to search the content of every commit message:

while read line; do echo $line; git --no-pager log -p -S $line; done < puppet_search

We know that there are no credentials which we're currently using in the history of the Puppet repository. The history of the repository does contain credentials that we've used in the past, but before publishing we changed all of those. A lot of the credentials which we've used in the past require more access in order to use them (for example to be using our office network or to have SSH access to our servers), so we think the risk of making the repository public is small.

Once we'd made sure there were no current credentials which would be compromised by publishing the repository history, we went through a few more manual steps to make sure there was nothing problematic in the latest version of the code.

bob walker created a GitHub issue with a checkbox for each of the directories in modules/ and we eyeballed all of our code. This was helpful in itself because the repository is huge and some of it hasn't been touched in quite a while. Looking through the code gave us a good refresher of what's around.

Matt Bostock used some fun commands to let him easily view all of the unique words in the codebase, which made it much easier to see lines which might have resembled credentials (you need to be using zsh for the glob `**` in the strings command to work):

strings modules/**/*.pp | tr ' ' '\n' | sort -n | uniq | view -

GitHub is the new home

We weren't able to audit the description and comments of all of 3,800 of the pull requests that we'd made, so we decided to create and close issues with the title of the pull request and a link through to our GitHub Enterprise install. This means that the merge commit messages that GitHub generates ("Merge pull request #1234 from gds/62831904-vagrant-puppet-bash") still link through to the right place.

From now on we're using the public GitHub repository to do all our work against. Over time we're moving small reusable pieces out of this codebase and publishing them elsewhere on GitHub and on our Puppet Forge account.

We can't promise to provide support for our main Puppet codebase (this is coding in the open rather than open source) but feel free to open issues and pull requests for us to look at.

You can follow Alex on Twitter, sign up now for email updates from this blog or subscribe to the feed.

If this sounds like a good place to work, take a look at Working for GDS - we're usually in search of talented people to come and join the team.

Leave a comment

We only ask for your email address so we know you're a real person