17 Apr

Automated Bitbucket repository backups without a dedicated user account

Recently I’ve been using Bitbucket as part of a new team I’ve been collaborating with. It’s a relatively small team containing 5 members. Bitbucket hosts over 100 of the teams repositories and so they are backed up nightly by a chron job on a NAS server.

Bitbucket only allows up to 5 users per team before you have to start paying for it’s services. I replaced one of the existing team members so when I joined, the old team member’s account was disassociated with the team on Bitbucket and my account was associated with the team. This allowed the team to stay within it’s 5 user limit.

As it turned out the backup script was using the old team members credentials to make the backups and so the backups began to fail. It could easily be fixed by changing the hard coded credentials to another team members account. This approach however would just push the problem down the line and we would be hit again when other team members rolled on and off the collaboration.

Some of you may be thinking why not just add a team SSH key and have the script use that. It’s correct that we can use a team SSH key to perform a git clone on our repositories however we must know of all our repositories ahead of time. This would mean every time we created a repository we would have to add it to our backup script. If we want to use the Bitbucket API to automatically find all the team repositories then a team SSH key is not enough.

Bitbucket also offers a team API key, which is basically just an account that can be used to act as the team. The team name (username) and API key (password) would have been enough to get the backups working and to keep it working. There are a few problems I see with this.

  • If the API key is ever exposed, every application which uses it will need to be updated.
  • It grants far too many permissions to things that don’t need it. (A backup script should only have read-only access).
  • There is no accountability. If all the clients are using the same credentials then how do you know which one performed an action?

To get around these limitations I decided to use the OAuth option offered by Bitbucket. I wrote a script which can be installed by running:

npm install -g bitbucket-backup-oauth

Once installed you can run the backup script by including the following in your scripts or from the command line:

bitbucket-backup-oauth --owner principal [--backupFolder ./backup/] [--consumerKey client_id] [--consumerSecret client_secret]

The only mandatory parameter is the owner parameter. If the script cannot find locally stored client credntials (consumer key and secret) then you will be prompted for them. The consumer key and secret and associated setup is detailed below.

Bitbucket Setup

You will need to setup an OAuth consumer on Bitbucket first. Go to manage team and then on the left hand side menu there will be OAuth option under Access Management.

836ee22f-01

Under the OAuth Consumers section click Add consumer. Fill in a name, select read access for repositories and set the callback URL to http://localhost/cb (it can be anything you want as it won’t be used with the OAuth flow we use other than the initial authorisation) and then finally click save.

836ee22f-02

Go back to the OAuth Consumers section and you will now have a consumer key (client id) and consumer secret (client password).

836ee22f-03

You will need to authorize the OAuth consumer to have access to your repositories. To do this you will need to use your browser to go to the following URL:

https://bitbucket.org/site/oauth2/authorize?client_id={client_id}&response_type=code

Replace {client_id} with your consumer key setup in the previous step.

If you are not already logged in you will be asked to login. You will be presented with a screen asking for you to authorize the consumer:

836ee22f-04

Click grant access. You will be redirected to the call back URL localhost/cb. You will get a 404 but this does not matter. Authorisation has been granted and the consumer key and secret can be used with the back up script.

Benefits

Using the OAuth method addresses my concerns with the API key method above. In audit trails we will know it is the backup consumer by the logs. We can revoke access at any time if we know the consumer key or secret has been compromised. The credentials are only given the permissions it needs to do it’s job (read access to repositories).

Leave a Reply

Your email address will not be published. Required fields are marked *