61 KiB
A Git Horror Story: Repository Integrity With Signed Commits
(Note: This article was written at the end of 2012 and is out of date. I will update it at some point, but until then, please keep that in perspective.)
It's 2:00 AM. The house is quiet, the kid is in bed and your significant other
has long since fallen asleep on the couch waiting for you, the light of the TV
flashing out of the corner of your eye. Your mind and body are exhausted.
Satisfied with your progress for the night, you commit the code you've been
hacking for hours: "[master 2e4fd96] Fixed security vulnerability CVE-123"
.
You push your changes to your host so that others can view and comment on your
progress before tomorrow's critical release, suspend your PC and struggle to
wake your significant other to get him/her in bed. You turn off the lights, trip
over a toy on your way to the bedroom and sigh as you realize you're going to
have to make a bottle for the child who just heard his/her favorite toy jingle.
Fast forward four sleep-deprived hours. You are woken to the sound of your phone vibrating incessantly. You smack it a few times, thinking it's your alarm clock, then fumble half-blind as you try to to dig it out from under the bed after you knock it off the nightstand. (Oops, you just woke the kid up again.) You pick up the phone and are greeted by a frantic colleague. "I merged in our changes. We need to tag and get this fix out there." Ah, damnit. You wake up your significant other, asking him/her to deal with the crying child (yeah, that went well) and stumble off to your PC, failing your first attempt to enter your password. You rub your eyes and pull the changes.
Still squinting, you glance at the flood of changes presented to you. Your
child is screaming in the background, not amused by your partner's feeble
attempts to console him/her. git log --pretty=short
...everything looks
good---just a bunch of commits from you and your colleague that were merged in.
You run the test suite---everything passes. Looks like you're ready to go. git tag -s 1.2.3 -m 'Various bugfixes, including critical CVE-123' && git push --tags
. After struggling to enter the password to your private key, slowly
standing up from your chair as you type, you run off to help with the baby
(damnit, where do they keep the source code for these things). Your CI system
will handle the rest.
Fast forward two months.
CVE-123 has long been fixed and successfully deployed. However, you receive an
angry call from your colleague. It seems that one of your most prominent users
has had a massive security breach. After researching the problem, your colleague
found that, according to the history, the breach exploited a back door that you
created! What? You would never do such a thing. To make matters worse, 1.2.3
was signed off by you, using your GPG key---you affirmed that this tag was
good and ready to go. "3-b-c-4-2-b, asshole", scorns your colleague. "Thanks
a lot."
No---that doesn't make sense. You quickly check the history. git log --patch 3bc42b
. "Added missing docblocks for X, Y and Z." You form a puzzled
expression, raising your hands from the keyboard slightly before tapping the
space bar a few times with few expectations. Sure enough, in with a few minor
docblock changes, there was one very inconspicuous line change that added the
back door to the authentication system. The commit message is fairly clear and
does not raise any red flags---why would you check it? Furthermore, the
author of the commit was indeed you!
Thoughts race through your mind. How could this have happened? That commit has your name, but you do not recall ever having made those changes. Furthermore, you would have never made that line change; it simply does not make sense. Did your colleague frame you by committing as you? Was your colleague's system compromised? Was your host compromised? It couldn't have been your local repository; that commit was clearly part of the merge and did not exist in your local repository until your pull on that morning two months ago.
Regardless of what happened, one thing is horrifically clear: right now, you are the one being blamed.
Who Do You Trust?
Theorize all you want---it's possible that you may never fully understand what resulted in the compromise of your repository. The above story is purely hypothetical, but entirely within the realm of possibility. How can you rest assured that your repository is safe for not only those who would reference or clone it, but also those who may download, for example, tarballs that are created from it?
Git is a distributed revision control system. In short, this means that anyone can have a copy of your repository to work on offline, in private. They may commit to their own repository and users may push/pull from each other. A central repository is unnecessary for distributed revision control systems, but may be used to provide an "official" hub that others can work on and clone from. Consequently, this also means that a repository floating around for project X may contain malicious code; just because someone else hands you a repository for your project doesn't mean that you should actually use it.
The question is not "Who can you trust?"; the question is "Who do you trust?", or rather---who are you trusting with your repository, right now, even if you do not realize it? For most projects, including the story above, there are a number of individuals or organizations that you may have inadvertently placed your trust in without fully considering the ramifications of such a decision:
- Git Host
- Git hosting providers are probably the most easily overlooked
trustees---providers like Gitorious, GitHub, Bitbucket, SourceForge, Google
Code, etc. Each provides hosting for your repository and "secures" it by
allowing only you, or other authorized users, to push to it, often with the
use of SSH keys tied to an account. By using a host as the primary holder of
your repository---the repository from which most clone and push to---you are
entrusting them with the entirety of your project; you are stating, "Yes, I
trust that my source code is safe with you and will not be tampered with".
This is a dangerous assumption. Do you trust that your host properly secures
your account information? Furthermore, bugs exist in all but the most
trivial pieces of software, so what is to say that there is not a
vulnerability just waiting to be exploited in your host's system, completely
compromising your repository?
It was not too long ago (March 4th, 2012) that a public key security vulnerability at GitHub was exploited by a Russian man named Egor Homakov, allowing him to successfully commit to the master branch of the Ruby on Rails framework repository hosted on GitHub. Oops.
- Friends and Coworkers/Colleagues
- There may be certain groups or individuals that you trust enough to (a) pull
or accept patches from or (b) allow them to push to you or a
central/"official" repository. Operating under the assumption that each
individual is truly trustworthy (and let us hope that is the case), that
does not immediately imply that their repository can be trusted. What are
their security policies? Do they leave their PC unlocked and unattended? Do
they make a habit of downloading virus-laden pornography on an unsecured,
non-free operating system? Or perhaps, through no fault of their own, they
are running a piece of software that is vulnerable to a 0-day exploit. Given
that, how can you be sure that their commits are actually their own?
Furthermore, how can you be sure that any commits they approve (or sign off
on using
git commit -s
) were actually approved by them?That is, of course, assuming that they have no ill intent. For example, what of the pissed off employee looking to get the arrogant, obnoxious co-worker fired by committing under the coworker's name/email? What if you were the manager or project lead? Whose word would you take? How would you even know whom to suspect?
- Your Own Repository
- Linus Torvalds (original author of Git and the kernel Linux) keeps a
secured repository on his personal computer, inaccessible by any
external means to ensure
that he has a repository he can fully trust. Most developers simply keep
a local copy on whatever PC they happen to be hacking on and pay no mind
to security---their repository is likely hosted elsewhere as well, after
all; Git is distributed. This is, however, a very serious matter.
You likely use your PC for more than just hacking. Most notably, you likely use your PC to browse the Internet and download software. Software is buggy. Buggy software has exploits and exploits tend to get, well, exploited. Not every developer has a strong understanding of the best security practices for their operating system (if you do, great!). And no---simply using GNU/Linux or any other *NIX variant does not make you immune from every potential threat.
To dive into each of these a bit more deeply, let us consider one of the world's largest free software projects---the kernel Linux---and how its original creator Linus Torvalds handles issues of trust. During a talk he presented at Google in 2007, he describes a network of trust he created between himself and a number of others (which he refers to as his "lieutenants"). Linus himself cannot possibly manage the mass amount of code that is sent to him, so he has others handle portions of the kernel. Those "lieutenants" handle most of the requests, then submit them to Linus, who handles merging into his own branch. In doing so, he has trusted that these lieutenants know what they are doing, are carefully looking over each patch and that the patches Linus receives from them are actually from them.
I am not aware of how patches are communicated from the lieutenants to Linus. Certainly, one way to state with a fairly high level of certainty that the patch is coming from one of his "lieutenants" is to e-mail the patches, signed with their respective GPG/PGP keys. At that point, the web of trust is enforced by the signature. Linus is then sure that his private repository (which he does his best to secure, as aforementioned) contains only data that he personally trusts. His repository is safe, so far as he knows, and he can use it confidently.
At this point, assuming Linus' web of trust is properly verified, how can he
confidently convey these trusted changes to others? He certainly knows his own
commits, but how should others know that this "Linus Torvalds" guy who has
been committing and signing off of on commits is actually Linus Torvalds? As
demonstrated in the hypothetical scenario at the beginning of this article,
anyone could claim to be Linus. If an attacker were to gain access to any clone
of the repository and commit as Linus, nobody would know the difference.
Fortunately, one can get around this by signing a tag with his/her private key
using GPG (git tag -s
). A tag points to a particular commit and that commit
depends on the entire history leading up to that commit.
This means that signing the SHA1 hash of that commit, assuming no security
vulnerabilities within SHA1, will forever state that the entire history of the
given commit, as pointed to by the given tag, is trusted.
Well, that is helpful, but that doesn't help to verify any commits made after the tag (until the next tag comes around that includes that commit as an ancestor of the new tag). Nor does it necessarily guarantee the integrity of all past commits---it only states that, to the best of Linus' knowledge, this tree is trusted. Notice how the hypothetical you in our hypothetical story also signed the tag with his/her private key. Unfortunately, he/she fell prey to something that is all too common---human error. He/she trusted that his/her "trusted" colleague could actually be fully trusted. Wouldn't it be nice if we could remove some of that human error from the equation?
Ensuring Trust
What if we had a way to ensure that a commit by someone named "Mike Gerwitz" with my e-mail address is actually a commit from myself, much like we can assert that a tag signed with my private key was actually tagged by myself? Well, who are we trying to prove this to? If you are only proving your identity to a project author/maintainer, then you can identify yourself in any reasonable manner. For example, if you work within the same internal network, perhaps you can trust that pushes from the internal IP are secure. If sending via e-mail, you can sign the patch using your GPG key. Unfortunately, these only extend this level of trust to the author/maintainer, not other users! If I were to clone your repository and look at the history, how do I know that a commit from "Foo Bar" is truly a commit from Foo Bar, especially if the repository frequently accepts patches and merge requests from many users?
Previously, only tags could be signed using GPG. Fortunately, Git v1.7.9 introduced the ability to GPG-sign individual commits---a feature I have been long awaiting. Consider what may have happened to the story at the beginning of this article if you signed each of your commits like so:
$ git commit -S -m 'Fixed security vulnerability CVE-123'
# ^ GPG-sign commit
Notice the -S
flag above, instructing Git to sign the commit using your
GPG key (please note the difference between -s
and -S
). If you followed this
practice for each of your commits---with no exceptions---then you (or anyone
else, for that matter) could say with relative certainty that the commit was
indeed authored by yourself. In the case of our story, you could then defend
yourself, stating that if the backdoor commit truly were yours, it would have
been signed. (Of course, one could argue that you simply did not sign that
commit in order to use that excuse. We'll get into addressing such an issue in a
bit.)
In order to set up your signing key, you first need to get your key id using
gpg --list-secret-keys
:
$ gpg --list-secret-keys | grep ^sec
sec 4096R/8EE30EAB 2011-06-16 [expires: 2014-04-18]
# ^^^^^^^^
You are interested in the hexadecimal value immediately following the forward
slash in the above output (your output may vary drastically; do not worry if
your key does not contain 4096R
as above). If you have multiple secret
keys, select the one you wish to use for signing your commits. This value will
be assigned to the Git configuration value user.signingkey
:
# remove --global to use this key only on the current repository
$ git config --global user.signingkey 8EE30EAB
# ^ replace with your key id
Given the above, let's give commit signing a shot. To do so, we will create a test repository and work through that for the remainder of this article.
$ mkdir tmp && cd tmp
$ git init .
$ echo foo > foo
$ git add foo
$ git commit -S -m 'Test commit of foo'
You need a passphrase to unlock the secret key for
user: "Mike Gerwitz (Free Software Developer) <mike@mikegerwitz.com>"
4096-bit RSA key, ID 8EE30EAB, created 2011-06-16
[master (root-commit) cf43808] Test commit of foo
1 file changed, 1 insertion(+)
create mode 100644 foo
The only thing that has been done differently between this commit and an
unsigned commit is the addition of the -S
flag, indicating that we want
to GPG-sign the commit. If everything has been set up properly, you should be
prompted for the password to your secret key (unless you have gpg-agent
running), after which the commit will continue as you would expect, resulting in
something similar to the above output (your GPG details and SHA-1 hash will
differ).
By default (at least in Git v1.7.9), git log
will not list or validate
signatures. In order to display the signature for our commit, we may use the
--show-signature
option, as shown below:
$ git log --show-signature
commit cf43808e85399467885c444d2a37e609b7d9e99d
gpg: Signature made Fri 20 Apr 2012 11:59:01 PM EDT using RSA key ID 8EE30EAB
gpg: Good signature from "Mike Gerwitz (Free Software Developer) <mike@mikegerwitz.com>"
Author: Mike Gerwitz <mike@mikegerwitz.com>
Date: Fri Apr 20 23:59:01 2012 -0400
Test commit of foo
There is an important distinction to be made here---the commit author and the
signature attached to the commit may represent two different people. In other
words: the commit signature is similar in concept to the -s
option, which adds
a Signed-off
line to the commit---it verifies that you have signed off on
the commit, but does not necessarily imply that you authored it. To demonstrate
this, consider that we have received a patch from "John Doe" that we wish to
apply. The policy for our repository is that every commit must be signed by a
trusted individual; all other commits will be rejected by the project
maintainers. To demonstrate without going through the hassle of applying an
actual patch, we will simply do the following:
$ echo patch from John Doe >> foo
$ git commit -S --author="John Doe <john@doe.name>" -am 'Added feature X'
You need a passphrase to unlock the secret key for
user: "Mike Gerwitz (Free Software Developer) <mike@mikegerwitz.com>"
4096-bit RSA key, ID 8EE30EAB, created 2011-06-16
[master 16ddd46] Added feature X
Author: John Doe <john@doe.name>
1 file changed, 1 insertion(+)
$ git log --show-signature
commit 16ddd46b0c191b0e130d0d7d34c7fc7af03f2d3e
gpg: Signature made Sat 21 Apr 2012 12:14:38 AM EDT using RSA key ID 8EE30EAB
gpg: Good signature from "Mike Gerwitz (Free Software Developer) <mike@mikegerwitz.com>"
Author: John Doe <john@doe.name>
Date: Sat Apr 21 00:14:38 2012 -0400
Added feature X
# [...]
This then raises the question---what is to be done about those who decide to
sign their commit with their own GPG key? There are a couple options here.
First, consider the issue from a maintainer's perspective---do we necessary
care about the identity of a 3rd party contributor, so long as the provided code
is acceptable? That depends. From a legal standpoint, we may, but not every user
has a GPG key. Given that, someone creating a key for the sole purpose of
signing a few commits without some means of identity verification, only to
discard the key later (or forget that it exists) does little to verify one's
identity. (Indeed, the whole concept behind PGP is to create a web of trust by
being able to verify that the person who signed using their key is actually who
they say they are, so such a scenario defeats the purpose.) Therefore, adopting
a strict signing policy for everyone who contributes a patch is likely to be
unsuccessful. Linux and Git satisfy this legal requirement with a
"Signed-off-by"
line in the commit, signifying that the author agrees to the
Developer's Certificate of
Origin;
this essentially states that the author has the legal rights to the code
contained within the commit. When accepting patches from 3rd parties who are
outside of your web of trust to begin with, this is the next best thing.
To adopt this policy for patches, require that authors do the following and request that they do not GPG-sign their commits:
$ git commit -asm 'Signed off'
# ^ -s flag adds Signed-off-by line
$ git log
commit ca05f0c2e79c5cd712050df6a343a5b707e764a9
Author: Mike Gerwitz <mike@mikegerwitz.com>
Date: Sat Apr 21 15:46:05 2012 -0400
Signed off
Signed-off-by: Mike Gerwitz <mike@mikegerwitz.com>
# [...]
Then, when you receive the patch, you can apply it with the -S
(capital, not
lowercase) to GPG-sign the commit; this will preserve the Signed-off-by line as
well. In the case of a pull request, you can sign the commit by amending it
(git commit -S --amend
). Note, however, that the SHA-1 hash of the commit will
change when you do so.
What if you want to preserve the signature of whomever sent the pull request? You cannot amend the commit, as that would alter the commit and invalidate their signature, so dual-signing it is not an option (if Git were to even support that option). Instead, you may consider signing the merge commit, which will be discussed in the following section.
Managing Large Merges
Up to this point, our discussion consisted of apply patches or merging single commits. What shall we do, then, if we receive a pull request for a certain feature or bugfix with, say, 300 commits (which I assure you is not unusual)? In such a case, we have a few options:
-
Request that the user squash all the commits into a single commit, thereby avoiding the problem entirely by applying the previously discussed methods. I personally dislike this option for a few reasons:
-
We can no longer follow the history of that feature/bugfix in order to learn how it was developed or see alternative solutions that were attempted but later replaced.
-
It renders
git bisect
useless. If we find a bug in the software that was introduced by a single patch consisting of 300 squashed commits, we are left to dig through the code and debug ourselves, rather than having Git possibly figure out the problem for us.
-
-
Adopt a security policy that requires signing only the merge commit (forcing a merge commit to be created with
--no-ff
if needed).-
This is certainly the quickest solution, allowing a reviewer to sign the merge after having reviewed the diff in its entirety.
-
However, it leaves individual commits open to exploitation. For example, one commit may introduce a payload that a future commit removes, thereby hiding it from the overall diff, but introducing terrible effect should the commit be checked out individually (e.g. by
git bisect
). Squashing all commits (option #1), signing each commit individually (option #3), or simply reviewing each commit individually before performing the merge (without signing each individual commit) would prevent this problem. -
This also does not fully prevent the situation mentioned in the hypothetical story at the beginning of this article---others can still commit with you as the author, but the commit would not have been signed.
-
Preserves the SHA-1 hashes of each individual commit.
-
-
Sign each commit to be introduced by the merge.
-
The tedium of this chore can be greatly reduced by using http://www.gnupg.org/documentation/manuals/gnupg/Invoking-GPG_002dAGENT.html[
gpg-agent
]. -
Be sure to carefully review each commit rather than the entire diff to ensure that no malicious commits sneak into the history (see bullets for option #2). If you instead decide to script the sign of each commit without reviewing each individual diff, you may as well go with option #2.
-
Also useful if one needs to cherry-pick individual commits, since that would result in all commits having been signed.
-
One may argue that this option is unnecessarily redundant, considering that one can simply review the individual commits without signing them, then simply sign the merge commit to signify that all commits have been reviewed (option #2). The important point to note here is that this option offers proof that each commit was reviewed (unless it is automated).
-
This will create a new for each (the SHA-1 hash is not preserved).
-
Which of the three options you choose depends on what factors are important and feasible for your particular project. Specifically:
-
If history is not important to you, then you can avoid a lot of trouble by simply requiring the the commits be squashed (option #1).
-
If history is important to you, but you do not have the time to review individual commits:
Option #1 in the list above can easily be applied to the discussion in the previous section.
(Option #2)
Option #2 is as simple as passing the -S
argument to git merge
. If the merge is a fast-forward (that is, all commits can simply be
applied atop of HEAD
without any need for merging), then you would need to use
the --no-ff
option to force a merge commit.
# set up another branch to merge
$ git checkout -b bar
$ echo bar > bar
$ git add bar
$ git commit -m 'Added bar'
$ echo bar2 >> bar
$ git commit -am 'Modified bar'
$ git checkout master
# perform the actual merge (will be a fast-forward, so --no-ff is needed)
$ git merge -S --no-ff bar
# ^ GPG-sign merge commit
You need a passphrase to unlock the secret key for
user: "Mike Gerwitz (Free Software Developer) <mike@mikegerwitz.com>"
4096-bit RSA key, ID 8EE30EAB, created 2011-06-16
Merge made by the 'recursive' strategy.
bar | 2 ++
1 file changed, 2 insertions(+)
create mode 100644 bar
Inspecting the log, we will see the following:
$ git log --show-signature
commit ebadba134bde7ae3d39b173bf8947a69be089cf6
gpg: Signature made Sun 22 Apr 2012 11:36:17 AM EDT using RSA key ID 8EE30EAB
gpg: Good signature from "Mike Gerwitz (Free Software Developer) <mike@mikegerwitz.com>"
Merge: 652f9ae 031f6ee
Author: Mike Gerwitz <mike@mikegerwitz.com>
Date: Sun Apr 22 11:36:15 2012 -0400
Merge branch 'bar'
commit 031f6ee20c1fe601d2e808bfb265787d56732974
Author: Mike Gerwitz <mike@mikegerwitz.com>
Date: Sat Apr 21 17:35:27 2012 -0400
Modified bar
commit ce77088d85dee3d687f1b87d21c7dce29ec2cff1
Author: Mike Gerwitz <mike@mikegerwitz.com>
Date: Sat Apr 21 17:35:20 2012 -0400
Added bar
# [...]
Notice how the merge commit contains the signature, but the two commits involved
in the merge (031f6ee
and ce77088
) do not. Herein lies the problem---what
if commit 031f6ee
contained the backdoor mentioned in the story at the
beginning of the article? This commit is supposedly authored by you, but because
it lacks a signature, it could actually be authored by anyone. Furthermore, if
ce77088
contained malicious code that was removed in 031f6ee
, then it would
not show up in the diff between the two branches. That, however, is an issue
that needs to be addressed by your security policy. Should you be reviewing
individual commits? If so, a review would catch any potential problems with the
commits and wouldn't require signing each commit individually. The merge itself
could be representative of "Yes, I have reviewed each commit individually and I
see no problems with these changes."
If the commitment to reviewing each individual commit is too large, consider Option #1.
(Option #3)
Option #3 in the above list makes the review of each commit explicit and obvious; with option #2, one could simply lazily glance through the commits or not glance through them at all. That said, one could do the same with option #3 by automating the signing of each commit, so it could be argued that this option is completely unnecessary. Use your best judgment.
The only way to make this option remotely feasible, especially for a large
number of commits, is to perform the audit in such a way that we do not have
to re-enter our secret key passphrases for each and every commit. For this,
we can use
gpg-agent
,
which will safely store the passphrase in memory for the next time that it
is requested. Using gpg-agent
, we will only be prompted for the password
a single
time. Depending
on how you start gpg-agent
, be sure to kill it after you are done!
The process of signing each commit can be done in a variety of ways. Ultimately,
since signing the commit will result in an entirely new commit, the method you
choose is of little importance. For example, if you so desired, you could
cherry-pick individual commits and then -S --amend
them, but that would
not be recognized as a merge and would be terribly confusing when looking
through the history for a given branch (unless the merge would have been a
fast-forward). Therefore, we will settle on a method that will still produce a
merge commit (again, unless it is a fast-forward). One such way to do this is to
interactively rebase each commit, allowing you to easily view the diff, sign it,
and continue onto the next commit.
# create a new audit branch off of bar
$ git checkout -b bar-audit bar
$ git rebase -i master
# | ^ the branch that we will be merging into
# ^ interactive rebase (alternatively: long option --interactive)
First, we create a new branch off of bar
---bar-audit
---to perform the
rebase on (see bar
branch created in demonstration of option
#2). Then, in order to step through each commit that would be
merged into master
, we perform a rebase using master
as the upstream
branch. This will present every commit that is in bar-audit
(and
consequently bar
) that is not in master
, opening them in your preferred
editor:
e ce77088 Added bar
e 031f6ee Modified bar
# Rebase 652f9ae..031f6ee onto 652f9ae
#
# Commands:
# p, pick = use commit
# r, reword = use commit, but edit the commit message
# e, edit = use commit, but stop for amending
# s, squash = use commit, but meld into previous commit
# f, fixup = like "squash", but discard this commit's log message
# x, exec = run command (the rest of the line) using shell
#
# If you remove a line here THAT COMMIT WILL BE LOST.
# However, if you remove everything, the rebase will be aborted.
#
To modify the commits, replace each pick
with e
(or edit
), as shown above.
(In vim you can also do the following ex
command: :%s/^pick/e/
;
adjust regex flavor for other editors). Save and close. You will then be
presented with the first (oldest) commit:
Stopped at ce77088... Added bar
You can amend the commit now, with
git commit --amend
Once you are satisfied with your changes, run
git rebase --continue
# first, review the diff (alternatively, use tig/gitk)
$ git diff HEAD^
# if everything looks good, sign it
$ git commit -S --amend
# GPG-sign ^ ^ amend commit, preserving author, etc
You need a passphrase to unlock the secret key for
user: "Mike Gerwitz (Free Software Developer) <mike@mikegerwitz.com>"
4096-bit RSA key, ID 8EE30EAB, created 2011-06-16
[detached HEAD 5cd2d91] Added bar
1 file changed, 1 insertion(+)
create mode 100644 bar
# continue with next commit
$ git rebase --continue
# repeat.
$ ...
Successfully rebased and updated refs/heads/bar-audit.
Looking through the log, we can see that the commits have been rewritten to include the signatures (consequently, the SHA-1 hashes do not match):
$ git log --show-signature HEAD~2..
commit afb1e7373ae5e7dae3caab2c64cbb18db3d96fba
gpg: Signature made Sun 22 Apr 2012 01:37:26 PM EDT using RSA key ID 8EE30EAB
gpg: Good signature from "Mike Gerwitz (Free Software Developer) <mike@mikegerwitz.com>"
Author: Mike Gerwitz <mike@mikegerwitz.com>
Date: Sat Apr 21 17:35:27 2012 -0400
Modified bar
commit f227c90b116cc1d6770988a6ca359a8c92a83ce2
gpg: Signature made Sun 22 Apr 2012 01:36:44 PM EDT using RSA key ID 8EE30EAB
gpg: Good signature from "Mike Gerwitz (Free Software Developer) <mike@mikegerwitz.com>"
Author: Mike Gerwitz <mike@mikegerwitz.com>
Date: Sat Apr 21 17:35:20 2012 -0400
Added bar
We can then continue to merge into master
as we normally would. The next
consideration is whether or not to sign the merge commit as we would with
option #2. In the case of our example, the merge is a
fast-forward, so the merge commit is unnecessary (since the commits being merged
are already signed, we have no need to create a merge commit using --no-ff
purely for the purpose of signing it). However, consider that you may perform
the audit yourself and leave the actual merge process to someone else; perhaps
the project has a system in place where project maintainers must review the code
and sign off on it, and then other developers are responsible for merging and
managing conflicts. In that case, you may want a clear record of who merged the
changes in.
Enforcing Trust
Now that you have determined a security policy appropriate for your particular project/repository (well, hypothetically at least), some way is needed to enforce your signing policies. While manual enforcement is possible, it is subject to human error, peer scrutiny ("just let it through!") and is unnecessarily time-consuming. Fortunately, this is one of those things that you can script, sit back and enjoy.
Let us first focus on the simpler of automation tasks---checking to ensure that every commit is both signed and trusted (within our web of trust). Such an implementation would also satisfy option #3 in regards to merging. Well, perhaps not every commit will be considered. Chances are, you have an existing repository with a decent number of commits. If you were to go back and sign all those commits, you would completely alter the history of the entire repository, potentially creating headaches for other users. Instead, you may consider beginning your checks after a certain commit.
Commit History In a Nutshell
The SHA-1 hashes of each commit in Git are created using the delta and header information for each commit. This header information includes the commit's parent, whose header contains its parent---so on and so forth. In addition, Git depends on the entire history of the repository leading up to a given commit to construct the requested revision. Consequently, this means that the history cannot be altered without someone noticing (well, this is not entirely true; we'll discuss that in a moment). For example, consider the following branch:
Pre-attack:
---o---o---A---B---o---o---H
a1b2c3d^
Above, H
represents the current HEAD
and commit identified by A
is the
parent of commit B
. For the sake of discussion, let's say that commit A
is
identified by the SHA-1 fragment a1b2c3d
. Let us say that an attacker decides
to replace commit A
with another commit. In doing so, the SHA-1 hash of the
commit must change to match the new delta and contents of the header. This new
commit is identified as X
:
Post-attack:
---o---o---X---B---o---o---H
d4e5f6a^ ^!expects parent a1b2c3d
We now have a problem; when Git encounters commit B
(remember, Git must build
H
using the entire history leading up to it), it will check its SHA-1 hash and
notice that it no longer matches the hash of its parent. The attacker is unable
to change the expected hash in commit B
, because the header is used to
generate the SHA-1 hash for the commit, meaning B
would then have a different
SHA-1 hash (technically speaking, it would not longer be B
---it would be an
entirely different commit; we retain the identifier here only for demonstration
purposes). That would then invalidate any children of B
, so on and so forth.
Therefore, in order to rewrite the history for a single commit, the entire
history after that commit must also be rewritten (as is done by git rebase
).
Should that be done, the SHA-1 hash of H
would also need to change. Otherwise,
H
's history would be invalid and Git would immediately throw an error upon
attempting a checkout.
This has a very important consequence---given any commit, we can rest assured that, if it exists in the repository, Git will always reconstruct that commit exactly as it was created (including all the history leading up to that commit when it was created), or it will not do so at all. Indeed, as Linus mentions in a presentation at Google, he need only remember the SHA-1 hash of a single commit to rest assured that, given any other repository, in the event of a loss of his own, that commit will represent exactly the same commit that it did in his own repository. What does that mean for us? Importantly, it means that we do not have to rewrite history to sign each commit, because the history of our next signed commit is guaranteed. The only downside is, of course, that the history itself could have already been exploited in a manner similar to our initial story, but an automated mass-signing of all past commits for a given author wouldn't catch such a thing anyway.
That said, it is important to understand that the integrity of your repository guaranteed only if a hash collision cannot be created---that is, if an attacker were able to create the same SHA-1 hash with different data, then the child commit(s) would still be valid and the repository would have been successfully compromised. Vulnerabilities have been known in SHA-1 since 2005 that allow hashes to be computed faster than brute force, although they are not cheap to exploit. Given that, while your repository may be safe for now, there will come some point in the future where SHA-1 will be considered as crippled as MD5 is today. At that point in time, however, maybe Git will offer a secure migration solution to an algorithm like SHA-256 or better. Indeed, SHA-1 hashes were never intended to make Git cryptographically secure.
Given that, the average person is likely to be fine with leaving his/her history the way it is. We will operate under that assumption for our implementation, offering the ability to ignore all commits prior to a certain commit. If one wishes to validate all commits, the reference commit can simply be omitted.
Automating Signature Checks
The idea behind verifying that certain commits are trusted is fairly simple:
Given reference commit
r
(optionally empty), letC
be the set of all commits such thatC
=r..HEAD
(range spec) and letK
be the set of all public keys in a given GPG keyring. We must assert that, for each commitc
inC
, there must exist a keyk
in keyringK
such thatk
is trusted and can be used to verify the signature ofc
. This assertion is denoted by the functiong
(GPG) in the following expression:∀c∈C g(c)
.
Fortunately, as we have already seen in previous sections with the
--show-signature
option to git log
, Git handles the signature verification
for us; this reduces our implementation to a simple shell script. However, the
output we've been dealing with is not the most convenient to parse. It would be
nice if we could get commit and signature information on a single line per
commit. This can be accomplished with --pretty
, but we have an additional
problem---at the time of writing (in Git v1.7.10), the GPG --pretty
options
are undocumented.
A quick look at format_commit_one()
in
pretty.c
yields a 'G'
placeholder that has three different formats:
%GG
---GPG output (what we see ingit log --show-signature
)%G?
---Outputs "G" for a good signature and "B" for a bad signature; otherwise, an empty string (see mapping insignature_check
struct)%GS
---The name of the signer
We are interested in using the most concise and minimal representation ---
%G?
. Because this placeholder simply matches text on the GPG output, and the
string "gpg: Can't check signature: public key not found"
is not mapped in
signature_check
, unknown signatures will output an empty string, not "B".
This is not explicit behavior, so I'm unsure if this will change in future
releases. Fortunately, we are only interested in "G", so this detail will not
matter for our implementation.
With this in mind, we can come up with some useful one-line output per commit. The below is based on the output resulting from the demonstration of merge option #3 above:
$ git log --pretty="format:%H %aN %s %G?"
afb1e7373ae5e7dae3caab2c64cbb18db3d96fba Mike Gerwitz Modified bar G
f227c90b116cc1d6770988a6ca359a8c92a83ce2 Mike Gerwitz Added bar G
652f9aed906a646650c1e24914c94043ae99a407 John Doe Signed off G
16ddd46b0c191b0e130d0d7d34c7fc7af03f2d3e John Doe Added feature X G
cf43808e85399467885c444d2a37e609b7d9e99d Mike Gerwitz Test commit of foo G
Notice the "G" suffix for each of these lines, indicating that the signature is valid (which makes sense, since the signature is our own). Adding an additional commit, we can see what happens when a commit is unsigned:
$ echo foo >> foo
$ git commit -am 'Yet another foo'
$ git log --pretty="format:%H %aN %s %G?" HEAD^..
f72924356896ab95a542c495b796555d016cbddd Mike Gerwitz Yet another foo
Note that, as aforementioned, the string replacement of %G?
is empty when the
commit is unsigned. However, what about commits that are signed but untrusted
(not within our web of trust)?
$ gpg --edit-key 8EE30EAB
[...]
gpg> trust
[...]
Please decide how far you trust this user to correctly verify other users' keys
(by looking at passports, checking fingerprints from different sources, etc.)
1 = I don't know or won't say
2 = I do NOT trust
3 = I trust marginally
4 = I trust fully
5 = I trust ultimately
m = back to the main menu
Your decision? 2
[...]
gpg> save
Key not changed so no update needed.
$ git log --pretty="format:%H %aN %s %G?" HEAD~2..
f72924356896ab95a542c495b796555d016cbddd Mike Gerwitz Yet another foo
afb1e7373ae5e7dae3caab2c64cbb18db3d96fba Mike Gerwitz Modified bar G
Uh oh. It seems that Git does not seem to check whether or not a signature is trusted. Let's take a look at the full GPG output:
$ git log --show-signature HEAD~2..HEAD^
commit afb1e7373ae5e7dae3caab2c64cbb18db3d96fba
gpg: Signature made Sun 22 Apr 2012 01:37:26 PM EDT using RSA key ID 8EE30EAB
gpg: Good signature from "Mike Gerwitz (Free Software Developer) <mike@mikegerwitz.com>"
gpg: WARNING: This key is not certified with a trusted signature!
gpg: There is no indication that the signature belongs to the owner.
Primary key fingerprint: 2217 5B02 E626 BC98 D7C0 C2E5 F22B B815 8EE3 0EAB
Author: Mike Gerwitz <mike@mikegerwitz.com>
Date: Sat Apr 21 17:35:27 2012 -0400
Modified bar
As you can see, GPG provides a clear warning. Unfortunately,
parse_signature_lines()
in
pretty.c
,
which references a simple mapping in struct signature_check
, will
blissfully ignore the warning and match only "Good signature from"
,
yielding "G". A patch to provide a separate token for untrusted keys is
simple, but for the time being, we will explore two separate
implementations---one that will parse the simple one-line output that is
ignorant of trust and a mention of a less elegant implementation that parses
the GPG output. ^[Should the patch be accepted, this article will be
updated to use the new token.]
Signature Check Script, Disregarding Trust
As mentioned above, due to limitations of the current %G?
implementation, we
cannot determine from the single-line output whether or not the given signature
is actually trusted. This isn't necessarily a problem. Consider what will
likely be a common use case for this script---to be run by a continuous
integration (CI) system. In order to let the CI system know what signatures
should be trusted, you will likely provide it with a set of keys for known
committers, which eliminates the need for a web of trust (the act of placing the
public key on the server indicates that you trust the key). Therefore, if the
signature is recognized and is good, the commit can be trusted.
One additional consideration is the need to ignore all ancestors of a given commit, which is necessary on older repositories where older commits will not be signed (see Commit History In a Nutshell for information on why it is unnecessary, and probably a bad idea, to sign old commits). As such, our script will accept a ref and will only consider its children in the check.
This script assumes that each commit will be signed and will output the SHA-1 hash of each unsigned/bad commit, in addition to some additional, useful information, delimited by tabs.
#!/bin/sh
#
# Licensed under the CC0 1.0 Universal license (public domain).
#
# Validate signatures on each and every commit within the given range
##
# if a ref is provided, append range spec to include all children
chkafter="${1+$1..}"
# note: bash users may instead use $'\t'; the echo statement below is a more
# portable option
t=$( echo '\t' )
# Check every commit after chkafter (or all commits if chkafter was not
# provided) for a trusted signature, listing invalid commits. %G? will output
# "G" if the signature is trusted.
git log --pretty="format:%H$t%aN$t%s$t%G?" "${chkafter:-HEAD}" \
| grep -v "${t}G$"
# grep will exit with a non-zero status if no matches are found, which we
# consider a success, so invert it
[ $? -gt 0 ]
That's it; Git does most of the work for us! If a ref is provided, it will be
converted into a range spec by
appending ".."
(e.g. a1b2c
becomes a1b2c..
), which will cause git log
to return all of its children (not including the ref itself). If no ref is
provided, we end up using HEAD
without a range spec, which will simply list
every commit (using an empty string will cause Git to throw an error, and we
must quote the string in case the user decides to do something like "master@{5 days ago}"
). Using the --pretty
option to git log
, we output the GPG
signature result with %G?
, in addition to some useful information we will want
to see about any commits that do not pass the test. We can then filter out all
commits that have been signed with a known key by removing all lines that end in
"G"---the output from %G?
indicating a good signature.
Let's see it in action (assuming the script has been saved as signchk
):
$ chmod +x signchk
$ ./signchk
f72924356896ab95a542c495b796555d016cbddd Mike Gerwitz Yet another foo
$ echo $?
1
With no arguments, the script checks every commit in our repository, finding a single commit that has not been signed. At this point, we can either check the output itself or check the exit status of the script, which indicates a failure. If this script were run by a CI system, the best option would be to abort the build and immediately notify the maintainers of a potential security breach (or, more likely, someone simply forgot to sign their commit).
If we check commits after that failure, assuming that each of the children have been signed, we will see the following:
$ ./signchk f7292
$ echo $?
0
Be careful when running this script directly from the repository, especially with CI systems---you must either place a copy of the script outside of the repository or run the script from a trusted point in history. For example, if your CI system were to simply pull from the repository and then run the script, an attacker need only modify the script to circumvent this check entirely.
Signature Check Script With Web Of Trust
The web of trust would come in handy for large groups of contributors; in such a case, your CI system could attempt to download the public key from a preconfigured keyserver when the key is encountered (updating the key if necessary to get trust signatures). Based on the web of trust established from the public keys directly trusted by the CI system, you could then automatically determine whether or not a commit can be trusted even if the key was not explicitly placed on the server.
To accomplish this task, we will split the script up into two distinct portions---retrieving/updating all keys within the given range, followed by the actual signature verification. Let's start with the key gathering portion, which is actually a trivial task:
$ git log --show-signature \
| grep 'key ID' \
| grep -o '[A-Z0-9]\+$' \
| sort \
| uniq \
| xargs gpg --keyserver key.server.org --recv-keys $keys
The above string of commands simply uses grep
to pull the key ids out of git log
output (using --show-signature
to produce GPG output), and then requests
only the unique keys from the given keyserver. In the case of the repository
we've been using throughout this article, there is only a single signature---my
own. In a larger repository, all unique keys will be listed. Note that the
above example does not specify any range of commits; you are free to integrate
it into the signchk
script to use the same range, but it isn't strictly
necessary (it may provide a slight performance benefit, depending on the number
of commits that would have been ignored).
Armed with our updated keys, we can now verify the commits based on our web of trust. Whether or not a specific key will be trusted is dependent on your personal settings. The idea here is that you can trust a set of users (e.g. Linus' "lieutenants") that in turn will trust other users which, depending on your configuration, may automatically be within your web of trust even if you do not personally trust them. This same concept can be applied to your CI server by placing its keyring in place of you own (or perhaps you will omit the CI server and run the script yourself).
Unfortunately, with Git's current %G?
implementation, we are unable to
check basic one-line output. Instead, we must parse the output
of --show-signature
(as shown above) for each
relevant commit. Combining our output with the original script that
disregards trust, we can arrive at the following, which is
the output that we must parse:
$ git log --pretty="format:%H$t%aN$t%s$t%G?" --show-signature
f72924356896ab95a542c495b796555d016cbddd Mike Gerwitz Yet another foo
gpg: Signature made Sun 22 Apr 2012 01:37:26 PM EDT using RSA key ID 8EE30EAB
gpg: Good signature from "Mike Gerwitz (Free Software Developer) <mike@mikegerwitz.com>"
gpg: WARNING: This key is not certified with a trusted signature!
gpg: There is no indication that the signature belongs to the owner.
Primary key fingerprint: 2217 5B02 E626 BC98 D7C0 C2E5 F22B B815 8EE3 0EAB
afb1e7373ae5e7dae3caab2c64cbb18db3d96fba Mike Gerwitz Modified bar G
[...]
In the above snippet, it should be noted that the first commit (f7292
) is
not signed, whereas the second (afb1e
) is. Therefore, the GPG output
preceeds the commit line itself. Let's consider our objective:
. List all unsigned commits, or commits with unknown or invalid signatures. . List all signed commits that are signed with known signatures, but are otherwise untrusted.
Our previous script performs #1 just fine, so we need only augment it to support #2. In essence---we wish to convert lines ending in "G" to something else if the GPG output preceeding that line indicates that the signature is untrusted.
There are many ways to go about doing this, but we will settle for a fairly clear set of commands that can be used to augment the previous script. To prevent the lines ending with "G" from being filtered from the output (should they be untrusted), we will suffix untrusted lines with "U". Consider the output of the following:
$ git log --pretty="format:^%H$t%aN$t%s$t%G?" --show-signature \
> | grep '^\^\|gpg: .*not certified' \
> | awk '
> /^gpg:/ {
> getline;
> printf "%s U\n", $0;
> next;
> }
> { print; }
> ' \
> | sed 's/^\^//'
f72924356896ab95a542c495b796555d016cbddd Mike Gerwitz Yet another foo
afb1e7373ae5e7dae3caab2c64cbb18db3d96fba Mike Gerwitz Modified bar G U
f227c90b116cc1d6770988a6ca359a8c92a83ce2 Mike Gerwitz Added bar G U
652f9aed906a646650c1e24914c94043ae99a407 John Doe Signed off G U
16ddd46b0c191b0e130d0d7d34c7fc7af03f2d3e John Doe Added feature X G U
cf43808e85399467885c444d2a37e609b7d9e99d Mike Gerwitz Test commit of foo G U
Here, we find that if we filter out those lines ending in "G" as we did
before, we would be left with the untrusted commits in addition to the commits
that are bad ("B") or unsigned (blank), as indicated by %G?
. To accomplish
this, we first add the GPG output to the log with the --show-signature
option
and, to make filtering easier, prefix all commit lines with a caret (^) which
we will later strip. We then filter all lines but those beginning with a caret,
or lines that contain the string "not certified", which is part of the GPG
output. This results in lines of commits with a single "gpg:"
line before
them if they are untrusted. We can then pipe this to awk, which will remove all
"gpg:"
-prefixed lines and append "U"
to the next line (the commit line).
Finally, we strip off the leading caret that was added during the beginning of
this process to produce the final output.
Please keep in mind that there is a huge difference between the conventional use of trust with PGP/GPG ("I assert that I know this person is who they claim they are") vs trusting someone to commit to your repository. As such, it may be in your best interest to maintain an entirely separate web of trust for your CI server or whatever user is being used to perform the signature checks.
Automating Merge Signature Checks
The aforementioned scripts are excellent if you wish to check the validity of each individual commit, but not everyone will wish to put forth that amount of effort. Instead, maintainers may opt for a workflow that requires the signing of only the merge commit (option #2 above), rather than each commit that is introduced by the merge. Let us consider the appropach we would have to take for such an implementation:
Given reference commit
r
(optionally empty), letC'
be the set of all first-parent commits such thatC'
=r..HEAD
(range spec) and letK
be the set of all public keys in a given GPG keyring. We must assert that, for each commitc
inC
, there must exist a keyk
in keyringK
such thatk
is trusted and can be used to verify the signature of\c
. This assertion is denoted by the functiong
(GPG) in the following expression:∀c∈C′ g(c)
.
The only difference between this script and the script that checks for a
signature on each individual commit is that this script will only check for
commits on a particular branch (e.g. master
). This is important---if we
commit directly onto master, we want to ensure that the commit is signed (since
there will be no merge). If we merge into master, a merge commit will be
created, which we may sign and ignore all commits introduced by the merge. If
the merge is a fast-forward, a merge commit can be forcefully created with the
--no-ff
option to avoid the need to amend each commit with a signature.
To demonstrate a script that can valdiate commits for this type of workflow, let's first create some changes that would result in a merge:
$ git checkout -b diverge
$ echo foo > diverged
$ git add diverged
$ git commit -m 'Added content to diverged'
[diverge cfe7389] Added content to diverged
1 file changed, 1 insertion(+)
create mode 100644 diverged
$ echo foo2 >> diverged
$ git commit -am 'Added additional content to diverged'
[diverge 996cf32] Added additional content to diverged
1 file changed, 1 insertion(+)
$ git checkout master
Switched to branch 'master'
$ echo foo >> foo
$ git commit -S -am 'Added data to master'
You need a passphrase to unlock the secret key for
user: "Mike Gerwitz (Free Software Developer) <mike@mikegerwitz.com>"
4096-bit RSA key, ID 8EE30EAB, created 2011-06-16
[master 3cbc6d2] Added data to master
1 file changed, 1 insertion(+)
$ git merge -S diverge
You need a passphrase to unlock the secret key for
user: "Mike Gerwitz (Free Software Developer) <mike@mikegerwitz.com>"
4096-bit RSA key, ID 8EE30EAB, created 2011-06-16
Merge made by the 'recursive' strategy.
diverged | 2 ++
1 file changed, 2 insertions(+)
create mode 100644 diverged
Above, committed in both master
and a new diverge
branch in order to ensure
that the merge would not be a fast-forward (alternatively, we could have used
the --no-ff
option of git merge
). This results in the following (your hashes
will vary):
$ git log --oneline --graph
* 9307dc5 Merge branch 'diverge'
|\
| * 996cf32 Added additional content to diverged
| * cfe7389 Added content to diverged
* | 3cbc6d2 Added data to master
|/
* f729243 Yet another foo
* afb1e73 Modified bar
* f227c90 Added bar
* 652f9ae Signed off
* 16ddd46 Added feature X
* cf43808 Test commit of foo
From the above graph, we can see that we are interested in signatures on only
two of the commits: 3cbc6d2
, which was created directly on master
, and
9307dc5
---the merge commit. The other two commits (996cf32
and cfe7389
)
need not be signed because the signing of the merge commit asserts their
validity (assuming that the author of the merge was vigilant). But how do we
ignore those commits?
$ git log --oneline --graph --first-parent
* 9307dc5 Merge branch 'diverge'
* 3cbc6d2 Added data to master
* f729243 Yet another foo
* afb1e73 Modified bar
* f227c90 Added bar
* 652f9ae Signed off
* 16ddd46 Added feature X
* cf43808 Test commit of foo
The above example simply added the --first-parent
option to git log
, which
will display only the first parent commit when encountering a merge commit.
Importantly, this means that we are left with only the commits on master
(or
whatever branch you decide to reference). These are the commits we wish to
validate.
Performing the validation is therefore only a slight modification to the original script:
#!/bin/sh
#
# Validate signatures on only direct commits and merge commits for a particular
# branch (current branch)
##
# if a ref is provided, append range spec to include all children
chkafter="${1+$1..}"
# note: bash users may instead use $'\t'; the echo statement below is a more
# portable option (-e is unsupported with /bin/sh)
t=$( echo '\t' )
# Check every commit after chkafter (or all commits if chkafter was not
# provided) for a trusted signature, listing invalid commits. %G? will output
# "G" if the signature is trusted.
git log --pretty="format:%H$t%aN$t%s$t%G?" "${chkafter:-HEAD}" --first-parent \
| grep -v "${t}G$"
# grep will exit with a non-zero status if no matches are found, which we
# consider a success, so invert it
[ $? -gt 0 ]
If you run the above script using the branch setup provided above, then you will
find that neither of the commits made in the diverge
branch are listed in the
output. Since the merge commit itself is signed, it is also omitted from the
output (leaving us with only the unsigned commit mentioned in the previous
sections). To demonstrate what will happen if the merge commit is not signed,
we can amend it as follows (omitting the -S
option):
$ git commit --amend
[master 9ee66e9] Merge branch 'diverge'
$ ./signchk
9ee66e900265d82f5389e403a894e8d06830e463 Mike Gerwitz Merge branch 'diverge'
f72924356896ab95a542c495b796555d016cbddd Mike Gerwitz Yet another foo
$ echo $?
1
The merge commit is then listed, requiring a valid signature. ^[If you wish to ensure that this signature is trusted as well, see the section on verifying commits within a web of trust.]
Summary
- Be careful of who you trust. Is your repository safe from harm/exploitation on your PC? What about the PCs of those whom you trust? ** Your host is not necessarily secure. Be wary of using remotely hosted repositories as your primary hub.
- Using GPG to sign your commits can help to assert your identity, helping to protect your reputation from impostors.
- For large merges, you must develop a security practice that works best for your particular project. Specifically, you may choose to sign each individual commit introduced by the merge, sign only the merge commit, or squash all commits and sign the resulting commit.
- If you have an existing repository, there is little need to go rewriting history to mass-sign commits.
- Once you have determined the security policy best for your project, you may automate signature verification to ensure that no unauthorized commits sneak into your repository.