1317 lines
61 KiB
Markdown
1317 lines
61 KiB
Markdown
# A Git Horror Story: Repository Integrity With Signed Commits
|
||
|
||
_(Note: This article was written at the end of 2012 and is out of date. I
|
||
will update it at some point, but until then, please keep that in
|
||
perspective.)_
|
||
|
||
It's 2:00 AM. The house is quiet, the kid is in bed and your significant other
|
||
has long since fallen asleep on the couch waiting for you, the light of the TV
|
||
flashing out of the corner of your eye. Your mind and body are exhausted.
|
||
Satisfied with your progress for the night, you commit the code you've been
|
||
hacking for hours: `"[master 2e4fd96] Fixed security vulnerability CVE-123"`.
|
||
You push your changes to your host so that others can view and comment on your
|
||
progress before tomorrow's critical release, suspend your PC and struggle to
|
||
wake your significant other to get him/her in bed. You turn off the lights, trip
|
||
over a toy on your way to the bedroom and sigh as you realize you're going to
|
||
have to make a bottle for the child who just heard his/her favorite toy jingle.
|
||
|
||
Fast forward four sleep-deprived hours. You are woken to the sound of your phone
|
||
vibrating incessantly. You smack it a few times, thinking it's your alarm clock,
|
||
then fumble half-blind as you try to to dig it out from under the bed after you
|
||
knock it off the nightstand. (Oops, you just woke the kid up again.) You pick up
|
||
the phone and are greeted by a frantic colleague. "I merged in our changes. We
|
||
need to tag and get this fix out there." Ah, damnit. You wake up your
|
||
significant other, asking him/her to deal with the crying child (yeah, that went
|
||
well) and stumble off to your PC, failing your first attempt to enter your
|
||
password. You rub your eyes and pull the changes.
|
||
|
||
Still squinting, you glance at the flood of changes presented to you. Your
|
||
child is screaming in the background, not amused by your partner's feeble
|
||
attempts to console him/her. `git log --pretty=short`...everything looks
|
||
good---just a bunch of commits from you and your colleague that were merged in.
|
||
You run the test suite---everything passes. Looks like you're ready to go. `git
|
||
tag -s 1.2.3 -m 'Various bugfixes, including critical CVE-123' && git push
|
||
--tags`. After struggling to enter the password to your private key, slowly
|
||
standing up from your chair as you type, you run off to help with the baby
|
||
(damnit, where do they keep the source code for these things). Your CI system
|
||
will handle the rest.
|
||
|
||
Fast forward two months.
|
||
|
||
CVE-123 has long been fixed and successfully deployed. However, you receive an
|
||
angry call from your colleague. It seems that one of your most prominent users
|
||
has had a massive security breach. After researching the problem, your colleague
|
||
found that, according to the history, _the breach exploited a back door that you
|
||
created!_ What? You would never do such a thing. To make matters worse, `1.2.3`
|
||
was signed off by you, using your GPG key---you affirmed that this tag was
|
||
good and ready to go. "3-b-c-4-2-b, asshole", scorns your colleague. "Thanks
|
||
a lot."
|
||
|
||
No---that doesn't make sense. You quickly check the history. `git log --patch
|
||
3bc42b`. "Added missing docblocks for X, Y and Z." You form a puzzled
|
||
expression, raising your hands from the keyboard slightly before tapping the
|
||
space bar a few times with few expectations. Sure enough, in with a few minor
|
||
docblock changes, there was one very inconspicuous line change that added the
|
||
back door to the authentication system. The commit message is fairly clear and
|
||
does not raise any red flags---why would you check it? Furthermore, the
|
||
author of the commit _was indeed you!_
|
||
|
||
Thoughts race through your mind. How could this have happened? That commit has
|
||
your name, but you do not recall ever having made those changes. Furthermore,
|
||
you would have never made that line change; it simply does not make sense. Did
|
||
your colleague frame you by committing as you? Was your colleague's system
|
||
compromised? Was your _host_ compromised? It couldn't have been your local
|
||
repository; that commit was clearly part of the merge and did not exist in your
|
||
local repository until your pull on that morning two months ago.
|
||
|
||
Regardless of what happened, one thing is horrifically clear: right now, you are
|
||
the one being blamed.
|
||
|
||
<!-- more -->
|
||
|
||
## Who Do You Trust? {#trust}
|
||
|
||
Theorize all you want---it's possible that you may never fully understand what
|
||
resulted in the compromise of your repository. The above story is purely
|
||
hypothetical, but entirely within the realm of possibility. How can you rest
|
||
assured that your repository is safe for not only those who would reference or
|
||
clone it, but also those who may download, for example, tarballs that are
|
||
created from it?
|
||
|
||
Git is a [distributed revision control
|
||
system](https://en.wikipedia.org/wiki/Distributed_revision_control). In
|
||
short, this means that anyone can have a copy of your repository to work on
|
||
offline, in private. They may commit to their own repository and users may
|
||
push/pull from each other. A central repository is unnecessary for
|
||
distributed revision control systems, but [may be used to provide an
|
||
"official" hub that others can work on and clone
|
||
from](http://lwn.net/Articles/246381/). Consequently, this also means that a
|
||
repository floating around for project X may contain malicious code; just
|
||
because someone else hands you a repository for your project doesn't mean
|
||
that you should actually use it.
|
||
|
||
The question is not "Who _can_ you trust?"; the question is "Who _do_ you
|
||
trust?", or rather---who _are_ you trusting with your repository, right now,
|
||
even if you do not realize it? For most projects, including the story above,
|
||
there are a number of individuals or organizations that you may have
|
||
inadvertently placed your trust in without fully considering the ramifications
|
||
of such a decision:
|
||
|
||
<a id="trust-host"></a>Git Host
|
||
: Git hosting providers are probably the most easily overlooked
|
||
trustees---providers like Gitorious, GitHub, Bitbucket, SourceForge, Google
|
||
Code, etc. Each provides hosting for your repository and "secures" it by
|
||
allowing only you, or other authorized users, to push to it, often with the
|
||
use of SSH keys tied to an account. By using a host as the primary holder of
|
||
your repository---the repository from which most clone and push to---you are
|
||
entrusting them with the entirety of your project; you are stating, "Yes, I
|
||
trust that my source code is safe with you and will not be tampered with".
|
||
This is a dangerous assumption. Do you trust that your host properly secures
|
||
your account information? Furthermore, bugs exist in all but the most
|
||
trivial pieces of software, so what is to say that there is not a
|
||
vulnerability just waiting to be exploited in your host's system, completely
|
||
compromising your repository?
|
||
|
||
It was not too long ago (March 4th, 2012) that [a public key security
|
||
vulnerability at
|
||
GitHub](https://github.com/blog/1068-public-key-security-vulnerability-and-mitigation)
|
||
was [exploited](https://gist.github.com/1978249) by a Russian man named
|
||
[Egor
|
||
Homakov](http://homakov.blogspot.com/2012/03/im-disappoint-github.html),
|
||
allowing him to successfully [commit to the master branch of the Ruby on
|
||
Rails
|
||
framework](https://github.com/rails/rails/commit/b83965785db1eec019edf1fc272b1aa393e6dc57)
|
||
repository hosted on GitHub. Oops.
|
||
|
||
Friends and Coworkers/Colleagues
|
||
: There may be certain groups or individuals that you trust enough to (a) pull
|
||
or accept patches from or (b) allow them to push to you or a
|
||
central/"official" repository. Operating under the assumption that each
|
||
individual is truly trustworthy (and let us hope that is the case), that
|
||
does not immediately imply that their _repository_ can be trusted. What are
|
||
their security policies? Do they leave their PC unlocked and unattended? Do
|
||
they make a habit of downloading virus-laden pornography on an unsecured,
|
||
non-free operating system? Or perhaps, through no fault of their own, they
|
||
are running a piece of software that is vulnerable to a 0-day exploit. Given
|
||
that, _how can you be sure that their commits are actually their own_?
|
||
Furthermore, how can you be sure that any commits they approve (or sign off
|
||
on using `git commit -s`) were actually approved by them?
|
||
|
||
That is, of course, assuming that they have no ill intent. For example, what
|
||
of the pissed off employee looking to get the arrogant, obnoxious co-worker
|
||
fired by committing under the coworker's name/email? What if you were the
|
||
manager or project lead? Whose word would you take? How would you even know
|
||
whom to suspect?
|
||
|
||
Your Own Repository
|
||
: Linus Torvalds (original author of Git and the kernel Linux) [keeps a
|
||
secured repository on his personal computer, inaccessible by any
|
||
external means](http://www.youtube.com/watch?v=4XpnKHJAok8) to ensure
|
||
that he has a repository he can fully trust. Most developers simply keep
|
||
a local copy on whatever PC they happen to be hacking on and pay no mind
|
||
to security---their repository is likely hosted elsewhere as well, after
|
||
all; Git is distributed. This is, however, a very serious matter.
|
||
|
||
You likely use your PC for more than just hacking. Most notably, you likely
|
||
use your PC to browse the Internet and download software. Software is buggy.
|
||
Buggy software has exploits and exploits tend to get, well, exploited. Not
|
||
every developer has a strong understanding of the best security practices
|
||
for their operating system (if you do, great!). And no---simply using
|
||
GNU/Linux or any other *NIX variant does not make you immune from every
|
||
potential threat.
|
||
|
||
To dive into each of these a bit more deeply, let us consider one of the
|
||
world's largest free software projects---the kernel Linux---and how its
|
||
original creator Linus Torvalds handles issues of trust. During [a talk he
|
||
presented at Google in 2007](http://www.youtube.com/watch?v=4XpnKHJAok8), he
|
||
describes a network of trust he created between himself and a number of
|
||
others (which he refers to as his "lieutenants"). Linus himself cannot
|
||
possibly manage the mass amount of code that is sent to him, so he has
|
||
others handle portions of the kernel. Those "lieutenants" handle most of the
|
||
requests, then submit them to Linus, who handles merging into his own
|
||
branch. In doing so, he has trusted that these lieutenants know what they
|
||
are doing, are carefully looking over each patch and that the patches Linus
|
||
receives from them are actually from them.
|
||
|
||
I am not aware of how patches are communicated from the lieutenants to Linus.
|
||
Certainly, one way to state with a fairly high level of certainty that the patch
|
||
is coming from one of his "lieutenants" is to e-mail the patches, signed with
|
||
their respective GPG/PGP keys. At that point, the web of trust is enforced by
|
||
the signature. Linus is then sure that his private repository (which he does his
|
||
best to secure, as aforementioned) contains only data that _he personally
|
||
trusts_. His repository is safe, so far as he knows, and he can use it
|
||
confidently.
|
||
|
||
At this point, assuming Linus' web of trust is properly verified, how can he
|
||
confidently convey these trusted changes to others? He certainly knows his own
|
||
commits, but how should others know that this "Linus Torvalds" guy who has
|
||
been committing and signing off of on commits is _actually_ Linus Torvalds? As
|
||
demonstrated in the hypothetical scenario at the beginning of this article,
|
||
anyone could claim to be Linus. If an attacker were to gain access to any clone
|
||
of the repository and commit as Linus, nobody would know the difference.
|
||
Fortunately, one can get around this by signing a tag with his/her private key
|
||
using GPG (`git tag -s`). A tag points to a particular commit and that commit
|
||
[depends on the entire history leading up to that commit](#commit-history).
|
||
This means that signing the SHA1 hash of that commit, assuming no security
|
||
vulnerabilities within SHA1, will forever state that the entire history of the
|
||
given commit, as pointed to by the given tag, is trusted.
|
||
|
||
Well, that is helpful, but that doesn't help to verify any commits made _after_
|
||
the tag (until the next tag comes around that includes that commit as an
|
||
ancestor of the new tag). Nor does it necessarily guarantee the integrity of all
|
||
past commits---it only states that, _to the best of Linus' knowledge_, this
|
||
tree is trusted. Notice how the hypothetical you in our hypothetical story also
|
||
signed the tag with his/her private key. Unfortunately, he/she fell prey to
|
||
something that is all too common---human error. He/she trusted that his/her
|
||
"trusted" colleague could actually be fully trusted. Wouldn't it be nice if we
|
||
could remove some of that human error from the equation?
|
||
|
||
|
||
## Ensuring Trust {#trust-ensure}
|
||
|
||
What if we had a way to ensure that a commit by someone named "Mike Gerwitz"
|
||
with my e-mail address is _actually_ a commit from myself, much like we
|
||
can assert that a tag signed with my private key was actually tagged by myself?
|
||
Well, who are we trying to prove this to? If you are only proving your identity
|
||
to a project author/maintainer, then you can identify yourself in any reasonable
|
||
manner. For example, if you work within the same internal network, perhaps you
|
||
can trust that pushes from the internal IP are secure. If sending via e-mail,
|
||
you can sign the patch using your GPG key. Unfortunately, _these only extend
|
||
this level of trust to the author/maintainer, not other users!_ If I were to
|
||
clone your repository and look at the history, how do I know that a commit from
|
||
"Foo Bar" is truly a commit from Foo Bar, especially if the repository
|
||
frequently accepts patches and merge requests from many users?
|
||
|
||
Previously, only tags could be signed using GPG. Fortunately, [Git v1.7.9
|
||
introduced the ability to GPG-sign individual
|
||
commits](http://git.kernel.org/?p=git/git.git;a=blob_plain;f=Documentation/RelNotes/1.7.9.txt;hb=HEAD)---a
|
||
feature I have been long awaiting. Consider what may have happened to the
|
||
story at the beginning of this article if you signed each of your commits
|
||
like so:
|
||
|
||
```sh
|
||
$ git commit -S -m 'Fixed security vulnerability CVE-123'
|
||
# ^ GPG-sign commit
|
||
```
|
||
|
||
Notice the `-S` flag above, instructing Git to sign the commit using your
|
||
GPG key (please note the difference between `-s` and `-S`). If you followed this
|
||
practice for each of your commits---with no exceptions---then you (or anyone
|
||
else, for that matter) could say with relative certainty that the commit was
|
||
indeed authored by yourself. In the case of our story, you could then defend
|
||
yourself, stating that if the backdoor commit truly were yours, it would have
|
||
been signed. (Of course, one could argue that you simply did not sign that
|
||
commit in order to use that excuse. We'll get into addressing such an issue in a
|
||
bit.)
|
||
|
||
In order to set up your signing key, you first need to get your key id using
|
||
`gpg --list-secret-keys`:
|
||
|
||
```sh
|
||
$ gpg --list-secret-keys | grep ^sec
|
||
sec 4096R/8EE30EAB 2011-06-16 [expires: 2014-04-18]
|
||
# ^^^^^^^^
|
||
```
|
||
|
||
You are interested in the hexadecimal value immediately following the forward
|
||
slash in the above output (your output may vary drastically; do not worry if
|
||
your key does not contain `4096R` as above). If you have multiple secret
|
||
keys, select the one you wish to use for signing your commits. This value will
|
||
be assigned to the Git configuration value `user.signingkey`:
|
||
|
||
```sh
|
||
# remove --global to use this key only on the current repository
|
||
$ git config --global user.signingkey 8EE30EAB
|
||
# ^ replace with your key id
|
||
```
|
||
|
||
Given the above, let's give commit signing a shot. To do so, we will create a
|
||
test repository and work through that for the remainder of this article.
|
||
|
||
```sh
|
||
$ mkdir tmp && cd tmp
|
||
$ git init .
|
||
$ echo foo > foo
|
||
$ git add foo
|
||
$ git commit -S -m 'Test commit of foo'
|
||
|
||
You need a passphrase to unlock the secret key for
|
||
user: "Mike Gerwitz (Free Software Developer) <mike@mikegerwitz.com>"
|
||
4096-bit RSA key, ID 8EE30EAB, created 2011-06-16
|
||
|
||
[master (root-commit) cf43808] Test commit of foo
|
||
1 file changed, 1 insertion(+)
|
||
create mode 100644 foo
|
||
```
|
||
|
||
The only thing that has been done differently between this commit and an
|
||
unsigned commit is the addition of the `-S` flag, indicating that we want
|
||
to GPG-sign the commit. If everything has been set up properly, you should be
|
||
prompted for the password to your secret key (unless you have `gpg-agent`
|
||
running), after which the commit will continue as you would expect, resulting in
|
||
something similar to the above output (your GPG details and SHA-1 hash will
|
||
differ).
|
||
|
||
By default (at least in Git v1.7.9), `git log` will not list or validate
|
||
signatures. In order to display the signature for our commit, we may use the
|
||
`--show-signature` option, as shown below:
|
||
|
||
```sh
|
||
$ git log --show-signature
|
||
commit cf43808e85399467885c444d2a37e609b7d9e99d
|
||
gpg: Signature made Fri 20 Apr 2012 11:59:01 PM EDT using RSA key ID 8EE30EAB
|
||
gpg: Good signature from "Mike Gerwitz (Free Software Developer) <mike@mikegerwitz.com>"
|
||
Author: Mike Gerwitz <mike@mikegerwitz.com>
|
||
Date: Fri Apr 20 23:59:01 2012 -0400
|
||
|
||
Test commit of foo
|
||
```
|
||
|
||
There is an important distinction to be made here---the commit author and the
|
||
signature attached to the commit _may represent two different people_. In other
|
||
words: the commit signature is similar in concept to the `-s` option, which adds
|
||
a `Signed-off` line to the commit---it verifies that you have signed off on
|
||
the commit, but does not necessarily imply that you authored it. To demonstrate
|
||
this, consider that we have received a patch from "John Doe" that we wish to
|
||
apply. The policy for our repository is that every commit must be signed by a
|
||
trusted individual; all other commits will be rejected by the project
|
||
maintainers. To demonstrate without going through the hassle of applying an
|
||
actual patch, we will simply do the following:
|
||
|
||
```sh
|
||
$ echo patch from John Doe >> foo
|
||
$ git commit -S --author="John Doe <john@doe.name>" -am 'Added feature X'
|
||
|
||
You need a passphrase to unlock the secret key for
|
||
user: "Mike Gerwitz (Free Software Developer) <mike@mikegerwitz.com>"
|
||
4096-bit RSA key, ID 8EE30EAB, created 2011-06-16
|
||
|
||
[master 16ddd46] Added feature X
|
||
Author: John Doe <john@doe.name>
|
||
1 file changed, 1 insertion(+)
|
||
$ git log --show-signature
|
||
commit 16ddd46b0c191b0e130d0d7d34c7fc7af03f2d3e
|
||
gpg: Signature made Sat 21 Apr 2012 12:14:38 AM EDT using RSA key ID 8EE30EAB
|
||
gpg: Good signature from "Mike Gerwitz (Free Software Developer) <mike@mikegerwitz.com>"
|
||
Author: John Doe <john@doe.name>
|
||
Date: Sat Apr 21 00:14:38 2012 -0400
|
||
|
||
Added feature X
|
||
# [...]
|
||
```
|
||
|
||
This then raises the question---what is to be done about those who decide to
|
||
sign their commit with their own GPG key? There are a couple options here.
|
||
First, consider the issue from a maintainer's perspective---do we necessary
|
||
care about the identity of a 3rd party contributor, so long as the provided code
|
||
is acceptable? That depends. From a legal standpoint, we may, but not every user
|
||
has a GPG key. Given that, someone creating a key for the sole purpose of
|
||
signing a few commits without some means of identity verification, only to
|
||
discard the key later (or forget that it exists) does little to verify one's
|
||
identity. (Indeed, the whole concept behind PGP is to create a web of trust by
|
||
being able to verify that the person who signed using their key is actually who
|
||
they say they are, so such a scenario defeats the purpose.) Therefore, adopting
|
||
a strict signing policy for everyone who contributes a patch is likely to be
|
||
unsuccessful. Linux and Git satisfy this legal requirement with a
|
||
`"Signed-off-by"` line in the commit, signifying that the author agrees to the
|
||
[Developer's Certificate of
|
||
Origin](http://git.kernel.org/?p=git/git.git;a=blob;f=Documentation/SubmittingPatches;h=0dbf2c9843dd3eed014d788892c8719036287308;hb=HEAD);
|
||
this essentially states that the author has the legal rights to the code
|
||
contained within the commit. When accepting patches from 3rd parties who are
|
||
outside of your web of trust to begin with, this is the next best thing.
|
||
|
||
To adopt this policy for patches, require that authors do the following and
|
||
request that they do not GPG-sign their commits:
|
||
|
||
```sh
|
||
$ git commit -asm 'Signed off'
|
||
# ^ -s flag adds Signed-off-by line
|
||
$ git log
|
||
commit ca05f0c2e79c5cd712050df6a343a5b707e764a9
|
||
Author: Mike Gerwitz <mike@mikegerwitz.com>
|
||
Date: Sat Apr 21 15:46:05 2012 -0400
|
||
|
||
Signed off
|
||
|
||
Signed-off-by: Mike Gerwitz <mike@mikegerwitz.com>
|
||
# [...]
|
||
```
|
||
|
||
Then, when you receive the patch, you can apply it with the `-S` (capital, not
|
||
lowercase) to GPG-sign the commit; this will preserve the Signed-off-by line as
|
||
well. In the case of a pull request, you can sign the commit by amending it
|
||
(`git commit -S --amend`). Note, however, that the SHA-1 hash of the commit will
|
||
change when you do so.
|
||
|
||
What if you want to preserve the signature of whomever sent the pull request?
|
||
You cannot amend the commit, as that would alter the commit and invalidate their
|
||
signature, so dual-signing it is not an option (if Git were to even support that
|
||
option). Instead, you may consider signing the merge commit, which will be
|
||
discussed in the following section.
|
||
|
||
|
||
## Managing Large Merges
|
||
|
||
Up to this point, our discussion consisted of apply patches or merging single
|
||
commits. What shall we do, then, if we receive a pull request for a certain
|
||
feature or bugfix with, say, 300 commits (which I assure you is not unusual)? In
|
||
such a case, we have a few options:
|
||
|
||
1. <a id="merge-1"></a> **Request that the user squash all the commits into
|
||
a single commit**, thereby avoiding the problem entirely by applying the
|
||
previously discussed methods. I personally dislike this option for a few
|
||
reasons:
|
||
|
||
* We can no longer follow the history of that feature/bugfix in order to
|
||
learn how it was developed or see alternative solutions that were
|
||
attempted but later replaced.
|
||
|
||
* It renders `git bisect` useless. If we find a bug in the software that
|
||
was introduced by a single patch consisting of 300 squashed commits,
|
||
we are left to dig through the code and debug ourselves, rather than
|
||
having Git possibly figure out the problem for us.
|
||
|
||
2. <a id="merge-2"></a> **Adopt a security policy that requires signing only
|
||
the merge commit** (forcing a merge commit to be created with `--no-ff`
|
||
if needed).
|
||
|
||
* This is certainly the quickest solution, allowing a reviewer to sign
|
||
the merge after having reviewed the diff in its entirety.
|
||
|
||
* However, it leaves individual commits open to exploitation. For
|
||
example, one commit may introduce a payload that a future commit
|
||
removes, thereby hiding it from the overall diff, but introducing
|
||
terrible effect should the commit be checked out individually (e.g. by
|
||
`git bisect`). Squashing all commits ([option #1](#merge-1)), signing
|
||
each commit individually ([option #3](#merge-3)), or simply reviewing
|
||
each commit individually before performing the merge (without signing
|
||
each individual commit) would prevent this problem.
|
||
|
||
* This also does not fully prevent the situation mentioned in the
|
||
hypothetical story at the beginning of this article---others can still
|
||
commit with you as the author, but the commit would not have been
|
||
signed.
|
||
|
||
* Preserves the SHA-1 hashes of each individual commit.
|
||
|
||
3. <a id="merge-3"></a> **Sign each commit to be introduced by the merge.**
|
||
|
||
* The tedium of this chore can be greatly reduced by using
|
||
http://www.gnupg.org/documentation/manuals/gnupg/Invoking-GPG_002dAGENT.html[
|
||
`gpg-agent`].
|
||
|
||
* Be sure to carefully review _each commit_ rather than the entire diff to
|
||
ensure that no malicious commits sneak into the history (see bullets
|
||
for [option #2](#merge-2)). If you instead decide to script the sign
|
||
of each commit without reviewing each individual diff, you may as well
|
||
go with [option #2](#merge-2).
|
||
|
||
* Also useful if one needs to cherry-pick individual commits, since that would
|
||
result in all commits having been signed.
|
||
|
||
* One may argue that this option is unnecessarily redundant, considering that
|
||
one can simply review the individual commits without signing them, then
|
||
simply sign the merge commit to signify that all commits have been
|
||
reviewed ([option #2](#merge-2)). The important point to note here is
|
||
that this option offers _proof_ that each commit was reviewed (unless
|
||
it is automated).
|
||
|
||
* This will create a new for each (the SHA-1 hash is not preserved).
|
||
|
||
Which of the three options you choose depends on what factors are important and
|
||
feasible for your particular project. Specifically:
|
||
|
||
* If history is not important to you, then you can avoid a lot of trouble by
|
||
simply requiring the the commits be squashed ([option #1](#merge-1)).
|
||
|
||
* If history _is_ important to you, but you do not have the time to review
|
||
individual commits:
|
||
|
||
* Use [option #2](#merge-2) if you understand its risks.
|
||
|
||
* Otherwise, use [option #3](#merge-3), but _do not_ automate the signing
|
||
process to avoid having to look at individual commits. If you wish to keep
|
||
the history, do so responsibly.
|
||
|
||
Option #1 in the list above can easily be applied to the discussion in the
|
||
previous section.
|
||
|
||
|
||
### (Option #2)
|
||
|
||
[Option #2](#merge-2) is as simple as passing the `-S` argument to `git
|
||
merge`. If the merge is a fast-forward (that is, all commits can simply be
|
||
applied atop of `HEAD` without any need for merging), then you would need to use
|
||
the `--no-ff` option to force a merge commit.
|
||
|
||
```sh
|
||
# set up another branch to merge
|
||
$ git checkout -b bar
|
||
$ echo bar > bar
|
||
$ git add bar
|
||
$ git commit -m 'Added bar'
|
||
$ echo bar2 >> bar
|
||
$ git commit -am 'Modified bar'
|
||
$ git checkout master
|
||
|
||
# perform the actual merge (will be a fast-forward, so --no-ff is needed)
|
||
$ git merge -S --no-ff bar
|
||
# ^ GPG-sign merge commit
|
||
|
||
You need a passphrase to unlock the secret key for
|
||
user: "Mike Gerwitz (Free Software Developer) <mike@mikegerwitz.com>"
|
||
4096-bit RSA key, ID 8EE30EAB, created 2011-06-16
|
||
|
||
Merge made by the 'recursive' strategy.
|
||
bar | 2 ++
|
||
1 file changed, 2 insertions(+)
|
||
create mode 100644 bar
|
||
```
|
||
|
||
Inspecting the log, we will see the following:
|
||
|
||
```sh
|
||
$ git log --show-signature
|
||
commit ebadba134bde7ae3d39b173bf8947a69be089cf6
|
||
gpg: Signature made Sun 22 Apr 2012 11:36:17 AM EDT using RSA key ID 8EE30EAB
|
||
gpg: Good signature from "Mike Gerwitz (Free Software Developer) <mike@mikegerwitz.com>"
|
||
Merge: 652f9ae 031f6ee
|
||
Author: Mike Gerwitz <mike@mikegerwitz.com>
|
||
Date: Sun Apr 22 11:36:15 2012 -0400
|
||
|
||
Merge branch 'bar'
|
||
|
||
commit 031f6ee20c1fe601d2e808bfb265787d56732974
|
||
Author: Mike Gerwitz <mike@mikegerwitz.com>
|
||
Date: Sat Apr 21 17:35:27 2012 -0400
|
||
|
||
Modified bar
|
||
|
||
commit ce77088d85dee3d687f1b87d21c7dce29ec2cff1
|
||
Author: Mike Gerwitz <mike@mikegerwitz.com>
|
||
Date: Sat Apr 21 17:35:20 2012 -0400
|
||
|
||
Added bar
|
||
# [...]
|
||
```
|
||
|
||
Notice how the merge commit contains the signature, but the two commits involved
|
||
in the merge (`031f6ee` and `ce77088`) do not. Herein lies the problem---what
|
||
if commit `031f6ee` contained the backdoor mentioned in the story at the
|
||
beginning of the article? This commit is supposedly authored by you, but because
|
||
it lacks a signature, it could actually be authored by anyone. Furthermore, if
|
||
`ce77088` contained malicious code that was removed in `031f6ee`, then it would
|
||
not show up in the diff between the two branches. That, however, is an issue
|
||
that needs to be addressed by your security policy. Should you be reviewing
|
||
individual commits? If so, a review would catch any potential problems with the
|
||
commits and wouldn't require signing each commit individually. The merge itself
|
||
could be representative of "Yes, I have reviewed each commit individually and I
|
||
see no problems with these changes."
|
||
|
||
If the commitment to reviewing each individual commit is too large, consider
|
||
[Option #1](#merge-1).
|
||
|
||
### (Option #3)
|
||
|
||
[Option #3](#merge-3) in the above list makes the review of each commit
|
||
explicit and obvious; with [option #2](#merge-2), one could simply lazily
|
||
glance through the commits or not glance through them at all. That said, one
|
||
could do the same with [option #3](#merge-3) by automating the signing of each
|
||
commit, so it could be argued that this option is completely unnecessary. Use
|
||
your best judgment.
|
||
|
||
The only way to make this option remotely feasible, especially for a large
|
||
number of commits, is to perform the audit in such a way that we do not have
|
||
to re-enter our secret key passphrases for each and every commit. For this,
|
||
we can use
|
||
[`gpg-agent`](http://www.gnupg.org/documentation/manuals/gnupg/Invoking-GPG_002dAGENT.html),
|
||
which will safely store the passphrase in memory for the next time that it
|
||
is requested. Using `gpg-agent`, [we will only be prompted for the password
|
||
a single
|
||
time](http://stackoverflow.com/questions/9713781/how-to-use-gpg-agent-to-bulk-sign-git-tags/10263139). Depending
|
||
on how you start `gpg-agent`, _be sure to kill it after you are done!_
|
||
|
||
The process of signing each commit can be done in a variety of ways. Ultimately,
|
||
since signing the commit will result in an entirely new commit, the method you
|
||
choose is of little importance. For example, if you so desired, you could
|
||
cherry-pick individual commits and then `-S --amend` them, but that would
|
||
not be recognized as a merge and would be terribly confusing when looking
|
||
through the history for a given branch (unless the merge would have been a
|
||
fast-forward). Therefore, we will settle on a method that will still produce a
|
||
merge commit (again, unless it is a fast-forward). One such way to do this is to
|
||
interactively rebase each commit, allowing you to easily view the diff, sign it,
|
||
and continue onto the next commit.
|
||
|
||
```sh
|
||
# create a new audit branch off of bar
|
||
$ git checkout -b bar-audit bar
|
||
$ git rebase -i master
|
||
# | ^ the branch that we will be merging into
|
||
# ^ interactive rebase (alternatively: long option --interactive)
|
||
```
|
||
|
||
First, we create a new branch off of `bar`---`bar-audit`---to perform the
|
||
rebase on (see `bar` branch created in demonstration of [option
|
||
#2](#merge-2)). Then, in order to step through each commit that would be
|
||
merged into `master`, we perform a rebase using `master` as the upstream
|
||
branch. This will present every commit that is in `bar-audit` (and
|
||
consequently `bar`) that is not in `master`, opening them in your preferred
|
||
editor:
|
||
|
||
```
|
||
e ce77088 Added bar
|
||
e 031f6ee Modified bar
|
||
|
||
# Rebase 652f9ae..031f6ee onto 652f9ae
|
||
#
|
||
# Commands:
|
||
# p, pick = use commit
|
||
# r, reword = use commit, but edit the commit message
|
||
# e, edit = use commit, but stop for amending
|
||
# s, squash = use commit, but meld into previous commit
|
||
# f, fixup = like "squash", but discard this commit's log message
|
||
# x, exec = run command (the rest of the line) using shell
|
||
#
|
||
# If you remove a line here THAT COMMIT WILL BE LOST.
|
||
# However, if you remove everything, the rebase will be aborted.
|
||
#
|
||
```
|
||
|
||
To modify the commits, replace each `pick` with `e` (or `edit`), as shown above.
|
||
(In vim you can also do the following `ex` command: `:%s/^pick/e/`;
|
||
adjust regex flavor for other editors). Save and close. You will then be
|
||
presented with the first (oldest) commit:
|
||
|
||
```sh
|
||
Stopped at ce77088... Added bar
|
||
You can amend the commit now, with
|
||
|
||
git commit --amend
|
||
|
||
Once you are satisfied with your changes, run
|
||
|
||
git rebase --continue
|
||
|
||
# first, review the diff (alternatively, use tig/gitk)
|
||
$ git diff HEAD^
|
||
# if everything looks good, sign it
|
||
$ git commit -S --amend
|
||
# GPG-sign ^ ^ amend commit, preserving author, etc
|
||
|
||
You need a passphrase to unlock the secret key for
|
||
user: "Mike Gerwitz (Free Software Developer) <mike@mikegerwitz.com>"
|
||
4096-bit RSA key, ID 8EE30EAB, created 2011-06-16
|
||
|
||
[detached HEAD 5cd2d91] Added bar
|
||
1 file changed, 1 insertion(+)
|
||
create mode 100644 bar
|
||
|
||
# continue with next commit
|
||
$ git rebase --continue
|
||
|
||
# repeat.
|
||
$ ...
|
||
Successfully rebased and updated refs/heads/bar-audit.
|
||
```
|
||
|
||
Looking through the log, we can see that the commits have been rewritten to
|
||
include the signatures (consequently, the SHA-1 hashes do not match):
|
||
|
||
```sh
|
||
$ git log --show-signature HEAD~2..
|
||
commit afb1e7373ae5e7dae3caab2c64cbb18db3d96fba
|
||
gpg: Signature made Sun 22 Apr 2012 01:37:26 PM EDT using RSA key ID 8EE30EAB
|
||
gpg: Good signature from "Mike Gerwitz (Free Software Developer) <mike@mikegerwitz.com>"
|
||
Author: Mike Gerwitz <mike@mikegerwitz.com>
|
||
Date: Sat Apr 21 17:35:27 2012 -0400
|
||
|
||
Modified bar
|
||
|
||
commit f227c90b116cc1d6770988a6ca359a8c92a83ce2
|
||
gpg: Signature made Sun 22 Apr 2012 01:36:44 PM EDT using RSA key ID 8EE30EAB
|
||
gpg: Good signature from "Mike Gerwitz (Free Software Developer) <mike@mikegerwitz.com>"
|
||
Author: Mike Gerwitz <mike@mikegerwitz.com>
|
||
Date: Sat Apr 21 17:35:20 2012 -0400
|
||
|
||
Added bar
|
||
```
|
||
|
||
We can then continue to merge into `master` as we normally would. The next
|
||
consideration is whether or not to sign the merge commit as we would with
|
||
[option #2](#merge-2). In the case of our example, the merge is a
|
||
fast-forward, so the merge commit is unnecessary (since the commits being merged
|
||
are already signed, we have no need to create a merge commit using `--no-ff`
|
||
purely for the purpose of signing it). However, consider that you may perform
|
||
the audit yourself and leave the actual merge process to someone else; perhaps
|
||
the project has a system in place where project maintainers must review the code
|
||
and sign off on it, and then other developers are responsible for merging and
|
||
managing conflicts. In that case, you may want a clear record of who merged the
|
||
changes in.
|
||
|
||
|
||
## Enforcing Trust
|
||
|
||
Now that you have determined a security policy appropriate for your particular
|
||
project/repository (well, hypothetically at least), some way is needed to
|
||
enforce your signing policies. While manual enforcement is possible, it is
|
||
subject to human error, peer scrutiny ("just let it through!") and is
|
||
unnecessarily time-consuming. Fortunately, this is one of those things that you
|
||
can script, sit back and enjoy.
|
||
|
||
Let us first focus on the simpler of automation tasks---checking to ensure
|
||
that _every_ commit is both signed and trusted (within our web of trust). Such
|
||
an implementation would also satisfy [option #3](#merge-3) in regards to
|
||
merging. Well, perhaps not every commit will be considered. Chances are, you
|
||
have an existing repository with a decent number of commits. If you were to go
|
||
back and sign all those commits, you would completely alter the history of the
|
||
entire repository, potentially creating headaches for other users. Instead, you
|
||
may consider beginning your checks _after_ a certain commit.
|
||
|
||
### Commit History In a Nutshell {#commit-history}
|
||
|
||
The SHA-1 hashes of each commit in Git are created using the delta _and_ header
|
||
information for each commit. This header information includes the commit's
|
||
_parent_, whose header contains its parent---so on and so forth. In addition,
|
||
Git depends on the entire history of the repository leading up to a given commit
|
||
to construct the requested revision. Consequently, this means that the history
|
||
cannot be altered without someone noticing (well, this is not entirely true;
|
||
we'll discuss that in a moment). For example, consider the following branch:
|
||
|
||
```
|
||
Pre-attack:
|
||
|
||
---o---o---A---B---o---o---H
|
||
a1b2c3d^
|
||
```
|
||
|
||
Above, `H` represents the current `HEAD` and commit identified by `A` is the
|
||
parent of commit `B`. For the sake of discussion, let's say that commit `A` is
|
||
identified by the SHA-1 fragment `a1b2c3d`. Let us say that an attacker decides
|
||
to replace commit `A` with another commit. In doing so, the SHA-1 hash of the
|
||
commit must change to match the new delta and contents of the header. This new
|
||
commit is identified as `X`:
|
||
|
||
```
|
||
Post-attack:
|
||
|
||
---o---o---X---B---o---o---H
|
||
d4e5f6a^ ^!expects parent a1b2c3d
|
||
```
|
||
|
||
We now have a problem; when Git encounters commit `B` (remember, Git must build
|
||
`H` using the entire history leading up to it), it will check its SHA-1 hash and
|
||
notice that it no longer matches the hash of its parent. The attacker is unable
|
||
to change the expected hash in commit `B`, because the header is used to
|
||
generate the SHA-1 hash for the commit, meaning `B` would then have a different
|
||
SHA-1 hash (technically speaking, it would not longer be `B`---it would be an
|
||
entirely different commit; we retain the identifier here only for demonstration
|
||
purposes). That would then invalidate any children of `B`, so on and so forth.
|
||
Therefore, in order to rewrite the history for a single commit, _the entire
|
||
history after that commit must also be rewritten_ (as is done by `git rebase`).
|
||
Should that be done, the SHA-1 hash of `H` would also need to change. Otherwise,
|
||
`H`'s history would be invalid and Git would immediately throw an error upon
|
||
attempting a checkout.
|
||
|
||
This has a very important consequence---given any commit, we can rest
|
||
assured that, if it exists in the repository, Git will _always_ reconstruct
|
||
that commit exactly as it was created (including all the history leading up
|
||
to that commit _when_ it was created), or it will not do so at all. Indeed,
|
||
as Linus mentions in a presentation at Google, [he need only remember the
|
||
SHA-1 hash of a single commit](http://www.youtube.com/watch?v=4XpnKHJAok8)
|
||
to rest assured that, given any other repository, in the event of a loss of
|
||
his own, that commit will represent exactly the same commit that it did in
|
||
his own repository. What does that mean for us? Importantly, it means that
|
||
*we do not have to rewrite history to sign each commit*, because the history
|
||
of our _next_ signed commit is guaranteed. The only downside is, of course,
|
||
that the history itself could have already been exploited in a manner
|
||
similar to our initial story, but an automated mass-signing of all past
|
||
commits for a given author wouldn't catch such a thing anyway.
|
||
|
||
That said, it is important to understand that the integrity of your
|
||
repository guaranteed only if a [hash
|
||
collision](https://en.wikipedia.org/wiki/Hash_collision) cannot be
|
||
created---that is, if an attacker were able to create the same SHA-1 hash
|
||
with _different_ data, then the child commit(s) would still be valid and the
|
||
repository would have been successfully compromised. [Vulnerabilities have
|
||
been known in
|
||
SHA-1](http://www.schneier.com/blog/archives/2005/02/cryptanalysis_o.html)
|
||
since 2005 that allow hashes to be computed [faster than brute
|
||
force](http://www.schneier.com/blog/archives/2005/02/sha1_broken.html),
|
||
although they are not cheap to exploit. Given that, while your repository
|
||
may be safe for now, there will come some point in the future where SHA-1
|
||
will be considered as crippled as MD5 is today. At that point in time,
|
||
however, maybe Git will offer a secure migration solution to [an algorithm
|
||
like SHA-256](http://kerneltrap.org/mailarchive/git/2006/8/27/211001) or
|
||
better. Indeed, [SHA-1 hashes were never intended to make Git
|
||
cryptographically
|
||
secure](http://kerneltrap.org/mailarchive/git/2006/8/27/211020).
|
||
|
||
Given that, the average person is likely to be fine with leaving his/her history
|
||
the way it is. We will operate under that assumption for our implementation,
|
||
offering the ability to ignore all commits prior to a certain commit. If one
|
||
wishes to validate all commits, the reference commit can simply be omitted.
|
||
|
||
### Automating Signature Checks {#automate}
|
||
|
||
The idea behind verifying that certain commits are trusted is fairly simple:
|
||
|
||
> Given reference commit $r$ (optionally empty), let
|
||
> $C$ be the set of all commits such that $C$ = `r..HEAD`
|
||
> ([range spec](http://book.git-scm.com/4_git_treeishes.html)) and let
|
||
> $K$ be the set of all public keys in a given GPG keyring. We must assert
|
||
> that, for each commit $c$ in $C$, there must exist a key $k$ in
|
||
> keyring $K$ such that $k$ is
|
||
> [trusted](https://en.wikipedia.org/wiki/Web_of_trust) and can be used to
|
||
> verify the signature of $c$. This assertion is denoted by the function
|
||
> $g$ (GPG) in the following expression: $∀c∈C g(c)$.
|
||
|
||
Fortunately, as we have already seen in previous sections with the
|
||
`--show-signature` option to `git log`, Git handles the signature verification
|
||
for us; this reduces our implementation to a simple shell script. However, the
|
||
output we've been dealing with is not the most convenient to parse. It would be
|
||
nice if we could get commit and signature information on a single line per
|
||
commit. This can be accomplished with `--pretty`, but we have an additional
|
||
problem---at the time of writing (in Git v1.7.10), the GPG `--pretty` options
|
||
are undocumented.
|
||
|
||
A quick look at [`format_commit_one()` in
|
||
`pretty.c`](https://github.com/gitster/git/blob/f9d995d5dd39c942c06829e45f195eeaa99936e1/pretty.c#L1038)
|
||
yields a `'G'` placeholder that has three different formats:
|
||
|
||
- *`%GG`*---GPG output (what we see in `git log --show-signature`)
|
||
- *`%G?`*---Outputs "G" for a good
|
||
signature and "B" for a bad signature; otherwise, an empty string ([see
|
||
mapping in `signature_check`
|
||
struct](https://github.com/gitster/git/blob/f9d995d5dd39c942c06829e45f195eeaa99936e1/pretty.c#L808))
|
||
- *`%GS`*---The name of the signer
|
||
|
||
We are interested in using the most concise and minimal representation ---
|
||
`%G?`. Because this placeholder simply matches text on the GPG output, and the
|
||
string `"gpg: Can't check signature: public key not found"` is not mapped in
|
||
`signature_check`, unknown signatures will output an empty string, not "B".
|
||
This is not explicit behavior, so I'm unsure if this will change in future
|
||
releases. Fortunately, we are only interested in "G", so this detail will not
|
||
matter for our implementation.
|
||
|
||
With this in mind, we can come up with some useful one-line output per commit.
|
||
The below is based on the output resulting from the demonstration of
|
||
[merge option #3](#merge-3) above:
|
||
|
||
```sh
|
||
$ git log --pretty="format:%H %aN %s %G?"
|
||
afb1e7373ae5e7dae3caab2c64cbb18db3d96fba Mike Gerwitz Modified bar G
|
||
f227c90b116cc1d6770988a6ca359a8c92a83ce2 Mike Gerwitz Added bar G
|
||
652f9aed906a646650c1e24914c94043ae99a407 John Doe Signed off G
|
||
16ddd46b0c191b0e130d0d7d34c7fc7af03f2d3e John Doe Added feature X G
|
||
cf43808e85399467885c444d2a37e609b7d9e99d Mike Gerwitz Test commit of foo G
|
||
```
|
||
|
||
Notice the "G" suffix for each of these lines, indicating that the signature
|
||
is valid (which makes sense, since the signature is our own). Adding an
|
||
additional commit, we can see what happens when a commit is unsigned:
|
||
|
||
```sh
|
||
$ echo foo >> foo
|
||
$ git commit -am 'Yet another foo'
|
||
$ git log --pretty="format:%H %aN %s %G?" HEAD^..
|
||
f72924356896ab95a542c495b796555d016cbddd Mike Gerwitz Yet another foo
|
||
```
|
||
|
||
Note that, as aforementioned, the string replacement of `%G?` is empty when the
|
||
commit is unsigned. However, what about commits that are signed but untrusted
|
||
(not within our web of trust)?
|
||
|
||
```
|
||
$ gpg --edit-key 8EE30EAB
|
||
[...]
|
||
gpg> trust
|
||
[...]
|
||
Please decide how far you trust this user to correctly verify other users' keys
|
||
(by looking at passports, checking fingerprints from different sources, etc.)
|
||
|
||
1 = I don't know or won't say
|
||
2 = I do NOT trust
|
||
3 = I trust marginally
|
||
4 = I trust fully
|
||
5 = I trust ultimately
|
||
m = back to the main menu
|
||
|
||
Your decision? 2
|
||
[...]
|
||
|
||
gpg> save
|
||
Key not changed so no update needed.
|
||
$ git log --pretty="format:%H %aN %s %G?" HEAD~2..
|
||
f72924356896ab95a542c495b796555d016cbddd Mike Gerwitz Yet another foo
|
||
afb1e7373ae5e7dae3caab2c64cbb18db3d96fba Mike Gerwitz Modified bar G
|
||
```
|
||
|
||
Uh oh. It seems that Git does not seem to check whether or not a signature is
|
||
trusted. Let's take a look at the full GPG output:
|
||
|
||
<a id="gpg-sig-untrusted"></a>
|
||
```sh
|
||
$ git log --show-signature HEAD~2..HEAD^
|
||
commit afb1e7373ae5e7dae3caab2c64cbb18db3d96fba
|
||
gpg: Signature made Sun 22 Apr 2012 01:37:26 PM EDT using RSA key ID 8EE30EAB
|
||
gpg: Good signature from "Mike Gerwitz (Free Software Developer) <mike@mikegerwitz.com>"
|
||
gpg: WARNING: This key is not certified with a trusted signature!
|
||
gpg: There is no indication that the signature belongs to the owner.
|
||
Primary key fingerprint: 2217 5B02 E626 BC98 D7C0 C2E5 F22B B815 8EE3 0EAB
|
||
Author: Mike Gerwitz <mike@mikegerwitz.com>
|
||
Date: Sat Apr 21 17:35:27 2012 -0400
|
||
|
||
Modified bar
|
||
```
|
||
|
||
As you can see, GPG provides a clear warning. Unfortunately,
|
||
[`parse_signature_lines()` in
|
||
`pretty.c`](https://github.com/gitster/git/blob/f9d995d5dd39c942c06829e45f195eeaa99936e1/pretty.c#L808),
|
||
which references a simple mapping in `struct signature_check`, will
|
||
blissfully ignore the warning and match only `"Good signature from"`,
|
||
yielding "G". A patch to provide a separate token for untrusted keys is
|
||
simple, but for the time being, we will explore two separate
|
||
implementations---one that will parse the simple one-line output that is
|
||
ignorant of trust and a mention of a less elegant implementation that parses
|
||
the GPG output. ^[Should the patch be accepted, this article will be
|
||
updated to use the new token.]
|
||
|
||
|
||
#### Signature Check Script, Disregarding Trust {#script-notrust}
|
||
|
||
As mentioned above, due to limitations of the current `%G?` implementation, we
|
||
cannot determine from the single-line output whether or not the given signature
|
||
is actually trusted. This isn't necessarily a problem. Consider what will
|
||
likely be a common use case for this script---to be run by a continuous
|
||
integration (CI) system. In order to let the CI system know what signatures
|
||
should be trusted, you will likely provide it with a set of keys for known
|
||
committers, which eliminates the need for a web of trust (the act of placing the
|
||
public key on the server indicates that you trust the key). Therefore, if the
|
||
signature is recognized and is good, the commit can be trusted.
|
||
|
||
One additional consideration is the need to ignore all ancestors of a given
|
||
commit, which is necessary on older repositories where older commits will not be
|
||
signed (see [Commit History In a Nutshell](#commit-history) for information on
|
||
why it is unnecessary, and probably a bad idea, to sign old commits). As such,
|
||
our script will accept a ref and will only consider its children in the check.
|
||
|
||
This script *assumes that each commit will be signed* and will output the SHA-1
|
||
hash of each unsigned/bad commit, in addition to some additional, useful
|
||
information, delimited by tabs.
|
||
|
||
```sh
|
||
#!/bin/sh
|
||
#
|
||
# Licensed under the CC0 1.0 Universal license (public domain).
|
||
#
|
||
# Validate signatures on each and every commit within the given range
|
||
##
|
||
|
||
# if a ref is provided, append range spec to include all children
|
||
chkafter="${1+$1..}"
|
||
|
||
# note: bash users may instead use $'\t'; the echo statement below is a more
|
||
# portable option
|
||
t=$( echo '\t' )
|
||
|
||
# Check every commit after chkafter (or all commits if chkafter was not
|
||
# provided) for a trusted signature, listing invalid commits. %G? will output
|
||
# "G" if the signature is trusted.
|
||
git log --pretty="format:%H$t%aN$t%s$t%G?" "${chkafter:-HEAD}" \
|
||
| grep -v "${t}G$"
|
||
|
||
# grep will exit with a non-zero status if no matches are found, which we
|
||
# consider a success, so invert it
|
||
[ $? -gt 0 ]
|
||
```
|
||
|
||
That's it; Git does most of the work for us! If a ref is provided, it will be
|
||
converted into a [range spec](http://book.git-scm.com/4_git_treeishes.html) by
|
||
appending `".."` (e.g. `a1b2c` becomes `a1b2c..`), which will cause `git log`
|
||
to return all of its children (_not_ including the ref itself). If no ref is
|
||
provided, we end up using `HEAD` without a range spec, which will simply list
|
||
every commit (using an empty string will cause Git to throw an error, and we
|
||
must quote the string in case the user decides to do something like `"master@{5
|
||
days ago}"`). Using the `--pretty` option to `git log`, we output the GPG
|
||
signature result with `%G?`, in addition to some useful information we will want
|
||
to see about any commits that do not pass the test. We can then filter out all
|
||
commits that have been signed with a known key by removing all lines that end in
|
||
"G"---the output from `%G?` indicating a good signature.
|
||
|
||
Let's see it in action (assuming the script has been saved as `signchk`):
|
||
|
||
```sh
|
||
$ chmod +x signchk
|
||
$ ./signchk
|
||
f72924356896ab95a542c495b796555d016cbddd Mike Gerwitz Yet another foo
|
||
$ echo $?
|
||
1
|
||
```
|
||
|
||
With no arguments, the script checks every commit in our repository, finding a
|
||
single commit that has not been signed. At this point, we can either check the
|
||
output itself or check the exit status of the script, which indicates a failure.
|
||
If this script were run by a CI system, the best option would be to abort the
|
||
build and immediately notify the maintainers of a potential security breach (or,
|
||
more likely, someone simply forgot to sign their commit).
|
||
|
||
If we check commits after that failure, assuming that each of the children have
|
||
been signed, we will see the following:
|
||
|
||
```sh
|
||
$ ./signchk f7292
|
||
$ echo $?
|
||
0
|
||
```
|
||
|
||
Be careful when running this script directly from the repository, especially
|
||
with CI systems---you must either place a copy of the script outside of the
|
||
repository or run the script from a trusted point in history. For example, if
|
||
your CI system were to simply pull from the repository and then run the script,
|
||
an attacker need only modify the script to circumvent this check entirely.
|
||
|
||
|
||
#### Signature Check Script With Web Of Trust {#script-trust}
|
||
|
||
The web of trust would come in handy for large groups of contributors; in such a
|
||
case, your CI system could attempt to download the public key from a
|
||
preconfigured keyserver when the key is encountered (updating the key if
|
||
necessary to get trust signatures). Based on the web of trust established from
|
||
the public keys directly trusted by the CI system, you could then automatically
|
||
determine whether or not a commit can be trusted even if the key was not
|
||
explicitly placed on the server.
|
||
|
||
To accomplish this task, we will split the script up into two distinct
|
||
portions---retrieving/updating all keys within the given range, followed by the
|
||
actual signature verification. Let's start with the key gathering portion,
|
||
which is actually a trivial task:
|
||
|
||
```sh
|
||
$ git log --show-signature \
|
||
| grep 'key ID' \
|
||
| grep -o '[A-Z0-9]\+$' \
|
||
| sort \
|
||
| uniq \
|
||
| xargs gpg --keyserver key.server.org --recv-keys $keys
|
||
```
|
||
|
||
The above string of commands simply uses `grep` to pull the key ids out of `git
|
||
log` output (using `--show-signature` to produce GPG output), and then requests
|
||
only the unique keys from the given keyserver. In the case of the repository
|
||
we've been using throughout this article, there is only a single signature---my
|
||
own. In a larger repository, all unique keys will be listed. Note that the
|
||
above example does not specify any range of commits; you are free to integrate
|
||
it into the `signchk` script to use the same range, but it isn't strictly
|
||
necessary (it may provide a slight performance benefit, depending on the number
|
||
of commits that would have been ignored).
|
||
|
||
Armed with our updated keys, we can now verify the commits based on our web
|
||
of trust. Whether or not a specific key will be trusted is [dependent on
|
||
your personal
|
||
settings](http://www.gnupg.org/gph/en/manual.html#AEN533). The idea here is
|
||
that you can trust a set of users (e.g. Linus' "lieutenants") that in turn
|
||
will trust other users which, depending on your configuration, may
|
||
automatically be within your web of trust even if you do not personally
|
||
trust them. This same concept can be applied to your CI server by placing
|
||
its keyring in place of you own (or perhaps you will omit the CI server and
|
||
run the script yourself).
|
||
|
||
Unfortunately, with Git's current `%G?` implementation, [we are unable to
|
||
check basic one-line output](#automate). Instead, we must parse the output
|
||
of `--show-signature` ([as shown above](#gpg-sig-untrusted)) for each
|
||
relevant commit. Combining our output with [the original script that
|
||
disregards trust](#script-notrust), we can arrive at the following, which is
|
||
the output that we must parse:
|
||
|
||
```sh
|
||
$ git log --pretty="format:%H$t%aN$t%s$t%G?" --show-signature
|
||
f72924356896ab95a542c495b796555d016cbddd Mike Gerwitz Yet another foo
|
||
gpg: Signature made Sun 22 Apr 2012 01:37:26 PM EDT using RSA key ID 8EE30EAB
|
||
gpg: Good signature from "Mike Gerwitz (Free Software Developer) <mike@mikegerwitz.com>"
|
||
gpg: WARNING: This key is not certified with a trusted signature!
|
||
gpg: There is no indication that the signature belongs to the owner.
|
||
Primary key fingerprint: 2217 5B02 E626 BC98 D7C0 C2E5 F22B B815 8EE3 0EAB
|
||
afb1e7373ae5e7dae3caab2c64cbb18db3d96fba Mike Gerwitz Modified bar G
|
||
[...]
|
||
```
|
||
|
||
In the above snippet, it should be noted that the first commit (`f7292`) is
|
||
_not_ signed, whereas the second (`afb1e`) is. Therefore, the GPG output
|
||
_preceeds_ the commit line itself. Let's consider our objective:
|
||
|
||
. List all unsigned commits, or commits with unknown or invalid signatures.
|
||
. List all signed commits that are signed with known signatures, but are
|
||
otherwise untrusted.
|
||
|
||
Our [previous script](#script-notrust) performs #1 just fine, so we need only
|
||
augment it to support #2. In essence---we wish to convert lines ending in
|
||
"G" to something else if the GPG output _preceeding_ that line indicates that
|
||
the signature is untrusted.
|
||
|
||
There are many ways to go about doing this, but we will settle for a fairly
|
||
clear set of commands that can be used to augment the previous script. To
|
||
prevent the lines ending with "G" from being filtered from the output (should
|
||
they be untrusted), we will suffix untrusted lines with "U". Consider the
|
||
output of the following:
|
||
|
||
```sh
|
||
$ git log --pretty="format:^%H$t%aN$t%s$t%G?" --show-signature \
|
||
> | grep '^\^\|gpg: .*not certified' \
|
||
> | awk '
|
||
> /^gpg:/ {
|
||
> getline;
|
||
> printf "%s U\n", $0;
|
||
> next;
|
||
> }
|
||
> { print; }
|
||
> ' \
|
||
> | sed 's/^\^//'
|
||
f72924356896ab95a542c495b796555d016cbddd Mike Gerwitz Yet another foo
|
||
afb1e7373ae5e7dae3caab2c64cbb18db3d96fba Mike Gerwitz Modified bar G U
|
||
f227c90b116cc1d6770988a6ca359a8c92a83ce2 Mike Gerwitz Added bar G U
|
||
652f9aed906a646650c1e24914c94043ae99a407 John Doe Signed off G U
|
||
16ddd46b0c191b0e130d0d7d34c7fc7af03f2d3e John Doe Added feature X G U
|
||
cf43808e85399467885c444d2a37e609b7d9e99d Mike Gerwitz Test commit of foo G U
|
||
```
|
||
|
||
Here, we find that if we filter out those lines ending in "G" as we did
|
||
before, we would be left with the untrusted commits in addition to the commits
|
||
that are bad ("B") or unsigned (blank), as indicated by `%G?`. To accomplish
|
||
this, we first add the GPG output to the log with the `--show-signature` option
|
||
and, to make filtering easier, prefix all commit lines with a caret (^) which
|
||
we will later strip. We then filter all lines but those beginning with a caret,
|
||
or lines that contain the string "not certified", which is part of the GPG
|
||
output. This results in lines of commits with a single `"gpg:"` line before
|
||
them if they are untrusted. We can then pipe this to awk, which will remove all
|
||
`"gpg:"`-prefixed lines and append `"U"` to the next line (the commit line).
|
||
Finally, we strip off the leading caret that was added during the beginning of
|
||
this process to produce the final output.
|
||
|
||
Please keep in mind that there is a huge difference between the conventional use
|
||
of trust with PGP/GPG ("I assert that I know this person is who they claim they
|
||
are") vs trusting someone to commit to your repository. As such, it may be in
|
||
your best interest to maintain an entirely separate web of trust for your CI
|
||
server or whatever user is being used to perform the signature checks.
|
||
|
||
|
||
### Automating Merge Signature Checks {#script-merge}
|
||
|
||
The aforementioned scripts are excellent if you wish to check the validity of
|
||
each individual commit, but not everyone will wish to put forth that amount of
|
||
effort. Instead, maintainers may opt for a workflow that requires the signing
|
||
of only the merge commit ([option #2 above](#merge-2)), rather than each
|
||
commit that is introduced by the merge. Let us consider the appropach we would
|
||
have to take for such an implementation:
|
||
|
||
> Given reference commit $r$ (optionally empty), let
|
||
> $C'$ be the set of all _first-parent_ commits such that $C'$ = `r..HEAD`
|
||
> ([range spec](http://book.git-scm.com/4_git_treeishes.html)) and let
|
||
> $K$ be the set of all public keys in a given GPG keyring. We must assert
|
||
> that, for each commit $c$ in $C$, there must exist a key $k$ in
|
||
> keyring $K$ such that $k$ is
|
||
> [trusted](https://en.wikipedia.org/wiki/Web_of_trust) and can be used to
|
||
> verify the signature of\ $c$. This assertion is denoted by the function
|
||
> $g$ (GPG) in the following expression: $∀c∈C′ g(c)$.
|
||
|
||
The only difference between this script and the script that checks for a
|
||
signature on each individual commit is that *this script will only check for
|
||
commits on a particular branch* (e.g. `master`). This is important---if we
|
||
commit directly onto master, we want to ensure that the commit is signed (since
|
||
there will be no merge). If we merge _into_ master, a merge commit will be
|
||
created, which we may sign and ignore all commits introduced by the merge. If
|
||
the merge is a fast-forward, a merge commit can be forcefully created with the
|
||
`--no-ff` option to avoid the need to amend each commit with a signature.
|
||
|
||
To demonstrate a script that can valdiate commits for this type of workflow,
|
||
let's first create some changes that would result in a merge:
|
||
|
||
```sh
|
||
$ git checkout -b diverge
|
||
$ echo foo > diverged
|
||
$ git add diverged
|
||
$ git commit -m 'Added content to diverged'
|
||
[diverge cfe7389] Added content to diverged
|
||
1 file changed, 1 insertion(+)
|
||
create mode 100644 diverged
|
||
$ echo foo2 >> diverged
|
||
$ git commit -am 'Added additional content to diverged'
|
||
[diverge 996cf32] Added additional content to diverged
|
||
1 file changed, 1 insertion(+)
|
||
$ git checkout master
|
||
Switched to branch 'master'
|
||
$ echo foo >> foo
|
||
$ git commit -S -am 'Added data to master'
|
||
|
||
You need a passphrase to unlock the secret key for
|
||
user: "Mike Gerwitz (Free Software Developer) <mike@mikegerwitz.com>"
|
||
4096-bit RSA key, ID 8EE30EAB, created 2011-06-16
|
||
|
||
[master 3cbc6d2] Added data to master
|
||
1 file changed, 1 insertion(+)
|
||
$ git merge -S diverge
|
||
|
||
You need a passphrase to unlock the secret key for
|
||
user: "Mike Gerwitz (Free Software Developer) <mike@mikegerwitz.com>"
|
||
4096-bit RSA key, ID 8EE30EAB, created 2011-06-16
|
||
|
||
Merge made by the 'recursive' strategy.
|
||
diverged | 2 ++
|
||
1 file changed, 2 insertions(+)
|
||
create mode 100644 diverged
|
||
```
|
||
|
||
Above, committed in both `master` and a new `diverge` branch in order to ensure
|
||
that the merge would not be a fast-forward (alternatively, we could have used
|
||
the `--no-ff` option of `git merge`). This results in the following (your hashes
|
||
will vary):
|
||
|
||
```
|
||
$ git log --oneline --graph
|
||
* 9307dc5 Merge branch 'diverge'
|
||
|\
|
||
| * 996cf32 Added additional content to diverged
|
||
| * cfe7389 Added content to diverged
|
||
* | 3cbc6d2 Added data to master
|
||
|/
|
||
* f729243 Yet another foo
|
||
* afb1e73 Modified bar
|
||
* f227c90 Added bar
|
||
* 652f9ae Signed off
|
||
* 16ddd46 Added feature X
|
||
* cf43808 Test commit of foo
|
||
```
|
||
|
||
From the above graph, we can see that we are interested in signatures on only
|
||
two of the commits: `3cbc6d2`, which was created directly on `master`, and
|
||
`9307dc5`---the merge commit. The other two commits (`996cf32` and `cfe7389`)
|
||
need not be signed because the signing of the merge commit asserts their
|
||
validity (assuming that the author of the merge was vigilant). But how do we
|
||
ignore those commits?
|
||
|
||
```
|
||
$ git log --oneline --graph --first-parent
|
||
* 9307dc5 Merge branch 'diverge'
|
||
* 3cbc6d2 Added data to master
|
||
* f729243 Yet another foo
|
||
* afb1e73 Modified bar
|
||
* f227c90 Added bar
|
||
* 652f9ae Signed off
|
||
* 16ddd46 Added feature X
|
||
* cf43808 Test commit of foo
|
||
```
|
||
|
||
The above example simply added the `--first-parent` option to `git log`, which
|
||
will display only the first parent commit when encountering a merge commit.
|
||
Importantly, this means that we are left with _only the commits on_ `master` (or
|
||
whatever branch you decide to reference). These are the commits we wish to
|
||
validate.
|
||
|
||
Performing the validation is therefore only a slight modification to the
|
||
original script:
|
||
|
||
```sh
|
||
#!/bin/sh
|
||
#
|
||
# Validate signatures on only direct commits and merge commits for a particular
|
||
# branch (current branch)
|
||
##
|
||
|
||
# if a ref is provided, append range spec to include all children
|
||
chkafter="${1+$1..}"
|
||
|
||
# note: bash users may instead use $'\t'; the echo statement below is a more
|
||
# portable option (-e is unsupported with /bin/sh)
|
||
t=$( echo '\t' )
|
||
|
||
# Check every commit after chkafter (or all commits if chkafter was not
|
||
# provided) for a trusted signature, listing invalid commits. %G? will output
|
||
# "G" if the signature is trusted.
|
||
git log --pretty="format:%H$t%aN$t%s$t%G?" "${chkafter:-HEAD}" --first-parent \
|
||
| grep -v "${t}G$"
|
||
|
||
# grep will exit with a non-zero status if no matches are found, which we
|
||
# consider a success, so invert it
|
||
[ $? -gt 0 ]
|
||
```
|
||
|
||
If you run the above script using the branch setup provided above, then you will
|
||
find that neither of the commits made in the `diverge` branch are listed in the
|
||
output. Since the merge commit itself is signed, it is also omitted from the
|
||
output (leaving us with only the unsigned commit mentioned in the previous
|
||
sections). To demonstrate what will happen if the merge commit is _not_ signed,
|
||
we can amend it as follows (omitting the `-S` option):
|
||
|
||
```sh
|
||
$ git commit --amend
|
||
[master 9ee66e9] Merge branch 'diverge'
|
||
$ ./signchk
|
||
9ee66e900265d82f5389e403a894e8d06830e463 Mike Gerwitz Merge branch 'diverge'
|
||
f72924356896ab95a542c495b796555d016cbddd Mike Gerwitz Yet another foo
|
||
$ echo $?
|
||
1
|
||
```
|
||
|
||
The merge commit is then listed, requiring a valid signature. ^[If you wish to
|
||
ensure that this signature is trusted as well, see [the section on verifying
|
||
commits within a web of trust](#script-trust).]
|
||
|
||
|
||
## Summary
|
||
|
||
* [Be careful of who you trust.](#trust) Is your repository safe from
|
||
harm/exploitation on your PC? What about the PCs of those whom you trust?
|
||
** [Your host is not necessarily secure.](#trust-host) Be wary of using
|
||
remotely hosted repositories as your primary hub.
|
||
* [Using GPG to sign your commits](#trust-ensure) can help to assert your
|
||
identity, helping to protect your reputation from impostors.
|
||
* For large merges, you must develop a security practice that works best for
|
||
your particular project. Specifically, you may choose to [sign each
|
||
individual commit](#merge-3) introduced by the merge, [sign only the merge
|
||
commit](#merge-2), or [squash all commits](#merge-1) and sign the
|
||
resulting commit.
|
||
* If you have an existing repository, there is [little need to go rewriting
|
||
history to mass-sign commits](#commit-history).
|
||
* Once you have determined the security policy best for your project, you may
|
||
[automate signature verification](#automate) to ensure that no unauthorized
|
||
commits sneak into your repository.
|