thoughts/post/2019-02-18-ghcq-exceptional...

974 lines
42 KiB
Markdown
Raw Normal View History

# GHCQ's "Exceptional Access", End-To-End Encryption, Decentralization, and Reproducible Builds
Late last November,
Ian Levy and Crispin Robinson of the GHCQ (the British intelligence
agency) published a proposal for intercepting end-to-end encrypted
communications,
entitled ["Principles for a More Informed Exceptional
Access Debate"][proposal].
Since then,
there have been a series of notable rebuttals to this proposal
arguing why this system would fail in practice and why it should be
rejected.
Completely absent from these responses, however,
is any mention of existing practices that would prohibit this attack
outright---the
combination of free/libre software, reproducible builds, and
decentralized or distributed services.
[proposal]: https://www.lawfareblog.com/principles-more-informed-exceptional-access-debate
<!-- more -->
This proposal is just the latest episode in the [crypto
wars][crypto-wars]:
Users need secure communications to protect their privacy and defend
against attackers,
but law enforcement and governments argue that this leaves them in
the dark.
But this one's a bit different.
The proposal states:
[crypto-wars]: https://en.wikipedia.org/wiki/Crypto_wars
> The U.K. government strongly supports commodity encryption. The Director
> of GCHQ has publicly stated that we have no intention of undermining the
> security of the commodity services that billions of people depend upon
> and, in August, the U.K. signed up to the Five Country statement on access
> to evidence and encryption, committing us to support strong encryption
> while seeking access to data. [...] We believe these U.K. principles will
> enable solutions that provide for responsible law enforcement access with
> service provider assistance without undermining user privacy or security.
The suggestions in the article are a pleasant deviation from past proposals,
such as [key escrow schemes][key-escrow-eff];
in fact,
it categorically denounces such schemes:
[key-escrow-eff]: https://www.eff.org/deeplinks/2015/04/clipper-chips-birthday-looking-back-22-years-key-escrow-failures
> There is no single solution to enable all lawful access, but we definitely
> dont want governments to have access to a global key that can unlock any
> users data. Government controlled global key escrow systems would be a
> catastrophically dumb solution in these cases.
So how do the authors propose intercepting communications?
They suggest inserting a third party---a
"ghost", as others have been calling it---into
the conversation.
To understand the implications of adding a third party to an
end-to-end (E2E) encrypted protocol,
you have to understand how end-to-end encryption usually works in
practice.^[
For another perspective,
see [Matthew Green's overview][green-ghost] in his response to
the GHCQ proposal.]
[green-ghost]: https://blog.cryptographyengineering.com/2018/12/17/on-ghost-users-and-messaging-backdoors/
## Undermining End-to-End Encrypted Communication Systems
Let's say that three users named Alice, Bob, and Carol wish to communicate
with one-another privately.
There are many ways to accomplish this,
but for the sake of this discussion,
we need to choose a protocol that attempts to fit into the model that
Levy and Robinson had in mind.
Alice and the others will make use of a centralized messaging service that
relays messages on behalf of users.[^centralized]
Centralized services are commonplace and include popular services like
Signal, WhatsApp, Facebook Messenger, iMessage, and many others.
They all work in slightly different ways,
so to simplify this analysis,
I'm going to talk about an imaginary messaging service called FooRelay.
[^centralized]: See section [The Problem With Centralized
Services](#centralized-services).
FooRelay offers a directory service that allows participants to find
one-another by name or pseudonym.
The directory will let Alice know if Bob and Carol are online.
FooRelay also offers private chat rooms supporting two or more participants.
Alice, Bob, and Carol don't want anyone else to know what they are
saying---that
includes FooRelay's servers,
their Internet Service Providers (ISPs),
their employers,
their governments,
or whomever else may be monitoring the network that any of them are
communicating over.[^threat-model]
Fortunately for them,
FooRelay makes use of _end-to-end encryption_.[^primitive-e2e]
[^threat-model]: The process of determining potential threats and
adversaries is called [threat modeling][].
Since this article is about a proposal from a government spy agency,
it's also worth noting that global passive adversaries like the GHCQ and
NSA have the ability to monitor and store global traffic with the hopes
of later decrypting it.
[I have written about pre-Snowden revelations][national-uproar],
and [the EFF has compiled a bunch of information on NSA spying][eff-nsa].
[threat modeling]: https://en.wikipedia.org/wiki/Threat_model
[national-uproar]: /2013/06/national-uproar-a-comprehensive-overview-of-the-nsa-leaks-and-revelations
[eff-nsa]: http://eff.org/nsa-spying
[^primitive-e2e]: Here I will describe a fairly elementary public-key
end-to-end encrypted protocol that omits many important features
(most notably, forward secrecy).
For detailed information on a modern and well-regarded key exchange
protocol,
see [X3DH][] (Extended Triple Diffie-Hellman),
which is employed by Signal.
Following a key agreement,
the [Double Ratchet][] algorithm is widely employed for forward
secrecy even in the event of a compromised session key.
[X3DH]: https://signal.org/docs/specifications/x3dh/
[Double Ratchet]: https://www.signal.org/docs/specifications/doubleratchet/
Alice, Bob, and Carol each hold secret encryption keys known only to
them---their
_private keys_,
which are generated for them automatically by the FooRelay client
software running on their systems.
These keys can be used to _decrypt_ messages sent to them,
and can be used to _sign_ messages to assert their authenticity.
But these private keys must never be divulged to others,
including FooRelay's servers.
Instead,
each private key has a _public key_ paired with it.
The public key can be used to _encrypt_ messages that can only be decrypted
using the associated private key.[^pke]
Alice, Bob, and Carol each publish their public keys into FooRelay's
directory so that others may discover and use them.
When Alice wants to start a chat with Bob and Carol,
she can ask FooRelay to provide their public keys from the directory.
[^pke]: This is called [_public-key cryptography_][public-key-crypto]
(or _asymmetric encryption_).
[public-key-crypto]: https://en.wikipedia.org/wiki/Public-key_cryptography
But making the public keys available in a directory is only part of the
problem---how
do Alice, Bob, and Carol know that the keys published to the directory
are actually associated with the _real_ Alice, Bob, and Carol?^[
This topic is known as [_key distribution_][key-distribution].]
**This is the first opportunity to spy**,
if FooRelay is poorly designed.
[key-distribution]: https://en.wikipedia.org/wiki/Key_distribution
As stated by the proposal:
> Its relatively easy for a service provider to silently add a law
> enforcement participant to a group chat or call. The service provider
> usually controls the identity system and so really decides whos who and
> which devices are involved - theyre usually involved in introducing the
> parties to a chat or call. You end up with everything still being
> end-to-end encrypted, but theres an extra end on this particular
> communication.
### Man-in-the-Middle
Let's start by assuming a pretty grim scenario.
This is not quite the plan of attack that Levy and Robinson had in mind,
but it's important to understand why it would not work in practice.
The FooRelay client software running on Alice's computer retrieves Bob's
public key from the identity service and initiates a chat.
FooRelay's server creates a new private chat room to accommodate the
request and adds two initial participants---Alice and Bob.
The FooRelay client then generates an invitation message containing
the identifier of the new room,
signs it using Alice's private key to prove that it was from Alice,
and sends it off to FooRelay's servers.
FooRelay's server verifies Alice's signature to make sure that she is
authorized to invite someone to the room,
and then sends the invitation off to Bob.[^whatsapp-group-chat]
[^whatsapp-group-chat]: As it turns out,
getting invitations right can be difficult too.
[WhatsApp had a vulnerability that allowed for users to insert themselves
into group conversations][whatsapp-vuln] because it didn't implement a
similar protocol.
A better defense would be for Bob to publish the invitation from Alice
when he joins the room,
allowing anyone else in the room (like Carol) to verify that he was
invited by someone authorized to do.
Only after verifying the invitation's signature would Carol decide to
encrypt messages to him.
[whatsapp-vuln]: https://techcrunch.com/2018/01/10/security-researchers-flag-invite-bug-in-whatsapp-group-chats/
Bob is also running the FooRelay client on his computer.
It receives the invitation from Alice,
looks up her public key from the identity service,
and uses it to verify the signature on the invitation to make sure it
originated from Alice.
If the signature checks out,
FooRelay asks Bob if he'd like to join the chat.
Bob accepts.
Alice enters a message into the FooRelay client to send to the chat room.
But remember,
Alice does not want the FooRelay server to know what message is being
sent.
So the FooRelay client on Alice's computer encrypts the message using Bob's
pubic key,
signs it using Alice's private key to assert that it was from her,
and sends it.
The FooRelay server---and
anyone else watching---see
junk data.
But Bob,
upon receiving the message and verifying its signature,
is able to decrypt and read it using his private key.[^sending]
[^sending]: This is omitting many very important details that are necessary
for a proper implementation.
While this portrayal isn't necessarily dishonest at a high level,
there is a lot more that goes into sending a message.
See information on the [Double Ratchet][] algorithm for information on
one robust way to handle this exchange.
**Now let's explore how to intercept communications.**
Enter Mallory.
Mallory works for the GHCQ.
FooRelay has been provided with a wiretap order against Carol.
Alice wants to bring Carol into the conversation with her and Bob,
so she requests Carol's key from the identity service.
FooRelay's identity service,
subject to the wiretap order,
doesn't return Carol's public key;
instead, it returns Mallory's,
_who is pretending to be Carol_.
Alice sends the invitation to Mallory
(again, thinking he's Carol),
and the fake Carol (Mallory) joins the room.
Now when sending a message,
Alice encrypts using both Bob and Mallory's public keys,
so both of them can read it.
But when Alice and Carol meet up tomorrow for lunch,
it will be pretty clear that Carol was not part of the conversation.
So Mallory is clever---he
has FooRelay provide him with Carol's _real_ public key.
When Alice sends Mallory an invitation to the room,
Mallory instructs FooRelay to create a covert _fake_ chat room with the
same identifier.
Mallory then sends an invitation to Carol to that new chat room,
_pretending to be Alice_.
But Mallory doesn't have access to Alice's private key,
and so cannot sign it as her;
he instead signs it using his own private key.
FooRelay on Carol's computer receives the invitation,
which claims to be from Alice
(but is really from Mallory).
When it attempts to retrieve the key from the identity service,
rather than receiving Alice's key,
_the identity service sends back Mallory's_.
Now Mallory is impersonating _both_ Alice and Carol.
The signature checks out,
and Carol joins the covert chat.
FooRelay---still
under the wiretap order---announces
that Alice and Bob are both in the room,
even though they aren't.
Now,
when Mallory receives a message from Alice that is intended for Carol,
he encrypts it using Carol's public key,
signs it using his own,
and sends it off to Carol.
Since Carol's FooRelay client thinks that Mallory's key is Alice's
(remember the invitation?),
the signature checks out and she happily decrypts the message and
reads it.
If Bob sends a message,
we repeat the same public key lookup procedure---FooRelay's identity
service lies and provides Mallory's key instead,
and Mallory proxies the message all the same.^[
Of course,
it may be suspicious if Alice and Bob both have the same key,
so maybe Mallory has multiple keys.
Or maybe the FooRelay software just doesn't care.]
This is a [man-in-the-middle (MITM)][mitm] attack.
But notice how **the conversation is still fully end-to-end encrypted**,
between each of Alice, Bob, Carol, and Mallory.
[mitm]: https://en.wikipedia.org/wiki/Man-in-the-middle_attack
Why is this attack possible?
Because FooRelay has not offered any insight into the identity
process---there
is no _authentication_ procedure.
Blind trust is placed in the directory,
which in this case has been compromised.
#### Mutual Authentication
If the FooRelay client allowed Alice, Bob, and Carol to inspect each others'
public keys by displaying a [public key "fingerprint"][fingerprint],
then that would have immediately opened up the possibility for them to
discover that something odd was going on.
For example,
if Alice and Carol had previously communicated before Mallory was
involved,
then maybe they would notice that the fingerprint changed.
If they met _after_ the fact,
they would notice that the fingerprint Alice had for Carol was not the
fingerprint that Carol had for _herself_.
Maybe they would notice---perhaps
by communicating in person---that
the fingerprint that Alice associated with Carol and the fingerprint that
Carol associated with Alice were in fact the same (that is, Mallory's).
[fingerprint]: https://en.wikipedia.org/wiki/Key_fingerprint
To mitigate the first issue,
Mallory would have to MITM communications from the moment that Carol first
signed up for FooRelay,
and permanently thereafter.
The second could not be mitigated unless Mallory compromised Carol's device,
or FooRelay cooperated with Mallory to plant a defective FooRelay client
on Carol's device.
To mitigate the third,
maybe Mallory would use separate keys.
But if Alice, Bob, or Carol ever compared public keys in person with someone
else that was outside of their group of three,
then they would notice that the fingerprints did not match.
So FooRelay would have to always provide the wrong key to _everyone_ trying
to communicate with Carol,
and for _everyone_ Carol tried to communicate with,
in perpetuity---an
everlasting wiretap.
This issue of mutual authentication is another complex topic that is very
difficult to solve in a manner that is convenient for users.[^wot]
For example,
Alice, Bob, and Carol could all meet in person and verify that
one-anothers' fingerprints look correct.
Or they could post their fingerprints to something outside of FooRelay's
control,
like social media.
This is the ["safety number"][safety-number] concept that Signal employs.
[^wot]: One distributed model of assoicating a key with an owner is PGP's
[Web of Trust][wot],
which has been in use since the 1990s.
While it does enjoy use in certain communities,
it has failed to take off with average users due to the [complexities of
implementing the model properly][debian-keysign].
PGP's author also came up with short authentication string (SAS)
authentication protocol for VoIP systems called [ZRTP][],
but it relies on users being able to identify the authenticity of
one-anothers' voices,
a luxury that may be undermined in the near future by speech
synthesis systems [trained to reproduce real voices][ss-deep].
[safety-number]: https://signal.org/blog/safety-number-updates/
[wot]: https://en.wikipedia.org/wiki/Web_of_trust
[debian-keysign]: https://wiki.debian.org/Keysigning/
[zrtp]: https://en.wikipedia.org/wiki/ZRTP
[ss-deep]: https://en.wikipedia.org/wiki/Speech_synthesis#Deep_learning
FooRelay could also implement a [trust-on-first-use (TOFU)][tofu]
policy---the
client software would remember the last public key that it saw for a
user,
and if that key ever changed,
then a prominent warning would be displayed.[^ssh-tofu]
For example,
if Alice communicates once with the real Carol,
the TOFU policy in the FooRelay client would record that real public key.
Then,
when Mallory tries to MITM the conversation,
Alice's FooRelay client would say:
"Hold up; the key changed! Something is wrong!"
[tofu]: https://en.wikipedia.org/wiki/Trust_on_first_use
[^ssh-tofu]: SSH users, for example, may be familiar with the almost-violent
warning when the server fingerprint changes.
Server fingerprints are stored in `\~/.ssh/known_hosts` the first time they
are contacted,
and those fingerprints are used for verification on all subsequent
connection attempts.
In any case,
let's assume that FooRelay's cooperation in serving up the wrong public
key is no longer sufficient because of these mitigations.
What does Mallory do without the ability to MITM?
No respectable communication software should be vulnerable to this sort of
attack.
Knowing this,
Levy and Robinson had a different type of attack in mind.
### A Ghost in the Room
Back when most people used land lines for communication via telephone,
wiretapping was pretty easy.
Conversations were transmitted in an unencrypted,
analog form;
anyone could listen in on someone else's conversation if they had some
elementary technical know-how and knew where to apply it.
By severing or exposing the line at any point,
an eavesdropper could attach [alligator clips][]---or
"crocodile clips", if you're east of the Atlantic---to
route the analog signal to another phone or listening device.
[alligator clips]: https://en.wikipedia.org/wiki/Crocodile_clip
Levy and Robinson try to apply this same concept as a metaphor for Internet
communications,
presumably in an effort to downplay its significance.
But the concepts are very different.
Continuing from the previous quote of Levy and Robinson's proposal:
> This sort of solution seems to be no more intrusive than the virtual
> crocodile clips that our democratically elected representatives and
> judiciary authorise today in traditional voice intercept solutions and
> certainly doesnt give any government power they shouldnt have.
>
> Were not talking about weakening encryption or defeating the end-to-end
> nature of the service. In a solution like this, were normally talking
> about suppressing a notification on a targets device, and only on the
> device of the target and possibly those they communicate with. Thats a
> very different proposition to discuss and you dont even have to touch the
> encryption.
This statement is disingenuous.
We can implement the quoted suggestion in two different ways:
The first is precisely the situation that was just previously
described---allow
MITM and remain ignorant about it.
The second way is to have the FooRelay server _actually invite Mallory_ to
the chat room,
but _have the FooRelay client hide him from other participants_.
**He would be a ghost in the room;**
nobody would see him,
but Alice, Bob, and Carol's FooRelay software would each surreptitiously
encrypt to him using his public key,
as a third recipient.
Sure,
the actual ciphers used to encrypt the communications are not weakened.
Sure,
it is still end-to-end encrypted.
But this is _nothing_ like alligator clips on a phone line---instead,
_an anti-feature has been built into the software_.
As the EFF notes,
[this is just a backdoor by another name][eff-ghost].
[eff-ghost]: https://www.eff.org/deeplinks/2019/01/give-ghost-backdoor-another-name
If software has to be modified to implement this backdoor,
then it has to either be done for _every_ user of FooRelay,
or individual users have to be targeted to install a malicious version
of the program.
If either of these things are possible,
then _everyone_ is made less secure.
What if a malicious actor figures out how to exploit either of those
mechanisms for their own purposes?
Or what if someone tricks FooRelay into thinking they're from the GHCQ?
And since this is a backdoor in the software running on the user's computer,
it is very difficult to be covert.
Nate Cardozo and Seth Schoen of the Electronic Frontier Foundation
[analyze various ways to detect ghosts][detect-ghosts],
which would tip Alice, Bob, and Carol off that Mallory is watching them.
[detect-ghosts]: https://www.lawfareblog.com/detecting-ghosts-reverse-engineering-who-ya-gonna-call
This is bad,
and everyone knows it.
The proposal is a non-starter.
But this shouldn't be the end of the conversation---there
is a much more fundamental issue is at play which has received no
attention from the mainstream responses.
## Betrayed By Software {#betrayed}
All of these mainstream discussions make an implicit assumption:
_that users are not in control of the software running on their systems_.
The [detection methods][detect-ghosts] are discussed in terms of binary
profiling and side-channels.
[The GHCQ's proposal itself][proposal] fundamentally relies on the software
being modified in ways that are a disservice to the user---adding
a backdoor that surreptitiously exfiltrates messages to a third
party (Mallory) without the consent of other participants (Alice, Bob,
or Carol).
When a user has full control over their software---when
they have the freedom to use, study, modify, and share it as they
please---we
call it [_free software_][free-sw].
If FooRelay's client were free software,
then Alice, Bob, and Carol would all have the right to inspect it to make
sure no nasty backdoors were added,[^proprietary-malware]
or ask someone else to inspect it for them.
Or maybe they could depend on the fact that many other people are
watching---essentially
anyone in the world could at any moment look at FooRelay's client source
code.
This helps to keep FooRelay honest---if
they _did_ implement a feature that suppresses notifications as Levy and
Robinson suggest,
then they would have done so in plain sight of everyone,
and they would immediately lose the trust of their users.
[free-sw]: https://www.gnu.org/philosophy/free-sw.en.html
[^proprietary-malware]: Unfortunately,
[proprietary (non-free) software is often malware][proprietary-malware],
hiding things that work in the interests of its developers but
_against_ the interests of its users.
[proprietary-malware]: https://www.gnu.org/philosophy/proprietary.html
FooRelay could try make the change in a plausibly deniable way---to
make the change look like a bug---but
then _anyone with sufficient skill in the community could immediately fix
it_ and issue a patch.
That patch could be immediately circulated and adopted by other users
without the blessing of FooRelay itself.
If FooRelay didn't implement that patch,
then users would [_fork_][software-fork] it,
making their own version and ditching FooRelay entirely.
Forking is a commonly exercised and essential right in the free
software community.
[software-fork]: https://en.wikipedia.org/wiki/Software_fork
The popular program Signal is free software.[^moxie-signal]
The [OMEMO specification][omemo]---which
implements many of the encryption standards that were developed by
Signal---is
also [implemented by multiple free software projects][omemo-yet],
some of which include [Pidgin][] (GNU/Linux, Windows, Mac OSX),
[Conversations][] (Android),
[ChatSecure][] (iOS),
and [Gajim][] (GNU/Linux, Windows).
[omemo]: https://conversations.im/omemo/
[omemo-yet]: https://omemo.top/
[pidgin]: https://pidgin.im/
[conversations]: https://conversations.im/
[chatsecure]: https://chatsecure.org/
[gajim]: https://gajim.org/
[^moxie-signal]: Unfortunately,
its author has caused some friction in the free software community by
[strongly discouraging forks and saying they are unwelcome to connect to
Signal's servers][moxie-fdroid].
This also relates to the issue of centralization,
which is the topic of the next section;
Moxie [explains in a blog post why he disagrees with a federated
Signal][moxie-federation].
[moxie-fdroid]: https://github.com/LibreSignal/LibreSignal/issues/37
[moxie-federation]: https://signal.org/blog/the-ecosystem-is-moving/
If a program does not respect users' freedoms,
we call it _non-free_, or _proprietary_.
**Most of the popular chat programs today are non-free**:
Apple iMessage, Facebook Messenger, and WhatsApp are all examples of
programs that keep secrets from their users.
Those communities are unable to inspect the program,
or modify it to remove anti-features;
they are at the mercy of the companies that write the software.
For example,
a recent [bug in Apple's FaceTime][facetime-vuln] left users
vulnerable to surveillance by other FaceTime users.
FaceTime likely has hundreds of thousands of users.
If it were free software and only a tiny fraction of those users actually
inspected the source code,
it's possible that somebody would have noticed and maybe even fixed the
bug before it was exploited.[^bugs-shallow]
Further,
after it _was_ discovered,
users had no choice but to wait for Apple themselves to issue a fix,
which didn't come until a week later.
The person who did discover it [tried to contact Apple with no
success][bad-apple],
and the world only found out about the issue when a video demoing the
exploit went viral eight days after its initial discovery.
This differs from free software communities,
where bugs are typically posted to a public mailing list or bug tracker,
where anybody in the community can both view and immediately act upon
it.[^embargo]
[facetime-vuln]: https://9to5mac.com/2019/01/28/facetime-bug-hear-audio/
[bad-apple]: https://www.wsj.com/articles/teenager-and-his-mom-tried-to-warn-apple-of-facetime-bug-11548783393
[^bugs-shallow]: This is often cited as [Linus's Law][linus-law],
which states that "given enough eyeballs, all bugs are shallow".
While this may be true,
that is certainly not always the case.
It is a common argument in support of open source,
[which covers the same class of software][floss-class] as free software.
However,
it's important not to fixate too much on this argument---it
[misses the point of free software][oss-misses-point],
and is a shallow promise,
since open source software is not always superior in technical
quality to proprietary software.
[linus-law]: https://en.wikipedia.org/wiki/Linus's_Law
[floss-class]: https://www.gnu.org/philosophy/free-open-overlap.html
[oss-misses-point]: https://www.gnu.org/philosophy/open-source-misses-the-point.html
[^embargo]: Sometimes an exception is made for severe security
vulnerabilities.
For example,
the [`linux-distros` mailing list][linux-distros] is used to coordinate
security releases amongst GNU/Linux distributions,
imposing an embargo period.
This practice ensures that exploits are not made publicly available to
malicious actors before users are protected.
[linux-distros]: https://oss-security.openwall.org/wiki/mailing-lists/distros
But free software alone isn't enough.
How does Alice know that she _actually_ has the source code to the
program that she is running?
### Reproducibility and Corresponding Source Code {#reproducibility}
The source code to FooRelay can't provide Alice with any security
assurances unless she can be confident that it is _actually_
the source code to the binary running on her machine.
For example,
let's say that FooRelay has agreed to cooperate with the GHCQ to implement
ghosts by introducing a backdoor into the FooRelay client.
But since FooRelay is a free software project,
anyone can inspect it.
Rather than tipping off the community by publishing the _actual_ source
code,
_they publish the source code for a version that does not have the
backdoor_.
But when Alice downloads the compiled (binary) program from FooRelay,
she receives a backdoored version.
To mitigate this,
**Alice wants to be sure that she has the _corresponding source code_**.
One way for Alice to be confident is for her to compile the FooRelay client
herself from the source code.
But not everybody has the technical ability or desire to do
this.[^bootstrap]
Most users are instead going to download binaries from their operating
system's software repositories,
or from FooRelay's website,
or maybe even from other convenient third parties.
How can _all_ users be confident that the FooRelay client they download
actually corresponds to the source code that has been published and vetted
by the community?
[^bootstrap]: And then you have the issue of ensuring that you have the
corresponding source to the rest of your system so that it does not
[alter the behavior of the produced binary][trusting-trust].
System-wide reproducibility is the topic of [_bootstrappable
builds_][bootstrappable-builds].
[trusting-trust]: https://www.archive.ece.cmu.edu/\~ganger/712.fall02/papers/p761-thompson.pdf
[bootstrappable-builds]: http://bootstrappable.org/http://bootstrappable.org/
[_Reproducible builds_][reproducible-builds] are required to solve this
problem.
When FooRelay is built,
it is done so in a manner that can be completely reproduced by others.
Bit-for-bit reproducibility means that,
if two people on different systems follow the same instructions for
building a program in similar enough environments,
every single bit of the resulting binary will match---
they will be exact copies of one-another.[^unreproducible]
[reproducible-builds]: http://reproducible-builds.org/
[^unreproducible]: Additional effort often has to be put into building
reproducibly because a build may produce timestamps corresponding to the
time of the build,
information specific to the environment in which the program is being
built,
and various other sources of nondeterminism.
This has powerful consequences.
Alice no longer has to build the program herself---she
can trust that others have checked FooRelay's work.
FooRelay wouldn't dare try to distribute a tainted binary now,
since the community could trivially detect it.
Further,
Alice, Bob, and Carol could all verify that they have the _exact same
verison_ of the FooRelay client,
and _all_ be confident that it was compiled from the same source code
that was published.[^verify-checksum]
They could even accept FooRelay from complete strangers and _still_ be
confident that it was compiled from the published source code!
[^verify-checksum]: This verification can be done trivially by verifiying
the _checksum_ of a program or distribution archive.
For example,
running `sha512sum foorelay` on a GNU/Linux system would output a _hash_
of the contents of the file `foorelay`.
Alice, Bob, and Carol could then compare this value if they are all
running the same operating system and CPU architecture.
Otherwise they can compare it published checksums,
or with others they trust.
Reproducible builds have made a lot of progress in recent years.
As of February 2019,
for example,
[over 93% of all packages on Debian GNU/Linux are reproducible on the
`amd64` architecture][debian-reproducible],
which includes the aforementioned Pidgin and Gajim projects that
implement OMEMO.
[Signal also offers a reproducible Android build][signal-reproducible].
[debian-reproducible]: https://tests.reproducible-builds.org/debian/reproducible.html
[signal-reproducible]: https://signal.org/blog/reproducible-android/
So let's go back to Levy and Robinson's proposal.
How do you implement a ghost in FooRelay where its client source code is
publicly available and its builds are reproducible?
You don't,
unless you can hide the implementation in a plausibly-deniable way and
write it off as a bug.
But anyone that finds that "bug" will fix it and send FooRelay a patch,
which FooRelay would have no choice but to accept unless it wishes to lose
community trust (and provoke a fork).
Mallory could instead target specific users and compromise them
individually,
but this goes beyond the original proposal;
if Mallory can cause Alice, Bob, or Carol to run whatever program he
pleases,
then he doesn't need to be a ghost---he
can just intercept communications _before they are encrypted_.
Therefore,
reproducible builds---if
done correctly---make
Levy and Robinson's attack risky and impractical long-term.
But there is still one weak link---the
fact that Alice, Bob, and Carol are communicating with FooRelay's servers
at all means that Mallory still has the ability to target them by coercing
FooRelay to cooperate with him.
## The Problem With Centralized Services {#centralized-services}
The final issue I want to discuss is that of centralized services.
A centralized service is one where all users communicate through one central
authority---all
messages go through the same servers.
The hypothetical FooRelay is centralized.
Signal, iMessage, Facebook Messenger, WhatsApp, and many other popular chat
services are centralized.
And while this offers certain conveniences for users,
it also makes certain types of surveillance trivial to perform,
as they are bountiful targets for attackers, governments, and law
enforcement.
But services don't have to be centralized.
_Decentralized_ services contain many separate servers to
which users connect,
and those servers can communicate with one-another.
The term _"federated"_ is also used,
most often when describing social networks.[^decentralize-term]
Consider email.
Let's say that Alice has an email address `alice@foo.mail` and Bob has an
email address `bob@quux.mail`.
Alice uses `foo.mail` as her provider,
but Bob uses `quux.mail`.
Despite this,
Alice and Bob can still communicate with one-another.
This works because the `foo.mail` and `quux.mail` mailservers send and
receive mail to and from one-another.
[^decentralize-term]: While the term "decentralized" has been around for
some time,
there's not really a solid agreed-upon definition for "federated".
[Some people use the terms interchangeably][uu-federated].
The term "federation" is frequently used when talking about social
networking.
[uu-federated]: http://networkcultures.org/unlikeus/resources/articles/what-is-a-federated-network/
[XMPP][]---the protocol on which OMEMO is based---is
a federated protocol.
Users can choose to sign up with existing XMPP servers,
or they can even run their own personal servers.[^me-prosody]
Federation is also the subject of the [ActivityPub][] social networking
protocol,
which is implemented by projects like [Mastodon][], [NextCloud][], and
[PeerTube][].
[Riot][] is an implementation of the [Matrix][] protocol for real-time,
decentralized, end-to-end encrypted communication including chat, voice,
video, file sharing, and more.
All of these things make Mallory's job much more difficult---
instead of being able to go to a handful of popular services like
FooRelay, Signal, WhatsApp, iMessage, Facebook Messenger, and others,
Mallory has to go to potentially _thousands_ of server operators and ask
them to cooperate.[^risk-popular]
[^me-prosody]: I run my own [Prosody][] server,
for example,
which supports OMEMO.
[^risk-popular]: Of course,
there's always the risk of a few small instances becoming very
popular,
which once again makes Mallory's job easier.
[xmpp]: https://en.wikipedia.org/wiki/XMPP
[prosody]: https://prosody.im/
[activitypub]: https://www.w3.org/TR/activitypub/
[mastodon]: https://joinmastodon.org/
[nextcloud]: http://nextcloud.org/
[peertube]: https://joinpeertube.org/
[riot]: https://about.riot.im/
[matrix]: https://matrix.org/docs/guides/faq
[_Peer-to-peer (P2P)_][p2p] (or _distributed_) services forego any sort of
central server and users instead communicate directly with
one-another.[^dht]
In this case,
Mallory has no server operator to go to;
Levy and Robinson's proposal is ineffective in this
environment.[^excuse-me]
[Tox][] is an end-to-end encrypted P2P instant messaging program.
[GNU Jami][jami] is an end-to-end encrypted P2P system with text, audio, and
video support.
Another example of a different type of P2P software is Bittorrent,
which is a very popular filesharing protocol.
[IPFS][] is a peer-to-peer Web.
[^excuse-me]: "Excuse me, kind sir/madam,
may I please have your cooperation in spying on your
conversations?"
Another benefit of distributed systems is that they help to
evade censorship,
since no single server can be shut down to prohibit speech.
[^dht]: Though some P2P services offer discovery services.
For example,
[GNU Jami][jami] offers a distributed identity service using
[_distributed hash tables_][dht] (DHTs).
Bittorrent uses DHTs for its trackers.
[p2p]: https://en.wikipedia.org/wiki/Peer-to-peer
[tox]: https://tox.chat/
[jami]: https://jami.net/
[ipfs]: https://ipfs.io/
[dht]: https://en.wikipedia.org/wiki/Distributed_hash_table
_Decentralization puts users in control._
Users have a _choice_ of who to entrust their data and communications with,
or can choose to trust no one and self-host.[^metadata-leak]
Alice, Bob, and Carol may have different threat models---maybe
Carol doesn't want to trust FooRelay.
Maybe Alice, Bob, and Carol can't agree at _all_ on a host.
Nor should they have to.
[^metadata-leak]: Though it is important to understand what sort of data are
leaked (including metadata) in decentralized and distributed systems.
When you send a message in a decentralized system,
that post is being broadcast to many individual servers,
increasing the surface area for Mallory to inspect those data.
If there are a couple popular servers that host the majority of users,
Mallory can also just target those servers.
For example,
even if you self-host your email,
if any of your recipients use GMail,
then Google still has a copy of your message.
Self-hosting has another benefit: it helps to [put users in control of their
own computing][saass].[^online-freedom]
Not only do they have control over their own data,
but they also have full control over what the service does on their
behalf.
In the previous section,
I mentioned how free software helps to keep FooRelay honest.
What if FooRelay's _server software_ were _also_ free software?
If Alice can self-host FooRelay's server software and [doesn't like how
FooRelay implements their group chat][whatsapp-vuln],
for example,
she is free to change it.
If Mallory forces FooRelay to implement a feature on their server to allow
him to be added to group chats,
the community may find that as well and Alice can remove that
anti-feature from her self-hosted version.
[^online-freedom]: I go into more information on the problems with modern
software on the web in [my LibrePlanet 2016 talk "Restore Online
Freedom!"][rof].
[saass]: http://www.gnu.org/philosophy/who-does-that-server-really-serve.html
[rof]: https://mikegerwitz.com/talks#online-freedom
## Please Continue Debating
This article ended up being significantly longer and more substantive than I
had originally set out to write.
I hope that it has provided useful information and perspective that
was missing from many of the existing discussions,
and I hope that I have provided enough resources for further research.
The prominent responses to which I referred (some of which were already
referenced above) are analyses by
[Susan Landau][landau],
[Matthew Green][green-ghost],
[Bruce Schneier][schneier],
[Nate Cardozo and Seth Schoen of the EFF][detect-ghosts],
and [another by Nate Cardozo][eff-ghost].
There are surely others,
but these were the ones that motivated this article.
It is important to keep these encryption debates alive.
The crypto wars are far from over.
We must ensure that we provide users with the tools and information
necessary to defend themselves and one-another---tools
and practices that are immune from government interference unless they
themselves become illegal.
What a grim and dangerous world that would be.
I'm most concerned by the lack of debate from community leaders about the
issues of [software freedom](#betrayed),
[reproducibility](#reproducibility), and
[decentralization](#centralized-services).
These are essential topics that I feel must be encouraged if we are to
ensure the [safety and security][sapsf] of people everywhere.[^disagree]
We need more people talking about them!
If you found these arguments convincing,
I would appreciate your help in spreading the word.
If you didn't,
please reach out to me and tell me why;
I would very much like to hear and understand your perspective.
[landau]: https://www.lawfareblog.com/exceptional-access-devil-details-0
[schneier]: https://www.schneier.com/essays/archives/2019/01/evaluating_the_gchq_.html
[sapsf]: /talks/#sapsf
[^disagree]: But I also know that there are many people that disagree with
me on each of these points!
If that weren't the case,
I wouldn't need to be an activist.