Fully Countering Trusting Trust through Diverse Double-Compiling (DDC)

Fully Countering Trusting Trust through Diverse Double-Compiling (DDC) – Countering Trojan Horse attacks on Compilers

Here’s information about my work to counter the “Trusting Trust” attack.
The “Trusting Trust” attack is an incredibly nasty attack in computer security;
up to now it’s been presumed to be the essential uncounterable attack.
I’ve worried about it for a long time,
essentially since Ken Thompson publicly described it.
After all, if there’s a known attack that cannot
be effectively countered, should we be using computers at all?
Thankfully, I think there is an effective countermeasure, which I have named
“Diverse Double-Compiling” (DDC).

This page notes my
2009 PhD dissertation and its preceding
2005 ACSAC paper, a little about
citing my work, and
detailed data (to duplicate the experiments),
It then has sections on
countering misconceptions,
what about applying this to hardware?,
Software patents and application programmer interface (API) copyrights,
credit where credit is due, and
who’s talking about it?.
We then have a section on
real-world application of DDC, specifically
discussing GNU Mes.
It includes a large section on
some related material.

Fully Countering Trusting Trust through Diverse Double-Compiling
(PDF version,
HTML version,
OpenDocument text version,
Perma.cc link to PDF,
arXiv:1004.5534 of PDF,
GMU Mason Archival
Repository Service (MARS))
is my 2009 PhD dissertation explaining how to counter the
“trusting trust” attack by using the “Diverse Double-Compiling” (DDC)
technique.
This dissertation was accepted by my PhD committee on October 26, 2009.

The video
of my official public defense is also available;
this presentation was given on November 23, 2009, 1-3pm
(podcast/RSS available).
The presentation materials are also available in

PDF and

OpenDocument (ODP)
formats.
The public defense was held at
George Mason University,
Fairfax, Virginia, Innovation Hall, room 105
[location on campus] [Google map].

Here’s the abstract of the dissertation:

An Air Force evaluation of Multics, and Ken Thompson’s Turing award
lecture (“Reflections
on Trusting Trust”), showed that compilers can
be subverted to insert malicious Trojan horses into critical software,
including themselves. If this “trusting trust” attack goes undetected,
even complete analysis of a system’s source code will not find the
malicious code that is running. Previously-known countermeasures have
been grossly inadequate. If this attack cannot be countered, attackers
can quietly subvert entire classes of computer systems, gaining complete
control over financial, infrastructure, military, and/or business
system infrastructures worldwide. This dissertation’s thesis is that
the trusting trust attack can be detected and effectively countered
using the “Diverse Double-Compiling” (DDC) technique, as demonstrated by
(1) a formal proof that DDC can determine if source code and generated
executable code correspond, (2) a demonstration of DDC with four compilers
(a small C compiler, a small Lisp compiler, a small maliciously corrupted
Lisp compiler, and a large industrial-strength C compiler, GCC), and
(3) a description of approaches for applying DDC in various real-world
scenarios. In the DDC technique, source code is compiled twice: once
with a second (trusted) compiler (using the source code of the compiler’s
parent), and then the compiler source code is compiled using the result
of the first compilation. If the result is bit-for-bit identical with
the untrusted executable, then the source code accurately represents
the executable.

The dissertation includes a section explaining how it extends
my previous 2005 ACSAC paper.
The dissertation generalizes the ACSAC paper
(now compilers don’t need to self-parent), includes formal proofs, and
includes demonstrations with GCC (to demonstrate scaleability)
and with a malicious compiler.

If you read the dissertation you should also look at the
dissertation errata
(the errata are trivial and
do not impact the fundamentals of anything in the dissertation).

My thanks go to the committee members, who were very helpful.
A special thanks go to Dr. Ravi Sandhu; I wanted to do a PhD dissertation
that was completely off the beaten path, and he was flexible enough to
let me do it.
He also had some great advice for getting through the process.
Dr. Daniel A. Menascé asked me to demonstrate the approach with a
malicious compiler (which I did).
Dr. Jeff Offutt asked me about its relationship to
N-version programming (so I added material about how this is different
than N-version programming).
Dr. Paul Ammann had some interesting comments about the N-version programming
material; it turns out that he was personally involved in that landmark study!
Dr. Yutao Zhong asked me about T-diagrams (so I added material about why
I did not use them).
Everyone on the committee asked good questions, especially in the
private presentations before the public defense; thank you!

Here’s my 2005 paper, which was formally reviewed and published by ACSAC:

Countering Trusting Trust through Diverse Double-Compiling (DDC),
David A. Wheeler,
Proceedings of the Twenty-First
Annual Computer
Security Applications Conference
(ACSAC), December 5-9, 2005, Tucson, Arizona, pp. 28-40,
Los Alamitos: IEEE Computer Society.
ISBN 0-7695-2461-3, ISSN 1063-9527, IEEE Computer Society Order Number P2461.
If you cannot get that paper from ACSAC, here’s a local copy of
Countering Trusting Trust through Diverse Double-Compiling (DDC) as posted by ACSAC.
You can also get

this alternative PDF of
“Countering Trusting Trust through Diverse Double-Compiling (DDC)”
and

OpenDocument form of
“Countering Trusting Trust through Diverse Double-Compiling (DDC)”.
(I have the rights to publish it here
as well.)

I’m honored to have been accepted by the ACSAC 2005 conference.
They get lots of good submissions, yet in 2005 they rejected 77% of
their submitted papers.
One reason that I submitted to ACSAC is that I believe publication on the
web is absolutely critical for widespread use of a result;
ACSAC has been publishing on the web for a long time now, and is an
open access
conference.

There’s a minor change in notation between the ACSAC paper and the
later dissertation:

Item	ACSAC (2005)	Dissertation (2009)
Trusted compiler	T	c_T
Compiler under test	A	c_A
Parent compiler	–	c_P

I have a presentation based on the ACSAC paper.
I gave the original presentation at ACSAC; I’ve since updated it a little
based on various feedback I’ve received.

You can get the presentation in:

Note: The ACSAC 2005 paper
“Countering Trusting Trust through Diverse Double-Compiling”
has a typo.
In the last paragraph of section 4, just ahead of the figure, it says:
“if c(sA,c(sA,T)), A, and c(sA,T) are identical, …”.
The “c(sA,T)” should be “c(sA,A)”; you can confirm this because the
figure clearly shows “c(sA,A)” not “c(sA,T)”.
My thanks to Ulf Dittmer for pointing this out to me!

If you cite my work,
at least include my middle initial “A.”, and if at all possible
please use “David A. Wheeler”. Please do not cite me as
“David Wheeler” or “D. Wheeler” in any written work
(including electronic media like the Internet).
There are too many David Wheelers,
so it’s like not giving me credit at all.
If you are required by forces outside your control to use initials,
at least use “D. A. Wheeler”.
However, I would really appreciate it if you showed me the
courtesy of using my name as I use it, instead of changing it.
In general, please cite the names that people actually use; please don’t
change them into someone else’s name.
Thanks.
This doesn’t apply to talking in person, of course; usually there
aren’t that many David Wheelers in the room that it’s
confusing :-).

I strongly believe that scientific work must be repeatable.
Sadly, much of the so-called computational sciences are no longer a science,
because it is increasingly not possible to reproduce work.
This problem is no secret; it is discussed in papers such as
“Reproducible Research: Addressing the Need for Data and Code Sharing in Computational Science” by Victoria C. Stodden (Computing in Science & Engineering, 2010).
It’s not just computer science either; there is a widespread
replication crisis in science.
See also the blog post on the paper
Why most of psychology is statistically unfalsifiable.
“False-Positive Psychology: Undisclosed
Flexibility in Data Collection and Analysis
Allows Presenting Anything as Significant” by
Joseph P. Simmons, Leif D. Nelson, and Uri Simonsohn
discusses some of dubious practices
that allow practically anything to be “experimentally proven”.
Science is not the only source of truth, but if you’re going to call it
science, it needs to actually be science.

In contrast, I do provide the information necessary to
reproduce this work.
For the ACSAC paper, see my Tiny C Compiler (tcc) page for
how to duplicate the ACSAC experiment, as well as
other tcc-related work too.
For the PhD dissertation,
see the
separate page on detailed data for the PhD dissertation.
These provide enough information to repeat or extend the experiments.

Some misconceptions seems to be especially hard to shake, so let me
counter them here (as well).

The DDC approach does not assume that two completely
different compilers will produce the same binary output,
given the same input.

I say it in the ACSAC paper, and again in the dissertation,
but somehow it does not sink in, so let me try again.

Both the ACSAC paper and dissertation do not assume that
different compilers produce equal results.
In fact, both specifically state
that different compilers normally produce different results.
In fact, as noted in the paper, it’s an improvement
if the trusted compiler generates code for a different CPU architecture
than the compiler under test (say, M68000 and 80×86).
Clearly, if they’re generating code for
different CPUs, the binary output of the two compilers
cannot always be identical in the general case!

This approach does require that the trusted compiler be able to compile
the source code of the parent of the compiler under test.
You can’t use a Java compiler to directly compile C code.

For the pedants: Yes, sometimes it’s possible to write machine code that
runs on multiple yet radically different CPU architectures,
depending on the architectures.
You may even be able to devise code that
determines which architecture it’s running on,
and then jumps to the “right” code for that architecture.
These would exploit the exact values of various machine codes, and are
certainly clever hacks.
But if you want to do that,
fat binaries
with multiple segments (each for a different architecture)
are a better approach — they’re designed to do that cleanly.
In any case, that’s not the point; the point is that
the compiler-under-test and the trusted compiler are not
required to generate identical code as output.

Non-deterministic hardware is okay in DDC

DDC does require that the parent compiler must be deterministic
when it compiles the compiler under test.
That’s not the same as
assuming that two different compilers always
produce identical results.
A compiler is deterministic if, when
run twice on identical input (with all the same option flags, etc.),
it produces the same output.
You can use a random number generator, as long as you give the user
control over the random number generator seed
(gcc, for example, has a command line option for setting the seed).
For example, on a Unix/Linux system, you should be able to do this:

  $ mycompiler input.c     # Compile, store result in "a.out".
  $ mv a.out a.out.saved   # Save old result.
  $ mycompiler input.c     # Do it again
  $ cmp a.out a.out.saved  # If always identical, it's determinstic.

This is a relatively easy constraint, and one that most compiler authors
want to be true anyway (since non-deterministic compilers are hard to debug).
Compilers generally are deterministic, with the possible
exception of embedded timestamps — and I discuss how to handle
embedded timestamps in the paper.
Sometimes you may need to use a flag (e.g., to set a random number
generator seed as in the GCC C++ compiler).

The parent compiler may internally use constructs that are
individually non-deterministic (such as threads with
non-deterministic scheduling), but if it does it must use
those mechanisms in a way that ensures that the output will be the same on
each execution given the same input.
Today’s underlying CPUs have all sorts of non-deterministic properties
(e.g., from threading multiple cores, or timing variances);
“modern CPUs are inherently random and a complex
general purpose OS on top amplifies
this inherent randomness substantially”
[“Analysis of inherent randomness of the Linux kernel”
by Nicholas Mc Guire, Peter Okech, and Georg Schiesser].
But if the CPU were so non-deterministic that
you could not reliably write data in a particular order,
you couldn’t get a compiler or any other program to run.
So the parent compiler simply needs to be written in way that ensures
that these effects will not impact its results.
For example, the parent compiler could use
locks to ensure that thread scheduling variation does not cause
variation in the results.
In practice, developers tend to do this anyway.

The trusted compiler (“compiler T” in the ACSAC paper, and
“compiler cT” in the dissertation)
doesn’t need to be deterministic.

See assumption sP_portable_and_deterministic in the dissertation
if you want more details.

DDC’s use of trusted compiler(s) fundamentally increases trustworthiness

Some past approaches used a second compiler, but they basically just
switched which compiler you had to trust completely.
Indeed, you might make things worse, if you switch from an unsubverted
compiler to a subverted compiler.

DDC, in contrast, uses additional compilers as a check on the first.
This fundamentally changes things, because now an attacker must
simultaneously subvert both the original compiler,
and all of the compilers used in DDC.
Subverting multiple compilers is much harder than subverting one, especially
since the defender can choose which compilers to use in DDC
and can choose the compilers used in DDC after the attack has been performed.

Why not use the trusted compiler for everything?

Using a different trusted compiler greatly increases the
confidence that the compiler executable corresponds with its source code.
When a second compiler is used as part of DDC, an attacker must subvert
multiple executables and executable-generation processes to perform the
“trusting trust” attack without detection.
If you only used the trusted compiler,
you’re back to the original problem,
which I view as total trust on a single compiler executable without a
viable verification process.

Also, as explained in

section 4.6,
there are many reasons the trusted compiler might not be suitable for
general use.
It may be slow, produce slow code, generate code for a
different CPU architecture than desired, be costly, or have undesirable
software license restrictions.
It may lack many useful functions necessary for general-purpose use.
In DDC, the trusted compiler only needs to be able to compile the parent;
there is no need for it to provide other functions.

Finally, note that the “trusted” compiler(s) could be malicious
and still work well well for DDC.
We just need justified confidence that any triggers or payloads
in a trusted compiler do not affect the DDC process when applied
to the compiler-under-test.
That is much, much easier to justify.

Does applying DDC by itself guarantee the compiler isn’t malicious?

No, applying DDC by itself does not guarantee that the
compiler isn’t malicious, or that the compiler is not doing something
surprising to you, or that the compiler has no defects.
For example, in 2016 it was discovered that

Microsoft Visual Studio 2015 Update 2 was quietly inserting telemetry calls into compiled programs by default,
even though this was not well documented and could harm privacy.
That’s not the sort of thing that DDC could typically detect.

Passing the DDC test
simply means that you can read compiler source code to see
what the compiler does,
instead of having to review executable (binary) code.
But that’s a difference that matters: Developers are used to looking
at source code, since that’s what they normally do.
DDC turns an intractable challenge into a normal review process.

What does “fully” mean?

By “fully” I mean that
“the trusting trust attack can be
detected and effectively countered”
(as I say in the thesis).
A little background may help illustrate why I use the word
“fully”.

First, complaining that people trust others is a waste of time.
You must trust others in a modern world.
No one grows all their own food, builds their own shelters from their own
materials, and provides all their other needs by themselves;
we all trust others.
However, there is a serious systemic problem
if you cannot independently verify what you trust.
You should strive to “trust, but verify”.

I believe the fundamental problem caused by the trusting trust attack was that
it was impractical to independently verify that what you depended on
(the executable) corresponds to its human-readable
representation (the source code).
This is because program-handling programs can subvert the
relationship between what humans review and what is actually used.
Ken Thompson’s paper is not titled “Reflections on trust”; it is
“Reflections on trusting trust”.
Again, I believe problem was not trust,
but the lack of a meaningful process for independent verification.

With DDC, we now have a practical process
to independently verify
that source code and executable correspond.
DDC fully counters the problem
that we lacked a practical independent verification process
for program-handling programs (like compilers).

I believe it’s important that
we understand the limitations of any result.
Section 8.14
explains, in detail, how an attacker can subvert DDC.
Because DDC has been proven using a formal mathematical proof,
the only way to counter DDC is
to falsify one of the proof assumptions.
A defender can make such falsification very difficult.
For example,
the defender, not the attacker, gets to choose the compiler(s)
used as the trusted compiler(s); the defender can even write one himself.
It’s true that an unwise defender can depend on components that are
not really diverse, but section 6 describes how to get that diversity.
Once the defender knows that diversity is a goal,
the defender can come up with all sorts of ways to provide it.

My goal was to create a process for independent verification.
DDC provides an independent verification process, and one that
can be practically applied.
I applied the DDC process to four different compiler executables,
and one of them was the widely-used gcc.
Therefore, DDC fully meets the need for an independent verification process
that can be practically applied.

So why did I put the word fully in the dissertation title at all?
Well, I needed to find some way to
diffentiate the titles of the ACSAC paper and the PhD dissertation.
I realized that my older ACSAC paper had an important limitation:
it only applied to self-parented compilers.
Many compilers are not self-parented, and thus, the older ACSAC paper
process could not apply to many compiler executables in use.
In contrast, the 2009 dissertation can address all compilers,
self-parenting or not.
Thus, the dissertation “fully” provides a process
for verifying compiler executables, whether they are self-parented or not.
I should note that even if I wanted to, I cannot change the title now :-).

I mentioned applying this DDC approach to hardware
in the dissertation and at the ACSAC conference.
Obviously, if your software is okay, but the hardware is subverted,
you’re still subverted.
The ACSAC presentation and dissertation talk about this in more detail.
DDC can be applied to hardware as well as software.
As I also mentioned, there are two problem areas: legal and technical.

The legal problem is that increasingly chip designers
and chip manufacturers cannot legally know what is supposed
to be on the chip.
For example, developers of the various “IP cores” used on chips
typically forbid chip designers and manufactureres from obtaining or
using this information.

The key technical problem is creating a meaningful
“equality” test in hardware.
I speculate that various techniques, such as scanning electron microscopes,
could be used to help implement an equality test.
Other hardware validation mecahnisms
(e.g., see
Semiconductor IP Validation Gets Faster), might
also play a role.
But it is fundamentally harder to implement equality tests for
hardware (compared to software).
I cited several papers in my dissertation about this.
You can learn more about the challenge from papers pubished since then,
such as
“Stealthy Dopant-Level Hardware Trojans”
by Georg T. Becker, Francesco Regazzoni, Christof Paar,
and Wayne P. Burleson
(Bruce Schneier briefly discusses this), as well as
“Integrated Circuit Security Threats and Hardware Assurance Countermeasures” by Karen Mercedes Goertzel (CrossTalk, November/December 2013)
[alternate URL].
“A2: Analog Malicious Hardware” by Kaiyuan Yang, Matthew Hicks, Qing Dong, Todd Austin, and Dennis Sylvester
show “how a fabrication-time attacker can
leverage analog circuits to create a hardware attack that is
small (i.e., requires as little as one gate) and stealthy (i.e.,
requires an unlikely trigger sequence before effecting a chip’s
functionality).”
Researchers at University of Michigan demonstrated in 2016 a sabotaged processor called A2; it could be planted by a single employee at a chip factory.

However, while there are challenges, there is also hope.
As noted in
“X-Ray Tech Lays Chip Secrets Bare” by Samuel K. Moore (IEEE Spectrum, 2019-10-07,
researchers in Switzerland and the U.S. have a non-destructive technique that
has the poential to reverse engineer an entire chip without damaging it.
This approach, called
ptychographic
X-ray laminography, could possibly “be used by
integrated circuit designers to verify that manufactured chips
match their designs.”

Countering subverted hardware is
definitely an area for potential future research.
More generally, I think there’s a growing problem in
logic hardware verification.
The Growing Semiconductor Design Problem is
an especially easy-to-understand summary about the problem of
hardware verification, aka the verification gap.

The approach described here only works when you can create
alternative implementations of computer languages (compilers).
There is no technical problem in doing so, but some organizations are
trying to make it difficult to legally create alternative
implementations.

Any limitation on creating or distributing alternative implementations
of a computer languages creates a dangerous threat to any user of that
computer language.
It also creates a threat to any user of programs developed (directly or
indirectly) with that language.

Computer application programmer interfaces (APIs) and languages are generally
held to be outside the scope of copyright.
Specific implementations and their documentation
can be copyrighted, but APIs and languages are fundamentally ideas
and not just fixed expressions.
This was long understood, but many rulings in 2012 (in the US and Europe)
make this even clearer… though there are some stormclouds
that threaten this.
The
Oracle v. Google “Order RE Copyrightability
of Certain Replicated Elements of the Java Application programming Interface”
of 2012
found that
“So long as the specific code used to implement a method is different,
anyone is free under the Copyright Act to write his or her own code
to carry out exactly the same function or specification of any methods
used in the Java API. It does not matter that the declaration or
method header lines are identical.
Under the rules of Java, they must be identical to declare a
method specifying the same functionality” even when the
implementation is different.
When there is only one way to express an idea or function,
then everyone is free to do so and
no one can monopolize that expression.
And, while the Android method and class names could
have been different from the names of their counterparts in Java
and still have worked, copyright
protection never extends to names or short phrases as a matter of law. …
This command structure is a system or method of operation under
Section 102(b) of the Copyright Act and, therefore, cannot be copyrighted.”
(Groklaw
has this as text.)
Similarly, the Court of Justice of the European Union found
in SAS Institute v. World Programming Ltd., Judgment in Case C-406/10, that “The functionality of a computer program and the programming language cannot be protected by copyright.”
(Here are the actual judgements of C-406/10.)
Copyright, under U.S. law, specifically does not cover any
“idea, procedure, process,
system, method of operation, concept, principle, or discovery”;
the history and justification of this (note that the list is much more
than just ideas)
is given in

“Why Copyright Law Excludes Systems and
Processes from the Scope of Its Protection” by Pamela Samuelson.
However,
on May 9, 2014, the Federal Circuit partially
reversed the district court ruling, ruling in Oracle’s favor
on the copyrightability issue,
and remanding the issue of fair use to the district court.
I hope this will be construed narrowly; if broadly interpreted, then
copyright might effectively prevent all future competition.
As a practical matter, software must work with each other; if a
a company can prevent compatible implementions, then that company
can effectively prevent meaningful competition and
verification measures like DDC.
See
Computer Scientists Ask Supreme Court to Rule APIs Can’t Be Copyrighted for more information about
APIs and copyright.

Sadly, the risk from patents is still significant, as discussed in the
dissertation.
See my
page on software patents for more.

I used the word “trusted” when referring to the “trusted compiler”.
I should note that there is a big difference between the words
“trusted” and “trustworthy”.
Something is trustworthy if there is evidence that it is worthy of trust;
something is trusted if someone trusts it (hopefully because they have
determined that it is trustworthy).
If you use DDC, you need to use a trusted compiler — since you are
trusting its results, by definition it is trusted.
You should choose a trustworthy compiler as the trusted compiler,
however.

The good news is that you do not need to use a totally perfect,
never-makes-a-mistake compiler; such compilers are rare.
Instead, you just have to use a compiler that meets the conditions
described in the paper, which are much easier conditions to meet.

I tried to summarize some lessons learned on how to use
tools to prove things in my short paper

“How to prove stuff automatically”.

After my paper was published I learned of another subverted compiler
example (in the trusting trust sense) in
Mike Stute’s answer to
“What is a coder’s worst nightmare?”.
He tried to modify a program but found he couldn’t do it successfully.
After 15 days of work, “I suddenly realize it’s in the compiler…
every time you compile the original code and run it
puts in the subliminal message code into the source code…
Several days later.. we recompile the compiler from the source.
That solves it… Except it didn’t…
The ex-grad student had poisoned the compiler to poison itself when it
was recompiled…
We also found that if /sbin/login is compiled it puts
in a backdoor allowing anyone who uses a specific password
to login in as the root user. This computer is accessible by modem and Tymnet.
Finally this gets the computing center’s attention.
Genius! But put to a horrible cause.”

As I clearly note in the paper, I didn’t come up with the original idea
for the DDC countermeasure.
The
original idea
was dreamed up by the amazingly bright Henry Spencer.
However, he never pursued it; in fact over time he’d forgotten about it.
I took his few sentences describing his idea and greatly expanded on it,
including a much more detailed and analyzed description of it, as well as
justifying and demonstrating it.
For example, his original approach presumed self-parenting,
a limitation my PhD dissertation removes.
My thanks to him for his original idea, and for his helpful comments since.

I also want to credit those who made the world aware of the problem
in the first place: Paul Karger, Roger Schell, and Ken Thompson.
Paul Karger and Roger Schell’s groundbreaking analysis of Multics
was the first time that this issue was identified.
A key step in fixing a problem is knowing there’s a problem in the first place!
I had several great conversations with Paul Karger, who was very
enthusiastic about this work and provided several helpful comments.
Sadly, Paul Karger died in 2010, and that is a loss for the world;
the good news is that when he died, he knew about my solution
and was quite happy about it.
I also talked with Roger Schell about it.
I also want to thank Ken Thompson, who (among his legion of
accomplishments) demonstrated this attack and made far more people
aware of the problem.

The first syllabus that included my ACSAC 2005 paper as required reading is

CSC 593: Secure Software Engineering Seminar,
a Spring 2006 class taught by Dr. James Walden at Northern Kentucky University.
He paired my paper with Ken Thompson’s classic 1984 paper
Reflections
on Trusting Trust.
It was also a subject of a class session
at George Mason University (GMU)’s
“Advanced Topics in Computer Security: Cyber-Identity, Authority and Trust”
(IT962) taught by Ravi Sandhu. I had the honor of visiting for the day
and giving the presentation myself for their Spring 2006 session.

Technische Universitat Dortmund’s
Lehrstuhl Informatik VI (Dr. Ulrich Flegel and Dr. Michael Meier)
(WS 2007/2008)
include it, too.
It’s specifically noted in
Linux Luddites
podcast #21 (August 2,2014) starting at 1:41 as well.

The ACSAC paper is cited in various places, including

“Increasing Open Source Software Integration
on the Department of Defense Unclassified Desktop”
by Steven Anthony Schearer (June 2008), a
Naval Postgraduate School (NPS) thesis,

“How Practical Are Intrusion-Tolerant Distributed Systems?”
by Obelheiro et al.
(Sep 2006), Department of Informatics, University of Lisbon, and

the PhD thesis
“Tamper-resistant Peer-to-Peer Storage for File Integrity Checking”
by Alexander Zangerl, Bond University, School of Information Technology
(August 2006).

The ACSAC paper has been noted or discussed at many locations, including

Bugtraq,

comp.risks (the Risks digest),

Bruce Schneier’s weblog (the source for Crypto-Gram),

Lambda the ultimate,

SC-L (the Secure Coding mailing list),

LinuxSecurity.com,

Chi Publishing’s Information Security Bulletin,
Wikipedia’s “Backdoor”
article,

Open Web Application Security Project (OWASP) (mailing list),
and others.

Bruce Schneier’s page in particular
includes a lengthy commentary about it, and both
his site and Lamba-the-Ultimate have various blog entries.
The article

Open Source is Securable
discusses the paper and its ramifications — in particular, it’s finally
possible to make very strong claims through source code analysis.

BartK’s “Defeating the Trust Attack”
summarized the PhD dissertation; this triggered
a spirited reddit discussion
in September 2013.

There was a lively
discussion of the dissertation on Y Combinator’s “Hacker News”
in October 2016.

Diverse Double-Compiling to Harden Cryptocurrency Software
by Niklas Rosencrantz, 2023, KTH, School of Electrical Engineering and Computer Science (EECS)
)
is follow-on work. Its abstract is:
“A trusting trust attack is a special case of a software supply-chain
attack. The project in this report, named diverse double-compiling
for cryptocurrency (DDC4CC), demonstrates and explains a defense
for cryptocurrency software against trusting trust attacks. DDC4CC
includes a case study that implements a trusting trust attack and
the defense applied to a hypothetical theft of cryptocurrency on
the Bitcoin blockchain. The motivation for such an attack is easy
to understand: An adversary can acquire significant monetary funds
by manipulating economic or decentralized financial systems. For a
supply-chain attack in general, the outcome is potentially even
more severe. An adversary can control entire organizations and even
the systems belonging to the organization’s customers if the supply
chain is compromised. Attacks are possible when targets are inherently
vulnerable due to trust in their suppliers and trust in the supply
chain, i.e., the hardware constructors and the software authors,
the upstream development team, and the dependencies in the supply
chain.”
This is a Master’s thesis.

The video
The Original Sin of Computing…that no one can fix
by LaurieWired cites this work.
I don’t agree with the title – this is fixable, but always
happy to get a citation!

Sure. In particular, this dissertation brings together
technical areas that aren’t often combined.
The practical demonstrations
involved analyzing machine code (not just assembly code!)
produced by C compilers, as well as S-expressions generated by Lisp.
To prove that this really worked, I ended up using
first-order predicate logic (a mathematical logic notation) and
various tools to help automate its use.
My mathematical models ended up having to account for stuff like
different text encoding systems, because I wanted the models to
accurately model the real world enough to really counter the attack.
Some dissertations go deeply into the technical details of machine code,
while others fly into the abstractions of mathematical proof;
far fewer do both.
Frankly, I think that unusual combination makes the result
more interesting; I hope you do too.

A lot of people were sure that what I’m doing could not be done,
so I did everything I could to prove it correct.
I don’t just provide a mathematical proof; I provide a formal proof,
where absolutely every step is spelled out
(most proofs in math books “skip the details” but I do not).
I presented the proof in Hilbert (3-column) style, giving justifications
for absolutely every step.
I directly used the output of a prover tool; I could have massaged it for
clarity, but by using the output directly, I avoid the charge that I made
a mistake in its transformation, and even more importantly I could
use a separate tool (ivy) to double-check the proof.
A lot of people do not have a background in this
area of mathematics, so I
give references to where the various steps come from, and I explain
in detail each of the starting mathematical statements.

This section discusses the application of DDC in the real world.
I know of at least one real-world application of DDC,
specifically for the GNU Mes C compiler.
So let’s discuss that!

DDC Application: GNU Mes C Compiler boostrap

GNU Mes is
“a Scheme interpreter and C compiler for bootstrapping the GNU
System. Since version 0.22 it has again helped to halve the size
of opaque, uninspectable binary seeds that are currently being used
in the Reduced Binary Seed bootstrap of GNU Guix. The final goal
is to help create a full source bootstrap as part of the bootstrappable
builds effort for UNIX-like operating systems.”

Rephrased differently, the idea is that GNU Mes, if trustworthy, can be
used as a starting point to rebuild everything else from source code, giving
you much greater confidence that the source code is an accurate representation.
GNU Mes can in turn be used to bootstrap an entire distribution, and in
particular is used to bootstrap GNU Guix (a complete operating system).
But how can you be sure that the executable of GNU Mes is trustworthy?
That seems like a chicken-and-egg kind of problem.
One way to gain confidence is DDC.

The post
Reproducible bootstrap of Mes C compiler
describes their application of DDC to gain confidence in the Mes C compiler’s
binary.
They used three different distributions
(GNU Guix, Nix, and Debian) with three different major versions of GCC to
recompile GNU Mes.
They later used the tcc compiler as well (though details about that
are sketchy).
In all cases they recreated a bit-for-bit identical result of the GNU Mes
C compiler!

As with all real-world uses there are limitations.
DDC must be executed using only trusted processes and programs,
and “trusted” in this case means that
there is “justified confidence that it does not have triggers and
payloads that would affect the results of DDC” (DDC dissertation
section 6).

The application described here shows that several different distributions
with different executables produce the same underlying result.
However,
three of these applications are using the same compiler, specifically GCC
(albeit different versions).
These tests use similar and highly related distributions; they even
use many of the same underlying components like glibc, the Linux kernel, and so
on (though again, with different versions).

So while this does use DDC, and it does increase confidence,
it increases confidence so only to a limited extent because the
checking systems are relatively similar.
They hope to attempt to use an even
more diverse set of compilers in the future, which would give even greater
confidence.

This is an awesome application of DDC, and I believe it’s the first
publicly acknowledged use of DDC on a binary (beyond what I did).
Basically, GNU Mes is designed to provide a safe starting point, and
DDC can be used to verify that GNU Mes is providing a safe starting point.
This application
of DDC can be improved over time, and that is wonderful.

Many people have worked in related areas,
in particular, to implement
reproducible (deterministic) builds
(which enable exact recreation of executables given source code)
or proving that programs do what they say they do.
I mention some of the issues, and counter-measures, in
“Countering Development Environment Attacks”
at the 2015 RSA Conference in San Francisco
(this was a presentation by Dan Reddy and me).

Here are some pointers.

Toolchain attacks: Real and claimed

It’s important to protect the development environment, including
its development toolchain.
Ken Thompson demonstrated an attack on the toolchain in the 1980s,
and it was a full-blown “trusting trust” attack.
My dissertation also discussed an attack on the Delphi compiler.

Ken Thompson Really Did Launch His “Trusting Trust” Trojan Attack in Real Life
provides a copy of the full Usenet message discussing the
subversion demonstrated by Ken Thompson.
More sources available via
mail-archive.com
and
Google Groups.
It’s important to note that
“the compiler was never released) outside”, but that was because
Ken Thompson was ethical, not because it could not be done.

Perhaps more importantly, today the source code for Ken Thompson’s
attack is now available and annotated.
“Running the ‘Reflections on Trusting Trust’ Compiler” by
Russ Cox (2023-10-25)
shares that source code, along with a detailed discussion of it.
This version is the corrected version that didn’t keep generating
slightly longer versions of the generated compiler.
You can experiment with
this attack on a web-based simulator.

I also want to correct the record on a minor point about Thompson’s work.
At one time I thought the subverted compiler wasn’t detected.
I no evidence it was detected as
being an attack, but due to a bug in how
Ken Thompson implemented his trusting trust implementation,
it did eventually fail.
The
post by Ken Thompson on 2021-09-20 titled
“[TUHS] Thompson trojan put into practice”
noted that every time it was recompiled it grew by another byte.
They noted this oddity, and after that
“they played with it until they broke the quine part”
(how it worked).
Ken noted that “the extra byte was my bug” and said
“I’m not sure … if they ever realized what was going on.”
This was
discussed in Hacker News.
It would be unwise to assume that future attacks would be buggy,
especially since it would be easy to detect this kind of bug
(make sure the subverted compiler generates the same bytes each time).
The now-posted version of the attack doesn’t include this bug.

“The Octopus Scanner Malware: Attacking the open source supply chain”
by Alvaro Muñoz (2020-05-28)
discusses the Octopus Scanner malware,
“designed to enumerate and backdoor NetBeans projects,
and which uses the build process and its resulting artifacts to spread itself.”

Operation ShadowHammer was disclosed in March 2019 by Kaspersky Lab,
and one thing it did was sabotage developer tools.
As noted in
“Operation ShadowHammer Exploited Weaknesses in the Software Pipeline”
by Fahmida Rashid, IEEE Spectrum, 1 May 2019,
“Attackers also targeted at least three gaming companies…
the attackers made a one-line change to their targets’ integrated
development environment (IDE), a software program that developers use
to write code. The effect was that whenever Microsoft Visual Studio
compiled code with a specific Microsoft-owned library, the IDE used a
similarly-named library file instead.
Compilers and development platforms are at the core of the
software supply chain…
One infected compiler on a few developers’ machines can result in
thousands of Trojanized software applications installed on millions of
end-user computers.
“It’s a poisonous seed. Plant your poisonous seed in a safe place, and it will turn into the poisonous tree with fruit.” …
Since the compiler pulls in relevant pieces of code from linked libraries and other components, using the tampered library meant code the developer did not intend to include was added to the application. A source code review won’t find the issue because the problem isn’t anywhere in the original code and the developer doesn’t know about the alternate library.
“When your compiler lies to you, your product always contains a backdoor, no matter what the source code is,” Kamluk said.”

In 2015 it was revealed that
over 4,000 Apple iOS applications were subverted and got
into the Apple app store through an attack called XCodeGhost.
This attack convinced developers to use a subverted version
of Apple’s XCode development environment.
Many
popular applications were infected via XCodeGhost,
including Angry Birds 2 and WeChat.
FireEye estimated that XcodeGhost added malicious code to over 4000 apps.
CNBC reported that
Apple was “cleaning up its iOS App Store to remove malicious iPhone and iPad programs identified in the first large-scale attack on the popular mobile software outlet. The company disclosed the effort after several cyber security firms reported finding a malicious program dubbed XcodeGhost that was embedded in hundreds of legitimate apps…
The hackers embedded the malicious code in these apps by convincing developers of legitimate software to use a tainted, counterfeit version of Apple’s software for creating iOS and Mac apps, which is known as Xcode, Apple said.”
Reuters carried a similar report.
Wikipedia has an article on
XcodeGhost
Manish Goregaokar’s “Reflections on Rusting Trust”
demonstrates an implementation of the “trusting trust” attack in the
Rust programming language.
This isn’t an attack, it’s a demo of the attack, but it’s a nice demo.
There’s a
Hacker News
and
Reddit
discussion of it.

“How a VC-funded company is undermining the open-source community” (The Outline, 2017)
makes a number of claims about
actions of Kite, a venture capital-funded startup.
The article states that that Kite has been
modifying developer tools for Kite’s benefit.
It’s not the same as the trusting trust attack at all, but
it does suggest that tools for developers are a potential target.

Safe builds

It’s important to build software in a safe way.

Security challenges for the Qubes build process
has an interesting discussion.
They state their goals as:

“We want to build (and distribute) non-backdoored software.
We don’t want the build process itself to be able to compromise the developer’s machine.”

To do this, they focus on these tasks:

“To perform verification of all the input sources, git repo commits, and other components (such as the stock RPMs and DEBs we also use), i.e. that they have proper digital signatures created by the select keys that we chose to trust.”
“Provide strong sandboxes for building the less trusted parts of the Qubes OS, such as the various templates, so that even if the (properly signed) sources or other components turn out to be malicious*, the rest of the generated system, such as the Xen hypervisor and dom0, are not affected (nor is the developer’s machine).”

They include this important footnote: “* Of course, one should understand that the mere fact that packages or sources are properly signed, even with key(s) we have decided to trust, doesn’t guarantee that the code has not been backdoored. This could happen if one of the developers turned out to be malicious or was somehow coerced to introduce a backdoor, e.g. via some kind of a warrant or blackmail, or if their laptop were somehow compromised. We would like to defend against such potential situations.”

Reproducible (deterministic) builds

Creating reproducible builds
(aka deterministic builds)
is an excellent way to detect many development-time attacks, and
is a precondition for applying DDC.
The reproducible-builds.org
web site has some great information on the topic.
The video
Reproducible builds: Two years in the trenches (2017)
summarizes their work.
They developed a tool called
diffoscope that I wish I’d had!
See below for more about the related topic
semantically reproducible builds.

Here are a few pointers you may find useful.

The Tails Operating System image is reproducible.
Verifying a Tails image for reproducibility
describes the process to independently reproduce and verify an image.

“Preventing Supply Chain Attacks like SolarWinds” (Linux Foundation
blog post) by David A. Wheeler (me!)
discusses how to counter subverted build processes, like that in SolarWinds,
by using verified reproducible builds.
Basically, use reproducible builds to independently verify that
your build result is correct.
For a detailed technical discussion on how SolarWinds Orion was
subverted, including a discussion of
SUNSPOT (the malware that inserted the backdoor) and
SUNBURST (the backdoor itself), see
“SUNSPOT: An Implant in the Build Process” by the
CrowdStrike Intelligence Team (2020-01-11).

Telegram supports reproducible builds so that others can verify that
its open source code is the same as the code available in the
Apple App Store and Google Play.
As of early 2021 it’s considered “somewhat experimental”.
Telegram notes that Reproducible Builds are especially hard for iOS
due to Apple’s current policies and MacOS limitations.
As they say:
(1) “Apple insists on using FairPlay encryption to “protect” even
free apps from “app pirates” which makes obtaining the executable
code of apps impossible without a jailbroken device.” and (2)
“macOS doesn’t support containers like Docker.”
It’s still possible, but challenging.

“Source and Executable” by Mike Lai (Microsoft)
discussed reproducible builds and
was given at the NIST/DHS Software and Supply Chain Assurance (SSCA)
Forum in late 2019.

I’ve been told that the
the Russian “Non-Documented Functionality” (NDF) certification regime
may have specifically required demonstrating a reproducible build,
and that at least at one time Microsoft did meet these NDF requirements.
However, I have not been able to confirm this.
One problem is that I don’t speak Russian.
Maybe someone can look at sites such as this
Echelon site to confirm this?

“Improving Trust in Software through Diverse Double-Compiling and
Reproducible Builds” by Yrjan Skrimstad (University of Oslo, 2018)
is perhaps the closest paper to my work (since it builds on it).
In my work I noted that it was possible to use more than one “trusted”
compiler (that is, more than 2 grandparent compilers)
to increase the difficulty for the attacker.
This work by Yrjan Skrimstad expands that further, discussing
implications when there are more than two such compilers.
It demonstrates the trusting trust attack (using a quine)
in the Go programming language, and then
and demonstrates how to use DDC (with 3 grandparents) to detect the attack.
It also discusses the relationship between DDC and reproducible builds.
This work by Yrjan Skrimstad is demonstrated in the
GitHub repo yrjan/untrustworthy_go.

The Tor project is very concerned about reproducible (deterministic) builds:

Mike Perry of the Tor project explained on 2013-06-18
that,
“I didn’t spend six agonizing weeks (and counting)
getting deterministic builds to work for
Tor Browser to prove that I was honest or trustworthy.
I did it because I don’t believe that software development
models based on single party trust can actually be secure against
serious adversaries anymore, given the current trends in computer
security and ‘cyberwar’…
I don’t believe it is possible to
keep a software-based GPG key secure anymore,
nor do I believe it is possible to keep even an
offline build machine secure from malware injection anymore…
This is where deterministic builds come in:
any individual can use our anonymity network
to download our source code, verify it against public
signed, audited, and mirrored git repositories,
and reproduce our builds exactly…
Otherwise, I really don’t think we’ll have working computers left in
5-10 years from now :/.”
Deterministic builds aren’t enough if the compiler executable is
subverted, but thankfully, DDC enables multi-party verification
of compiler executables (you still have to check the source, but
that is a much easier problem).
Deterministic Builds Part One: Cyberwar and Global Compromise and
Deterministic Builds Part Two: Technical Details
has a lot of material about Tor and deterministic builds.

The
Yocto project not only has reproducible builds, but its CI/CD
pipeline verifies that every build is reproducible.

“Is that really the source code for this software?” by Jos van den Oever
(2013-06-19)
posts about the problems of trying to
recreate executables from source code.
Sometimes it’s not so bad, e.g., for Debian,
“The binary package that was built from a Debian source package was
not identical to the published binary package, but the differences are
limited to timestamps and the build id in the executables.”
But sometimes it’s very difficult, just as I had found years earlier,
because you often need a lot more build information than you can get.
You need much more than the source code and build script; you need
to know the exact versions of all relevant build software, its
configuration, and so on.
But it is possible to record all that information, so that the
process can be repeated… and you can repeat the process to make sure
that you got it all.
If you record that information, then you have the problem of
“how do I know that my build tools are not malicious?”
At that point, DDC comes to the rescue… because DDC can help you
verify that.

“On business adoption and use of reproducible builds for open and closed source software” by Simon Butler, Jonas Gamalielsson, Björn Lundell, Christoffer Brax, Anders Mattsson, Tomas Gustavsson, Jonas Feist, Bengt Kvarnström & Erik Lönroth , 2022-11-29, Software Quality Journal
discusses reproducible builds from a business point of view.
“Through interviews with software practitioners and business managers,
this study explores the utility of applying R-Bs in businesses in
the primary and secondary software sectors and the business and
technical reasons supporting their adoption. We find businesses use
R-Bs in the safety-critical and security domains, and R-Bs are
valuable for traceability and support collaborative software
development. We also found that R-Bs are valued as engineering
processes and are seen as a badge of software quality, but without
a tangible value proposition. There are good engineering reasons
to use R-Bs in industrial software development, and the principle
of establishing correspondence between source code and binary offers
opportunities for the development of further applications.”
This is an open access paper, so anyone can read it.

The Debian ReproducibleBuilds project
has the goal of making it
possible to reproduce, byte for byte, every build of every package in Debian.
They have made a lot of progress, and I am
really delighted to see it.
Their
Overview of known issues related to reproducible builds
shows what commonly causes problems;
these include embedded generated timestamps from various causes
(this is a big one) and
random/unsorted ordering.
For example,
SOURCE_DATE_EPOCH specification
provides a simple mechanism to turn complicated timestamp issues into
something simple.
Also, the
sources.debian.net site
provides convenient browsing access to the Debian source code.
How Debian Is Trying to Shut Down the CIA and Make Software Trustworthy Again also discusses this.

Reproducible Builds for Fedora is a similar project to
deterministically reproduce the packages of Fedora.

F-Droid and
The Guardian Project
are working on reproducible builds for Android.
For more information, see
LWN.net,
info on the first reproducible build by Guardian (a developers’ tool),
their success with the utility app Checkey.

How I compiled TrueCrypt 7.1a for Win32 and matched the official binaries describes a deterministic build (with explanable differences)
was achieved for TrueCrypt.
This is an encryption software capable of on-the-fly
encryption on file-, partition- or disk-based virtual disks, yet its
authors are anonymous, leading some to worry that the executables
were backdoored.
Note: Though its source code is visible, it does not use a standard OSS
license and it imposes restrictions that probably mean is it is not OSS;
it is not considered FLOSS by many major Linux distributions
including Debian, Ubuntu, Fedora, openSUSE, and Gentoo.
More recently, the TrueCrypt developers have stopped development, and
its lack of a real OSS license may inhibit anyone else supporting it.

Gitian
is a “secure source-control oriented software distribution method
[so] you can download trusted binaries
that are verified by multiple builders.”

Vagrant
is designed to “create and configure
lightweight, reproducible, and portable development environments”.
Seth Vargo discusses it briefly.

The paper
“Reproducible Containers” by Omar S. Navarro Leija et al
will be presented in March 2020 at the
25th International Conference on Architectural Support for
Programming Languages and Operating Systems (ASPLOS) 2020
(this is a conference of the Association for Computing Machinery (ACM)).
This paper “describes DetTrace, a reproducible container abstraction for
Linux implemented in user space.”
This looks really promising.
The
implementation is OSS (MIT license).

Buildroot is a simple mechanism for
creating embedded Linux systems through cross-compilation.

Byzantine Askemos Language Layer (BALL)
is an implementation of the
Askemos Distributed Virtual Machine.
It creates an “autonomous virtual execution environment for applications”
which unlike traditional cloud environments is
specifically designed to provide fault tolerance and
to be tamper-proof.
It executes the code on several different machines,
runtime libraries, compilers, operating systems and so on
in parallel and compares cryptographic signatures.
Thus, this tries to counter subversion of various lower-level components.

Christophe Rhodes has discussed
the problems of reproducing builds of Steel Bank Common Lisp (SBCL)
on different systems in
Still working on reproducible builds
and
Reproducible builds – a month ahead of schedule.
While his notes are specific to SBCL, they illustrate more general issues.
He notes that one of the reasons that SBCL separated from its parent
CMCL was to
“make the result of its build
be independent of the compiler used to build it.”
His goal was not primarily to counter attack, but to eliminate
hard-to-find bugs:
“… how do we know there aren’t odd differences that depend
on the host compiler lurking,
which will not obviously affect normal operation but will
cause hard-to-debug trouble later? (In fact there were plenty of those, popping up at inopportune moments).
I’ve been working intermittently on dealing with this, by attempting to
make the Common Lisp code that SBCL!Compiler is written in sufficiently
portable that executing it on different implementations generates
bitwise-identical output. Because then, and only then, can we be
confident that we are not depending in some unforseen way on a particular
implementation-specific detail…”.
Here are some of the issues that he (and perhaps other
the SBCL developers) found and fixed, as an example of what to look for:

The Common Lisp specification permits implementations to compute
(write-to-string '(quote foo) :pretty nil)
as either "(QUOTE FOO)" or "'FOO".
To create a reproducible build
they had to use a workaround (involving function name counters)
so that the difference would not matter.
A related problem happens with backquote: the Common Lisp specification
allows implementations to determine if values coalesce, but this
can produce different results; the solution is to change the code
so that its results are guaranteed.
The various set-related functions
(e.g., set-difference, uniion, intersection, etc.) do not
return sets in an order, which can result in differing builds.
The solution is to sort the result of set operations, to force them
to a specific known order.
A call to maphash was used to affect the Lisp image directly.
In general, hash tables do not guarantee any particular order when you
walk their contents, so you need to force an ordering if you iterate
over their contents.
Implementation-defined constants, especially
most-positive-fixnum and most-negative-fixnum,
but also array-dimension-limit and internal-time-units-per-second.
Some functions like random and sxhash are understably different,
which cause access patterns to differ.
The sort routine in Lisp is not specified to be stable,
so when trying to make it deterministic with multiple stages,
stable-sort should be used instead.
Initial values of arrays are undefined, so don’t depend on their value!

The key thing to note is that creating compilers that can easily
have reproducible (deterministic) builds on other compilers
typically takes work in the real world… but it is very doable.

There are various tools that can help you create reproducible builds.
For example, if build paths are embedded, you can force fixed
directory values to make them reproducible.
There are tools that enable this without requiring root permission,
including my tools
user-union
and
auto-destdir,
as well as tools like
proot.

LF-Edge EVE has worked hard on reproducibility.
See EVE image reproducibility and
EVE image sources for an interesting example of
how to implement higher standards.

Semantically equivalent builds

Reproducible builds are great for showing that a package really was built
from some given source, but sometimes they’re hard to do.
A useful backoff is something called a “semantically equivalent build”.

As explained in the documentation for the
oss-reproducible tool
(part of OSSGadget),
“A project build is semantically equivalent
if its build results can be either recreated exactly (a bit for bit
reproducible build,or if the differences between the release package and a rebuilt package are not expected to produce functional differences in normal cases.
For example, the rebuilt package might have different date/time stamps, or one might include files like .gitignore that are not in the other and would not change the execution of a program under normal circumstances.”

A semantically equivalent build has very low risk of being a subverted build
as long as it’s verified to be semantically equivalent.
Put another way, verifying that a package has a semantically equivalent build
counters the risk where the putative source code isn’t malicious, but
where someone has tampered with the build or distribution process,
resulting in a built package that is malicious.
It’s quite common for builds to produce different date/time stamps, or
to add or remove “extra” files that would have no impact if the original
source code was not malicious.

It’s much easier (and lower cost) for software
developers to create a semantically equivalent build instead of always
creating a fully reproducible build.
Fully reproducible builds are still a gold standard for verifying
that a build has not been tampered with.
However, creating fully reproducible builds often require that package
creators change their build process, sometimes in substantive ways.
In many cases a semantically equivalent build requires no changes,
and even if changes are required, there are typically fewer changes required.

OSSGadget
includes a tool that can determine if a given package is
semantically equivalent.
It’s still helpful to work to make a package a fully reproducible build.
A fully reproducible build is a somewhat stronger claim, and
you don’t need a complex tool to determine if the package is fully
reproducible.
Even given that, it’s easier to first create a package that’s
semantically equivalent, and then work on the issues remaining
to make it a fully reproducible build.

The intended use case for semantically equivalent builds
(instead of fully reproducible builds)
really to help people make risk decisions when they’re thinking
about bringing in external software.
I’m primarily trying to deal with the case where the developer has decided
to not provide a reproducible build,
and I have to estimate the likelihood
of it being maliciously built (presumably as a part of decideing whether or not
the package is safe to install). I’m primarily thinking of
applying this process to mostly-unmanaged repositories
like npm, PyPI, and RubyGems, *not* to managed repositories like
most Linux distributions’ repositories (which have other
mechanisms to counter malicious builds).
The problem is that I cannot make the developer
provide me a reproducible build (I can beg, but that’s not the same thing).
I’m trying to make good decisions with the information I have,
not the information I *want* to have.

The threat model is a little different, too. The assumption isn’t that
“it is impossible for these differences to cause damage”.
The assumption is that “the original source code was benign,
reasonably coded, and did not do damage”. The question is,
“is this non-reproducible
package likely to have been generated from it, even though it’s
not a reproducible build?”

Here’s an example that might clarify the threat model.
It’s possible that a
program could look for the file “.gitignore” and run it if present.
The source code repo might not have a .gitignore file,
but the malicious package might add a .gitignore file and fill it with
a malicious application. That would cause malicious code to
be executed. However, it would also be *highly* suspicious for
the source code to
run a “.gitignore” file (that’s *not* what they are for), so
it’s reasonable to assume that the source code didn’t do that.
If an attacker can insert a file that *would* cause malicious code
to execute in a reasonably-coded app, then that *would* be a problem.
“What’s reasonable” is hard to truly write down, but
ignoring date/time stamps and ignoring a
whitelisted list of specific filenames seems like a reasonable place
to start.

Sure, ideally everything would have a reproducible build.
Since that day isn’t here, what can we do to take piecemeal
steps towards that?

In short, making packages at least semantically equivalent
(and verifying this) is a great countermeasure against subverted builds.

Bootstrappable builds

Bootstrappable builds
focuses on minimizing the amount of bootstrap binaries.
They’re not just interested in the direct “bootstrap” code
to boot a computer, but also what is necessary to generate the
direct bootstrap code.
The problem bootstrappable builds is trying to address is a real one,
namely, they are worried about subverted bootstrap code.

In the ideal sense they want to build up
“compilers and interpreters and tools from nothing” – but of course
you can’t really build up from nothing – there has to be
a starting point.
DDC provides a great complement to bootstrappable builds –
bootstrappable builds tries to limit the binaries you depend on, and
DDC can be used to verify the those binaries that you depend on.

One example of this approach is
stage0.

Another example is
builder-hex0,
a kernel “for bootstrapping compilers without having to
trust a prebuilt binary”.
They are now able to bootstrap a POSIX kernel from Hex0 and use it
to bootstrap a cross-platform C compiler with it.

Formal methods / proofs

Coq is being used by Xavier Leroy (main developer of OCaml) to write a
certified compiler,

compcert, that guarantees that semantics of a C
source program is kept up to PowerPC assembly.

The *specification* (unfortunately not the Coq proofs) of the
compiler back-end is available as GPL software.

You might also be interested in the results of the MITRE Vlisp project.

Vlisp README says:
“The Verified Programming Language Implementation project has developed
a formally verified implementation of the Scheme programming language,
called Vlisp… An overview of the project is
presented in the Vlisp Guide.

More accessible PDFs about Vlisp are available too.

Another paper that you may find interesting is

Jonathan A. Rees. “A Security Kernel Based on the Lambda-Calculus”.
PhD. Thesis. February 1995.

Underhanded code / Malicously-misleading code

In my PhD dissertation I note the problem of code that intentionally written
to look (to a human) like it does one thing, but actually does another.
In my dissertation I called this “maliciously-misleading code”; more recently
the term “underhanded code” seems to have become more common.
Either way, it’s not a good thing.
There are contests such as the
Obfuscated V Contest and Underhanded C Contest
showing that it’s very possible, and a 2003 attempted attack on the Linux
kernel shows that it is not merely an academic issue.

If you are interested in the topic of underhanded code,
I suggest looking at my paper
“Initial Analysis of Underhanded Source Code”
by David A. Wheeler, IDA Document D-13166, April 2020
(here’s a Perma.cc link to the
PDF of D-13166).

Building (more) trusted hardware

A related issue is developing (more) trusted hardware.
Since it’s much harder to determine if hardware is “equivalent” there are
many ways an attacker can subvert hardware that aren’t a trusting trust attack.
It’s especially easy to subvert an ASIC mask, and unfortunately it’s hard to
detect. Those who control foundries can easily insert attacks that
are hard to detect.
I can’t cover this huge field, but I thought I should make a few notes about it.

First, it’s well-known that it’s possible to insert attacks on hardware
that are hard to detect later.
E.g.,
“A2: Analog Malicious Hardware”
showed “how a fabrication-time attacker can
leverage analog circuits to create a hardware attack that is
small (i.e., requires as little as one gate) and stealthy (i.e.,
requires an unlikely trigger sequence before effecting a chip’s
functionality).”
It won “best paper” at the 37th IEEE Symposium on Security and Privacy
and
ComputerWorld had article about it.

There are some countermeasures. E.g.,
“Detecting Hardware Trojans with Gate-Level Information-Flow Tracking”
(IEEE Computer August 2016)
uses information flow tracking to discover hardware Trojans
As noted in a
UCSD article, it works by assigning a label to important data
like a cryptographic key.
That has some value, but it’s not a good general-purpose technique
for countering arbitrary Trojans – what if the Trojan isn’t trying to
violate that specific property?

Obviously, what we’d like to have is computers that we can
really trust for important missions, either directly or for use as
checks on “normal” computers.
One approach I’ve suggested for decades is possibly using FPGAs to implement
some computers. FPGAs can be subverted also, especially to add “kill switches”,
but at least the FPGA manufacturer has less information about the chip’s use.

A really interesting set of work is
“”A Trustworthy, Free (Libre), Linux Capable,
Self-Hosting 64bit RISC-V Computer” by
Gabriel L. Somlo
(be sure to see the
FOSDEM 2023 presentation “Self-Hosting (Almost) All The Way Down”).
Gabriel L. Somlo
has built an entire hardware and software system using an FPGA.
He managed this by implementing RISC-V, an open instruction set that
is supported by today’s Linux systems.
I’m especially delighted that he points to my DDC work (smile).

As of 2020 Somlo’s FPGA-implemented system’s speed was about 65MHz.
That’s obviously
much slower than current systems which are measured in GHzr
Still, this is a usable speed if you’re patient;
it’s far faster than computers of decades past.
I also believe that future versions could be much faster if
resources were poured into it.
This work used many off-the-shelf components;
I expect that those components could be optimized to be much faster.
Its speed is also seriously limited by the lack of capacity in current FPGAs
(the 2020 implementation lacks hardware floating point and has a
limited L1 cache).
Future FPGA systems could be developed with more capacity (and more RAM)
to improve its performance.

Miscellaneous

Attackers can exploit compiler bugs by intentionally
writing code that triggers the bug in a way that subverts the program.
This is yet another way to write maliciously-misleading programs
(a general topic I discuss in my dissertation).
This attack is not the same same as the “trusting trust” attack that DDC
counters, but it is certainly related.
The paper
“Deniable Backdoors Using Compiler Bugs”
by Scott Bauer, Pascal Cuoq, and John Regehr,
Pastor Manul Laphroaig’s Export–Controlled Church Newsletter,
June 20, 2015,
demonstrates how a compiler defect (publicly known or found
via techniques such as fuzzing) can be exploited.
In their case, they demonstrated how sudo could be exploited
via innocent-looking code.
John Regehr’s blog post
about “Defending Against Compiler-Based Backdoors” (2015-06-21)
points this out, noting that
“the advantages of this kind of backdoor include
subtlety, deniability, and target-specificity” – and he then makes
some great points.
In particular, he notes that
compiler developers need to fix known miscompilation bugs as
rapidly as possible and use fuzz tools; compilers are security-sensitive
in a way that is often not appreciated.
Maintainers of open source packages need to be suspicious of baroque
patch submissions and consider rewriting patches.
Such attacks are much more fragile than the traditional
“trusting trust” attack, but they can occur in any program,
are potentially very dangerous, and are currently difficult to detect.
In the short term, it might be best to focus on detecting and
eliminating defects in widely-used compilers.
No one will complain about getting rid of compiler defects,
we have lots of techniques today that we can use,
and if compiler defects become more difficult to trigger these
backdoor attempts would typically become more obvious.
But that short-term strategy is not enough; I hope that people
will develop longer-term strategies too.

“Some Remarks about Random Testing” by B A Wichmann,
National Physical Laboratory,
Teddington, Middlesex, TW11 0LW, UK,
May 1998,
discusses creating random tests for compilers.

Kegel’s building and testing
gcc/glibc cross toolchains has lots of good information.

GCC explorer
interactively shows the assembly output from GCC (given various inputs).

Compiler explorer
interactively shows the assembly output from a variety of compilers
and languages.

Aquine is a
“computer program which takes no input and produces a copy of its own
source code as its only output.”
Compilers written to attack themselves are highly related to quines
(depending on your definition,
you could consider them a special case of quines).
“Quines (self-replicating programs)” by David Madore discusses
quines in more detail.
“How to write your first Quine program” by
David Bertoldi (2019-07-26)
also explains quines.
Perhaps the most amazing quine
mame quine relay,
an
Ouroborous quine that starts as a Ruby program to
generate a Rust program, which generates a Scala program, which
generates a Scheme program, and so on through 128 languages.

The RepRap Project is developing inexpensive 3D printer designs that
will hopefully (eventually) be able to create themselves.
Very interesting, and in the future, possibly quite relevant.

The
Open proofs web site
encourages the development of “open proofs”, where the
implementation, proofs, and required tools are all open source software.

Mark Mitchell’s “Using C++ in GCC is OK”
(Sun, 30 May 2010 17:26:16 -0700)
officially reported that
“the GCC Steering Committee and the FSF have
approved the use of C++ in GCC itself. Of course, there’s no reason for
us to use C++ features just because we can. The goal is a better
compiler for users, not a C++ code base for its own sake.”

Mark Mitchell later explains that he expects that GCC will use C++
cautiously.
For DDC,
this means that applying DDC to the GCC code base will require a
C++ compiler (at least one that supports the parts that GCC uses), not
just a C compiler.
I used Intel’s icc, which was a C++ compiler anyway, so that would not have
especially affected my example… and it certainly does not change
the validity of the approach.

This paper has a number of connections back to the halting problem.
Proof That Computers Can’t Do Everything (The Halting Problem) is a delightful video that shows the traditional
proof of the halting problem, but in a clever way.
You might also want to see
Beyond Computation: The P vs NP Problem – Michael Sipser.

Build tools like make are important for any large system.
Improving make
describes my efforts to improve the POSIX standard for make as well
as make implementations,
in particular to support the insights in
Peter Miller’s Recursive Make Considered Harmful.

The Juniper backdoor is interesting – it appears that
a crypto backdoor was itself backdoor’ed.
There are interesting comments by
Matthew Green
and
rpw.

“Hacking DNA: The Story of CRISPR, Ken Thompson, and the Gene Drive” by Geoff Ralston (April 3, 2017)
discusses CRISPR, and in passing discusses
Ken Thompson’s work.

“Cory Doctorow: Demon-Haunted World” in Locus Magazine
(2017-09-02)
notes that
“none of the world’s problems are ours to solve … so long as the
computers we rely on are sneaking around behind our backs, treating us
as their enemies.”

Compiler fuzzing, part 1 discusses some results from fuzzing compilers.

Breaking the Solidity Compiler with a Fuzzer discusses how they
enhanced the American Fuzzy Lop (AFL) fuzzer to fuzz C-like languages and find
defects in the Solidity compiler.

The Science Fiction story
Coding Machines by Lawrence Kesteloot (January 2009) is
based on these kinds of attacks.

Specifications/Standards

The
Open Trusted Technology Provider Standard (O-TTPS), Version 1.1:
Mitigating Maliciously Tainted and Counterfeit Products from the Open Group
may be of interest to you.
Per its website description, “The O-TTPS is an open standard containing a set of organizational guidelines, requirements, and recommendations for integrators, providers, and component suppliers to enhance the security of the global supply chain and the integrity of commercial off-the-shelf (COTS) information and communication technology (ICT). This standard if properly adhered to will help assure against maliciously tainted and counterfeit products throughout the COTS ICT product life cycle encompassing the following phases: design, sourcing, build, fulfillment, distribution, sustainment, and disposal.
The Open Group Trusted Technology Forum (OTTF) is a global initiative that invites industry, government, and other interested participants to work together to evolve this document and other OTTF deliverables.”

TUF (The Update Framework) helps developers secure their new or existing software update systems.
Between the system-level package managers,
programming language specific package managers/repositories, and application-specific update systems,
there are a lot of software updaters around.. and they all need to
be secure.

This paper cites key previous papers, here is how to get those
(I provide multiple links to increase the likelihood you can get them):

Karger, P.A., and Schell, R.R. Multics Security Evaluation: Vulnerability Analysis. ESD-TR-74-193, Vol II, June 1974, p 52.
Copies are available at
ACSAC
NIST,
and
UC Davis security lab.
This was the original paper that identified the attack.
The multicians security page
provides interesting context about this important paper.
Thomson, Ken, “Reflections on trusting trust”, Communications of the ACM CACM, Volume 27 Issue 8, Aug 1984, Pages 761-763. This is the first published demonstration of the attack as previously identified in the Multics paper. Copies are available at
ACM,
Ken Thompson’s Bell Labs site (archived), and
CMU.

I used
OpenOffice.org
to write the dissertation, and it worked out very nicely.
OpenOffice.org was a great program for writing larger documents.
LibreOffice has replaced OpenOffice.org as its successor;
LibreOffice started from the same code and has a far more active community.
So definitely check out
The Document Foundation’s LibreOffice Productivity Suite.

I developed an

OpenDocument template for George Mason University (GMU)
that did nearly all the formatting for me automatically.
That made it easy to concentrate on the text instead
of the formatting.

The most important rule for writing large documents
using LibreOffice or any other word processor is to
automate everything you can, and in particular, always use styles.
Never set the font size, font type, etc., for a range of characters
or a paragraph (one exception: using italics/bold to emphasize a word or
phrase is okay).
Instead, all formatting information should be attached to a paragraph style,
and then make sure that each paragraph has the right paragraph style.
Use “Text body” (not “Default”) for normal text, and the various
“Heading 1”, “Heading 2”, and
so on for headings.
Similarly, use Insert > Cross-Reference to refer to other parts of
the document; that way, the program can renumber things correctly.

OpenOffice.org gives you lots of control over how words break (or not)
on a line; for more, see

“Easy way to insert nonbreaking hyphen, etc. in OpenOffice.org Writer”
(by Solveig Haugland).
Basically, to get more control over hyphenation, go to
Tools > Options > Language Settings > Languages
and select the “Enabled for Complex Text Layout” option.
Now you can use “Insert>Formatting Mark” menu to insert more control
over formatting.
The “no width no break” character, aka the “glue” character”, “glues”
the characters it’s between to prevent line breaks there which would
otherwise there.
Similarly, the “no width optional break” character, when inserted, tells
OpenOffice.org that it’s okay to insert a line break there where
normally it would not do so.
You can also insert non-breaking spaces, non-breaking hyphens, and
optional hyphens.

In most cases, the paragraph styles should make paragraphs break
across pages in the right way
(e.g., the paragraph styles
should have reasonable default “widow” and “orphan” settings, and
header paragraph styles should have “keep with next paragraph” set).
But in some cases the paragraphs won’t break across
pages well because the program doesn’t “understand” text.
For example, if you have text that leads into the next paragraph,
you may need to right-click on that paragraph
and set “keep with next paragraph”.
In special cases you may want a different widow or orphan setting, too.

OpenOffice.org supports formulas, which I use quite a bit.
Its “stack” and “matrix” options are sometimes very useful for
multi-line formulas, for example.
For in-line formulas, I recommend making formula borders 0.
You can do this while editing formulas by
selecting Format>Spacing, category Borders, and then making the
all borders 0 (indeed, I suggest making this the default).
Otherwise, there’s embedded extra space in formulas that looks odd when you
try to combine formulas with punctuation.

For the final version, I used Tools > Update All (to update the
table of contents, cross-references, etc.),
moved to the beginning and saved, and then ran
File > Export as PDF.

After doing endless numbers of tedious compiles,
Xkcd’s cartoon about compiling
made me smile.
The big picture solution
to the halting problem is also relevant :-).

Dilbert has mentioned long compiling times too:
Dilbert 2013-06-22
Dilbert 2005-09-23
Dilbert 1998-06-21.
Dilbert once noticed that
“maybe there’s a bug in the compiler program itself!”.
Dilbert also makes
it clear why software single source strategies are a bad idea.

I gave a brief example of readable Lisp s-expressions;
the
readable Lisp s-expressions project
has specifications and implementations for curly-infix-expresssions,
neoteric-expressions, and sweet-expressions, which can make
Lisp notation a lot easier to read.

Mortality.pvs is a short demo of how to
express the “All men are mortal” example using PVS.

Here’s how to install gcc on SGI IRIX.

ERESI (ERESI Reverse Engineering Software Interface)
is a “unified multi-architecture binary analysis framework targeting
operating systems based on the Executable & Linking Format (ELF).”.

developerworks has a nice article on ELF.

Elfkickers was written by Brian Raiter, who also wrote

A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux
and

Albert Einstein’s Theory of Relativity:
In Words of Four Letters or Less.
This old article
explains ELF’s advantages.

I have tried to make sure that this paper will stick around
into the future.
Here’s the GMU page for my dissertation “Fully Countering Trusting Trust through Diverse Double-Compiling”, as well as the
arXiv.org copy of
“Fully Countering Trusting Trust through Diverse Double-Compiling” and the
UMI ProQuest copy of my PhD dissertation
“Fully Countering Trusting Trust through Diverse Double-Compiling”
(via ProQuest search).
Archive.org also has a copy.
These are just additional copies, with the same information.
The PDF file, as I submitted it, has these properties:

Title	Fully Countering Trusting Trust through Diverse Double-Compiling
Author	David A. Wheeler
Date	Fall Semester 2009 (actually 2009-11-30)
Filename	wheeler-trusting-trust-ddc.pdf
Length	1,971,698 bytes
Pages	199
MD5 hash	5320ff082ec060e7f58409b3877cb687
SHA-1 hash	20c8b702dd4b7c6586f2 59eb98f577dbadd359dd
SHA-256 hash	024bccc5254eaffe9466f12afe39f72b 154f63a6919f4e1add5d0513092b2052
SHA-512 hash	0004998431af5da486a87794969a5314 07cb607ffc411c966a23343a58636c20 72ceb85835ffe6eef727696ffc41b1dd d6d9e0fd090cbc85a33041c25acd2e55

An aside: At ACSAC 2005, Aleks Kissinger (from the University of Tulsa) also
presented work that he and I had done on micro-tainting.
Since that seems to have disappeared from the web, I thought I should
briefly describe it here.

Aleks’ presentation was titled
“Fine-Grained Taint Analysis using Regular Expressions,”
which was part of the

ACSA Works in Progress
Basically, we noted that instead of assigning “taint” to a whole value, such as a string, you could
assign taint on subcomponents, such as each character.
Then you could assign rules that identified the input paths and
what could come in —
typically zero or more tainted characters — and rules on output paths.
We concentrated on defining regular expressions for what is legal,
though any other expression for patterns such as BNFs would be fine too.
We noted that you could then check statically or dynamically.
For the static case, when you work backwards, if the check “fails” you can
even trivially derive the input patterns that cause security failures
(and from that information it should be easy to figure out how to fix it).
Aleks has recently made some good progress by transforming the regular
expressions into DFAs.
There was another ACSAC presentation on doing taint analysis with Java,
but this was the traditional “whole variable” approach that is used in
many languages, but through which many vulnerabilities slip by.
We hope this micro-tainting approach will lead to improved tools for detecting
security vulnerabilities in software, before that software is
delivered to end-users.

There is related work that we know about
that has been going on in the University of Virginia (UVA),
though we only found out about it halfway through our work (via Usenix).
More information about the UVA work is in

“Automatically Hardening Web Applications Using Precise Tainting”
by Anh Nguyen-Tuong, Salvatore Guarnieri, Doug Greene, Jeff Shirley, and
David Evans.
They focus on PHP, and only on the dynamic case; we were interested in
both, but especially interested in the static case (where
you can show that certain vulnerabilities never occur and thus
don’t need any run-time overhead to deal with them).

Other related work includes the
BRICS Java String Analyzer (GPL;
uses the BSD-licensed dk.brics.automaton).
Hampi might be
able to implement this statically, which would be fantastic.

There is a long history of work on data flow, static typing, and
security (such as work by

Dennis Volpano et al).
That’s good work, but not really focused on what we were looking at.
Those works tend to view variables as a whole, while instead we’re
tracking much smaller units of data.
We’re also tracking sequences (like arrays) which contain data
with different levels of security; most such works handled arrays
like a single unit (a simplification that is fundamentally at odds
with our approach).

You can also view
my formal education timeline,
my book on
writing secure programs,
FlawFinder, or
my home page.