# **UC San Diego UC San Diego Electronic Theses and Dissertations**

**Title**

Foundations for Speculative Side Channels

**Permalink** <https://escholarship.org/uc/item/64n9f44x>

**Author** Cauligi, Sunjay R

**Publication Date** 2021

Peer reviewed|Thesis/dissertation

### UNIVERSITY OF CALIFORNIA SAN DIEGO

### Foundations for Speculative Side Channels

## A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy

in

Computer Science

by

Sunjay R Cauligi

Committee in charge:

Professor Deian Stefan, Chair Professor Nadia Heninger Professor Ranjit Jhala Professor Farinaz Koushanfar Professor Shachar Lovett

2021

Copyright Sunjay R Cauligi, 2021 All rights reserved.

<span id="page-3-0"></span>The dissertation of Sunjay R Cauligi is approved, and it is acceptable in quality and form for publication on microfilm and electronically.

University of California San Diego

2021

## <span id="page-4-0"></span>DEDICATION

For Louisa.

#### EPIGRAPH

<span id="page-5-0"></span>What? Is it not a simple task?

Why, to someone like you, it should be by no means be a difficult task.

Except...

The one thing is... I'm a very busy fellow...

And I must leave this place in three days.

How grateful I would be if you could bring it back to me before my time here is up...

But, yes... You'll be fine. I see you are young and have tremendous courage.

I'm sure you'll find it right away.

Well then, I am counting on you...

—The Happy Mask Salesman (*The Legend of Zelda: Majora's Mask*)

## TABLE OF CONTENTS

<span id="page-6-0"></span>





## LIST OF FIGURES

<span id="page-9-0"></span>

## LIST OF TABLES

<span id="page-10-0"></span>

#### ACKNOWLEDGEMENTS

<span id="page-11-0"></span>I cannot begin this section without first thanking my partner, Louisa Fan. She has supported me through the pits of my graduate career, both emotionally and mentally; without her aid I would never have made it through my PhD. I am also truly blessed to have two wonderful parents, Raghothama and Pankaja Cauligi, who have given me nearly 30 years of love and encouragement despite my remaining a student for nearly 30 years.

I am incredibly thankful to my advisor, Deian Stefan, who decided to take a chance on a rather immature second-year and fostered him into the academic I am now. His myriad connections are what landed me with my frequent collaborator and quasi-advisor Gilles Barthe, who introduced me to the joys of semantics, and despite working with me first-hand still offered to put me up as a postdoc. And of course I must thank Geoff Voelker and Stefan Savage—my initial advisors—who were no doubt confused why I still showed up to their meetings for so many years.

To my colleagues and labmates, whose friendship gave me so much joy while we toiled away: Rob McGuinness, my perpetual roommate and brother-in-arms; Ariana Mirian, my twin sister and fellow jokester; Craig Disselkoen, who I all but conscripted into being my research assistant and also, somehow, my friend; and to the past and present members of the 3140 lunch crew, for their many, *many* interesting discussions over the years.

Finally, thank you to all my collaborators and classmates, and everyone I've interacted with during my research tenure at UCSD CSE and abroad—far too many to count and yet deeply impactful all the same.

The [Introduction,](#page-16-0) in part, uses material from all works listed below.

[Chapter 1,](#page-23-0) in part, is a reprint of the material as it appears in 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '19). Cauligi, Sunjay; Soeller, Gary; Johannesmeyer, Brian; Brown, Fraser; Wahby, Riad S.; Renner, John; Grégoire, Benjamin; Barthe, Gilles; Jhala, Ranjit; Stefan, Deian, ACM, 2019. The dissertation author was the primary investigator and author of this paper.

[Chapter 2,](#page-61-0) in part, is a reprint of the material as it appears in 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '20). Cauligi, Sunjay; Disselkoen, Craig; v. Gleissenthall, Klaus; Tullsen, Dean; Stefan, Deian; Rezk, Tamara; Barthe, Gilles, ACM, 2020. The dissertation author was the primary investigator and author of this paper.

[Chapter 3,](#page-99-0) in part, is currently being prepared for submission for publication of the material. Cauligi, Sunjay; Guarnieri, Marco; Mehta, Aastha; Moghimi, Daniel; Narayan, Shravan; Stefan, Deian; Vahldiek-Oberwagner, Anjo; Vassena, Marco. The dissertation author was the primary investigator and author of this paper.

[Chapter 4,](#page-114-0) in part, has been submitted for publication of the material as it may appear in 43rd IEEE Symposium on Security and Privacy (Oakland '22), Cauligi, Sunjay; Disselkoen, Craig; Moghimi, Daniel; Barthe, Gilles; Stefan, Deian. The dissertation author was the primary investigator and author of this material.

#### VITA

<span id="page-13-0"></span>

#### PUBLICATIONS

C. Watt, J. Renner, N. Popescu, S. Cauligi, D. Stefan. "CT-Wasm: Type-Driven Secure Cryptography for the Web Ecosystem." S. Cauligi, G. Soeller, B. Johannesmeyer, F. Brown, R. S. Wahby, J. Renner, B. Gregoire, G. Barthe, R. Jhala, and D. Stefan. 46th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL), January 2019.

"FaCT: A DSL for Timing-Sensitive Computation." 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2019.

S. Cauligi, C. Disselkoen, K. v Gleissenthall, D. Tullsen, D. Stefan, T. Rezk, and G. Barthe. "Constant-Time Foundations for the New Spectre Era." 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2020.

M. Vassena, C. Disselkoen, K. v Gleissenthall, S. Cauligi, R. Kıcı, R. Jhala, D. Tullsen, D. Stefan. "Automatically Eliminating Speculative Leaks from Cryptographic Code with Blade." 48th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL), January 2021.

G. Barthe, S. Cauligi, B. Grégoire, A. Koutsos, K. Liao, T. Oliveira, S. Priya, T. Rezk, P. Schwabe. "High-Assurance Cryptography in the Spectre Era." 42nd IEEE Symposium on Security and Privacy (Oakland), May 2021.

S. Narayan, C. Disselkoen, D. Moghimi, S. Cauligi, E. Johnson, Z. Gang, A.Vahldiek-Oberwagner, R. Sahita, H. Shacham, D. Tullsen, D. Stefan. "Swivel: Hardening WebAssembly against Spectre." 30th USENIX Security Symposium (USENIX), August 2021.

S. Cauligi, C. Disselkoen, D. Moghimi, G. Barthe, D. Stefan. "SoK: Practical Foundations for Spectre Defenses." In submission.

S. Cauligi, M. Guarnieri, A. Mehta, D. Moghimi, S. Narayan, D. Stefan, A. Vahldiek-Oberwagner, M. Vassena. "Formal Guarantees for Spectre-resistant SFI Sandboxing." Unpublished.

#### <span id="page-14-0"></span>ABSTRACT OF THE DISSERTATION

#### Foundations for Speculative Side Channels

by

Sunjay R Cauligi

Doctor of Philosophy in Computer Science

University of California San Diego, 2021

Professor Deian Stefan, Chair

Developers of high-security systems (e.g., cryptographic libraries, web browsers) must not allow sensitive information (e.g., encryption keys, browser cookies) to make its way to an attacker. However, clever attackers can make use of unintentional *side-channels*—such as timing information or other hardware resource metrics—to infer or *leak* the values of these secrets. Even worse, attackers can exploit hardware features such as *speculative execution* to create *new* vectors for side-channel leakage even where none existed before.

Side-channels are not typically captured in formal program semantics—information from a side-channel is leaked to an attacker purely as a side-effect of execution, rather than any explicit data flow. Furthermore, speculative execution fundamentally destroys security properties like memory or type safety, as they implicitly assume a standard sequential execution model. Without formal models to rely on, developers find themselves manually applying *ad-hoc* mitigations as a best-effort solution to prevent timing side-channels and speculative attacks. Unfortunately, these ad-hoc mitigations often lead to obfuscated code—and yet give no guarantee of a sound or complete defense.

This dissertation seeks to remedy this. We explore several formal frameworks that make side-channel effects explicit, both with and without the threat of speculative execution. Along the way, we introduce FaCT, a language and compiler for writing code free from side-channels; Pitchfork, a semantics and tool for detecting speculative side-channels in binaries; and ZFI= $\mathcal{D}$ , a framework for validating sandbox protections against speculative attacks. In addition to the systems presented in this dissertation, the research community writ large has developed several program analysis and defense tools backed by formal models, whether these models are explicit or implicit. We round out this dissertation by surveying these systems, examining various design choices and identifying areas of open research.

Ultimately, this dissertation demonstrates the power of practical, formal foundations when dealing with speculative side-channel security. By relying on sound, formal frameworks, high-security developers can write programs that verifiably do not leak sensitive information.

# <span id="page-16-0"></span>Introduction

Protecting secrets in software is hard. Security and cryptography engineers must write programs that protect secrets both at the source level and when executed on real hardware. Unfortunately, hardware too easily divulges information about a program's execution via *sidechannels*—e.g., an attacker can learn program secrets by observing the side-effects of the program on the hardware [\[59\]](#page-180-0). More alarmingly, modern hardware features such as *speculative execution* give rise to attacks such as *Spectre*, in which an attacker can exploit architecturally invalid execution paths to create new side-channels. Indeed, these issues destabilize the ground upon which standard notions of security are built. And accordingly, developers of secure software require *sound structural support*: Tools with *sound, formal backing* that can ensure that programs are free from such vulnerabilities, even in the face of speculative execution.

## <span id="page-16-1"></span>1 Timing side-channels

We first give a background on *timing side-channels*, wherein code executes for observably different amounts of time depending on the value of secret information. For example, a textbook implementation of RSA decryption takes a different amount of time depending on the individual key bits [\[84\]](#page-182-0)—each '1' bit requires an additional multiplication and thus more time. The cumulative effects of these operations on the running time is large enough for the attacker to reconstruct the value of the secret key. Timing vulnerabilities like these are not merely of academic

interest: They have been found in implementations of both RSA [\[32\]](#page-178-0) and AES [\[20,](#page-177-0) [115\]](#page-185-0), where they allowed even remote network attackers to divine the values of secret keys.

The most robust way to deal with timing side-channels is via *constant-time programming* the paradigm used to implement almost all modern cryptography [\[13,](#page-177-1) [46,](#page-179-0) [50,](#page-180-1) [118,](#page-185-1) [119\]](#page-185-2). Constant-time programs can neither branch on secrets nor access memory based on secret data.<sup>[1](#page-17-1)</sup> The first class of vulnerability, from control flow, arises when the value of a secret influences control flow, as attackers can often observe the path of execution through a program: For example, if conditional branch targets take different amounts of time to execute [\[118\]](#page-185-1) or if different program paths use different amounts of hardware resources [\[29\]](#page-178-1). The second class of vulnerability, from memory accesses, arises when memory access patterns depend on secret data. An attacker co-located on the same machine as a victim process, for example, can easily infer secret memory access patterns by observing their own cache hits and misses [\[59,](#page-180-0) [115\]](#page-185-0); alarmingly, attackers might even learn such information across a datacenter—or even over the Internet [\[32,](#page-178-0) [126\]](#page-185-3).

The constant-time paradigm implicitly assumes that each instruction in a program is executed in order. However, modern processors do not execute sequentially—instead, they *speculatively* execute (potentially incorrect) program instructions ahead of time before prior instructions are fully resolved. Standard constant-time guarantees are therefore insufficient for most modern hardware.

## <span id="page-17-0"></span>2 Spectre vulnerabilities

We next give an overview of Spectre attacks [\[9,](#page-176-1) [12,](#page-177-2) [69,](#page-181-0) [82,](#page-182-1) [86,](#page-182-2) [87,](#page-183-0) [97,](#page-183-1) [169\]](#page-189-0), a recently discovered family of vulnerabilities caused by *speculative execution* on modern processors. Spectre allows attackers to learn sensitive information by causing the processor to mispredict the targets of control flow (e.g., conditional jumps or indirect calls) or data flow (e.g., aliasing

<span id="page-17-1"></span><sup>&</sup>lt;sup>1</sup>Constant-time programs must also not use secret data as input to any variable-time operation—e.g., floating-point multiplication [\[10\]](#page-177-3).

or value forwarding). When the processor learns that its prediction was wrong, it *rolls back* execution, erasing the programmer-visible effects of the speculation. However, *microarchitectural* state—such as the state of the data cache—is still modified during speculative execution; these changes can be leaked during speculation and can persist even after rollback. As a result, the attacker can recover sensitive information from the microarchitectural state, even if the sensitive information was only speculatively accessed.

The following code gives an example of a vulnerable function; an attacker can exploit branch misprediction to leak arbitrary memory via the data cache:

> if (i < arrALen) { // mispredicted int  $x = arrA[i]$ ; // x is oob value int  $y = arrB[x]$ ; // leaked via address!  $// \ldots$

The attacker first primes the branch to predict that the condition  $\frac{1}{1}$  < arrALen is true by causing the code to repeatedly run with appropriate (small) values of  $\pm$ . Then, the attacker provides an out-of-bounds value for i. The processor (mis)predicts that the condition is still true and speculatively loads out-of-bounds (potentially secret) data into  $x$ ; subsequently, it uses the value  $x$ as part of the address of a memory read operation. This encodes the value of x into the data cache state—depending on the value of x, different cache lines will be accessed and cached. Once the processor resolves the misprediction, it rolls back execution, but the data cache state persists. The attacker can later interpret the data cache state in order to infer the value of x.

## <span id="page-18-0"></span>3 Principled and practical foundations

Many developers rely on community best-practices and recipes to manually write constanttime code [\[104,](#page-184-0) [118\]](#page-185-1). Developers apply these recipes in an *ad-hoc* manner, leaving overlooked vulnerabilities open to attack. Even then, it can be tricky for developers to *correctly* apply the

recipes. For example, an attempt to use a recipe to fix a timing attack vulnerability in TLS [\[104\]](#page-184-0) led to the Lucky13 timing vulnerability in OpenSSL [\[3\]](#page-176-2)—and the purported fix for Lucky13 opened the door to yet another vulnerability [\[140\]](#page-187-0).

Spectre mitigations, even when inserted automatically via tooling, have fared no better: The MSVC compiler's /Qspectre flag—one of the first compiler defenses [\[102\]](#page-184-1)—inserts mitigations by searching for code patterns. Since these patterns are not based in any rigorous analysis, the compiler easily misses similarly vulnerable code patterns [\[113\]](#page-185-4). Chrome adopted process isolation as its core defense mechanism against Spectre attacks [\[123\]](#page-185-5), but this is also unsound: [\[35\]](#page-178-2) shows that Spectre attacks can be performed across the process boundary, and [\[128\]](#page-186-0) shows how to read cross-origin data in the browser. Like constant-time recipes, Spectre defense mechanisms are applied ad-hoc and incompletely.

For targeted, flexible, *sound* defenses, we must turn to formal methods. Formal security analysis is rooted in *program semantics*, which provides rigorous models of program behavior and serves as the basis for *formal security policies*. These policies help us carefully and explicitly spell out our assumptions about the attacker's strength and to gain confidence that our tools are sound with respect to this class of attackers—that timing side-channel defenses indeed enforce a constant-time policy, or that Spectre detection tools find the vulnerabilities they claim.

Formal foundations not only ensure constant-time and Spectre defenses are secure, but also help improve the performance of practical tools. Without formalizations, manual defenses cannot be assured sound, and automatic defenses are usually either overly conservative (unnecessarily flagging code as vulnerable, which ultimately leads to unnecessary and slow mitigations) or overly aggressive (and thus vulnerable). Developing proper foundations allows us to craft defenses that are instead *targeted* while still being provably secure [\[7,](#page-176-3) [65,](#page-181-1) [151\]](#page-188-0).

## <span id="page-20-0"></span>4 Outline

This dissertation lays *principled* and *practical* foundations for rebuilding side-channel defenses in the speculative domain.

[Chapter 1](#page-23-0) presents FaCT, a compiler and domain-specific language for writing *sequentially* constant-time code. Although FaCT does not analyze speculative effects, it gives us a blueprint for what *security-aware* compilation can achieve. At the core of FaCT is a set of formal *compiler transformations* that describe how to soundly replace leaky program behavior: The results of transformation are programs with equivalent behavior, but that don't leak secrets. The FaCT language itself is a C-like language augmented with *secrecy annotations*, which allow the developer to explicitly specify which program variables are indeed secret. The FaCT compiler tracks these annotations through the compilation pipeline, allowing it to apply the transformation rules only when necessary to produce constant-time code.

[Chapter 2](#page-61-0) presents a structural foundation for speculative analysis: A formal instructionlevel semantics that models the speculative behavior—such as branch predictions and value forwarding—of modern processors. On top of this execution model, we apply the secrecy annotations from FaCT and define the notion of *speculative constant-time* (SCT): A speculative side-channel leak (such as a Spectre attack) is a violation of SCT. This semantics is expressive enough to capture all known Spectre attacks, including a variant of Spectre that was unrealized at the time. This semantics has been used to show the soundness of several tools that detect Spectre vulnerabilities, including our own verification tool, Pitchfork.

[Chapter 3](#page-99-0) builds upon this foundation, constructing a framework to analyze *software isolation* (or *sandboxing*) in the speculative context. Current systems that prevent speculative sandbox attacks are implemented as collections of ad-hoc mitigations, without any formal backing. We rectify this, expanding our speculative properties and semantics to capture speculative sandbox

security in addition to SCT. Our formal model shows that existing systems are not sound and make several implicit assumptions about the underlying hardware.

Finally, [Chapter 4](#page-114-0) surveys the current state of Spectre analysis and defense tools, both with and without associated formal models. We examine and categorize these systems by the different choices they make in their stated (or implied) semantics and security properties. Our analysis provides practical suggestions and considerations both for developers of analysis and mitigation tools and for researchers of speculative security.

## Acknowledgements

[Introduction,](#page-16-0) in part, uses the following material:

Material as it appears in 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '19). Cauligi, Sunjay; Soeller, Gary; Johannesmeyer, Brian; Brown, Fraser; Wahby, Riad S.; Renner, John; Grégoire, Benjamin; Barthe, Gilles; Jhala, Ranjit; Stefan, Deian, ACM, 2019. The dissertation author was the primary investigator and author of this paper.

Material as it appears in 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '20). Cauligi, Sunjay; Disselkoen, Craig; v. Gleissenthall, Klaus; Tullsen, Dean; Stefan, Deian; Rezk, Tamara; Barthe, Gilles, ACM, 2020. The dissertation author was the primary investigator and author of this paper.

Material currently being prepared for submission for publication. Cauligi, Sunjay; Guarnieri, Marco; Mehta, Aastha; Moghimi, Daniel; Narayan, Shravan; Stefan, Deian; Vahldiek-Oberwagner, Anjo; Vassena, Marco. The dissertation author was the primary investigator and author of this paper.

Material that has been submitted for publication as it may appear in 43rd IEEE Symposium on Security and Privacy (Oakland '22), Cauligi, Sunjay; Disselkoen, Craig; Moghimi, Daniel; Barthe, Gilles; Stefan, Deian. The dissertation author was the primary investigator and author of this material.

# <span id="page-23-0"></span>Chapter 1

# FaCT: A DSL for Timing-Sensitive Computation

*Or, a sketch of the tower.*

Despite many strides in language design over the past half-century, modern cryptographic routines are still typically written in C. This is good for speed but bad for keeping secrets. Like most general-purpose languages, C gives the programmer no way to denote which data is sensitive—and therefore gives the programmer no warnings about code that inadvertently divulges secrets.

The only recourse developers have to avoid timing vulnerabilities is to make their code ugly. Specifically, they use a selection of *recipes* to turn dangerous but readable code into safe but obfuscated code: they re-write potentially secret-revealing constructs like branches into low level sequences of assignments that operate in *constant-time* regardless of the values of secret data. For example, the readable

if (secret)  $x = e$ 

which branches on a secret bit is replaced by

 $x = (-secret \& e)$  | (secret - 1) & x

which, unlike the branch, executes in the same amount of time no matter the value of secret.

This is a sorry state of affairs. First, developers apply the recipes in an *ad-hoc* way, and any untransformed computation is left vulnerable to attack. Second, the recipes *obfuscate* the code, making it harder to determine whether the routine is even computing the desired value. Third, it can be tricky for developers to *correctly* apply the recipes. For example, an attempt to use a recipe to fix a timing attack vulnerability in TLS [\[104\]](#page-184-0) led to the Lucky13 timing vulnerability in OpenSSL [\[3\]](#page-176-2), and the purported fix for Lucky13 opened the door to yet another vulnerability [\[140\]](#page-187-0)!

In this chapter, we introduce FaCT, a domain-specific language and compiler for writing *readable* and *timing-secure* cryptographic routines. FaCT lets developers write readable code using high-level control-flow constructs like branches and procedural abstractions, but then automatically compiles this code into efficient, constant-time executables. We develop FaCT via four contributions:

**1. Language design.** Our first contribution is the design of a language for writing cryptographic code. The language allows programmers to use standard control-flow constructs like if and return statements. However, the language is equipped with an *information-flow* type system that programmers can use to specify which data are secret. The type system prevents leaks by ensuring that secrets do not explicitly or implicitly influence the public-visible outputs  $(\S1.2)$ .

2. Public safety. Our second contribution is the observation that not all programs are amenable to constant-time compilation. Specifically, we show that naive application of constanttime recipes can mangle otherwise safe programs, causing memory errors or undefined behavior. We address this problem by introducing a notion called *public safety* that characterizes the source programs that can be compiled to constant-time without introducing errors ([§1.2.2.3\)](#page-37-1).

3. Constant-time compilation. Our third contribution is a compiler that automatically converts (public safe) source programs into constant-time executables. The FaCT compiler is based on the key insight that we can exploit the secrecy types to *automatically* apply the recipes that developers have hitherto applied by hand, and can do so *systematically*, i.e., exactly where needed to prevent the exposure of secrets via timing. We formalize the compiler with two transformations, *return deferral* and *branch removal*, and prove that compilation yields constant-time executables with source-equivalent semantics ([§1.3\)](#page-39-0).

4. Implementation and evaluation. Our final contribution is an implementation of FaCT that produces LLVM IR from high-level sources, and uses LLVM's clang to generate the final object or assembly file. We evaluate FaCT's *usability* with a user study, surveying students in an upper-level, undergraduate programming languages course at a U.S. university, where 57% of the participants found FaCT easier to write than C (compared to 15% who found C easier). We evaluate FaCT's *expressiveness* and *performance* by using our implementation to port 7 cryptographic routines from 3 widely used libraries: OpenSSL, libsodium, and curve25519 donna, totaling about 2400 lines of C source. The unoptimized FaCT code—which we *formally* guaranteed to be constant-time—is between 16–346% slower than the C equivalent. The clangoptimized FaCT code—which we *empirically* check to be constant-time using dudect [\[125\]](#page-185-6)—is between 5% slower to 21% *faster* than the C equivalent, showing that FaCT yields readable constant-time code whose performance is competitive with C ([§1.4\)](#page-49-0).

We make all source and data available under an open source license at: [https://fact.programming.](https://fact.programming.systems) [systems.](https://fact.programming.systems)

## <span id="page-25-0"></span>1.1 Background

Some common C constructs—branches, returns, and array updates—can reveal secrets via timing channels. In this section, for each potentially dangerous construct, we explain: (1) how that construct could introduce bugs in real-world projects; (2) how developers must use recipes to avoid the dangerous construct; and, (3) how FaCT allows programmers to forgo recipes and write readable yet safe code.

*Branching on secret values.* A first class of vulnerability arises from directly branching on the value of a secret—attackers can often reconstruct control flow through a program, and thus secret condition values (e.g., because the true branch takes orders of magnitude longer to execute than the false branch) [\[118\]](#page-185-1). To avoid this type of vulnerability, developers manually translate branching code to straight-line code by replacing if-statements with constant-time bitmasks. Consider the following example from OpenSSL (edited slightly for brevity), which formats a message before computing a message authentication code (MAC):

```
for (j = 0; j < \text{md\_block\_size}; j++) {
 b = data[k - header_length];b = constant_time_select_8(is_past_c, 0x80, b);b = b & \simis past cp1;
 b &= ~is block b | is block a;
 block[j] = b;}
```
It's hard to tell, but this snippet (1) iterates over plaintext message data, (2) terminates the message with standard-defined  $0 \times 80$ , and (3) pads the terminated message to fill a hash block—all while keeping data secret. To this end, even the selection operator constant\_time\_select\_8(mask, a, b) is a series of bitmasks: (mask  $\&$  a) | (~mask  $\&$  b).

Translating each line of this OpenSSL code to FaCT leads to drastically more readable code:

```
for (uint64 j from 0 to md block size) {
 k += 1;
 b = is\_past\_c ? 0x80 : data[k - (len header)];if (is_past_cp1 || (is_block_b && !is_block_a)) {
   b = 0:
 }
```

```
block[j] = b;}
```
With FaCT, the programmer declares the sensitive variables as used in the conditions as secret. After doing so, they are free to use plain conditional expressions and ternary operators to compute the final value of  $\mathfrak b$ . The FaCT compiler automatically uses the type annotations to generate machine code equivalent to the C example.

*Early termination.* Both loops and procedures can terminate early depending on the value of a secret, thereby leaking the secret. A well-known padding oracle attack in older versions of OpenSSL exploits an early function return [\[152\]](#page-188-1): a packet processing function would decrypt a packet and then check that the padding was valid, and, in the case of invalid padding, would return immediately. An attacker could exploit this to recover the SSL session key by sending specially crafted packets and use timing measurements to determine whether or not the padding of the decrypted packet was valid. Similarly, if the number of loop iterations in a program depends on a secret, attackers can use timing to uncover the value of that secret (e.g., in the Lucky13 attack  $[3]$ ).

C programmers again use special recipes, turning idiomatic programs into hard-to-read constant-time code. Consider the following buffer comparison code from the libsodium cryptographic library:

```
for (i = 0; i < n; i++)d |= x[i] \wedge y[i];return (1 \& ((d - 1) \Rightarrow 8)) - 1;
```
This snippet compares the first n bytes of the arrays  $x$  and  $y$ , returning 0 if they are the same, and -1 otherwise. To avoid leaking information about the contents of the arrays, though, the loop *cannot* simply return early when it detects differing values; instead, the programmer must maintain a "flag" (d), and update it at each iteration to signal success or failure. While iterating

inside the loop, if the values  $x[i]$  and  $y[i]$  are the same, then  $x[i]$   $\land$   $y[i]$  will be 0, leaving d unchanged. However, if  $x[i]$  and  $y[i]$  are different, then their XOR will have at least one bit set, causing d to also have a non-zero value. After the loop, the code uses a complex shift-and-mask dance to collapse  $\text{d}$  into the value  $-1$  if any bits are set, and 0 otherwise.

FaCT lets programmers avoid the "flag" contortions:

```
for (uint64 i from 0 to n)
 if (x[i] != y[i])return -1;
return 0;
```
With FaCT, the programmer can readily express returning  $-1$  in the case of failure as the compiler automatically generates a special variable for the return value, and uses the secret type to translate returns-under-secret conditions into (constant-time) updates to this variable, producing machine code roughly equivalent to the C recipe above.

*Memory access.* Memory access patterns that depend on secret data can also inadvertently leak that secret data. An attacker co-located on the same machine as a victim process, for example, can easily infer secret memory access patterns by observing their own cache hits and misses [\[59,](#page-180-0) [115\]](#page-185-0); alarmingly, attackers might even learn such information across a datacenter—or even over the Internet [\[32,](#page-178-0) [126\]](#page-185-3).

To avoid leaking information via memory access patterns, developers rely on recipes that avoid accessing memory based on secrets. The following C code (from the "donna" Curve25519 implementation), for example, swaps the values of array a with array  $\frac{1}{2}$  based on the value of a secret (swap):

```
for (i = 0; i < 5; ++i) {
 const limb x = swap \& (a[i] \land b[i]);
 a[i] ^= x;
```

```
b[i] ^= x;
}
```
To avoid leaking the value of the secret swap, the code *always* accesses both a[i] and b[i] at each loop iteration, and uses bitmask operations that only change them if swap is a mask of all 1-bits.

FaCT, again, makes such subterfuge unneccessary:

```
if (swap != 0) {
  for (uint64 i from 0 to 5) {
    secret uint64 tmp = a[i];
    a[i] = b[i];b[i] = tmp;}
}
```
Once the programmer has marked swap as secret, the compiler will automatically synthesize masked array reads similar to those from the original Curve25519 code.

# <span id="page-29-0"></span>1.2 FaCT

FaCT is a high-level, strongly-typed C-like DSL, designed for writing constant-time crypto code. In this section, we describe the DSL and its type system, one that both disallows certain unsafe programs and specifies how the compiler should transform code to run in constant-time.<sup>[1](#page-30-2)</sup> We describe the type-directed transformations in [§1.3.](#page-39-0)

<span id="page-30-1"></span>PROGRAM *program* ::= - *fdef* | *sdef* ... STRUCTURE DEFINITION *sdef*  $::=$  struct *name* {  $\beta x; ...$  } PROCEDURE DEFINITIONS *fdef* ::=  $| f(\vec{x} : \vec{\beta}) \{ S \} : \beta$  internal procedure  $|\qquad \text{export } f(\vec{x} : \vec{\beta}) \{ S \} : \beta \quad \text{exported procedure}$  $|\qquad\text{extern }f(\vec{x}\,;\vec{\beta}): \pmb{\beta} \qquad\qquad \text{external procedure}$ 

Figure 1.1: FaCT grammar, top-level constructs.

### <span id="page-30-0"></span>1.2.1 Core language

FaCT is designed to be embedded into existing crypto projects (e.g., OpenSSL), and not to be used as a standalone language. As such, FaCT "programs" are organized as collections of procedures. As shown in Figure [1.1,](#page-30-1) developers can export these procedures as C functions to the embedding environment. They can also *import* trusted procedures. This is especially useful when using FaCT to implement error-prone glue code around already known-safe C crypto primitives (e.g., building a block cipher mode that calls an AES primitive).

FaCT procedures are composed of a sequence of statements (e.g., if statements, for loops, etc.), which are themselves composed of expressions. Both statements and expressions are mostly standard. We only remark on the more notable language constructs we add to make writing cryptographic code more natural.

First, FaCT includes a number of *array primitives* to capture common idioms in cryptographic routines, and to replace unsafe pointer arithmetic. The operation len *e* returns the length of an array *e*; zeros( $\beta$ , *e*) creates an array of zeros of type  $\beta$  of length *e*; clone(*e*) copies the

<span id="page-30-2"></span><sup>1</sup>The *surface* language as used by developers is slightly less verbose than the *core* language presented in this section. For example, our surface syntax allows procedures to be called in expressions; FaCT desugars such expressions into core language procedure-call statements. We refer to both the surface and core languages as FaCT.

<span id="page-31-0"></span>



<span id="page-32-0"></span>

#### Figure 1.3: FaCT types.

array *e*; and view( $e_1, e_2, e_{len}$ ) returns a *slice* of array  $e_1$  starting at position  $e_2$  and with length *e*<sub>len</sub>. We introduce views to make up for the lack of pointers: views allow developers to efficiently compute on small pieces of large buffers.

Second, we provide *vector primitives*: parallel vector arithmetic and vector shuffle instructions. These instructions allow developers to implement crypto algorithms that leverage fast SIMD instructions (e.g., SSE4 in x86\_64) without resorting to architecture-specific inline assembly or compiler intrinsics.

Third, we expose ctselect, a constant-time selection primitive. The operation ctselect( $e_1, e_2, e_3$ ) evaluates to either  $e_2$  or  $e_3$ , depending on whether  $e_1$  is true or false, respectively. The compiler guarantees that ctselect compiles to constant-time code (e.g., as a series of bitmasks or the CMOV instruction on x86\_64). Developers need not use ctselect directly; instead, they can use our higher-level if-statements, which our compiler transforms to such ctselects ([§1.3\)](#page-39-0).

Lastly, FaCT includes a declassify primitive that takes a secret expression as input and returns a public value. Developers can use this primitive to bypass FaCT's typing restrictions (described below) and explicitly make information public. This is useful, e.g., for implementing encryption: a buffer containing a secret message must be treated with care, but if the buffer is encrypted in-place, it is thereafter safe to declassify because it contains ciphertext.

### <span id="page-33-0"></span>1.2.2 Type system

The most important feature of the FaCT language is its static information-flow type system. We rely on this type system to: (1) provide a way for developers to demarcate the sensitivity of data—whether it is secret or public; (2) reject unsafe programs, i.e., programs that are not information-flow secure or cannot be safely transformed to constant-time code; and (3) direct the compiler when applying transformations. Below, we give an overview of our type system and explain how it fulfills the first two roles; we leave the third for [§1.3.](#page-39-0)

Like previous information-flow type systems [\[109,](#page-184-2) [110,](#page-184-3) [129,](#page-186-1) [156\]](#page-188-2), FaCT decorates each base type with a secret or public *secrecy label*[2](#page-33-1) . Figure [1.3](#page-32-0) summarizes our base types; they are largely standard. Reference types wrap another base type and inherit its secrecy label; they are also access controlled, i.e., they can be read-only or read-write. In the FaCT surface syntax, we disallow recursively-typed references—only references of integer and boolean types are expressible. Array types, like references, inherit the secrecy of their base type; arrays have a size which is either a statically known constant or dynamically determined (∗). Struct types *do not* carry a secrecy label; instead, each struct field is individually labeled.

Developers explicitly specify labels when they declare variables and procedures. FaCT's type system, in turn, uses these labels to reject unsafe programs and specify how the compiler should transform high-level code that uses seemingly unsafe constructs (e.g., secret ifstatements) to constant-time code. Below, we walk through our typing rules for expressions, statements, and procedures.

<span id="page-33-1"></span><sup>&</sup>lt;sup>2</sup>Labels are partially ordered according to  $\subseteq$  as usual: PUB  $\subseteq$   $\ell$  and  $\ell \subseteq$  SEC holds true for any label  $\ell$ . The join of two labels is similarly standard:  $\ell_1 \sqcup \ell_2$  is SEC if either label is SEC, and PUB otherwise. For brevity, we also use these operators on types (operating on the underlying label), much like previous work (e.g., [\[109,](#page-184-2) [110\]](#page-184-3)).

#### 1.2.2.1 Expression typing

FaCT's expression typing judgment  $\Gamma \vdash e : \beta$  states that under the type context  $\Gamma$ , which maps variables to their declared types, the expression *e* has the type  $\beta$ . We write  $x : \beta \in \Gamma$  when variable *x* maps to type β in the context Γ.

Figure [1.4](#page-36-0) gives the typing rules for the most interesting expressions. The rule for ctselect, for example, ensures that (1) the result is at least as secret as all the arguments to ctselect and (2) all the arguments can be cast to integers—since, internally, ctselect may be implemented as a series of constant-time bitmasks. The typing rules for other constructs similarly preserve secrecy.

The type system also disallows certain kinds of unsafe computations. For example, we reject programs that index memory based on secrets: the rules for T-ARR-GET and T-ARR-VIEW ensure that array indices are public and in-bounds. The in-bounds checks are highlighted , and detailed in [§1.2.2.3.](#page-37-1)

### 1.2.2.2 Statement and procedure typing

FaCT allows developers to write code whose control flow depends on sensitive data. Unfortunately, not all such code can be safely or efficiently transformed. For example, to safely allow writes to arrays using a secret-dependent index we must (transform the code to) write to *all* indices [\[103,](#page-184-4) [117,](#page-185-7) [122\]](#page-185-8); such a transformation would be expensive, and FaCT instead disallows such computations. As such, typing rules for statements and procedures rely on a *secrecy context*, which comprises a pair of secrecy labels *pc*,*rc* called the *path* and *return* context, respectively.

The path context label *pc* for a statement is secret if the statement is contained within i.e., is control-dependent upon—a secret branch. Since a procedure caller's path context must persist through to the callee's path context, the initial path context label of a procedure is secret if it is ever called from a secret context; otherwise the initial path context label is public. We use  $\omega$  to map procedures to their initial path context labels.

The return context label *rc* for a statement is secret if the statement *may* be preceded by a return statement that is itself control-dependent on a secret value. A procedure's return context label is always initially public. Thus, the *secrecy context* ( $pc \sqcup rc$ ) for a statement represents whether the flow of control (to get to the statement) can be influenced by secret values. For example, if the conditional expression of an if statement is secret, then the statements of each branch are judged with  $pc =$  SEC, and are thus typed under a secret context.

*Statement typing.* FaCT's statement typing judgment is of the form  $\omega, pc, \beta_r \vdash S : \Gamma, rc \rightarrow$  $\Gamma', r c',$  where  $\beta_r$  is the return type of the procedure containing the statement *S*. This judgment states that, given a type- and security- context defined by  $\omega$ ,  $pc$ ,  $\beta_r$  and initial  $\Gamma$ ,  $rc$ , the statement *S*: (1) can be safely compiled into constant-time code, and (2) yields a new updated type context  $\Gamma'$  and return context  $rc'$ . This typing judgment accounts for new variable declarations and ensures that the secrecy context influences subsequent statements. For example, if a return statement resides within a secret branch, then all statements executed after that branch must also be typed under a secret context, since their execution now depends on the return.

Figure [1.5](#page-37-0) shows the most interesting statement typing rules. For example, (T-ASGN) checks that when updating a reference, the current secrecy context does not exceed the secrecy label of the value  $e_2$  being assigned. This ensures that secret data cannot be leaked via control flow. Rules (T-IF) and (T-RET) account for such secret contexts; the latter additionally ensures that the procedure cannot return a value more sensitive than specified by the procedure return type.

Rule (T-FOR) is more restricting: it ensures that secrets do not influence the running time of for loops by requiring that the loop bounds—and therefore the number of iterations—be public. The updated return context  $rc'$  must both be a fixpoint of the loop, and must be no lower than the original return context  $rc$ . In practice, our type checker only assigns  $rc'$  to be secret if it cannot assign it to be public.
<span id="page-36-0"></span>

| T-CT-SEL<br>$\Gamma \vdash e_1 : \text{Bool}_\ell$ | $\beta$ is numeric or BOOL $\Gamma \vdash e_2 : \beta$   |                                                                             | $\Gamma \vdash e_3 : \beta$           |
|----------------------------------------------------|----------------------------------------------------------|-----------------------------------------------------------------------------|---------------------------------------|
|                                                    | $\Gamma\vdash$ ctselect $(e_1,e_2,e_3): \beta\sqcup\ell$ |                                                                             |                                       |
| T-ARR-GET                                          |                                                          |                                                                             |                                       |
|                                                    | $\Gamma \vdash e_1 : \text{ARR}^{sz}[\beta]$             | $\Gamma \vdash e_2 : \text{UINT}_{\text{PIIB}}^s$                           | $\Gamma \Rightarrow e_2 < 1$ en $e_1$ |
|                                                    |                                                          | $\Gamma \vdash e_1[e_2] : \beta$                                            |                                       |
| T-Arr-View                                         |                                                          |                                                                             |                                       |
| $\Gamma \vdash e_1 : \text{ARR}^{sz}[\beta]$       | $\Gamma \vdash e_2 : \text{UINT}_{\text{PIIB}}^s$        | $\Gamma \vdash e_{len} : \text{UINT}_{\text{PUB}}^s$                        | $sz' = szOfExp(r_{len})$              |
|                                                    | $\Gamma \Rightarrow e_2 < 1$ en $e_1$                    | $\Gamma \Rightarrow e_{len} \leq \text{len } e_1 - e_2$                     |                                       |
|                                                    |                                                          | $\Gamma \vdash$ view $(e_1, e_2, e_{len})$ : ARR <sup>sz'</sup> [ $\beta$ ] |                                       |
|                                                    |                                                          |                                                                             |                                       |

Figure 1.4: FaCT expression typing rules (subset).

The typing for procedure calls given by (T-CALL) is slightly more complex. In particular, this rule ensures that procedures can only be called with suitable inputs and checks that the output type is *compatible* with the variable being assigned. To this end, we ensure that if the procedure *f* has visible effects, then its initial path context  $\omega(f)$  must be at least the label of the calling context. This, in effect, ensures that in a secret context we *cannot* call procedures that (1) modify public parameters, i.e., take mutable public references as input parameters; (2) are externally defined and so possibly have publicly visible side-effects; or (3) are exported (top-level) procedures.

*Procedure typing.* Figure [1.6](#page-38-0) shows rules for typing procedure definitions. FaCT's procedure typing judgment is of the form  $\omega \vdash f(\vec{x} : \vec{\beta}) \{S\} : \beta_r$ , which states that under  $\omega$ , the procedure  $f$  with named parameters  $\vec{x}$  of types  $\vec{\beta}$  has return type  $\beta_r$ . Procedures in FaCT may only return simple types (i.e., boolean values or integers), but there are no such restrictions on the types of parameters. When typing procedures, the initial type context  $\Gamma$  is formed from the procedure's parameters, and the initial path context  $pc$  is given by  $\omega(f)$ . The return context  $rc$ always starts as PUB, as the procedure body *S* (vacuously) has no preceding secret-dependent return statements. The return type  $\beta_r$  is taken from the procedure definition. If the body *S* is well-typed under these initial contexts, then the procedure itself is considered well-typed.

<span id="page-37-0"></span>
$$
\frac{\mathsf{w} \vdash f(\vec{\beta}) : \beta \qquad \textit{hasEffects}(f) \Rightarrow pc \sqcup rc \sqsubseteq \omega(f) \qquad \Gamma \vdash e_i : \beta_i \qquad \Gamma' = \Gamma, x : \beta}{\omega, pc, \beta_r \vdash \beta x = f(\vec{e}) : \Gamma, rc \rightarrow \Gamma', rc}
$$



T-IF

$$
\Gamma \vdash e : \text{Bool}_{\ell}
$$
\n
$$
\omega, pc \sqcup \ell, \beta_r \vdash S_1 : \Gamma \land e \mid rc \to \Gamma_1, rc_1
$$
\n
$$
\omega, pc \sqcup \ell, \beta_r \vdash S_2 : \Gamma \land \neg e \mid rc \to \Gamma_2, rc_2
$$
\n
$$
\overline{\omega, pc, \beta_r \vdash \text{if } (e) \{ S_1 \} \text{else } \{ S_2 \} : \Gamma, rc \to \Gamma, rc_1 \sqcup rc_2}
$$

T-FOR

$$
\frac{\Gamma \vdash e_1: \text{UINT}_{\text{PUB}} \qquad \Gamma \vdash e_2: \text{UINT}_{\text{PUB}}}{\alpha, pc, \beta_r \vdash \text{for } (x \text{ from } e_1 \text{ to } e_2) \{ S \}:\Gamma, rc \rightarrow \Gamma', rc' \rightarrow \Gamma'', rc'}
$$

| T-RET                                                                                 | T-ASSUME                                                                          |
|---------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|
| $\Gamma \vdash e : \beta_r$ $pc \sqcup rc \sqsubseteq \beta_r$                        | $\Gamma \vdash e : \text{Bool}_\ell \qquad \Gamma' = \Gamma \wedge e$             |
| $\omega, pc, \beta_r \vdash$ return $e : \Gamma, rc \rightarrow \Gamma, pc \sqcup rc$ | $\omega, pc, \beta_r \vdash \text{assume}(e): \Gamma, rc \rightarrow \Gamma', rc$ |

Figure 1.5: FaCT statement type rules (subset).

### <span id="page-37-1"></span>1.2.2.3 Public safety

The FaCT type system ensures that procedures can be transformed using constant-time recipes without giving up safety. Naively applying recipes can inadvertently *introduce* safety and security vulnerabilities while making the code constant-time. Consider the following procedure:

```
void potential_oob( secret mut uint32[] buf
                  , public uint64 i
                  , secret uint64 secret_index ) {
  assume(secret_index <= len buf);
  if (i < secret_index)
```
<span id="page-38-0"></span>T-FN  
\n
$$
pc = \omega(f)
$$
\n
$$
\Gamma = {\vec{x} : \vec{\beta}}
$$
\n
$$
\beta_r
$$
 is numeric or BooL  
\n
$$
\omega(f) = \text{PUB}
$$
\n
$$
\omega, pc, \beta_r \vdash S : \Gamma, \text{PUB} \rightarrow \Gamma', rc'
$$
\n
$$
\beta_r
$$
 is numeric or BooL  
\n
$$
\omega \vdash f(\vec{x} : \vec{\beta}) \{ S \} : \beta_r
$$
\n
$$
\omega \vdash \text{extern } f(\vec{x} : \vec{\beta}) : \beta_r
$$

Figure 1.6: FaCT procedure typing rules (subset).

$$
buf[i] = 0;
$$

...

}

This code is memory safe as the branch condition ensures that we only update buf [i] when i is within bounds. However, the update is predicated upon a secret condition. To make the above code constant-time, we must ensure that the access to  $\text{buf}[i]$  happens regardless of that condition, or else the memory access pattern will reveal the secret. Consequently, the constant-time recipes—that we discuss in [§1.3—](#page-39-0)would compile the code into:

```
cond = (i \lt secret index);
buf[i] = ctselect(cond, 0, buf[i]);
```
Such a naive transformation introduces a potential *out-of-bounds* access. In other cases it can introduce yet different kinds of undefined behavior.

*Public safety.* We avoid the above problem with the key observation that for a program to be amenable to constant-time compilation, the source must be *publicly safe*: It must be memorysafe and free from buffer overflows and undefined behavior using only public-visible information, i.e., the code must be safe even after removal of secret-dependent control-flow. We formalize the notion of public safety in FaCT's type system by extending the type environment  $\Gamma$  to track public-visible path conditions, using these conditions to check safety. In Figures [1.4](#page-36-0) and [1.5](#page-37-0) these public safety extensions are highlighted .

*Public views.* We first define the judgment  $\Gamma \vdash_i e$  to mean that *e* is *immutable* in  $\Gamma$ ; that is, *e* is only composed of constants, immutable variables, array lengths, or operations thereon. Next, we define the operation Γ ∧ *e*, which *conjoins* Γ with a *public view* of the condition *e*: if *e* is a public bool ( $\Gamma \vdash e : \text{BoOL}_{\text{PUB}}$ ) and *e* is immutable ( $\Gamma \vdash_i e$ ), then  $\Gamma \land e$  represents the environment  $\Gamma$  with the additional assumption that *e* is true. Otherwise,  $\Gamma \wedge e = \Gamma$ , i.e., conjoining Γ with a secret condition does not add any new assumptions to Γ. Rules T-IF and T-FOR in Figure [1.5](#page-37-0) show how we propagate public views, tracking (public) conditions and loop ranges to use when type checking statements.

For cases where the public safety checker is incomplete, we allow developers to add assumptions directly to the environment  $\Gamma$  with the assume primitive (Figure [1.2\)](#page-31-0). This is useful for aiding the checker by, e.g., adding preconditions to a procedure.

*Checking public safety.* Finally, we define  $\Gamma \Rightarrow e$  to mean that the conditions in  $\Gamma$  *imply e*. This is checked via an SMT solver. We use this predicate in the expression typing rules T-ARR-GET and T-ARR-VIEW [\(Figure 1.4\)](#page-36-0) to check that memory accesses are never out of bounds. In the example program given earlier, since the expression  $i \leq$  secret\_index is of type BOOLSEC, it is not added to Γ; thus the predicate  $\Gamma \Rightarrow i < 1$  en buf does not hold when typing the expression buf $[i]$ , and the program (correctly) does not type check.

The FaCT type system also prevents undefined behavior from invalid operand values (not shown in [Figure 1.4\)](#page-36-0). For example, integer division has the additional requirement  $\Gamma \Rightarrow e_2 \neq 0$ , and the left- and right-shift operators have the requirement  $\Gamma \Rightarrow 0 \le e_2 < s$  where *s* is the bitwidth of *e*1.

# <span id="page-39-0"></span>1.3 Front-end compiler

The FaCT compiler consists of two passes. The first pass is a source to source transformation—it compiles well-typed code into semantically equivalent FaCT constant-time code whose observable timing is secret-independent. The second pass is straightforward—it takes the secret-independent code and generates LLVM bitcode. In the rest of the section, we thus only describe and formalize FaCT's transformation pass.

Since our type checker ([§1.2.2\)](#page-33-0) already ensures that memory accesses, loop iterations, and variable-time instructions are secret-independent, the transformations need only make procedure returns and branches secret-independent. FaCT does this in two steps, *return deferral* and *branch removal*.

The first step replaces secret-dependent return statements by (1) creating a boolean that represents whether the procedure has returned and (2) conditioning all later code on that boolean to prevent statements from executing after the original procedure would have terminated. That is, return deferral converts control flow in terms of secret returns into control flow in terms of secret ifs.

The second step turns all secret-dependent conditional branches into straight-line code. This includes both secret if statements in the original source as well as those generated by return deferral. Thus, by eliminating secret ifs—the last source of secret-dependent timing—this transformation yields constant-time code.

## 1.3.1 Return deferral

As previously mentioned, early returns that depend on secrets often leak information. Consider the following snippet:

```
if (sec) { return 1; }
// long-running computation ...
```
Here, an attacker can determine whether sec is true by observing a quick computation, or false by observing a slow computation.

FaCT prevents code from leaking such secrets by *deferring* returns to the end of each procedure. For example, the compiler transforms the above code to:

<span id="page-41-0"></span>TR-RET-DEC  
\n
$$
\Phi = (\omega, {\{\vec{x} : \vec{\beta}\}, \beta_r}) \qquad \Phi, \omega(f), \text{PUB} \vdash S \rightarrow S'
$$
\n
$$
\omega \vdash f(\vec{\beta}) \{ S \} : \beta_r \rightarrow
$$
\n
$$
f(\vec{\beta}) \{ \text{REFRW}[\beta_r] \mid \text{val} = \text{init}(\beta_r);
$$
\nREFRW} [BoOL<sub>SEC</sub>] *notRet* = true;  $S'$ ; return *rval* } : \beta\_r\nTR-RET-GUARD-PUB TR-RET-GUARD-SEC  
\n $\Phi, pc, \text{PUB} \vdash S \rightarrow S'$   $\Phi, pc, \text{SEC} \vdash S \rightarrow S'$   
\n $\Phi, pc, \text{PUB} \vdash S \rightarrow S'$   $\Phi, pc, \text{SEC} \vdash S \rightarrow i \text{f} \text{ (notRet)} \{ S' \}$ \nTR-RET  
\n
$$
pc \sqcup rc = \text{SEC}
$$
\n $\Phi, pc, rc \vdash \text{return } e \rightarrow \text{rval} := e; \text{notRet} := \text{false}$ \nTR-RET-SEC  
\n $\Phi = (\omega, \Gamma, \beta_r) \qquad \omega, pc, \beta_r \vdash S_1 : \Gamma, rc \rightarrow \Gamma', rc'$ \n $\Phi, pc, rc \vdash S_1 \rightarrow S'_1 \qquad \Phi, pc, rc' \vdash S_2 \rightarrow S'_2$ \nTR-RET-FOR

$$
\frac{\Phi = (\omega, \Gamma, \beta_r)}{\Phi, pc, rc \vdash \text{for } (x \text{ from } e_1 \text{ to } e_2) \{ S \} \rightarrow \text{for } (x \text{ from } e_1 \text{ to } e_2) \{ S \} }
$$

## Figure 1.7: Transformation rules for return deferral.

```
secret mut uint 32 rval = 0;
secret mut bool notRet = true;
if (sec) { rval = 1; notRet = false; }
if (notRet) {
 // long-running computation ...
}
return rval;
```
The new notRet variable tracks whether or not the procedure *would have* returned, and any statement that could be executed after the return is guarded by the notRet variable. Finally, the actual return occurs at the very end of each procedure, returning the value stored in rval.

*Transformation rules.* We formalize return deferral using three kinds of rewrite rules, shown in Figure [1.7.](#page-41-0) The first *procedure-transformation* rule  $\omega \vdash f(\vec{x} : \vec{\beta}) \{S\} : \beta_r \to f(\vec{x} : \vec{\beta})$  $\vec{\beta}$ ) {  $S'$  } :  $\beta_r$  is used to rewrite the body *S* of a procedure *f* into a secret-independent body *S* 0 . (This is accomplished using the other two rewrite rules.) The second *guarded-execution* rule  $\Phi$ ,  $pc$ ,  $rc \vdash S \leadsto S'$  transforms a statement *S*, given a secrecy context  $pc$ ,  $rc$ , into *S'* by making implicit control flow (due to secret returns) explicit. Finally, the *return-elimination* rule  $\Phi, pc, rc \vdash S \rightarrow S'$  transforms *S* into *S'* by replacing all secret returns with assignments. Below, we walk though some of these rules in detail.

1. Procedure transformation. The TR-RET-DEC rule declares two special (mutable) variables notRet and rval that respectively hold the secret-dependent return state and the value to be returned. The return state notRet is set to true, while the return value rval is initialized to a default value for its type. The rule then eliminates all secret returns from *S* and inserts a (deferred) return after, as the very last statement of the transformed body  $S'$ .

2. Guarded execution. Rules TR-RET-GUARD-PUB and TR-RET-GUARD-SEC are used to transform statements that appear *after* any secret returns. Both of these rules first eliminate secret returns from *S* to obtain *S'*. If the original statement *S* is typed with  $rc =$  SEC, i.e., *S* may be preceded by a secret return, then the rule TR-RET-GUARD-SEC additionally *guards* the execution of S' with the condition notRet.

**3. Return elimination.** The bulk of the transformation is done by the remaining rules in Figure [1.7.](#page-41-0) We omit rules where we either do not transform the statement, or simply recursively transform any sub-statements. Rule TR-RET replaces secret returns by updating rval with the (deferred) return value and setting notRet to false, to signal that subsequent code should *not* be executed.

Rule TR-RET-SEQ handles sequenced statements  $S_1$ ;  $S_2$  by guarding the execution of instructions in  $S_2$  against possible secret returns in  $S_1$ . The rule first eliminates the secret returns from the *first* block to get S<sup>'</sup>  $\frac{1}{1}$ . Next, it extracts the secrecy context *rc*<sup> $\prime$ </sup> produced by type checking  $S_1$ . Finally, the rule uses *rc*<sup> $\prime$ </sup> to derive a guarded version of the *second* statement  $S_2^{\prime}$  $\frac{1}{2}$ .

The TR-RET-FOR rule handles secret returns inside loops. As control flow can jump back to the beginning of a loop, a secret return inside a loop body *S* can affect the execution of the entire body, as in the following example:

```
for (uint32 i from 0 to 5) {
 b[i] = 1;if (i == sec) { return i; }
 a[i] = 2;}
```
Here, if  $i ==$  sec becomes true, the program must stop overwriting the elements in both a *and* b. The rule accounts for returns in the body  $S$  by using the secrecy context  $rc'$  from type checking the body, and in turn, uses this to derive the guarded form of the body S'. In our example, the secret-dependent return makes the return context  $r = \text{SEC}$ , and so the entire body is guarded by notRet, to obtain the transformed program:

```
for (uint32 i from 0 to 5) {
 // for-loop rule
 if (notRet) {
   b[i] = 1;if (i == sec) { rval = i; notRet = false; }
   // sequencing rule
   if (notRet) { a[i] = 2; }
 }
}
```
<span id="page-44-0"></span>
$$
\frac{\mathsf{TR}\text{-}\mathsf{BR}\text{-}\mathsf{Dec}}{\Phi = (\omega, \{\vec{x} : \vec{\beta}\}, \beta_r) \qquad \omega(f) = \mathsf{PUB} \qquad \Phi, \text{true} \vdash S \to S'}
$$
\n
$$
\omega \vdash f(\vec{x} : \vec{\beta}) \{ S \} : \beta_r \to f(\vec{x} : \vec{\beta}) \{ S' \} : \beta_r
$$

$$
\text{TR-BR-DEC-SEC}\n\Phi = (\omega, {\vec{x}} : \vec{\beta}), \beta_r\n\quad \omega(f) = \text{SEC}\n\Phi, callCtx \vdash S \to S'\n\omega \vdash f(\vec{x} : \vec{\beta}) \{ S \} : \beta_r \to f(\vec{x} : \vec{\beta}, callCtx : Bool_{SEC}) \{ S' \} : \beta_r
$$

TR-BR-IF

$$
\Phi = (\omega, \Gamma, \beta_r)
$$
\n
$$
\Gamma \vdash e : \text{Bool}_{\text{SEC}} \qquad \text{FRESH } m_t, m_f \qquad \Phi, (p \& m_t) \vdash S_1 \rightarrow S'_1 \qquad \Phi, (p \& m_f) \vdash S_2 \rightarrow S'_2
$$
\n
$$
\Phi, p \vdash \text{if } (e) \{ S_1 \} \text{ else } \{ S_2 \} \rightarrow \qquad \{ \text{ Bool}_{\text{SEC}} m_t = e; \text{ } \text{BOOL}_{\text{SEC}} m_f = \neg m_t; \text{ } \text{S'_1}; S'_2 \qquad \}
$$

| TR-BR-ASSIGN                            | TR-BR-CALL                                        |
|-----------------------------------------|---------------------------------------------------|
| $p \neq$ true                           | $\omega(f) = \text{SEC}$                          |
| $\Phi, p \vdash e_1 := e_2 \rightarrow$ | $\Phi, p \vdash \beta x = f(\vec{e}) \rightarrow$ |
| $e_1 := \text{ctselect}(p, e_2, e_1)$   | $\beta x = f(\vec{e}, p)$                         |

Figure 1.8: Transformation rules for branch removal.

## 1.3.2 Branch removal

Return deferral eliminates secret returns by introducing secret-dependent branches. In this section we eliminate secret-dependent control flow as the final step towards producing constant-time code.

To this end, FaCT replaces secret branches with constant-time selections. Consider the following snippet:

```
if (sec1) { a[1] = 3; }
else if (sec2) { a[2] = 4; }
```
The updates to a[1] and a[2] are guarded by the secret values sec1 and sec2 and, therefore, produce memory access patterns that can reveal the values of those secrets when left untransformed this is the classic *implicit flows* problem [\[129\]](#page-186-0). We eliminate the implicit flow in two steps. First,

we track the *control predicates* that correspond to (the conjunction of) the secret-conditions. Then, we perform both memory writes, but use ctselect to preserve conditional semantics:

```
a[1] = ctselect( sec1, 3, a[1];
a[2] = ctselect(~sec1 & sec2, 4, a[2]);
```
Our general strategy is to transform each conditional array assignment into a re-assignment to a conditional (ctselect).

Transforming code that calls procedures is less straightforward: if a procedure takes a mutable parameter, the procedure may update that parameter's value in a way that is visible to the caller. For example:

```
void foo(secret mut uint32 x) { x = 5; }
...
if (sec) {
  foo(x);
 // x is now 5
}
```
The transformation of this code must ensure that updates to x only occur if sec is true. We do so using a *call-context* parameter passed to callee foo; this parameter is the caller control predicate—in this case, sec—which we use to guard updates in foo. Our compiler converts the above into semantically equivalent constant-time code:

```
void foo(secret mut uint32 x,
         secret bool callCtx) {
  x = ctselect(callCtx, 5, x);
}
...
foo(x, sec);
// x is 5 only if sec is true
```
*Transformation rules.* We formalize branch removal using two kinds of rules, shown in Figure [1.8.](#page-44-0) The *procedure transformation* rule  $\omega \vdash f(\vec{x} : \vec{\beta}) \{S\} : \beta_r \to f(\vec{x}' : \vec{\beta}') \{S'\} : \beta_r$ transforms the body  $S$  of the procedure  $f$  to  $S'$ , much like for secret-return removals. This rule, however, additionally extends *f*'s set of parameters  $\vec{x}$  to include the extra *call-context* parameter *callCtx*. The *statement transformation* rule  $\Phi$ ,  $p \vdash S \rightarrow S'$ , transforms *S* to *S'* given context  $\Phi$  and control predicate *p*. We walk though some of the rules below.

1. Procedure transformation rule. Both TR-BR-DEC and TR-BR-DEC-SEC remove branches from procedures. TR-BR-DEC transforms procedures that do not depend on secret contexts by transforming each procedure's body *S* into *S'* using the initial control predicate true. TR-BR-DEC-SEC, on the other hand, transforms a procedure *f* if  $\omega(f) = \text{SEC, i.e., where } f$ depends on the caller's secret context. The rule adds a new parameter secret bool callCtx that holds the control predicate at each call-site, and then transforms the body *S* starting with the initial control predicate callCtx.

**2. Branch elimination.** The remaining rules in Figure [1.8](#page-44-0) remove branches from statements. Rule TR-BR-IF, for example, eliminates secret-dependent conditional branches by saving the condition (resp. its negation) in the variable  $m_t$  (resp.  $m_f$ ). The "then" statement *S*<sub>1</sub> (resp. "else" statement *S*<sub>2</sub>) is then transformed after conjoining  $m_t$  (resp.  $m_f$ ) to the control predicate *p*. To prevent name collision when transforming nested conditionals, the FRESH metafunction guarantees that all  $m_t$  and  $m_f$  variables have unique names. The declarations of  $m_t$ ,  $m_f$  and transformed branches  $S_1'$  $'_{1}, S'_{2}$  $\frac{1}{2}$  are sequenced to obtain the final result.

Rule TR-BR-ASSIGN handles side-effecting assignment statements, using the control predicate to ctselect the old or new values. But, if the assignment occurs under the trivial control predicate (i.e., the literal true), the assignment is left unchanged.

Finally, rule TR-BR-CALL handles calls to ω-SEC procedures *f* by explicitly passing the control predicate *p* as the call-context parameter. This ensures that updates within *f* only occur according to the caller's control flow.

#### <span id="page-47-0"></span>1.3.3 Compiler correctness and security

In this section, we prove that our compiler preserves semantics and outputs constant-time procedures. To formalize these claims, we define an instrumented semantics that describes procedure behavior and *leakage*, i.e., the sequence of branches taken, the memory addresses accessed, and the operands to variable-time instructions. Intuitively, a procedure is constant-time if its leakage is not influenced by any secret values [\[15\]](#page-177-0).

In particular, we consider a big-step semantics of the form  $F : (\vec{v}, h) \stackrel{\psi}{\longrightarrow} (v, h')$  where F is shorthand for a procedure  $f(\vec{x} : \vec{\beta}) \{ S \} : \beta_r$ , the term  $\vec{v}$  represents the values of parameters, *h* and *h'* are *heaps* mapping pointers to values, *v* is the final value of the procedure, and  $\psi$  is the leakage. The semantics is parametrized by an allocation function, and the proofs of the claims below rely on several (minor) assumptions on this function. We give these assumptions, formal definition, and complete proofs in [Appendix A.](#page-151-0)

We first prove the *correctness* of our compiler, using the notation  $\omega \vdash F \rightarrow F'$  to denote the combined return deferral and branch removal transformations. Compiler correctness states that the compiler preserves the meaning of well-typed statements. To account for new references and variables that are introduced by the compiler pass itself, we show *equivalence* of the final heaps *h'* and *h''*, i.e., for any pointer *p'* in *h'*, there is an equivalent pointer *p''* in *h''* such that  $h'(p')$  and  $h''(p'')$  are either equal values, or are themselves equivalent pointers.

**Theorem 1.3.1** (Compiler correctness). If  $\omega \vdash F \rightarrow F'$  and  $F'$  is well-typed, then  $F: (\vec{v}, h) \stackrel{\psi}{\longrightarrow} (v, h')$  implies that  $F': (\vec{v}, h) \stackrel{\psi'}{\longrightarrow} (v, h'')$  and h' and h'' are equivalent.

 $\Box$ 

*Proof sketch.* By induction on the derivation.

Note that our compiler correctness theorem does not make any claim about leakage. We separately prove that the compiler produces constant-time procedures. To this end, we first define the notion of a constant-time procedure.

**Definition 1.3.2.** A procedure F where  $\omega \vdash F$  is constant-time *iff for every pair of executions*  $F:(\vec{\mathrm v}_1,h_1)\stackrel{\psi_1}{\longrightarrow} (\mathrm v_1,h_1')$  $\binom{1}{1}$  and  $F: (\vec{v}_2, h_2) \longrightarrow \binom{v_2}{2}$  $\mathbf{v}'_2$ ), we have  $\vec{v}_1, h_1 \equiv \vec{v}_2, h_2$  *implies*  $\psi_1 = \psi_2$ , *where*  $\equiv$  *is a suitably parametrized notion of equivalence (e.g., public or "low" equivalence* [\[7,](#page-176-0) *[15,](#page-177-0) [156\]](#page-188-0)).*

Much like CT-Wasm [\[156\]](#page-188-0), we cannot prove that *all* FaCT procedures are constant-time— FaCT allows procedures to declassify secret data and call external procedures over which it has no control. We can, however, provide guarantees for a safe subset of *declassify-free* procedures, i.e., procedures that do not contain any declassify statements nor call other procedures unless they too are declassify-free (and not extern).

**Theorem 1.3.3** (Compiler security). *If F is declassify-free and*  $\omega \vdash F \rightarrow F'$ *, then F' is constanttime.*

*Proof sketch.* We define two additional type systems that impose stricter constraints on programs, and prove type-preservation for return deferral and branch removal. We then conclude by proving that the final type system guarantees that programs are constant-time. It is important to note that these type systems are merely proof artifacts, i.e., type checking is not performed again after transformations.

Informally, the two type systems are incremental restrictions on the FaCT type system. The first type system, which we denote by  $\vdash_{rd}$ , rejects programs that contain secret returns; the second type system, denoted  $\vdash_{ct}$ , rejects programs that branch on secrets.

We then establish type-preservation for return deferral and branch removal:

- If  $\omega \vdash F$  and  $\omega \vdash_{rd} F \rightarrow F'$  then  $\omega \vdash_{rd} F'$ .
- If  $\omega \vdash_{rd} F$  and  $\omega \vdash_{ct} F \rightarrow F'$  then  $\omega \vdash_{ct} F'$ .

Both are proved by induction on derivations, using adequate ancillary statements for the induction to go through.

We conclude by proving that  $\vdash_{ct}$  guarantees that programs are constant-time. The proof follows from a "locally preserves" unwinding lemma, stating that equivalent states yield equivalent final configurations and equal leakage.  $\Box$ 

# 1.4 Implementation and evaluation

We implement a prototype compiler for FaCT in ∼6000 lines of OCaml. The compiler transforms FaCT source code into LLVM IR, which it passes to clang (version 6.0.1) to generate assembly or object code. The compiler uses the Z3 SMT solver [\[49\]](#page-180-0) to check public safety assertions ([§1.2.2.3\)](#page-37-1).

We evaluate FaCT by asking the following questions:

- $\triangleright$  Is FaCT expressive enough to implement real-world cryptographic algorithms?
- ▶ Does FaCT produce constant-time code?
- In What is FaCT's performance overhead?
- Compared to C, does FaCT improve non-experts' experience reading and writing constanttime code?

We answer the first three questions with case studies in which we integrate FaCT into real-world projects ([§1.4.1\)](#page-50-0). We find that FaCT is expressive enough to implement a range of cryptographic primitives. We use dudect [\[125\]](#page-185-0) to empirically check that our implementations, including compiler optimizations, are constant-time. We find that, compared to optimized C code, unoptimized FaCT code runs 16–346% more slowly, while optimized FaCT code ranges from 5% slower to 21% faster.

We answer the fourth question with a study comparing user experiences reading and writing FaCT and C ([§1.4.2\)](#page-53-0). In sum, a plurality of participants found FaCT easier to read than C, and a majority found FaCT easier to write.

#### <span id="page-50-0"></span>1.4.1 Case studies

We integrate FaCT into three popular open source libraries by porting pieces of these libraries from C to FaCT:

- In The NaCl secretbox API for symmetric-key authenticated encryption and decryption. We port the entire libsodium (version 1.0.16) [\[50\]](#page-180-1) secretbox API, including the two underlying primitives, the Poly1305 message authentication code (MAC) and the XSalsa20 stream cipher.
- $\triangleright$  The Curve25519 Elliptic-Curve Diffie-Hellman (ECDH) primitive for asymmetric key exchange. We port Adam Langley's curve25519-donna library [\[90\]](#page-183-0) in whole.
- ▶ The OpenSSL [\[114\]](#page-185-1) ssl3\_cbc\_digest\_record function used to verify decrypted SSLv3 messages. At its core, this function computes the MAC of a padded message without revealing the padding length. Our implementation invokes OpenSSL's SHA-1 hash primitive as an extern ([§1.2.1\)](#page-30-0).
- ▶ The OpenSSL aesni\_cbc\_hmac\_sha1\_cipher function used in the MAC-then-Encodethen-CBC-Encrypt (MEE-CBC) construction. This function performs AES-CBC decryption and then verifies the MAC and padding of the decrypted message. Our implementation invokes OpenSSL's AES and SHA-1 primitives as externs.

We choose these functions because they (1) are complex enough to exercise all of the FaCT language features; (2) implement a range of algorithms; and (3) demonstrate that FaCT can be used in different settings, from implementing individual procedures to large portions of libraries.

*Method.* We port in three steps. First, we port the C code to FaCT by translating C constructs to their corresponding FaCT counterparts. During this translation process, we label sensitive messages, keys, etc. as secret, and add assume and declassify statements as appropriate to ensure the code typechecks  $(\S1.4.1.1)$ ; we also replace "bit hacks"  $(\S1.1)$  with high-level FaCT constructs (e.g., if). Second, we check the correctness of our ports using each

| <b>Case study</b>       | Lines of code |             | #A | #D | #F.                   |
|-------------------------|---------------|-------------|----|----|-----------------------|
|                         |               | <b>FaCT</b> |    |    |                       |
| libsodium secretbox     | 984           | 1068        | 16 |    |                       |
| curve25519-donna        | 310           | 342         |    |    |                       |
| OpenSSL record validate | 92            | 91          |    |    | $\mathcal{D}_{\cdot}$ |
| OpenSSL MEE-CBC         | 201           | 219         |    |    |                       |

<span id="page-51-1"></span>Table 1.1: FaCT case study summary: lines of code (per cloc) and uses of assume (#A), declassify (#D), and extern (#E).

library's test harness, and we empirically check that the ports are constant-time using dudect ([§1.4.1.2\)](#page-52-0). Finally, we use each library's benchmarking suite to compare our ports to the C implementations ([§1.4.1.3\)](#page-53-1).

#### <span id="page-51-0"></span>1.4.1.1 Expressiveness

Table [1.1](#page-51-1) summarizes our ports. FaCT implementations are at worst ∼10% longer than the corresponding C code. Much of the extra length is because FaCT does not have a macro system; instead, we translated macro definitions and then manually expanded them. (We note that it would be straightforward to instead use the C preprocessor with FaCT.) FaCT code is also more verbose than C when processing buffers: since FaCT has no pointer arithmetic, FaCT code must use extra variables to track offsets into arrays.

Our ports make sparing use of extern, declassify, and assume. For example, our ports use assume to help the public safety verifier track values through memory and reason across procedure and language boundaries. We declassify in two cases: in libsodium secretbox decryption and in OpenSSL MEE-CBC verification; these declassifications are permitted by the libraries' respective attacker models [\[26,](#page-178-0) [45,](#page-179-0) [91\]](#page-183-1). Finally, we use extern to invoke existing primitives (e.g., OpenSSL's SHA-1 implementation).

| <b>Benchmark</b>        | % Overhead of FaCT |                  |  |
|-------------------------|--------------------|------------------|--|
|                         | <b>Unoptimized</b> | Optimized        |  |
| secretbox (reference)   | 345.57/373.49%     | $-20.92/-14.56%$ |  |
| secretbox (vectorized)  | 427.21/427.09%     | $-6.54/-4.99\%$  |  |
| curve25519-donna        | 144.42%            | $2.21\%$         |  |
| OpenSSL record validate | 30.13-35.16%       | $0.64 - 4.62\%$  |  |
| OpenSSL MEE-CBC         | 16.15-31.97%       | $-2.56 - 4.16\%$ |  |

<span id="page-52-1"></span>Table 1.2: Overhead of FaCT ports compared to optimized C, for each benchmark. secretbox results are for encryption and decryption overhead, respectively.

#### <span id="page-52-0"></span>1.4.1.2 Security

We prove that FaCT's transformations produce constant-time code ([§1.3.3\)](#page-47-0), but this applies only to the *unoptimized* LLVM IR produced by the FaCT compiler.<sup>[3](#page-53-2)</sup> Since we use clang to generate optimized object code, an LLVM optimization pass might break FaCT's constant-time guarantees.

To empirically check that our case study implementations run in constant-time, even after optimization, we use the dudect [\[125\]](#page-185-0) analysis tool. At a high level, dudect tests for constant-time execution by running the code under test for a large number of iterations and collecting timing information using the CPU's cycle counters. It then tests the collected timing information for statistically significant variation in execution time that are correlated with changes to secret inputs. In our evaluation, we configure dudect to collect 50 million measurements for each benchmark. It finds no statistically significant timing variation.

Several other works concerned with constant-time crypto implementation [\[14,](#page-177-1) [125,](#page-185-0) [139,](#page-187-0) [156\]](#page-188-0) have reported using dudect. In our testing, we found the tool to quickly and reliably find timing differences in buggy code. We note, however, that dudect is only a check—not a proof—of constant-time behavior; we discuss further in Section [1.5.](#page-56-0)

#### <span id="page-53-1"></span>1.4.1.3 Performance

Table [1.2](#page-52-1) shows the performance cost of porting C to FaCT. We benchmark each implementation on an Intel i7-6700K at 4GHz with 64GB of RAM using clang 6.0.1. We compare both unoptimized and optimized FaCT implementations with C implementations that are compiled at the corresponding project's default optimization level.[4](#page-53-3) Our optimized FaCT code uses the same optimization flags as the C code.

For libsodium and curve25519-donna, we use the library's benchmarking suites. We measure the mean of  $\sim$ 2<sup>24</sup> and  $\sim$ 2<sup>17</sup> iterations, respectively, and report the median of five such measurements. For the OpenSSL implementations, we use OpenSSL's s\_server and s\_client commands to measure throughput when transferring 256MB, 1GB, and 4GB files. We compute the median throughput of five transfers at each file size, and report the minimum and maximum result; overhead was uncorrelated with file size.

For most benchmarks, we find that optimized FaCT is comparable to C: the overhead is never more than 5%. Notably, the FaCT implementation of libsodium secretbox is 15- 20% *faster* than the C reference implementation. We attribute this speedup to vectorization: inspecting the XSalsa20 assembly code, we find that clang generates vector instructions for the FaCT implementation, but not for C. To explore this discrepancy, we measure performance of secretbox with XSalsa20 explicitly vectorized (using vectors in FaCT, intrinsics in C). In this case, FaCT is still 5-6% faster than C, but this speedup appears to be an artifact of LLVM's applying different optimizations to different code.

## <span id="page-53-0"></span>1.4.2 User study

We evaluate the usability of FaCT by conducting a user study as part of an upper-level, undergraduate programming languages course at UC San Diego.<sup>[5](#page-53-4)</sup> Prior to the study, we dedicated

<span id="page-53-2"></span> $3$ And to procedures that do not use declassify.

<span id="page-53-3"></span> ${}^{4}$ For OpenSSL,  $-03$ ; for other projects,  $-02$ .

<span id="page-53-4"></span><sup>5</sup>Our study was reviewed and exempted by the IRB.

three lectures to timing side-channels, constant-time programming in general, and constant-time programming specifically in C and FaCT. As an optional assignment, students were asked to (1) *explain* the behavior of constant-time code written in C and FaCT, and (2) *implement* constanttime algorithms in both C and FaCT. Of the 129 enrolled students, 77 completed the study over a nine-day period. We describe methods and conclusions below; in our extended paper [\[39\]](#page-179-1), we give further lessons from the study, e.g., compilation errors participants ran into frequently.

*Method.* The user study is a sequence of web-based tasks. For each task, the participant is first given a warm-up code comprehension question, whose answer is subsequently revealed. The participant is then given a second, related question. This question is repeated twice, in C and in FaCT; we randomize the order of the languages per participant, i.e., half the participants' tasks are in C and then FaCT, and vice-versa. On a given question, participants can repeatedly check partial answers for correctness; once finished, the participant *submits* a final answer, which can no longer be viewed or revised. A task is *complete* if the participant submits a final answer for both C and FaCT; we discard incomplete tasks.

The user study was built on an earlier version of FaCT which did not enforce public memory safety. Nevertheless, we believe the results largely translate to the version presented in this chapter, because the surface language did not change significantly.

#### 1.4.2.1 Understanding constant-time code

To evaluate participants' understanding of C and FaCT code, we asked them to describe the behavior of two functions. The first function takes two input buffers—a header and a message and copies the header and message to an output buffer and adds padding up to a fixed size. The second function implements long division: it computes a quotient and remainder, writes each to an output buffer, and returns a status code indicating success or failure.

We graded participants on their ability to correctly describe each function's behavior. In both cases, we find that participants showed slightly better understanding of FaCT than of C: for

<span id="page-55-1"></span>Table 1.3: Number of correct and constant-time solutions for each task: Number of participants (out of 77) that submitted correct and constant-time solutions for each task. The check\_ pkcs7\_padding task was misconfigured, and marked variable-time code as constant-time (16 submissions); we report these numbers for completeness ([§1.4.2.2\)](#page-55-0).

| <b>Programming task</b> | <b>FaCT</b> | C      |
|-------------------------|-------------|--------|
| remove_secret_padding   | 62          | 49     |
| check_pkcs7_padding     | 35          | 32(16) |
| remove_pkcs7_padding    | 34          | 24     |

the first function, the mean score was 57% for FaCT and 53% for C; for the second, it was  $40\%$ for FaCT and 32% for C. Participants also reported a slight preference for FaCT; specifically, 31 participants found FaCT easier to understand compared to 10 that found C easier and 28 that reported similar difficulty.

#### <span id="page-55-0"></span>1.4.2.2 Writing constant-time code

To evaluate participants' ability to write constant-time code in FaCT and C, we had them implement three functions:

- $\triangleright$  remove\_secret\_padding: given a buffer and secret length, this function removes any secret padding, i.e., sets every element of the buffer past the length to zero.
- $\triangleright$  check\_pkcs7\_padding: this function checks whether a supplied buffer contains a valid PKCS#7 [\[79\]](#page-182-0) message.
- $\triangleright$  remove pkcs7 padding: this function removes padding from a supplied buffer, if it contains a valid message.

Participants could compile their code, run a test suite, and, for C code, check constant-time correctness with ct-verif [\[7\]](#page-176-0). They could also give up on a task and move to the next one.

Table [1.3](#page-55-1) summarizes our findings. Of the 68 participants that completed the first task, 62 submitted correct and constant-time FaCT code, and 49 submitted correct and constant-time C code. For the third task, 34 participants submitted correct, constant-time FaCT code compared to 24 participants for C. In the survey, 40 participants reported finding FaCT easier to write, 11 found C easier, and 18 found them similar.

We cannot draw conclusions from check\_pkcs7\_padding, because the task had a bug that incorrectly marked variable-time code as constant-time; only 16 of the 32 C submissions marked "correct" were constant-time. The bug was limited to this task, but because check\_ pkcs7\_padding is required for remove\_pkcs7\_padding, some participants needed to correct their code to pass the third task.

# <span id="page-56-0"></span>1.5 Limitations and future work

FaCT makes it easier to write constant-time code, but it is not perfect. Limitations and future work include:

*The type system.* The type system lacks polymorphism and flow sensitivity [\[110,](#page-184-0) [129\]](#page-186-0), which reduces both expressivity and performance. For example, our type system cannot express a program that branches on a buffer's public contents and then decrypts the buffer in-place, upgrading its label to secret. We leave such extensions to future work.

*The public safety checker.* FaCT's public safety checker does not reason about mutable variables or properties across function calls. For example, indexing an array based on a mutable variable requires assume-ing the index is in bounds.

*The brittleness of constant-time behavior.* FaCT's compiler only guarantees constant-time behavior for the LLVM IR that it produces. Crucially, this means that LLVM's optimization passes and lowering to assembly can introduce variable-time behavior. Though many optimizations *do* preserve constant-time property [\[17\]](#page-177-2), FaCT relies on dudect to empirically check that a piece of code is constant-time.

Sound, *symbolic* verification of constant-time behavior using ct-verif [\[7\]](#page-176-0) would give much stronger guarantees. Unfortunately, ct-verif currently has limited support for declassification and vector instructions. Extending ct-verif to support these primitives and applying it to optimized FaCT code is future work.

*The evaluation.* Our evaluation of FaCT is preliminary and thus incomplete. For example, we relied on extern versions of SHA-1 and AES  $(\S1.4.1)$  because we preferred to focus on porting higher-level OpenSSL functions with a history of timing attacks. Moreover, some of the low-level primitives we ported (XSalsa20, Poly1305, and Curve25519) were explicitly designed for ease of constant-time implementation [\[21,](#page-177-3)[22,](#page-177-4)[24\]](#page-177-5). Future work is expanding FaCT's repertoire with potentially more challenging algorithms.

Finally, our user study has limited scope and involves only non-expert users; remedying these issues is also future work.

# 1.6 Related work

This work supersedes an initial design we previously described in [\[38\]](#page-179-2). In particular, we present a design and implementation of a DSL for writing constant-time crypto, provide a formal semantics and security guarantees for FaCT, and evaluate FaCT on several dimensions; in [\[38\]](#page-179-2) we outlined the vision for such a DSL. Our implementation and formalization efforts revealed insights previously missed in [\[38\]](#page-179-2)—e.g., the need for *public safety* ([§1.2.2.3\)](#page-37-1) and challenges with using ct-verif [\[7\]](#page-176-0) to verify code with inline declassifications. At the same time, in this chapter, we did not explore parts of the design space outlined in [\[38\]](#page-179-2)—e.g., we do not expose some hardware-specific instructions like add-with-carry, which could simplify asymmetric-key crypto implementations.

*Domain-specific languages.* There are several efforts designing DSLs for implementing cryptographic primitives and protocols. Bernstein's qhasm is a low-level portable assembly for writing high-speed crypto routines [\[23\]](#page-177-6); it does not distinguish secret data from public data, so does not prevent information leaks by construction.

Vale [\[30\]](#page-178-1) and Jasmin [\[5\]](#page-176-1) are DSLs for writing and verifying high-performance assembly code. Vale developers write platform-independent assembly code and specify the target architecture; the Vale compiler uses Dafny to verify semantics and non-interference. Jasmin allows developers to use architecture-specific instructions alongside higher-level code, and the verified Jasmin compiler rejects non-constant-time programs. Low\* is a higher-level, embedded (in F\*) DSL that compiles to verified constant-time C [\[120\]](#page-185-2). The verified NaCl [\[25\]](#page-178-2) library, HACL\* [\[172\]](#page-189-0), is written in Low\*. CT-Wasm [\[156\]](#page-188-0) is a formally verified extension to the WebAssembly language [\[157\]](#page-188-1) for writing crypto code in the browser. CT-Wasm uses a strict label-based type system to enforce its constant-time policy. These languages provide support for high-level control flow constructs and procedures, but they require developers to manually write constant-time code.

Constant-Time Toolkit (CTTK) is a C library [\[119\]](#page-185-3) that follows recipes in [\[46,](#page-179-3) [118\]](#page-185-4) to provide functions—including low-level constant-time primitives—for crypto libraries, but developers must compose these low-level blocks.

*Verification.* There is a growing body of work on both building verified cryptographic implementations and verifying existing libraries. Bhargavan et. al verify an implementation of TLS, including low-level cryptographic primitives [\[27\]](#page-178-3). Barthe et. al [\[15\]](#page-177-0) verify constanttime properties of various PolarSSL implementations. Ye et. al [\[165\]](#page-189-1) verify the mbedTLS implementation of HMAC-DRBG. Appel [\[11\]](#page-177-7) and Beringer et. al [\[19\]](#page-177-8) respectively verify OpenSSL's implementation of SHA-256 and HMAC. Tsai et. al [\[147\]](#page-187-1) verify core parts of X25519. Almeida et. al [\[6\]](#page-176-2) verify AWS Lab's s2n MEE-CBC implementation (after identifying a vulnerability); they also verify security properties of NaCl libraries [\[8\]](#page-176-3). Erbsen et. al [\[53\]](#page-180-2) synthesize and verify elliptic curve implementations from high-level descriptions. Almeida et. al develop ct-verif [\[7\]](#page-176-0) and verify constant-time properties of several cryptographic algorithms. Many of these verification efforts are specific to the projects being analyzed. Additionally, developers still bear the burden of manually writing constant-time code, which FaCT aims to alleviate.

*General techniques for eliminating timing channels.* FaCT uses an information flow control type system to eliminate programs that may introduce information leaks or are otherwise inefficient (or impossible) to transform to constant-time. Our label-based type system is a standard IFC type system [\[129\]](#page-186-0) that borrows explicit mutability from ownership-based systems [\[43\]](#page-179-4). Previous solutions have also relied on type- and static-analysis techniques (e.g., [\[15,](#page-177-0) [52,](#page-180-3) [127,](#page-186-1) [143,](#page-187-2) [168\]](#page-189-2)) to address timing leaks. FaCT automatically transforms secret sub-computations into constant-time straight-line code. Our approach follows several previous efforts on eliminating timing channels via source code transformations [\[1,](#page-176-4) [18,](#page-177-9) [108,](#page-184-1) [112,](#page-184-2) [117,](#page-185-5) [122\]](#page-185-6). Most similar in ethos is SC-Eliminator [\[160\]](#page-188-2). This system takes as input a program and a list of secrets, and uses tag propagation to transform LLVM IR into its constant-time equivalent. Though both projects perform transformations, they use orthogonal approaches: SC-Eliminator repairs already-existing code, while FaCT is a language for writing such code from the start. Finally, many other efforts employ system-level techniques to eliminate and detect timing-channels [\[31,](#page-178-4)[59,](#page-180-4)[94,](#page-183-2)[125,](#page-185-0)[141,](#page-187-3)[171\]](#page-189-3).

## Acknowledgements

We thank the anonymous PLDI and PLDI AEC reviewers and our shepherd Limin Jia for their suggestions and insightful comments. We thank the participants of the Dagstuhl Seminar on Secure Compilation for early feedback on this work, especially Tamara Rezk. We thank Ariana Mirian for handling the IRB for our user study, Shravan Narayan for his help in understanding the subtleties of LLVM, and Joseph Jaeger and Jess Sorrell for helping us understand elliptic curve implementations. We also thank the CSE 130 TAs for their help in testing our user study, and the CSE 130 students for participating in the user study. This work was supported in part by gifts from Fujitsu and Cisco, by the National Science Foundation under Grant Number CNS-1514435, by ONR Grant N000141512750, and by the CONIX Research Center, one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA.

[Chapter 1,](#page-23-0) in part, is a reprint of the material as it appears in 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '19). Cauligi, Sunjay; Soeller, Gary; Johannesmeyer, Brian; Brown, Fraser; Wahby, Riad S.; Renner, John; Grégoire, Benjamin; Barthe, Gilles; Jhala, Ranjit; Stefan, Deian, ACM, 2019. The dissertation author was the primary investigator and author of this paper.

# Chapter 2

# Constant-Time Foundations for the New Spectre Era

*In which we shore up the earth.*

The previous chapter demonstrated how we can compile a high-level language, FaCT, all the way down to low-level code, while enforcing sound constant-time guarantees—as long as we assume a standard sequential execution model. As we will see, *microarchitectural* features—and in particular, *speculative execution*—break many of our constant-time techniques. To reclaim constant-time properties even when accounting for such features, we must develop a new formal strategy for analyzing programs.

In this chapter, we lay the foundations for constant-time in the presence of microarchitectural features that have been exploited in recent attacks: Out-of-order and speculative execution. We focus on constant-time for two key reasons. First, *impact*: Constant-time programming is largely used in real-world crypto libraries—and high-assurance code—where developers already go to great lengths to eliminate leaks via side-channels. Second, *foundations:* Constant-time programming is already rooted in foundations, with well-defined semantics [\[15,](#page-177-0) [40\]](#page-179-5). These semantics consider very powerful attackers—e.g., attackers in [\[15\]](#page-177-0) have control over the cache and the scheduler. An advantage of considering powerful attackers is that the semantics can

overlook many hardware details—e.g., since the cache is adversarially controlled, there is no point in modeling it precisely—making constant-time amenable to automated verification and enforcement.

*Contributions.* We first define a semantics for an abstract, three-stage (fetch, execute, and retire) machine. Our machine supports out-of-order and speculative execution by modeling *reorder buffers* and *transient instructions*, respectively. We assume that attackers have complete control over microarchitectural features (e.g., the branch target predictor) when executing a victim program and model the attacker's control over predictors using *directives*. This keeps our semantics simple yet powerful: our semantics abstracts over all predictors when proving security—of course, assuming that predictors themselves do not leak secrets. We further show how our semantics can be extended to capture new predictors—e.g., a hypothetical *memory aliasing* predictor.

We then define *speculative constant-time*, an extension of constant-time for machines with out-of-order and speculative execution. This definition allows us to discover microarchitectural side channels in a principled way—all four classes of Spectre attacks as classified by Canella et al. [\[35\]](#page-178-5), for example, manifest as violations of our constant-time property.

We further use our semantics as the basis for a prototype analysis tool, Pitchfork, built on top of the angr symbolic execution engine [\[138\]](#page-186-2). Like other symbolic analysis tools, Pitchfork suffers from path explosion, which limits the depth of speculation we can analyze. Nevertheless, we are able to use Pitchfork to detect multiple Spectre bugs in real code. We use Pitchfork to detect leaks in the well-known Kocher test cases [\[85\]](#page-182-1) for Spectre v1, as well as our more extensive test suite which includes Spectre v1.1 variants. More significantly, we use Pitchfork to analyze—and find leaks in—real cryptographic code from the libsodium, OpenSSL, and curve25519-donna libraries.

*Open source.* Pitchfork and our test suites are open source and available at [https://](https://pitchfork.programming.systems) [pitchfork.programming.systems.](https://pitchfork.programming.systems)

# 2.1 Motivating examples

In this section, we show why classical constant-time programming is insufficient when attackers can exploit microarchitectural features. We do this via two example attacks and show how these attacks are captured by our semantics.

*Classical constant time is not enough.* Our first example consists of 3 lines of code, shown in [Figure 2.1](#page-64-0) (top right). The program, a variant of the classical Spectre v1 attack [\[86\]](#page-182-2), branches on the value of register  $r_a$  (line 1). If  $r_a$ 's value is smaller than 4, the program jumps to program location  $\underline{2}$ , where it uses  $r_a$  to index into a public array *A*, saves the value into register  $r_b$ , and uses  $r_b$  to index into another public array *B*. If  $r_a$  is larger than or equal to 4 (i.e., the index is out of bounds), the program skips the two load instructions and jumps to location 4. In a sequential execution, this program neither loads nor branches on secret values. It thus trivially satisfies the constant-time discipline.

However, modern processors do not execute sequentially. Instead, they continue fetching instructions before prior instructions are complete. In particular, a processor may continue fetching instructions beyond a conditional branch, before evaluating the branch condition. In that case, the processor *guesses* which branch will be taken. For example, the processor may erroneously guess that the branch condition at line  $1$  evaluates to true, even though  $r_a$  contains value 9. It will therefore continue down the "true" branch speculatively. In hardware, such guesses are made by a branch prediction unit, which may have been mistrained by an adversary.

These guesses, as well as additional choices such as execution order, are directly supplied by the adversary in our semantics. We model this through a series of *directives*, as shown on the bottom left of [Figure 2.1.](#page-64-0) The directive fetch: true instructs our model to speculatively follow the true branch and to place the fetched instruction at index  $\overline{1}$  in the *reorder buffer*. Similarly, the two following fetch directives place the loads at indices  $\overline{2}$  and  $\overline{3}$  in the buffer. The instructions in the reorder buffer, called *transient instructions*, do not necessarily match the original instructions,

<span id="page-64-0"></span>

| Directive              | Speculative execution:<br>Effect on reorder buffer            | Leakage        |
|------------------------|---------------------------------------------------------------|----------------|
| fetch: true            | $\overline{1} \mapsto$ br(>, (4, r <sub>a</sub> ), 2, (2, 4)) |                |
| fetch                  | $\overline{2} \mapsto (r_b = \text{load}([40, r_a]))$         |                |
| fetch                  | $\overline{3} \mapsto (r_c = \text{load}([44, r_b]))$         |                |
| execute 2              | $\overline{2} \mapsto (r_b = \text{Key}[1]_{\text{sec}})$     | read 49pub     |
| execute $\overline{3}$ | $\overline{3} \mapsto (r_c = X)$                              | read $a_{sec}$ |
|                        | where $a = Key[1]_{sec} + 44$                                 |                |

**Figure 2.1:** Example demonstrating a Spectre v1 attack. The branch at  $1$  acts as bounds check for array *A*. The execution speculatively ignores the bounds check, and leaks a byte of the secret *Key*.

but can contain additional information (see Table [2.1\)](#page-66-0). For instance, the transient version of the branch instruction records which branch has been speculatively taken.

In our example, the attacker next instructs the model to execute the first load, using the directive execute  $\overline{2}$ . Because the bounds check has not yet been executed, the load reads from the secret element *Key*[1], placing the value in  $r<sub>b</sub>$ . The attacker then issues directive execute  $\overline{3}$ to execute the following load; this load's address is calculated as  $44 + Key[1]$ . Accessing this address affects externally visible cache state, allowing the attacker to recover *Key*[1] through a cache side-channel attack [\[59\]](#page-180-4). This is encoded by the leakage observation shown in bold on the bottom right. Though this secret leakage cannot happen under sequential execution, our semantics clearly highlights the possible leak when we account for microarchitectural features.

*Modeling hypothetical attacks.* Next, we give an example of a hypothetical class of Spectre attack captured by our extended semantics. The attack is based on a microarchitectural feature which would allow processors to speculate whether a store and load pair might operate on the same address, and forward values between them [\[75,](#page-182-3) [131\]](#page-186-3).

We demonstrate this attack in [Figure 2.2.](#page-67-0) The reorder buffer, after all instructions have been fetched, is shown in the top right. The program stores the value of register  $r<sub>b</sub>$  into the *secretKey*sec array and eventually loads two values from public arrays. The attacker first issues the directive execute  $\overline{2}$ : value; this results in a buffer where the store instruction at  $\overline{2}$  has been modified to record the resolved value  $x_{\text{sec}}$ . Next, the attacker issues the directive execute  $\overline{7}$ : fwd $\overline{2}$ , which causes the model to mispredict that the load at  $\overline{7}$  aliases with the store at  $\overline{2}$ , and thus to forward the value  $x_{\text{sec}}$  to the load. The forwarded value  $x_{\text{sec}}$  is then used in the address  $a = 48 + x_{\text{sec}}$  of the load instruction at index  $\overline{8}$ . There, the loaded value *X* is irrelevant, but the address  $a$  is leaked to the attacker, allowing them to recover the secret value  $x_{\text{sec}}$ . The speculative execution continues and rolls back when the misprediction is detected (details on this are given in [Section 2.2\)](#page-65-0), but at this point, the secret has already been leaked.

As with the example in [Figure 2.1,](#page-64-0) the program in this example follows the (sequential) constant-time discipline, yet leaks during speculative execution. But, both examples are insecure under our new notion of *speculative constant-time* as we discuss next.

# <span id="page-65-0"></span>2.2 Speculative semantics and security

In this section we define the notion of speculative constant time, and propose a speculative semantics that models execution on modern processors. We start by laying the groundwork for our definitions and semantics.

*Configurations.* A configuration  $C \in$  Confs represents the state of execution at a given step. It is defined as a tuple  $(\rho, \mu, n, \text{buf})$  where:

- $\rho : \mathcal{R} \to \mathcal{V}$  is a map from a finite set of register names  $\mathcal{R}$  to values;
- $\blacktriangleright \mu : \mathscr{V} \to \mathscr{V}$  is a memory;

<span id="page-66-0"></span>

Table 2.1: Instructions and their transient instruction form. Table 2.1: Instructions and their transient instruction form.

<span id="page-67-0"></span>

| Directive                                                                                                                                | Speculative execution<br>Effect on buf                                                                                                                                                                                                                   | Leakage                                 |
|------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------|
| execute $\overline{2}$ : value<br>execute $\overline{7}$ : fwd $\overline{2}$<br>execute $\overline{8}$<br>execute $\overline{2}$ : addr | $\overline{2} \mapsto$ store( $x_{\text{sec}}$ , [40, $r_a$ ])<br>$\overline{7} \mapsto (r_c = \text{load}([45], x_{\text{sec}}, \overline{2}))$<br>$\overline{8} \mapsto (r_c = X\{\perp,a\})$<br>$\overline{2} \mapsto$ store $(r_b, 42_{\text{pub}})$ | read $a_{\rm sec}$<br>$fwd$ 42 $_{pub}$ |
| execute $\overline{7}$                                                                                                                   | $\{\overline{7},\overline{8}\}\notin but$<br>where $a = x_{\text{sec}} + 48$                                                                                                                                                                             | rollback,<br>fwd 45 <sub>pub</sub>      |

Figure 2.2: Example demonstrating a hypothetical attack abusing an aliasing predictor. This attack differs from prior speculative data forwarding attacks in that branch misprediction is not needed.

- $\blacktriangleright$  *n* :  $\mathcal V$  is the current program point;
- $\triangleright$  *buf* :  $\mathbb{N} \rightharpoonup$  Translnstr is the reorder buffer.

*Values and labels.* As a convention, we use *n* for memory addresses that map to instructions, and *a* for addresses that map to data. Each value is annotated with a label from a lattice of security labels with join operator  $\sqcup$ . For brevity, we sometimes omit public label annotation on values.

Using labels, we define an equivalence  $\simeq_{\text{pub}}$  on configurations. We say that two configurations are equivalent if they coincide on public values in registers and memories.

*Reorder buffer.* The *reorder buffer* maps buffer indices (natural numbers) to transient instructions. We write  $buf(i)$  to denote the instruction at index *i* in buffer *buf*, if *i* is in *buf*'s domain. We write  $buf[i \mapsto \text{instr}]$  to denote the result of extending *buf* with the mapping from

<span id="page-68-0"></span>
$$
(buf +_i \rho)(r) = \begin{cases} v_\ell & \text{if } max(j) < i : but f(j) = (r = \_) \land \\ \rho(r) & \text{if } \forall j < i : but f(j) \neq (r = \_) \\ \bot & otherwise \end{cases}
$$

Figure 2.3: Definition of the register resolve function.

*i* to instr, and *buf*  $\langle \text{buf}(i) \rangle$  for the function formed by removing *i* from *buf*'s domain. We write *buf*[ $j : j < i$ ] to denote the restriction of *buf*'s domain to all indices  $j$ , s.t.  $j < i$  (i.e., removing all mappings at indices *i* and greater). Our rules add and remove indices in a way that ensures that *buf*'s domain will always be contiguous.

*Notation.* We let  $MIN(M)$  (resp.  $MAX(M)$ ) denote the minimum (maximum) index in the domain of a mapping *M*. We denote the empty mapping as  $\theta$  and let  $MIN(\theta) = MAX(\theta) = 0$ .

For a formula  $\varphi$ , we may discuss the bounded highest (lowest) index for which a formula holds. We write  $max(j) < i : \varphi(j)$  to mean that *j* is the highest index less than *i* for which  $\varphi$ holds, and define  $min(j) > i : \varphi(j)$  analogously.

*Register resolve function.* In [Figure 2.3,](#page-68-0) we define the *register resolve function*, which we use to determine the value of a register in the presence of transient instructions in the reorder buffer. For index *i* and register *r*, the function may (1) return the latest assignment to *r* prior to position *i* in the buffer, if the corresponding operation is already resolved; (2) return the value from the register map  $\rho$ , if there are no pending assignments to r in the buffer; or (3) be undefined. Note that if the latest assignment to *r* is yet unresolved then  $(buf + i\rho)(r) = \bot$ . We extend this definition to values by defining  $(buf +_i \rho)(v_\ell) = v_\ell$  for all  $v_\ell \in \mathcal{V}$ , and lift it to lists of registers or values using a pointwise lifting.

## 2.2.1 Speculative constant-time

We present our new notion of constant-time security in terms of a small-step semantics, which relates program configurations, observations, and attacker directives.

Our semantics does not directly model caches, nor any of the predictors used by speculative semantics. Rather, we model externally visible effects—memory accesses and control flow—by producing a sequence of *observations*. We can thus reason about *any* possible cache implementation, as any cache eviction policy can be expressed as a function of the sequence of observations. Furthermore, exposing control flow observations directly in our semantics makes it unnecessary for us to track various other side channels. Indeed, while channels such as port contention or register renaming produce distinct measurable effects [\[86\]](#page-182-2), they only serve to leak the path taken through the code—and thus modeling these observations separately would be redundant. For the same reason, we do not model a particular branch prediction strategy; we instead let the attacker resolve scheduling non-determinism by supplying a series of *directives*.

This approach has two important consequences. First, the use of observations and directives allows our semantics to remain *tractable* and *amenable to verification*. For instance, we do not need to model the behavior of the cache or any branch predictor. Second, our notion of speculative constant-time is *robust*, i.e., it holds for all possible branch predictors and replacement policies—assuming that they do not leak secrets directly, a condition that is achieved by all practical hardware implementations.

Given an attacker directive *d*, we use  $C \frac{\partial}{d} C'$  to denote the execution step from configuration *C* to configuration  $C'$  that produces observation  $o$ . Program execution is defined from the small-step semantics in the usual style. We use  $C_{\mathcal{O}}\psi_D^N C'$  to denote a sequence of execution steps from  $C$  to  $C'$ . Here  $D$  and  $O$  are the concatenation of the single-step directives and leakages, respectively; *N* is the number of retired instructions, i.e.,  $N = \frac{4}{d} \in D | d =$  retire}. When such a big step from  $C$  to  $C'$  is possible, we say  $D$  is a *well-formed* schedule of directives for  $C$ . We omit *D*, *N*, or *O* when not used.

Definition 2.2.1 (Speculative constant-time). *We say a configuration C with schedule D satisfies* speculative constant-time *(SCT)* with respect to a low-equivalence relation  $\simeq_{\text{pub}}$  iff for every  $C'$ 

such that  $C \simeq_{\text{pub}} C'$ :

$$
C_D \psi_O C_1
$$
 iff  $C'_{D} \psi_{O'} C'_1$  and  $C_1 \simeq_{pub} C'_1$  and  $O = O'$ .

*A program satisfies SCT iff every initial configuration satisfies SCT under any schedule.*

*Aside, on sequential execution.* Processors work hard to create the illusion that assembly instructions are executed sequentially. We validate our semantics by proving equivalence with respect to sequential execution. Formally, we define *sequential schedules* as schedules that execute and retire instructions immediately upon fetching them. We attach to each program a canonical sequential schedule and write  $C\sqrt{\frac{N}{seq}}C'$  to model execution under this canonical schedule. Our sequential validation is defined relative to an equivalence  $\approx$  on configurations. Informally, two configurations are equivalent if their memories and register files are equal, even if their speculative states may be different.

Theorem 2.2.2 (Sequential equivalence). *Let C be an initial configuration and D a well-formed*  $\mathcal{S}$ *schedule for C. If*  $C\sqrt[b]{\hbar}C_1$ *, then*  $C\sqrt[b]{\hbar}c_2$  *and*  $C_1\approx C_2$ *.* 

Complete definitions, more properties, and proofs are given in [Appendix B.](#page-166-0)

#### 2.2.2 Overview of the semantics

As shown in [Table 2.1,](#page-66-0) each instruction has a *physical* form and one or more *transient* forms. Our semantics operates on these instructions similar to a multi-stage processor pipeline. Physical instructions are *fetched* from memory and become transient instructions in the reorder buffer. They are then *executed* until they are fully resolved. Finally they are *retired*, updating the non-speculative state in the configuration.

In the rest of this section, we show how we model speculative execution [\(Section 2.2.3\)](#page-71-0), memory operations [\(Section 2.2.4\)](#page-74-0), aliasing prediction [\(Section 2.2.5\)](#page-81-0), and fence instructions

[\(Section 2.2.6\)](#page-84-0). We also extend our semantics with indirect jumps [\(Section 2.2.7\)](#page-85-0) and function calls [\(Section 2.2.8\)](#page-87-0).

Our semantics captures a variety of existing Spectre variants, including v1 [\(Figure 2.1\)](#page-64-0), v1.1 [\(Figure 2.5\)](#page-79-0), and v4 [\(Figure 2.6\)](#page-80-0), as well as a new hypothetical variant [\(Figure 2.2\)](#page-67-0). Additional variants (e.g., v2 and *ret2spec*) can be expressed with the extended semantics given in [Sections 2.2.7](#page-85-0) and [2.2.8.](#page-87-0) Our semantics shows that these attacks violate SCT by producing observations depending on secrets.

## <span id="page-71-0"></span>2.2.3 Speculative execution

We start with the semantics for *conditional branches* which introduce speculative execution.

*Conditional branching.* The physical instruction for conditional branches has the form  $\mathsf{br}(op, \vec{\mathit{n}}, \mathit{n}^{\mathsf{true}}, \mathit{n}^{\mathsf{false}})$ , where  $op$  is a Boolean operator whose result determines whether or not to execute the jump,  $\vec{r}$  are the operands to *op*, and  $n^{\text{true}}$  and  $n^{\text{false}}$  are the program points for the *true* and *false* branches, respectively.

We show br's transient counterparts in Table [2.1.](#page-66-0) The unresolved form extends the physical instruction with a program point  $n_0$ , which is used to record the branch that is executed (*n* true or *n* false) speculatively, and may or may not correspond to the branch that is actually taken once *op* is resolved. The resolved form contains the final jump target.

Fetch. We give the rule for the fetch stage below.

COND-FETCH

\n
$$
\mu(n) = \text{br}(op, \vec{n}, n^{\text{true}}, n^{\text{false}}) \qquad i = \text{MAX}(buf) + 1
$$
\n
$$
\frac{buf' = buf[i \mapsto \text{br}(op, \vec{n}, n^{\text{true}}, (n^{\text{true}}, n^{\text{false}}))]}{(\rho, \mu, n,buf) \xrightarrow{\text{fetch: true}} (\rho, \mu, n^{\text{true}},buf')}
$$
The COND-FETCH rule speculatively executes the branch determined by a Boolean value *b* given by the directive. We show the case for  $b = \text{true}$ ; the case for false is analogous. The rule updates the current program point *n*, allowing execution to continue along the specified branch. The rule then records the chosen branch  $n^{\text{true}}$  (resp.  $n^{\text{false}}$ ) in the transient jump instruction.

This semantics models the behavior of most modern processors. Since the target of the branch cannot be resolved in the fetch stage, speculation allows execution to continue and not stall until the branch target is resolved. In hardware, a branch predictor chooses which branch to execute; in our semantics, the directives fetch: true and fetch: false determine which of the rules to execute. This allows us to abstract over all possible predictor implementations.

*Execute.* Next, we describe the rules for the execute stage.

COND-EXECUTE-CORRET

\n
$$
buf(i) = \text{br}(op, \vec{r}, n_0, (n^{\text{true}}, n^{\text{false}})) \qquad \forall j < i : buf(j) \neq \text{fence}
$$
\n
$$
\frac{(buf +_i \rho)(\vec{r}) = \vec{v}_\ell \qquad [op(\vec{v}_\ell)]] = \text{true}_\ell \qquad n^{\text{true}} = n_0 \qquad buf' = buf[i \mapsto \text{jump } n^{\text{true}}]}{(\rho, \mu, n, buf) \xrightarrow[\text{execute } i]{\text{jump } n^{\text{true}}_\ell} (\rho, \mu, n, buf')}
$$

COND-EXECUTE-INCORRECT

$$
buf(i) = br(op, \vec{n}, n_0, (n^{\text{true}}, n^{\text{false}})) \qquad \forall j < i : buf(j) \neq \text{fence} \qquad (buf +_i \rho)(\vec{n}) = \vec{v}_{\ell}
$$
\n
$$
[op(\vec{v}_{\ell})] = true_{\ell} \qquad n^{\text{true}} \neq n_0 \qquad buf' = buf[j : j < i][i \mapsto \text{jump } n^{\text{true}}]
$$
\n
$$
(\rho, \mu, n, but) \xrightarrow{\text{rollback}, \text{jump } n^{\text{true}}_{\ell}} (\rho, \mu, n^{\text{true}}, butf')
$$

Both rules evaluate the condition  $op$  via an evaluation function  $\lbrack \cdot \rbrack$ . In both, the function produces true; but the false rules are analogous. The rules then compare the actual branch target  $n_{true}$ against the speculatively chosen target  $n_0$  from the fetch stage.

If the *correct* path was chosen during speculation, i.e.,  $n_0$  agrees with the correct branch *n* true, rule COND-EXECUTE-CORRECT updates *buf* with the fully resolved jump instruction and emits an observation:  $jump$   $n_{\ell}^{\text{true}}$ . This models an attacker that can observe control flow, e.g., by

<span id="page-73-0"></span>**Table 2.2:** Correct and incorrect branch prediction. Initially,  $r_a = 3$ . In (b), the misprediction causes a rollback to  $\overline{4}$ .



timing executions along different paths. The leaked observation  $n^{\text{true}}$  has label  $\ell$ , propagated from the evaluation of the condition.

In case the *wrong* path was taken during speculation, i.e., the calculated branch  $n^{\text{true}}$ *disagrees* with  $n_0$ , the semantics must roll back all execution steps along the erroneous path. For this, rule COND-EXECUTE-INCORRECT removes all entries in *buf* that are newer than the current instruction (i.e., all entries  $j \ge i$ ), sets the program point *n* to the correct branch, and updates *buf* at index *i* with correct value for the resolved jump instruction. Since an attacker can observer misspeculation through instruction timing [\[86\]](#page-182-0), the rule issues a rollback observation in addition to the jump observation.

*Retire.* The rule for the retire stage is shown below; its only effect is to remove the jump instruction from the buffer.

JUMP-RETIRE  
MIN(*buf*) = *i buf*(*i*) = jump 
$$
n_0
$$
 *buf'* = *buf* \ *buf*(*i*)  
( $\rho, \mu, n, buf$ )  $\xrightarrow{\text{retire}}$  ( $\rho, \mu, n, buf'$ )

*Examples.* [Table 2.2](#page-73-0) shows how branch prediction affects the reorder buffer. In part (a), the branch at index  $\overline{4}$  is predicted correctly. The jump instruction is resolved, and execution proceeds as normal. In part (b), the branch at index  $\overline{4}$  is incorrectly predicted. Upon executing the branch, the misprediction is detected, and *buf* is rolled back to index  $\overline{4}$ .

## <span id="page-74-0"></span>2.2.4 Memory operations

The physical instruction for loads is  $(r = load(\vec{r}, n'))$ , while the form for stores is store( $r, \vec{r}, n'$ ). As before,  $n'$  is the program point of the next instruction. For loads, *r* is the register receiving the result; for stores,  $\dot{v}$  is the register or value to be stored. For both loads and stores,  $\vec{r}$  is a list of operands (registers and values) which are used to calculate the operation's target address.

Transient counterparts of load and store are given in Table [2.1.](#page-66-0) We annotate unresolved load instructions with the program point of the physical instruction that generated them; we omit annotations whenever not used. Unresolved and resolved store instructions share the same syntax, but for resolved stores, both address and operand are required to be single values.

*Address calculation.* We assume an arithmetic operator *addr* which calculates target addresses for stores and loads from its operands. We leave this operation abstract in order to model a large variety of architectures. For example, in a simple addressing mode,  $\llbracket addr(\vec{v}) \rrbracket$ might compute the sum of its operands; in an x86-style address mode,  $\llbracket addr(\lbrack v_1, v_2, v_3 \rbrack) \rrbracket$  might instead compute  $v_1 + v_2 \cdot v_3$ .

*Store forwarding.* Multiple transient load and store instructions may exist concurrently in the reorder buffer. In particular, there may be unresolved loads and stores that will read or write to the same address in memory. Under a naive model, we must wait to execute load instructions until all prior store instructions have been retired, in case they write to the address we will load from. Indeed, some real-world processors behave exactly this way [\[42\]](#page-179-0).

For performance, most modern processors implement *store-forwarding* for memory operations: if a load reads from the same address as a prior store and the store has already been resolved, the processor can *forward* the resolved value to the load. The load can then proceed without waiting for the store to commit to memory [\[159\]](#page-188-0).

To model these store forwarding semantics, we use annotations to recall if a load was resolved from memory or forwarding. A resolved load has the form  $(r = v_{\ell}{j,a})^n$ , where the

index *j* records either the buffer index of the store instruction that forwarded its value to the load, or ⊥ if the value was taken from memory. We also record the memory address *a* associated with the data, and retain the program point *n* of the load instruction that generated the value instruction. The resolved load otherwise behaves as a resolved value instruction (e.g., for the register resolve function).

*Fetch.* We now discuss the inference rules for memory operations, starting with the fetch stage.

SIMPLE-FETCH  $\mu(n) \in \{\text{op}, \text{load}, \text{store}, \text{ fence}\}\$  $n' = next(\mu(n))$   $i = \text{MAX}(buf) + 1$   $buf' = buf[i \mapsto transient(\mu(n))]$  $(\rho, \mu, n, buf) \xrightarrow{\sim} (\rho, \mu, n', buf')$ 

Given a fetch directive, rule SIMPLE-FETCH extends the reorder buffer *buf* with a new transient instruction (see Table [2.1\)](#page-66-0). Other than load and store, the rule also applies to op and fence instructions. The *transient*( $\cdot$ ) function simply translates the physical instruction at  $\mu(n)$  to its unresolved transient form. It inserts the new, transient instruction at the first empty index in *buf*, and sets the current program point to the next instruction  $n'$ . Note that *transient*( $\cdot$ ) annotates the transient load instruction with its program point.

*Load execution.* Next, we cover the rules for the load execute stage.

LOAD-EXECUTE-NODEP

$$
buf(i) = (r = load(\vec{n}))^n \qquad \forall j < i : buf(j) \neq \text{fence}
$$
\n
$$
(buf +_i \rho)(\vec{n}) = \vec{v}_\ell \qquad [addr(\vec{v}_\ell)]] = a
$$
\n
$$
\ell_a = \Box \ell \qquad \forall j < i : buf(j) \neq \text{store}(\_, a) \qquad \mu(a) = v_\ell \qquad buf' = buf[i \mapsto (r = v_\ell \{\bot, a\})^n]
$$
\n
$$
(\rho, \mu, n, buf) \xrightarrow{\text{read } a_{\ell_a}} (\rho, \mu, n, buf')
$$

LOAD-EXECUTE-FORWARD

$$
buf(i) = (r = load(\vec{n}))^n \qquad \forall j < i : but f(j) \neq \text{fence}
$$
\n
$$
(buf +_i \rho)(\vec{n}) = \vec{v}_{\ell} \qquad [addr(\vec{v}_{\ell})] = a \qquad \ell_a = \Box \vec{\ell}
$$
\n
$$
\frac{max(j) < i : but f(j) = store(\_, a) \land but f(j) = store(v_{\ell}, a,) \qquad but' = but f[i \mapsto (r = v_{\ell}[j, a])^n]}{(\rho, \mu, n, but) \xrightarrow{fwd \ a_{\ell_a}} (\rho, \mu, n, but')}
$$

Given an execute directive for buffer index *i*, under the condition that *i* points to an unresolved load, rule LOAD-EXECUTE-NODEP applies if there are no prior store instructions in *buf* that have a resolved, matching address. The rule first resolves the operand list  $\vec{r}$  into a list of values  $\vec{v}_{\ell}$ , and then uses  $\vec{v}_{\ell}$  to calculate the target address *a*. It then retrieves the current value  $v_{\ell}$ at address *a* from memory, and finally adds to the buffer a resolved value instruction assigning  $v_{\ell}$ to the target register *r*. We annotate the value instruction with the address *a* and ⊥, signifying that the value comes from memory. Finally, the rule produces the observation read  $a_{\ell_a}$ , which renders the memory read at address *a* with label  $\ell_a$  visible to an attacker.

Rule LOAD-EXECUTE-FORWARD applies if the most recent store instruction in *buf* with a resolved, matching address has a resolved data value. Instead of accessing memory, the rule forwards the value from the store instruction, annotating the new value instruction with the calculated address *a* and the index *j* of the originating store instruction. The rule produces a fwd observation with the labeled address  $a_{\ell_a}$ . This observation captures that the attacker can determine (e.g., by observing the *absence* of memory access using a cache timing attack) that a forwarded value from address *a* was found in the buffer instead of loaded from memory.

Importantly, neither of the rules has to wait for prior stores to be resolved and can proceed speculatively. This can lead to memory hazards when a more recent store to the load's address has not been resolved yet; we show how to deal with hazards in the rules for the store instruction. *Store execution.* We show the rules for stores below.

STORE-EXECUTE-VALUE

$$
buf(i) = \text{store}(n, \vec{n})
$$
\n
$$
\forall j < i : \text{buf}(j) \neq \text{fence} \qquad (\text{buf} +_i \rho)(n) = v_\ell \qquad \text{buf}' = \text{buf}[i \mapsto \text{store}(v_\ell, \vec{n})]
$$
\n
$$
(\rho, \mu, n, \text{buf}) \xrightarrow[\text{execute } i: \text{value}]} (\rho, \mu, n, \text{buf}')
$$

STORE-EXECUTE-ADDR-OK

 $buf(i) = \mathsf{store}(r, \vec{r})$  $\forall j < i : \text{buf}(j) \neq \text{fence}$   $(\text{buf} +_i \rho)(\vec{n}) = \vec{v}_{\ell}$   $[\text{addr}(\vec{v}_{\ell})] = a$   $\ell_a = \Box \vec{\ell}$  $∀k > i : buf(k) = (r = ... \{ j_k, a_k \}):$   $(a_k = a \Rightarrow j_k \ge i) \land (j_k = i \Rightarrow a_k = a)$  $\textit{buf}' = \textit{buf}[i \mapsto \textsf{store}(r, a_{\ell_a})]$  $(\rho, \mu, n, buf) \xrightarrow[\text{execute } i:\text{addr}]{\text{fixed } a_{\ell_a}} (\rho, \mu, n, buf')$ 

STORE-EXECUTE-ADDR-HAZARD

$$
buf(i) = \text{store}(n, \vec{n})
$$
\n
$$
\forall j < i : \text{buf}(j) \neq \text{fence}
$$
\n
$$
(buf +_i \rho)(\vec{n}) = \vec{v}_{\ell}
$$
\n
$$
(\text{addr}(\vec{v}_{\ell}) \parallel = a
$$
\n
$$
e_a = \Box \ell
$$
\n
$$
min(k) > i : \text{buf}(k) = (r = \dots \{j_k, a_k\})^{n_k} : \qquad (a_k = a \land j_k < i) \lor (j_k = i \land a_k \neq a)
$$
\n
$$
buf' = \text{buf}[j : j < k][i \mapsto \text{store}(n, a_{\ell_a})]
$$
\n
$$
(\rho, \mu, n, \text{buf}) \xrightarrow{\text{rollback, fwd } a_{\ell_a}} (\rho, \mu, n_k, \text{buf}')
$$

The execution of store is split into two steps: value resolution, represented by the directive execute *i* : value, and address resolution, represented by the directive execute *i* : addr; a schedule may have either step first. Either step may be skipped if data or address are already in immediate form.

Rule STORE-EXECUTE-ADDR-OK applies if no misprediction has been detected, i.e., if no load instruction forwarded data from an outdated store. We check this by requiring that all value instructions *after* the current index (indices  $k > i$ ) with an address *a* matching the current store must be using a value forwarded from a store *at least as recent* as this one  $(a_k = a \Rightarrow j_k \ge i)$ . We

define ⊥ < *n* for any index *n*—that is, if a future load matches the address of the current store but loaded its value from memory, we consider this a hazard.

If there is indeed a hazard, i.e., if there was a resolved load with an outdated value, the rule STORE-EXECUTE-ADDR-HAZARD picks the *earliest* such instruction (index *k*) and restarts execution by resetting the instruction pointer to the program point  $n_k$  of this instruction. It then discards all transient instructions at indices at least *k* from the reorder buffer. As in the case of misspeculation, the rule issues a rollback observation.

*Retire.* Resolved loads are retired using the following rule.

VALUE-RETIRE

$$
\text{MIN}(buf) = i \qquad \text{buf}(i) = (r = v_\ell) \qquad \rho' = \rho[r \mapsto v_\ell] \qquad \text{buf}' = \text{buf}' \setminus \text{buf}(i)
$$
\n
$$
(\rho, \mu, n, \text{buf}) \xrightarrow{\sim} (\rho', \mu, n, \text{buf}')
$$

This is the same retire rule used for simple value instructions (e.g., resolved op instructions). The rule updates the register map  $\rho$  with the new value, and removes the instruction from the reorder buffer.

Stores are retired using the rule below.

$$
\text{STORE-RETIRE}
$$
\n
$$
\text{MIN}(buf) = i \qquad \text{buf}(i) = \text{store}(v_{\ell}, a_{\ell_a}) \qquad \mu' = \mu[a \mapsto v_{\ell}] \qquad \text{buf}' = \text{buf} \setminus \text{buf}(i)
$$
\n
$$
(\rho, \mu, n, \text{buf}) \xrightarrow{\text{write } a_{\ell_a}} (\rho, \mu', n, \text{buf}')
$$

A fully resolved store instruction retires similarly to a value instruction. However, instead of updating the register map  $\rho$ , rule STORE-RETIRE updates the memory  $\mu$ . Since an attacker can observe memory writes, the rule produces the observation write  $a_{\ell_a}$  with the labeled address of the store.

*Example.* [Figure 2.4](#page-79-0) gives an example of store-to-load forwarding. In the starting configuration, the store at index  $\overline{2}$  is fully resolved, while the store at index  $\overline{3}$  has an unresolved

<span id="page-79-0"></span>

| Registers                                   | $\rho(r_a) = 40_{\text{pub}}$                             |                                              |  |  |  |  |  |
|---------------------------------------------|-----------------------------------------------------------|----------------------------------------------|--|--|--|--|--|
| Directives                                  | D= execute $\overline{4}$ ; execute $\overline{3}$ : addr |                                              |  |  |  |  |  |
| Leakage for D                               | fwd 43pub; rollback, fwd 43pub                            |                                              |  |  |  |  |  |
| starting buf                                | buf after execute $\overline{4}$                          | buf after D                                  |  |  |  |  |  |
| $\overline{2}$ store(12,43 <sub>pub</sub> ) | $\overline{2}$ store(12,43 <sub>pub</sub> )               | $\overline{2}$ store(12,43 <sub>pub</sub> )  |  |  |  |  |  |
| $\overline{3}$ store $(20, [3, r_a])$       | $\overline{3}$ store $(20, [3, r_a])$                     | $\overline{3}$ store $(20, 43_{\text{pub}})$ |  |  |  |  |  |
| $\overline{4} (r_c = \text{load}([43]))$    | $\overline{4}(r_c = 12\{\overline{2}, 43\})$              |                                              |  |  |  |  |  |

Figure 2.4: Store hazard caused by late execution of store addresses. The store address for  $\overline{3}$ is resolved too late, causing the later load instruction to forward from the wrong store. When  $\overline{3}$ 's address is resolved, the execution must be rolled back. In this example,  $\llbracket addr(\cdot) \rrbracket$  adds its arguments.

<span id="page-79-1"></span>

| Registers                       |                          |                                                       |                | Reorder buffer                                                       |                       |  |
|---------------------------------|--------------------------|-------------------------------------------------------|----------------|----------------------------------------------------------------------|-----------------------|--|
| r                               | $\rho(r)$                |                                                       | i              | buf(i)                                                               |                       |  |
| $r_a$                           | $5_{\text{pub}}$         |                                                       | 1              | br(>, $(4,r_a)$ , $2$ , $(2,4)$ )                                    |                       |  |
| $r_b$                           | $x_{\rm sec}$            |                                                       | $\overline{2}$ | store $(r_b, [40, r_a])$                                             |                       |  |
|                                 | Memory                   |                                                       |                |                                                                      |                       |  |
| a                               | $\mu(a)$                 |                                                       | $\overline{7}$ | $(r_c = load([45]))$                                                 |                       |  |
| 40.43                           | $secretKey_{\text{sec}}$ |                                                       | $\overline{8}$ | $(r_c = \text{load}([48, r_c]))$                                     |                       |  |
|                                 | 4447 $pubArr Apub$       |                                                       |                |                                                                      |                       |  |
| 484B                            | $pubArr B_{pub}$         |                                                       |                |                                                                      |                       |  |
| Directive                       |                          | Effect on buf                                         |                |                                                                      | Leakage               |  |
| execute 2 : addr                |                          | $\overline{2} \mapsto$ store $(r_b, 45_{\text{pub}})$ |                |                                                                      | $fwd$ 45 $_{pub}$     |  |
| execute $\overline{2}$ : value  |                          |                                                       |                | $\overline{2} \mapsto$ store( $x_{\text{sec}}$ , 45 <sub>pub</sub> ) |                       |  |
| execute $\overline{7}$          |                          |                                                       |                | $\overline{7} \mapsto (r_c = x_{\text{sec}}\{\overline{2}, 45\})$    | fwd 45 <sub>pub</sub> |  |
| execute $\overline{8}$          |                          | $\overline{8} \mapsto (r_c = X\{\perp,a\})$           |                |                                                                      | read $a_{\text{sec}}$ |  |
| where $a = x_{\text{sec}} + 48$ |                          |                                                       |                |                                                                      |                       |  |

Figure 2.5: Example demonstrating a store-to-load Spectre v1.1 attack. A speculatively stored value is forwarded and then leaked using a subsequent load instruction.

<span id="page-80-0"></span>

|                        |                               | Registers                                                       |                |                | Reorder buffer                   |                    |  |
|------------------------|-------------------------------|-----------------------------------------------------------------|----------------|----------------|----------------------------------|--------------------|--|
|                        | r                             | $\rho(r)$                                                       | i              | buf(i)         |                                  |                    |  |
|                        | $r_a$                         | 40 <sub>pub</sub>                                               | 2              |                | store $(0, [3, r_a])$            |                    |  |
|                        |                               | Memory                                                          |                | $\overline{3}$ | $(r_c = \text{load}([43]))$      |                    |  |
|                        | a                             | $\mu(a)$                                                        | $\overline{4}$ |                | $(r_c = \text{load}([44, r_c]))$ |                    |  |
|                        | 40.43                         | $secretKey_{sec}$                                               |                |                |                                  |                    |  |
|                        |                               | 4447 $pubArr Apub$                                              |                |                |                                  |                    |  |
| Directive              |                               | Effect on <i>buf</i>                                            |                |                |                                  | Leakage            |  |
| execute 3              |                               | $\overline{3} \mapsto (r_c = \text{secretKey}[3]\{\perp, 43\})$ |                |                |                                  | read 43pub         |  |
| execute $\overline{4}$ |                               | $\overline{4} \mapsto (r_c = X\{\perp,a\})$                     |                |                |                                  | read $a_{\rm sec}$ |  |
|                        | execute $\overline{2}$ : addr | $\{\overline{3},\overline{4}\}\notin{buf}$                      |                |                |                                  | rollback,          |  |
|                        |                               | $\overline{2} \mapsto$ store $(0, 43_{\text{pub}})$             |                |                |                                  | $fwd$ 43 $_{pub}$  |  |
|                        |                               | where $a = \text{secretKey}[3]_{\text{sec}} + 44$               |                |                |                                  |                    |  |

Figure 2.6: Example demonstrating a v4 Spectre attack. The store is executed too late, causing later load instructions to use outdated values.

address. The first directive executes the load at  $\overline{4}$ . This load accesses address 43, which matches the store at index  $\overline{2}$ . Since this is the most recent such store and has a resolved value, the load gets the value 12 from this store. The following directive resolves the address of the store at index  $\overline{3}$ . This store also matches address 43. As this store is more recent than store  $\overline{2}$ , this directive triggers a hazard for the load at  $\overline{4}$ , leading to the rollback of the load from the reorder buffer.

*Capturing Spectre.* We now have enough machinery to capture several variants of Spectre attacks.

We discussed how our semantics model Spectre v1 in [Section 2.1](#page-63-0) [\(Figure 2.1\)](#page-64-0). [Figure 2.5](#page-79-1) shows a simple disclosure gadget using forwarding from an out-of-bounds write. In this example, a secret value  $x_{\text{sec}}$  is supposed to be written to *secretKey* at an index  $r_a$  as long as  $r_a$  is within bounds. However, due to branch misprediction, the store instruction is executed despite  $r_a$  being too large. The load instruction at index  $\overline{7}$ , normally benign, now aliases with the store at index  $\overline{2}$ , and receives the secret *x*sec instead of a public value from *pubArrA*. This value is then used as the address of another load instruction, causing  $x_{\text{sec}}$  to leak.

[Figure 2.6](#page-80-0) shows a Spectre v4 vulnerability caused when a store *fails* to forward to a future load. In this example, the load at index  $\overline{3}$  executes before the store at  $\overline{2}$  calculates its address. As a result, this execution loads the outdated secret value at address 43 and leaks it, instead of using the public zeroed-out value that would be written.

## 2.2.5 Aliasing prediction

We extend the memory semantics from the previous section to model aliasing prediction by introducing a new transient instruction  $(r = \text{load}(\vec{r}, (v_{\ell}, j)))^n$ . This instruction represents a *partially resolved* load with speculatively forwarded data. As before, *r* is the target register,  $\vec{r}$ is the list of arguments for address calculation, and *n* is the program point of the physical load instruction. The new parameters are  $v_\ell$ , the forwarded data, and *j*, the index of the originating store.

#### *Forwarding via prediction.*

LOAD-EXECUTE-FORWARDED-GUESSED

\n
$$
buf(i) = (r = \text{load}(\vec{r}v))^n \qquad j < i \qquad \forall k < i : buf(k) \neq \text{fence}
$$
\n
$$
buf(j) = \text{store}(v_{\ell}, \vec{r}v_j) \qquad \text{buf}' = \text{buf}[i \mapsto (r = \text{load}(\vec{r}v, (v_{\ell}, j)))^n]
$$
\n
$$
(p, \mu, n, but) \xrightarrow{\text{execute } i : \text{fwd } j} (p, \mu, n, but')
$$

Rule LOAD-EXECUTE-FORWARDED-GUESSED implements forwarding in the presence of unresolved target addresses. Instead of forwarding the value from a store with a matching address, as in [Section 2.2.4,](#page-74-0) the attacker can now freely choose to forward from *any* store with a resolved value—even if its target address is not known yet. Given a choice of which store *j* to forward from—supplied via directive—the rule updates the reorder buffer with the new partially resolved load and records both the forwarded value  $v_l$  and the buffer index *j* of the store instruction.

*Register resolve function.* We extend the register resolve function  $(buf + i\rho)$  to allow using values from partially resolved loads. In particular, whenever the register resolve function computes the latest resolved assignment to some register *r*, it now considers not only fully resolved value instructions, but also our new partially resolved load: whenever the latest assignment in the buffer is a partially resolved load, the register resolve function returns its value.

We now discuss the execution rules, where partially resolved loads may fully resolve against either the originating store or against memory.

#### *Resolving when originating store is in the reorder buffer.*

LOAD-EXECUTE-ADDR-OK

$$
buf(i) = (r = load(\vec{r}, (v_{\ell}, j)))^n \qquad (buf +_i \rho)(\vec{r}) = \vec{v}_{\ell}
$$
  
\n
$$
[addr(\vec{v}_{\ell})] = a \qquad \ell_a = \vec{u} \qquad but(j) = store(v_{\ell}, \vec{r}_{\ell}) \wedge (\vec{r}_{\ell}) = a' \Rightarrow a' = a)
$$
  
\n
$$
\forall k : (j < k < i) : but(k) \neq store(\_, a) \qquad but' = but[i \mapsto (r = v_{\ell}\{j, a\})^n]
$$
  
\n
$$
(\rho, \mu, n, but) \xrightarrow{\text{fwd } a_{\ell_a}} (\rho, \mu, n, but')
$$

LOAD-EXECUTE-ADDR-HAZARD

$$
buf(i) = (r = load(\vec{n}, (v_{\ell}, j)))^{n'}
$$

$$
(buf +_i \rho)(\vec{n}) = \vec{v_{\ell}} \qquad [addr(\vec{v_{\ell}})] = a \qquad \ell_a = \vec{u_{\ell}} \qquad (buf(j) = store(v_{\ell}, a') \land a' \neq a) \lor
$$

$$
(\exists k : j < k < i \land buf(k) = store(\_, a)) \qquad buf' = buf[j : j < i]
$$

$$
(\rho, \mu, n, buf) \xrightarrow{\text{rollback, fwd } a_{\ell_a}} (\rho, \mu, n', buf')
$$

To resolve  $(r = \text{load}(\vec{r}, (v_{\ell}, j)))^n$  when its originating store is still in *buf*, we calculate the load's actual target address *a* and compare it against the target address of the originating store at *buf*(*j*). If the store is not followed by later stores to *a*, and either (1) the store's address is resolved and its address is indeed *a*, or (2) the store's address is still unresolved, we update the reorder buffer with an annotated value instruction (rule LOAD-EXECUTE-ADDR-OK).

If, however, either the originating store resolved to a *different* address (mispredicted aliasing) or a later store resolved to the same address (hazard), we roll back our execution to just before the load (rule LOAD-EXECUTE-ADDR-HAZARD).

We allow the load to execute even if the originating store has not yet resolved its address. When the store does finally resolve its address, it must check that the addresses match and that the

forwarding was correct. The gray formulas in STORE-EXECUTE-ADDR-OK and STORE-EXECUTE-ADDR-HAZARD [\(Section 2.2.4\)](#page-74-0) perform these checks: For forwarding to be correct, all values forwarded from a store at *buf*(*i*) must have a matching annotated address ( $\forall k > i : j_k = i \Rightarrow a_k = a$ ). Otherwise, if any value annotation has a mismatched address, then the instruction is rolled back  $(j_k = i \land a_k \neq a).$ 

*Resolving when originating store is not in the buffer.* We must also consider the case where we have delayed resolving the load address to the point where the originating store has already retired, and is no longer available in *buf*. If this is the case, and no other prior store instructions have a matching address, then we must check the forwarded data against memory.

LOAD-EXECUTE-ADDR-MEM-MATCH

$$
buf(i) = (r = load(\vec{r}, v_\ell, j))^n
$$
  

$$
j \notin but \qquad (buf +_i \rho)(\vec{r}) = \vec{v}_\ell \qquad \ell_a = \vec{u}_\ell \qquad [addr(\vec{v}_\ell)]] = a
$$
  

$$
\forall k < i : but(k) \neq store(\_, a) \qquad \mu(a) = v_\ell \qquad but' = but[i \mapsto (r = v_\ell \{\bot, a\})^n]
$$
  

$$
(\rho, \mu, n, but') \xrightarrow{\text{read } a_{\ell_a}} (\rho, \mu, n, but')
$$

LOAD-EXECUTE-ADDR-MEM-HAZARD

$$
buf(i) = (r = load(\vec{r}, v_{\ell}, j))^n'
$$
  

$$
j \notin but \qquad (buf +_i \rho)(\vec{r}) = \vec{v}_{\ell} \qquad \ell_a = \square \vec{\ell} \qquad [addr(\vec{v}_{\ell})] = a
$$
  

$$
\forall k < i : but(k) \neq store(\_, a) \qquad \mu(a) = v'_{\ell'} \qquad v'_{\ell'} \neq v_{\ell} \qquad but' = but[j : j < i]
$$
  

$$
(\rho, \mu, n, but) \xrightarrow{\text{rollback}, read \ a_{\ell_a}} (\rho, \mu, n', but')
$$

If the originating store has retired, and no intervening stores match the same address, we must load the value from memory to ensure we were originally forwarded the correct value. If the value loaded from memory matches the value we were forwarded, we update the reorder buffer with a resolved load annotated as if it had been loaded from memory (rule LOAD-EXECUTE-ADDR-MEM-MATCH).

If a store *different* from the originating store overwrote the originally forwarded value, the value loaded from memory may not match the value we were originally forwarded. In this case we roll back execution to just before the load (rule LOAD-EXECUTE-ADDR-MEM-HAZARD).

We demonstrate these semantics in the attack shown in [Figure 2.2.](#page-67-0) An earlier draft of this work [\[37\]](#page-179-1) incorrectly claimed to have a proof-of-concept exploit for this attack on real hardware.

## 2.2.6 Speculation barriers

We extend our semantics with a *speculation barrier* instruction, fence *n*, that prevents further speculative execution until all prior instructions have been retired.

$$
\text{FENCE-RETIRE}
$$
\n
$$
\text{MIN}(buf) = i \qquad \text{buf}(i) = \text{fence} \qquad \text{buf}' = \text{buf} \setminus \text{buf}(i)
$$
\n
$$
(\rho, \mu, n, \text{buf}) \xrightarrow{\text{retire}} (\rho, \mu, n, \text{buf}')
$$

The fence instruction uses SIMPLE-FETCH as its fetch rule, and its rule for retire only removes the instruction from the buffer. It does not have an execute rule. However, fence instructions affect the execution of all instructions in the reorder buffer that come *after* them. In prior sections, execute rules have the highlighted condition  $\forall j < i : buf(j) \neq \text{fence}$ . This condition ensures that as long as a fence instruction remains in *buf*, any instructions fetched after the fence cannot be executed.

We use fence instructions to restrict out-of-order execution in our semantics. Notably, we can use it to prevent attacks of the forms shown in [Figures 2.1,](#page-64-0) [2.5](#page-79-1) and [2.6.](#page-80-0)

*Example*. The example in [Figure 2.7](#page-85-0) shows how placing a fence instruction just after the br instruction prevents the Spectre v1 attack from [Figure 2.1.](#page-64-0) The fence in this example prevents the load instructions at  $\overline{2}$  and  $\overline{3}$  from executing and forces the br to be resolved first. Evaluating the br exposes the misprediction and causes the two loads (as well as the fence) to be rolled back.

|                | Before executing 1               | After  |
|----------------|----------------------------------|--------|
|                | buf[i]                           | buf[i] |
|                | $br(\ge, (4, r_a), 2, (2, 5))$   | jump 5 |
| $\overline{2}$ | fence                            |        |
| 3              | $(r_b = load([40, r_a]))$        |        |
| $\overline{4}$ | $(r_c = \text{load}([44, r_b]))$ |        |

<span id="page-85-0"></span>Figure 2.7: Example demonstrating fencing mitigation against Spectre v1 attacks. The fence instruction prevents the load instructions from executing before the br.

## 2.2.7 Indirect jumps

We introduce a new form of control flow to our semantics, *indirect jumps*, which allow the program to dynamically jump to arbitrary locations. The physical instruction for an indirect jump is jmpi $(\vec{n})$ , where  $\vec{n}$  is a list of operands used to calculate the jump target. The semantics for jmpi are given below:

$$
\text{JMPI-FETCH}
$$
\n
$$
\mu(n) = \text{jmpi}(\vec{n}) \qquad i = \text{MAX}(buf) + 1 \qquad \text{buf}' = \text{buf}[i \mapsto \text{jmpi}(\vec{n}, n')]
$$
\n
$$
(\rho, \mu, n, \text{buf}) \xrightarrow{\text{fetch}: n'} (\rho, \mu, n', \text{buf}')
$$

JMPI-EXECUTE-CORRECT

$$
buf(i) = \text{jmpi}(\vec{n}, n_0) \qquad \forall j < i : \text{buf}(j) \neq \text{fence}
$$
\n
$$
\frac{(buf +_i\rho)(\vec{n}) = \vec{v_\ell} \qquad \ell = \Box \vec{\ell}}{\qquad \qquad [\text{addr}(\vec{v_\ell})] = n_0 \qquad \text{buf'} = \text{buf}[i \mapsto \text{jump } n_0]}
$$
\n
$$
(\rho, \mu, n, \text{buf}) \xleftarrow{\text{jump } n_{0\ell}} (\rho, \mu, n, \text{buf'})
$$

JMPI-EXECUTE-INCORRECT

$$
buf(i) = \text{jmpi}(\vec{n}, n_0) \qquad \forall j < i : \text{buf}[j] \neq \text{fence} \qquad (buf +_i \rho)(\vec{n}) = \vec{v_\ell}
$$
\n
$$
\underbrace{\ell = \Box \vec{\ell}}_{\text{[addr}(\vec{v_\ell})]} = n' \neq n_0 \qquad \text{buf'} = \text{buf}[j : j < i][i \mapsto \text{jump } n']_{\text{[out}}, \qquad \text{(p, µ, n, buf)} \xrightarrow{\text{rollback}, \text{jump } n'_\ell} (\rho, \mu, n', \text{buf'})}
$$

When fetching a jmpi instruction, the schedule guesses the jump target  $n'$ . The rule records the operands and the guessed program point in a new buffer entry. In a real processor, the jump

<span id="page-86-0"></span>

| Registers     |           |                      |                                                                                     |                                |                                                     | Program |                                              |
|---------------|-----------|----------------------|-------------------------------------------------------------------------------------|--------------------------------|-----------------------------------------------------|---------|----------------------------------------------|
| r             |           | $\rho(r)$            |                                                                                     | n                              | $\mu(n)$                                            |         |                                              |
| $r_a$         |           | $1_{\text{pub}}$     |                                                                                     |                                |                                                     |         | $(r_c = load([48, r_a], 2))$                 |
| $r_b$         |           | $8_{\text{pub}}$     |                                                                                     | $\frac{1}{2}$<br>$\frac{3}{2}$ | fence 3                                             |         |                                              |
|               |           | Memory               |                                                                                     |                                | $[mpi([12,r_h])]$                                   |         |                                              |
| $\mathfrak a$ |           | $\mu(a)$             |                                                                                     |                                |                                                     |         |                                              |
|               |           | 4447 $array B_{pub}$ |                                                                                     | 16                             | fence 17                                            |         |                                              |
|               | 484B      |                      | array $Key_{\text{sec}}$                                                            | 17                             |                                                     |         | $(r_d = \text{load}([44, r_c], \frac{18}{})$ |
|               | Directive |                      | Effect on <i>buf</i>                                                                |                                |                                                     |         | Leakage                                      |
|               | fetch     |                      |                                                                                     |                                | $\overline{1} \mapsto r_c = \text{load}(48 + r_a)$  |         |                                              |
|               | fetch     |                      | $\overline{2} \mapsto$ fence                                                        |                                |                                                     |         |                                              |
|               |           |                      | execute $\overline{1}$ $1 \mapsto r_c = \text{Key}[1]_{\text{sec}}$                 |                                |                                                     |         | read 49 <sub>pub</sub>                       |
|               |           |                      | fetch: $\overline{17}$ $\overline{3}$ $\mapsto$ jmpi([12, $r_b$ ], 17)              |                                |                                                     |         |                                              |
|               | fetch     |                      |                                                                                     |                                | $\overline{4} \mapsto r_d = \text{load}([44, r_c])$ |         |                                              |
|               | retire    |                      | $1 \notin \mathit{buf}$                                                             |                                |                                                     |         |                                              |
|               | retire    |                      | $\overline{2} \notin \mathit{buf}$                                                  |                                |                                                     |         |                                              |
|               |           |                      | execute $\overline{4}$ $\overline{4}$ $\mapsto$ $r_d$ $\overline{4}$ $\overline{4}$ |                                |                                                     |         | read $a_{\rm sec}$                           |
|               |           |                      |                                                                                     |                                | where $a = Key[1]_{sec} + 40$                       |         |                                              |

Figure 2.8: Example demonstrating a Spectre v2 attack from a mistrained indirect branch predictor. Speculation barriers are not a useful defense against this style of attack.

target guess is supplied by an indirect branch predictor; as branch predictors can be arbitrarily influenced by an adversary [\[54\]](#page-180-0), we model the guess as an attacker directive.

In the execute stage, we calculate the actual jump target and compare it to the guess. If the actual target and the guess match, we update the entry in the reorder buffer to the resolved jump instruction jump  $n_0$ . If actual target and the guess do not match, we roll back the execution by removing all buffer entries larger or equal to *i*, update the buffer with the resolved jump to the correct address, and set the next instruction.

Like conditional branch instructions, indirect jumps leak the calculated jump target.

*Examples.* The example in [Figure 2.8](#page-86-0) shows how a mistrained indirect branch predictor can lead to disclosure vulnerabilities. After loading a secret value into  $r_c$  at program point  $\overline{1}$ , the program makes an indirect jump. An adversary can mistrain the predictor to send execution to 17 instead of the intended branch target, where the secret value in  $r_c$  is immediately leaked. Because indirect jumps can have arbitrary branch target locations, fence instructions do not prevent these

kinds of attacks; an adversary can simply retarget the indirect jump to the instruction after the fence, as is seen in this example.

## 2.2.8 Function calls

Finally, we present how our semantics models function calls. The physical instructions are call $(n_f, n_{ret})$ , where  $n_f$  is the target program point of the call and  $n_{ret}$  is the return program point; and the return instruction ret. We "decode" calls and returns into multiple instructions, leaving their respective transient forms simply as markers call and ret.

*Call stack.* To track control flow in the presence of function calls, our semantics explicitly maintains a call stack in memory. For this, we use a dedicated register *rsp* which points to the top of the call stack, and which we call the *stack pointer register*. On fetching a call instruction, we update *rsp* to point to the address of the next element of the stack using an abstract operation *succ*. It then saves the return address to the newly computed address. On returning from a function call, our semantics transfers control to the return address at  $r_{sp}$ , and then updates  $r_{sp}$  to point to the address of the previous element using a function *pred*. This step makes use of a temporary register *rtmp*.

We use abstract operations *succ* and *pred* to manipulate *rsp*. On a 32-bit x86 processor with a downward-growing stack,  $op(succ, r_{sp})$  would be implemented as  $r_{sp} - 4$ , while  $op(pred, r_{sp})$ would be implemented as  $r_{sp} + 4$ ; on an upward growing system, the reverse would be true. Note that the stack register *rsp* is not protected from illegal access and can be updated freely.

*Return stack buffer.* For performance, modern processors speculatively predict return addresses. To model this, we extend configurations with a new piece of state called the *return stack buffer* (RSB), written as  $\sigma$ . The return stack buffer contains the expected return address at any execution point. Its implementation is simple: for a call instruction, the semantics pushes the return address to the RSB, while for a ret instruction, the semantics pops the address at the top of

<span id="page-88-0"></span>Program  $\frac{n}{\mu(n)}$   $\frac{1}{\text{call}(\underline{3},\underline{2})}$   $\frac{2}{\text{ret}}$   $\frac{3}{\text{ret}}$ Directive *n buf* σ fetch  $\underline{1} \rightarrow \underline{3} \quad \overline{1} \mapsto \text{call} \qquad \qquad \overline{1} \mapsto \text{push} \; \underline{2}$  $\overline{2} \mapsto r_{sp} = \mathsf{op}(succ, r_{sp})$  $\frac{1}{3} \mapsto \mathsf{store}(2,[r_{sp}])$ fetch  $\frac{3}{4} \rightarrow \frac{2}{4} \mapsto$  ret  $\frac{4}{4} \mapsto pop$  $5 \mapsto r_{tmp} = \text{load}([r_{sp}])$  $\overline{6} \mapsto r_{sp} = \mathsf{op}(pred, r_{sp})$  $\overline{7} \mapsto \textsf{jmpi}([r_{tmp}], 2)$ fetch:  $\mathbf{n}$  2 →  $\mathbf{n}$  8  $\mapsto$  ret 8  $\mapsto pop$  $\overline{9} \mapsto r_{tmp} = \textsf{load}([r_{sp}])$  $\overline{10} \mapsto r_{sp} = \mathsf{op}(pred, r_{sp})$  $\overline{11} \mapsto \text{jmpi}([r_{\text{tmp}}], n)$ 

Figure 2.9: Example demonstrating a ret2spec-style attack [\[97\]](#page-183-0). The attacker is able to send (speculative) execution to an arbitrary program point, shown in bold.

the RSB. Similar to the reorder buffer, we address the RSB through indices and roll it back on misspeculation or memory hazards.

We model return prediction directly through the return stack buffer rather than relying on attacker directives, as most processors follow this simple strategy, and the predictions therefore cannot be (directly) controlled by an attacker.

#### *Calling.*

CALL-DIRECT-FETCH

$$
\mu(n) = \text{call}(n_f, n_{ret})
$$
\n
$$
i = \text{MAX}(buf) + 1 \qquad \text{buf}_1 = \text{buf}[i \mapsto \text{call}][i + 1 \mapsto (r_{sp} = \text{op}(succ, r_{sp}))]
$$
\n
$$
\text{buf}' = \text{buf}_1[i + 2 \mapsto \text{store}(n_{ret}, [r_{sp}])] \qquad \sigma' = \sigma[i \mapsto \text{push } n_{ret}] \qquad n' = n_f
$$
\n
$$
(\rho, \mu, n, \text{buf}, \sigma) \xrightarrow{\leftarrow} (\rho, \mu, n', \text{buf}', \sigma')
$$

CALL-RETIRE

\n
$$
\text{MIN}(buf) = i \qquad \text{buf}(i) = \text{call} \qquad \text{buf}(i+1) = (r_{sp} = v_\ell) \qquad \text{buf}(i+2) = \text{store}(n_{ret}, a_{\ell_a})
$$
\n
$$
\rho' = \rho[r_{sp} \mapsto v_\ell] \qquad \mu' = \mu[a \mapsto n_{ret}] \qquad \text{buf}' = \text{buf}[j : j > i+2]
$$
\n
$$
(\rho, \mu, n, \text{buf}, \sigma) \xrightarrow{\text{write } a_{\ell_a}} (\rho', \mu', n, \text{buf}', \sigma)
$$

On fetching a call instruction, we add three transient instructions to the reorder buffer to model pushing the return address to the in-memory stack. The first transient instruction, call, simply serves as an indication that the following two instructions come from fetching a call instruction. The remaining two instructions advance  $r_{sp}$  to point to a new stack entry, then store the return address *nret* in the new entry. Neither of these transient instructions are fully resolved—they will need to be executed in later steps. We next add a new entry to the RSB, signifying a push of the return address *nret* to the RSB. Finally, we set our program point to the target of the call *n<sup>f</sup>* .

When retiring a call, all three instructions generated during the fetch are retired together. The register file is updated with the new value of *rsp*, and the return address is written to physical memory, producing the corresponding leakage.

The semantics for direct calls can be extended to cover indirect calls in a straightforward manner by imitating the semantics for indirect jumps. We omit them for brevity.

*Evaluating the RSB.* We define a function  $top(\sigma)$  that retrieves the value at the top of the RSB stack. For this, we let  $\llbracket \sigma \rrbracket$  be a function that transforms the RSB stack  $\sigma$  into a stack in the form of a partial map ( $st : \mathcal{N} \to \mathcal{V}$ ) from the natural numbers to program points, as follows: the function  $\lbrack \cdot \rbrack$  applies the commands for each value in the domain of  $\sigma$ , in the order of the indices. For a *push n* it adds *n* to the lowest empty index of *st*. For *pop*, it and removes the value with the highest index in *st*, if it exists. We then define  $top(\sigma)$  as  $st(MAX(st))$ , where  $st = \lbrack \sigma \rbrack$ , and  $\bot$ , if the domain of *st* is empty. For example, if  $\sigma$  is given as  $\emptyset[1 \mapsto push \underline{4}][2 \mapsto push \underline{5}][3 \mapsto pop]$ , then  $\llbracket \sigma \rrbracket = \emptyset[1 \mapsto \underline{4}]$ , and  $top(\sigma) = \underline{4}$ .

#### *Returning.*

RET-FETCH-RSB

$$
\mu(n) = \text{ret} \qquad top(\sigma) = n' \qquad i = \text{MAX}(buf) + 1 \qquad but_1 = but[i \mapsto \text{ret}]
$$
\n
$$
buf_2 = but_1[i + 1 \mapsto (r_{tmp} = \text{load}([r_{sp}]))] \qquad but_3 = but_2[i + 2 \mapsto (r_{sp} = \text{op}(pred, r_{sp}))]
$$
\n
$$
but_4 = but_3[i + 3 \mapsto \text{jmpi}([r_{tmp}], n')] \qquad \sigma' = \sigma[i \mapsto pop]
$$
\n
$$
(\rho, \mu, n, but, \sigma) \underset{\text{tetch}}{\longleftrightarrow} (\rho, \mu, n', but_4, \sigma')
$$

RET-FETCH-RSB-EMPTY

$$
\mu(n) = \text{ret} \qquad \text{top}(\sigma) = \bot \qquad i = \text{MAX}(buf) + 1 \qquad \text{buf}_1 = \text{buf}[i \mapsto \text{ret}]
$$
\n
$$
\text{buf}_2 = \text{buf}_1[i + 1 \mapsto (r_{tmp} = \text{load}([r_{sp}]))] \qquad \text{buf}_3 = \text{buf}_2[i + 2 \mapsto (r_{sp} = \text{op}(\text{pred}, r_{sp}))]
$$
\n
$$
\text{buf}_4 = \text{buf}_3[i + 3 \mapsto \text{jmpi}([r_{tmp}], n')] \qquad \sigma' = \sigma[i \mapsto \text{pop}]
$$
\n
$$
(\rho, \mu, n, \text{buf}, \sigma) \xrightarrow{\text{fetch}: n'} (\rho, \mu, n', \text{buf}_4, \sigma')
$$

RET-RETIRE

$$
\text{MIN}(buf) = i \qquad \text{buf}(i) = \text{ret} \qquad \text{buf}(i+1) = (r_{tmp} = v_{1\ell_1}) \qquad \text{buf}(i+2) = (r_{sp} = v_{2\ell_2})
$$
\n
$$
\text{buf}(i+3) = \text{jump } n' \qquad \rho' = \rho[r_{sp} \mapsto v_{2\ell_2}] \qquad \text{buf}' = \text{buf}[j : j > i+3]
$$
\n
$$
(\rho, \mu, n, \text{buf}, \sigma) \xrightarrow{\sim} (\rho', \mu, n, \text{buf}', \sigma)
$$

On a fetch of ret, the next program point is set to the predicted return address, i.e., the top value of the RSB, *top*(σ). Just as with call, we add the transient ret instruction to the reorder buffer, followed by the following (unresolved) instructions: we load the value at address *rsp* into a temporary register  $r_{tmp}$ , we "pop"  $r_{sp}$  to point back to the previous stack entry, and then add an indirect jump to the program point given by *rtmp*. Finally, we add a *pop* entry to the RSB. As with call instructions, the set of instructions generated by a ret fetch are retired all at once.

When the RSB is empty, the attacker can supply a speculative return address via the directive fetch: *n'*. This is consistent with the behavior of existing processors. In practice, there are several variants on what processors actually do when the RSB is empty [\[97\]](#page-183-0):

<span id="page-91-0"></span>

|          |                  |                  |                                                                                                                                                            | Effect of successive fetch directives            |                                                          |                                                                                                   |                                |  |
|----------|------------------|------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------|----------------------------------------------------------|---------------------------------------------------------------------------------------------------|--------------------------------|--|
|          |                  |                  |                                                                                                                                                            | n                                                | buf                                                      |                                                                                                   | σ                              |  |
|          | Registers        |                  | Program                                                                                                                                                    |                                                  | $3 \rightarrow 5 \quad \overline{3} \mapsto \text{call}$ |                                                                                                   | $3 \mapsto push \underline{4}$ |  |
| r        | $\rho(r)$        | $\boldsymbol{n}$ | $\mu(n)$                                                                                                                                                   |                                                  |                                                          | $\overline{4} \mapsto r_{sp} = \mathsf{op}(succ, r_{sp})$                                         |                                |  |
| $r_b$    | $8_{\text{pub}}$ | $\overline{3}$   | call $(5,4)$                                                                                                                                               |                                                  |                                                          | $5 \mapsto$ store $(4, [r_{sp}])$                                                                 |                                |  |
| $r_{sp}$ | $7C_{pub}$       |                  | fence 4                                                                                                                                                    |                                                  |                                                          | $\underline{5} \rightarrow \underline{6}$ $\overline{6} \mapsto r_d = \text{op}(addr, [12, r_b])$ |                                |  |
|          |                  | $\frac{4}{5}$    | $r_d = \mathsf{op}(addr, [12, r_b], \underline{6}) \quad \underline{6} \rightarrow \underline{7} \quad \overline{7} \mapsto \mathsf{store}(r_d, [r_{sp}])$ |                                                  |                                                          |                                                                                                   |                                |  |
|          |                  | $\underline{6}$  | store $(r_d, [r_{sp}], 7)$                                                                                                                                 | $7 \rightarrow 4 \quad \overline{8} \mapsto$ ret |                                                          |                                                                                                   | $8 \mapsto pop$                |  |
|          |                  | $\tau$           | ret                                                                                                                                                        |                                                  |                                                          | $\overline{9} \mapsto r_{tmp} = \text{load}([r_{sp}])$                                            |                                |  |
|          |                  |                  |                                                                                                                                                            |                                                  |                                                          | $10 \mapsto r_{sp} = op(pred, r_{sp})$                                                            |                                |  |
|          |                  |                  |                                                                                                                                                            |                                                  |                                                          | $11 \mapsto \text{jmpi}([r_{tmp}], 4)$                                                            |                                |  |
|          |                  |                  |                                                                                                                                                            |                                                  | $4 \rightarrow 4$ $12 \mapsto$ fence                     |                                                                                                   |                                |  |
|          |                  |                  | Directive                                                                                                                                                  | Effect on <i>buf</i>                             |                                                          | Leakage                                                                                           |                                |  |
|          |                  |                  | execute 4                                                                                                                                                  | $4 \mapsto r_{sp} = 7B$                          |                                                          |                                                                                                   |                                |  |
|          |                  |                  | execute 6                                                                                                                                                  | $\overline{6} \mapsto r_d = 20$                  |                                                          |                                                                                                   |                                |  |
|          |                  |                  | execute $\overline{7}$ : value $\overline{7} \mapsto$ store $(20, [r_{sp}])$                                                                               |                                                  |                                                          |                                                                                                   |                                |  |
|          |                  |                  | execute $7:$ addr                                                                                                                                          | $\overline{7} \mapsto$ store $(20, 7B)$          |                                                          | fwd 7B                                                                                            |                                |  |
|          |                  |                  | execute 9                                                                                                                                                  | $\overline{9} \mapsto r_{tmp} = 20$              |                                                          | fwd 7B                                                                                            |                                |  |
|          |                  |                  | execute 11                                                                                                                                                 | $\overline{12} \notin \mathit{buf}$              |                                                          | rollback,                                                                                         |                                |  |
|          |                  |                  |                                                                                                                                                            | $11 \mapsto$ jump $20$                           |                                                          | jump 20                                                                                           |                                |  |

Figure 2.10: Example demonstrating "retpoline" mitigation against Spectre v2 attack. The program is able to jump to program point  $12 + r_b = 20$  without the schedule influencing prediction.

- AMD processors refuse to speculate. This can be modeled by defining  $top(\sigma)$  to be a failing predicate if it would result in  $\bot$ .
- $\triangleright$  Intel Skylake/Broadwell processors fall back to using their branch target predictor. This can be modeled by allowing arbitrary *n'* for the fetch: *n'* directive for the RET-FETCH-RSB-EMPTY rule.
- $\triangleright$  "Most" Intel processors treat the RSB as a circular buffer, taking whichever value is produced when the RSB over- or underflows. This can be modeled by having  $top(\sigma)$ always produce an according value, and never producing ⊥.

*Examples.* We present an example of an RSB underflow attack in [Figure 2.9.](#page-88-0) After fetching a call and paired ret instruction, the RSB will be "empty". When one more (unmatched) ret instruction is fetched, since  $top(\sigma) = \perp$ , the program point *n* is no longer set by the RSB, and is instead set by the (attacker-controlled) schedule.

*Retpoline mitigation.* A mitigation for Spectre v2 attacks is to replace indirect jumps with *retpolines* [\[148\]](#page-187-0). [Figure 2.10](#page-91-0) shows a retpoline construction that would replace the indirect jump in [Figure 2.8.](#page-86-0) The call sends execution to program point 5, while adding 4 to the RSB. The next two instructions at 5 and 6 calculate the same target as the indirect jump in [Figure 2.8](#page-86-0) and overwrite the return address in memory with this jump target. When executed speculatively, the ret at 7 will pop the top value off the RSB, 4, and jump there, landing on a fence instruction that loops back on itself. Thus speculative execution cannot proceed beyond this point. When the transient instructions in the ret sequence finally execute, the indirect jump target 20 is loaded from memory, causing a roll back. However, execution is then directed to the proper jump target. Notably, at no point is an attacker able to hijack the jump target via misprediction.

## 2.3 Detecting violations

We develop a tool Pitchfork based on our semantics to check for SCT violations. Pitchfork only exercises a subset of our semantics; it only detects SCT violations stemming from branch misprediction or basic store-forwarding errors [\(Sections 2.2.3](#page-71-0) and [2.2.4\)](#page-74-0). Regardless, Pitchfork still soundly exposes Spectre-PHT and Spectre-STL vulnerabilities.

Pitchfork constructs worst-case schedules to maximize speculation, parametrized by a *speculation bound* which limits the depth of speculation. When encountering conditional branches, Pitchfork examines both possible path outcomes as if they were (mis)predicted), delaying the execution of the branch condition itself as late as possible. To account for load-store forwarding hazards, Pitchfork similarly examines all possible forwarding outcomes for each load instruction. All other instructions are executed eagerly and in order. We formalize the soundness of Pitchfork's schedule construction in more detail in [Appendix B.3.](#page-171-0)

We implement Pitchfork on top of the angr binary-analysis tool [\[138\]](#page-186-0). Pitchfork necessarily inherits the limitations of angr's symbolic execution—for instance, angr concretizes addresses for memory operations instead of keeping them symbolic. Furthermore, exploring every speculative branch and potential store-forward within a given speculation bound leads to an explosion in state space. In our tests, we were able to support speculation bounds of up to 20 instructions, though we can increase this bound to 250 instructions when we disable checks for store-forwarding hazards. Though these bounds do not capture the speculation depth of some modern processors, Pitchfork still correctly finds SCT violations in all our test cases, as well as SCT violations in real-world crypto code.

## 2.3.1 Evaluation procedure

To evaluate Pitchfork on real-world crypto implementations, we use the same case studies as FaCT [\[40\]](#page-179-2), a domain-specific language and compiler for constant-time crypto code. We use FaCT's case studies for two reasons: these implementations have been verified to be (sequentially) constant-time, and their inputs have already been annotated by the FaCT authors with secrecy labels. $<sup>1</sup>$  $<sup>1</sup>$  $<sup>1</sup>$ </sup>

We analyzed both the FaCT-generated binaries and the corresponding C binaries for the case studies. For each binary, we ran Pitchfork without forwarding hazard detection—only looking for Spectre v1 and v1.1 violations—and with a speculation bound of 250 instructions. If Pitchfork did not flag any violations, we re-enabled forwarding hazard detection—looking for Spectre v4 violations—and ran Pitchfork with a reduced bound of 20 instructions. The reduced bound ensured that the analysis was tractable.

<span id="page-93-0"></span><sup>1</sup><https://github.com/PLSysSec/fact-eval>

| <b>Case Study</b>            |   | $C \mid$ FaCT                  |
|------------------------------|---|--------------------------------|
| curve25519-donna             |   |                                |
| libsodium secretbox          | ᠒ |                                |
| OpenSSL ssl3 record validate | ∩ | $\left( \cdot \cdot \right)$ J |
| OpenSSL MEE-CBC              |   |                                |

<span id="page-94-0"></span>**Table 2.3:** SCT violations found by Pitchfork. A  $\hat{\Gamma}$  indicates Pitchfork found an SCT violation. A  $\mathbb{A}$ <sup>f</sup> indicates the violation was found only with forwarding hazard detection.

## 2.3.2 Detected violations

[Table 2.3](#page-94-0) shows our results. Pitchfork did not flag any SCT violations in the curve25519 donna implementations; this is not surprising, as the curve25519-donna library is a straightforward implementation of crypto primitives. Pitchfork did, however, find SCT violations (without forwarding hazard detection) in both the libsodium and OpenSSL codebases. Specifically, Pitchfork found violations in the C implementations of these libraries, in code ancillary to the core crypto routines. This aligns with our intuition that crypto primitives will not themselves be vulnerable to Spectre attacks, but higher-level code that interfaces with these primitives may still leak secrets. Such higher-level code is not present in the corresponding FaCT implementations, and Pitchfork did not find any violations in the FaCT code with these settings. However, with forwarding hazard detection, Pitchfork was able to find vulnerabilities even in the FaCT versions of the OpenSSL implementations. We describe two of the violations Pitchfork flagged next.

*C libsodium secretbox.* The libsodium codebase compiles with stack protection [\[58\]](#page-180-1) turned on by default. This means that, for certain functions (e.g., functions with stack allocated char buffers), the compiler inserts code in the function epilogue to check if the stack was "smashed". If so, the program displays an error message and aborts. As part of printing the error message, the program calls a function \_\_libc\_message, which does printf-style string formatting.

<span id="page-94-1"></span><sup>&</sup>lt;sup>2</sup>Code snippet taken from [https://github.com/lattera/glibc/blob/895ef79e04a953cac1493863bcae29ad85657ee1/](https://github.com/lattera/glibc/blob/895ef79e04a953cac1493863bcae29ad85657ee1/sysdeps/posix/libc_fatal.c) [sysdeps/posix/libc\\_fatal.c](https://github.com/lattera/glibc/blob/895ef79e04a953cac1493863bcae29ad85657ee1/sysdeps/posix/libc_fatal.c)

```
1 for (int cnt = nlist - 1; cnt >= 0; --cnt) {
2 iov[cnt].iov base = (char *) list->str;
\frac{3}{1} \frac{1}{1} ...
4 list = list->next;
5 }
```
**Figure [2](#page-94-1).11:** Vulnerable snippet from \_\_libc\_\_message().<sup>2</sup>

```
1 aesni_cbc_encrypt(/* ... */);
2 // (len _out) is in %r14
\frac{3}{2} secret mut uint32 pad = _out[len _out - 1];
4 public uint32 maxpad = tmppad > 255 ? 255 : tmppad;
5 if (pad > maxpad) {
6 pad = maxpad;
\tau ret = 0; // overwrites \frac{14}{3}8 }
9 // ...
10 _shal_update(/* \ldots */); // can return to line 3
```


[Figure 2.11](#page-95-1) shows a snippet from this function which traverses a linked list. When running the C secretbox implementation speculatively, the processor may misspeculate on the stack tampering check and jump into the error handling code, eventually calling libc message. Again due to misspeculation, the processor may incorrectly proceed through the loop extra times, traversing non-existent links, eventually causing secret data to be stored into list instead of a valid address (line 4). On the following iteration of the loop, dereferencing list (line 2) causes a secret-dependent memory access.

*FaCT OpenSSL MEE.* In [Figure 2.12,](#page-95-2) we show the code from the FaCT port of OpenSSL's authenticated encryption implementation. The FaCT compiler transforms the branch at lines 5-7 into straight-line constant-time code, since the variable pad is considered secret.

Initially, register  $\frac{14}{14}$  holds the length of the array \_out. The processor leaks this value due to the array access on line 3; this is not a security violation, as the length is public. On line 7,

<span id="page-95-0"></span><sup>3</sup>Code snippet taken from [https://github.com/PLSysSec/fact-eval/blob/888bc6c6898a06cef54170ea273de91868ea](https://github.com/PLSysSec/fact-eval/blob/888bc6c6898a06cef54170ea273de91868ea621e/openssl-mee/20170717_latest.fact) [621e/openssl-mee/20170717\\_latest.fact](https://github.com/PLSysSec/fact-eval/blob/888bc6c6898a06cef54170ea273de91868ea621e/openssl-mee/20170717_latest.fact)

the value of  $r14$  is overwritten with 0 if pad > maxpad, or 1 (the initial value of ret) otherwise. Afterwards, the processor calls \_sha1\_update.

To return from \_sha1\_update, the processor must first load the return address from memory. When forwarding hazard detection is enabled, Pitchfork allows this load to speculatively receive data from stores *older* than the most recent store to that address (see [Section 2.2.4\)](#page-74-0). Specifically, it may receive the prior value that was stored at that location: the return address for the call to aesni cbc encrypt.

After the speculative return, the processor executes line 3 a second time. This time,  $\epsilon$ r14 does not hold the public value len \_out; it instead holds the value of ret, which was derived from the secret condition pad > maxpad. The processor thus accesses either  $\text{\_out}[0]$  or  $\text{\_out}[-1]$ , leaking information about the secret value of pad via cache state.

# 2.4 Related work

Prior work on modeling speculative or out-of-order execution is concerned with correctness rather than security [\[4,](#page-176-0) [89\]](#page-183-1). We instead focus on security and model side-channel leakage explicitly. Moreover, we abstract away the specifics of microarchitectural features, considering them to be adversarially controlled.

Disselkoen et al. [\[51\]](#page-180-2) explore speculation and out-of-order effects through a relaxed memory model. Their semantics sits at a higher level, and is orthogonal to our approach. They do not define a semantic notion of security that prevents Spectre-like attacks, and do not provide support for verification.

Mcilroy et al. [\[100\]](#page-184-0) reason about micro-architectural attacks using a multi-stage pipeline semantics (though they do not define a formal security property). Their semantics models branch predictor and cache state explicitly. However, they do not model the effects of speculative barriers, nor other microarchitecture features such as store-forwarding. Thus, their semantics can only capture Spectre v1 attacks.

Both Guarnieri et al. [\[64\]](#page-181-0) and Cheang et al. [\[41\]](#page-179-3) define speculative semantics that are supported by tools. Their semantics handle speculation through branch prediction—where the predictor is left abstract—but do not capture more general out-of-order execution nor other types of speculation. These works also propose new semantic notions of security (different from SCT); both essentially require that the speculative execution of a program not leak more than its sequential execution. If a program is sequentially constant-time, this additional security property is equivalent to our notion of speculative constant-time. Though our property is stronger, it is also simpler to verify: we can directly check SCT without first checking if a program is sequentially constant-time. And since we focus on cryptographic code, we directly require the stronger SCT property.

Balliu et al. [\[63\]](#page-181-1) define a semantics in a style similar to ours. Their semantics captures various Spectre attacks, including an attack similar to our alias prediction example [\(Figure 2.2\)](#page-67-0), and a new attack based on their memory ordering semantics, which our semantics cannot capture.

Finally, several tools detect Spectre vulnerabilities, but do not present semantics. The oo7 static analysis tool [\[155\]](#page-188-1), for example, uses taint tracking to find Spectre attacks and automatically insert mitigations for several variants. Wu and Wang [\[161\]](#page-188-2), on the other hand, perform cache analysis of LLVM programs under speculative execution, capturing Spectre v1 attacks.

## 2.5 Conclusion

We introduced a semantics for reasoning about side-channels under adversarially controlled out-of-order and speculative execution. Our semantics capture existing transient execution attacks—namely Spectre—but can be extended to future hardware predictors and potential attacks. We also defined a new notion of constant-time code under speculation—speculative constant-time

(SCT)—and implemented a prototype tool to check if code is SCT. Our prototype, Pitchfork, discovered new vulnerabilities in real-world crypto libraries.

There are several directions for future work. Our immediate plan is to use our semantics to prove the effectiveness of existing countermeasures (e.g., retpolines) and to justify new countermeasures.

# Acknowledgements

We thank the anonymous PLDI and PLDI AEC reviewers and our shepherd James Bornholt for their suggestions and insightful comments. We thank David Kaplan from AMD for his detailed analysis of our proof-of-concept exploit that we incorrectly thought to be abusing an aliasing predictor. We also thank Natalie Popescu for her aid in editing and formatting the original published paper. This work was supported in part by gifts from Cisco and Fastly, by the NSF under Grant Number CCF-1918573, by ONR Grant N000141512750, and by the CONIX Research Center, one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA.

[Chapter 2,](#page-61-0) in part, is a reprint of the material as it appears in 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '20). Cauligi, Sunjay; Disselkoen, Craig; v. Gleissenthall, Klaus; Tullsen, Dean; Stefan, Deian; Rezk, Tamara; Barthe, Gilles, ACM, 2020. The dissertation author was the primary investigator and author of this paper.

# Chapter 3

# Towards Verified Spectre-Resistant SFI Sandboxing

*In which we build to new heights.*

Speculative constant-time (SCT) is difficult to achieve without some additional structure. One way we can approach SCT is by placing programs inside a "speculation sandbox": We prevent them from speculatively accessing any data that they would not otherwise be able to touch. If a given program is already sequentially constant-time—perhaps having been compiled by FaCT—then it will certainly be SCT after being sandboxed.

One popular technique for sandboxing untrusted code is *Software-based Fault Isolation* (SFI) [\[144\]](#page-187-1). Web browsers and cloud providers, for example, rely on SFI-based sandboxes to prevent buggy or malicious code from corrupting the memory of the host and other sandboxes [\[68,](#page-181-2) [101,](#page-184-1) [166\]](#page-189-0). Unfortunately, untrusted code can leverage speculative execution to break out of the sandbox and access trusted memory regions, thus making existing SFI implementations vulnerable to Spectre attacks [\[77,](#page-182-1) [86\]](#page-182-0).

Researchers have proposed different approaches to mitigate Spectre attacks in SFI-style sandboxes [\[78,](#page-182-2) [111,](#page-184-2) [137\]](#page-186-1). However, these are best-effort proposals: They rely on carefully combining several intricate software protections and hardware extensions to prevent unsafe

speculative behaviors. It is unclear whether the combination of these countermeasures work as intended and so, in practice, these approaches may fail to provide the expected security guarantees against Spectre attacks.

In this chapter, we develop principled foundations to build reliable sandboxing mechanisms against Spectre attacks. Towards this goal, we have formulated security properties to formally capture the essence of Spectre SFI attacks, and have already uncovered bugs in the implementation of the *Swivel* SFI system [\[111\]](#page-184-2). We investigate Swivel's security claims and show which Spectre attacks it can soundly mitigate and for which it falls short.

## 3.1 Formal model

To study SFI in the context of speculative execution attacks, we focus on a simple assembly-style language, *ZFI=* $\Omega$ . We present the syntax of ZFI= $\Omega$ , then formalize its architectural and speculative semantics.

## 3.1.1 Syntax

The syntax of ZFI $\equiv$  programs is given in [Figure 3.1.](#page-102-0) In ZFI $\equiv$   $\gamma$ , expressions are constructed by combining immediate values  $\nu$  and registers  $r$  using basic arithmetic operations ⊕. ZFI= $\bigcirc$  supports standard control-flow instructions (direct and indirect jumps, function calls and returns), register assignments ( $r := e$ ), and memory loads ( $r' := * (r + e)$ ) and stores  $({}^{*}(r+e) := e')$ . Memory instructions always access an offset *e* from a base register *r*. ZFI also supports dedicated instructions *flush* (e.g., clearing predictor state) and *endbranch* (e.g., control-flow integrity checks), which we use to model countermeasures against Spectre attacks.

### 3.1.2 Architectural semantics

We first cover the *architectural semantics* of  $ZFI = \mathcal{D}$ , which models the execution of our basic assembly programs *without* any speculative behavior. The semantics is defined in terms of architectural configurations Ψ. Each configuration Ψ is a quadruple consisting of a program *P* mapping values to instructions, a program counter  $pc \in V$ , a register file  $Reg : \mathbb{R} \to V$  mapping registers to values, and a memory  $Mem : \mathbb{V} \to \mathbb{V}$  that maps memory addresses to values. We use dot-notation to access a context's elements, e.g., Ψ.*Mem* denotes the memory associated with Ψ. We use bracket-notation to update an element within a context, e.g.,  $\Psi$ {*Reg* := *Reg*<sup>'</sup>} denotes the context obtained by updating the register file to  $Reg'$ . Furthermore,  $\Psi[s]$  denotes that *s* is the instruction pointed by the current program counter and  $\Psi^{++}$  denotes the context obtained by incrementing  $\Psi$ 's program counter by 1.

The architectural semantics is formalized by the  $\rightarrow$  relation in [Figure 3.2,](#page-103-0) which describes how architectural contexts are modified during the computation. In the rules,  $\llbracket e \rrbracket \Psi$  denotes the value of expression *e* in the context of Ψ, and *rStk* and *rHeap* represent the unique *stack pointer* and *heap pointer* registers.

## 3.1.3 Attackers and observations

To represent the power of attackers to observe and exfiltrate secret data, we have our semantics emit *leakage observations* that represent side-channel information an attacker can glean. The observations emitted by different instructions depends on the *leakage model* we wish to consider. We consider the following three leakage models, each giving increasing power to an attacker:

- *dmem*, where attackers can observe the state of the data cache,
- *ct*, where attackers can observe leaks considered by the *constant-time* paradigm [\[36\]](#page-179-4), and
- *arch*, where attackers can observe all values retrieved from memory [\[65\]](#page-181-3).

<span id="page-102-0"></span>

|                                            |  | <b>Basic types</b>                |  |                                         |
|--------------------------------------------|--|-----------------------------------|--|-----------------------------------------|
|                                            |  | $(Values) \quad v \in$            |  | V                                       |
|                                            |  | (Registers) $r \in \mathbb{R}$    |  |                                         |
|                                            |  | <i>(Operators)</i> $\oplus$ $\in$ |  | $\oplus$                                |
| Syntax                                     |  |                                   |  |                                         |
| $(Expressions)$ $e \in v   r   e \oplus e$ |  |                                   |  |                                         |
| (Instructions) $s \in r := e$              |  |                                   |  | (assignments)                           |
|                                            |  |                                   |  | $ r := * (r + e)$ (memory load)         |
|                                            |  |                                   |  | $\vert \cdot (r+e) := e$ (memory store) |
|                                            |  | $\mathsf{imp} \pm \mathsf{v}$     |  | (unconditional jump)                    |
|                                            |  | $\mathop{imp}\pm v$ if e          |  | (conditional jump)                      |
|                                            |  | $\lim p$ r                        |  | (indirect jump)                         |
|                                            |  | $call \pm v$                      |  | (direct call)                           |
|                                            |  | call r                            |  | (indirect call)                         |
|                                            |  | ret                               |  | (return)                                |
|                                            |  | $\mathit{fush}$                   |  | (BTB state flush)                       |
|                                            |  | endbranch                         |  | (CET "endbranch")                       |

Figure 3.1: Syntax of the ZFI $\equiv$  2 language.

The *dmem* model is the weakest of the three models, and considers the data cache as the only viable leakage channel. In this model, an attacker can observe cache state—specifically, the data cache—using attacks such as PRIME+PROBE [\[146\]](#page-187-2), but cannot determine the control flow trace of a program. In the *ct* model, we consider an attacker that can observe the standard *constant-time* leakages [\[36\]](#page-179-4) via timing or other microarchitectural leaks [\[62,](#page-181-4) [105,](#page-184-3) [164\]](#page-189-1). The data cache as well as the control flow trace are visible to the attacker in this model. Finally, in the *arch* model, we assume the attacker observes all values loaded from memory. Since the initial memory is the source of all values in the program, this is equivalent to an attacker seeing the full trace of all values during execution [\[65\]](#page-181-3).

<span id="page-102-1"></span>We formalize each leakage model by a function  $LEAKS(\Psi)$  that takes as input a configuration Ψ[*insn*] and outputs a sequence of observations for each *jump*, *load*, or *store* operation that occurs during the semantic execution rule for *insn*; [Table 3.1](#page-104-0) informally illustrates our leakage models. For example, LEAKS( $\Psi[ret]$ ) under the *ct* model produces two observations:  $v_{Stk}$ , for loading the return address; and *Mem*[ $v_{Stk}$ ], for jumping to that location (see [RETURN](#page-103-1) in [Figure 3.2\)](#page-103-0).

<span id="page-103-0"></span>

<span id="page-103-1"></span>**Figure 3.2:** Architectural semantics for  $ZFI = \bigcirc$ .

Finally, we include a structure *Obs* in our configuration to collect the sequence of leakage observations during execution. We update *Obs* with each architectural step using the relation  $\rightarrow$ <sub>trace</sub> induced by the following rule:

$$
TRACE
$$
  
\n
$$
\Psi \rightarrow \Psi' \qquad Obs' = \Psi.Obs + \text{LEAKS}(\Psi)
$$
  
\n
$$
\Psi[\text{insn}] \rightarrow_{\text{trace}} \Psi' \{Obs'\}
$$

We refer to this extended relation as  $\rightarrow$  for brevity, as it merely adds bookkeeping to the semantics.

Table 3.1: Leakage models.

<span id="page-104-0"></span>

| $LEAKS(\cdot)$                           | dmem       | <sub>ct</sub> | arch             |
|------------------------------------------|------------|---------------|------------------|
| any jump $pc := v$                       |            | ν             |                  |
| any load <i>Mem</i> [ $v_{addr}$ ] = $v$ | $v_{addr}$ | $v_{addr}$    | $V_{addr}$ , $V$ |
| any store $Mem[v_{addr}] := v$           | $v_{addr}$ | $V_{addr}$    | $v_{addr}$       |

<span id="page-104-2"></span><span id="page-104-1"></span>SPEC-[PREDICT](#page-105-0)

$$
\text{ISCONTROLFLOW}(insn)
$$
\n
$$
\text{pc}', \text{ystate}' = \text{Oracle}(insn, \Psi, pc, \Psi, Reg, \Psi, \text{ystate}) \qquad \Psi \to \Psi' \qquad \text{correct} = (pc' = \Psi', pc)
$$
\n
$$
\Psi[insn] \leadsto \Psi' \{pc', \text{ystate}', \text{m}{\text{in}}\} = \Psi.\text{m}{\text{ispredicted}} \lor \neg \text{correct}
$$
\n
$$
\text{SPEC-STEP}
$$
\n
$$
\neg \text{ISCONTROLFLOW}(insn) \qquad \text{insn} \neq \text{f} \text{u} \text{s} \text{h} \qquad \Psi \to \Psi'
$$
\n
$$
\Psi[insn] \leadsto \Psi'
$$

<span id="page-104-3"></span>**Figure 3.3:** Speculative semantics for  $ZFI=\gamma$ .

## 3.1.4 Speculative semantics

To reason about speculative leaks, we equip  $ZFI = \gamma$  with a speculative semantics that captures the effects of speculatively executed instructions.

We model microarchitectural predictors using a *prediction oracle* which abstracts away from the microarchitectural prediction details. The oracle is defined in terms of a set of oracle states µ*state* (which contains a designated initial state ⊥) and a function

$$
Oracle(insn, pc, Reg, \mu state)
$$

that, given the current instruction, the current *pc*, the register file *Reg*, and the current oracle state, produces the prediction pc and an updated oracle state  $\mu$ *state'* (which is then used in following predictions).

The speculative semantics is formalized by the relation  $\rightsquigarrow$  given in [Figure 3.3.](#page-104-1) In the rules defining  $\sim$  configurations, Ψ is extended to store the *µstate* of the prediction oracle (which is updated throughout the computation) as well as a simple flag *mispredicted* that is set as soon as an oracle prediction is incorrect.

<span id="page-105-0"></span>Our speculative semantics consists of three rules: The SPEC-[PREDICT](#page-104-2) rule describes the execution of control-flow statements, where the prediction oracle is invoked to obtain the new program counter and predictor state; the correctness of the prediction is recorded in the flag *mispredicted*. The SPEC-[FLUSH](#page-110-0) rule (described in [Section 3.2.4\)](#page-109-0) models the execution of *flush* instructions, which reset the predictor state to  $\perp$ . Finally, the [SPEC](#page-104-3)-STEP handles the remaining statements by updating the configuration according to the architectural semantics  $\rightarrow$ .

Unlike prior semantics [\[36,](#page-179-4) [64\]](#page-181-0), our language does not have any from of speculative rollback. Instead, we track the speculative state through the *mispredicted* flag, which persists in the configuration for the duration of the program.

# 3.2 Formalizing SFI security

Building atop the semantics for  $ZFI = \gamma$ , we define what it means for a program to be *speculatively secure*. We examine the security properties that Swivel claims to provide, formalizing them in terms of our formal security property and investigate whether Swivel can soundly uphold these properties.

## 3.2.1 Non-interference

We define the security of  $ZFI = \pi$  programs as a form of *non-interference property*. A program is *non-interferent* if an attacker cannot distinguish between two executions that differ only in their secret values. Formally, we define an *equivalence relation* on configurations: Two configurations are equivalent if and only if they differ only in their secret values. Then, if two equivalent configurations produce identical leakage observations, they are indistinguishable to an attacker.

Definition 3.2.1 (Speculative leakage security). *A program P is* speculatively secure *(up to n steps) with respect to an equivalence*  $\approx$  *and a given leakage model m if:* 

$$
\Psi_1 \approx \Psi_2
$$
 and  $\Psi_1 \leadsto^n \Psi'_1$   
and  $\Psi_2 \leadsto^n \Psi'_2$   
 $\implies \Psi'_1.Obs = \Psi'_2.Obs.$ 

When dealing with speculative execution, we can define what is secret (and thus our equivalence relation) in two different ways: If we already have an idea of which values in memory should not be leaked to an attacker, we can define an explicit *secrecy policy* that states which addresses are public and secret. Alternatively, we can define secrets to be any values that are not *already* observable by an attacker during architectural execution; that is, that speculative execution leaks *no additional* information to an attacker.

**Definition 3.2.2** (Policy equivalence,  $\approx_{\pi}$ ).  $\Psi_1$  *and*  $\Psi_2$  *are* equivalent *with respect to a secrecy policy* π *iff:*

$$
\forall v_{addr} \in \pi : \Psi_1.Mem[v_{addr}] = \Psi_2.Mem[v_{addr}]
$$

*and all other structures in*  $\Psi_1$  *and*  $\Psi_2$  *are syntactically equal. We write this as*  $\Psi_1 \approx_{\pi} \Psi_2$ *.* 

**Definition 3.2.3** (Architectural equivalence,  $\approx_m$ ).  $\Psi_1$  *and*  $\Psi_2$  *are* architecturally equivalent *(up to n steps) with respect to a leakage model m if:*

$$
\Psi_1 \rightarrow^n \Psi_1^*
$$
 and  $\Psi_2 \rightarrow^n \Psi_2^*$   
 $\implies \Psi_1^* . Obs = \Psi_2^* . Obs.$ 

*and all structures other than Mem in*  $\Psi_1$  *and*  $\Psi_2$  *are syntactically equal. We write this as*  $\Psi_1 \approx_m^n \Psi_2$ .

## 3.2.2 SFI security properties

Swivel enforces two distinct notions of security. First, the host application does not trust the individual sandboxes: Swivel must prevent *breakout attacks*, where a sandbox accesses data outside of its defined memory regions. Second, Swivel's sandboxes are *mutually distrusting*: Swivel must prevent *poisoning attacks*, where an attacker is able to leak secrets from a victim sandbox. We can formalize both of these properties in terms of speculative leakage security.

The first property we formalize captures *sandbox breakout* attacks. A sandbox breakout occurs when a malicious sandbox is able to directly access the contents of memory outside of its own memory segments, e.g., from the host application or from another sandbox. As an example, the following program has a possible breakout attack:



Even though *architecturally* the final load is safe, as the two conditions are mutually exclusive, *speculatively* we might (mis)predict and enter both conditional blocks anyway. An attacker can exploit this if it can control  $r_A$ , as under these conditions the value in  $r_A$  is incorrectly used as the heap base address.

Formally, to prevent breakout attacks, we want non-interference under the *arch* leakage model. We use equivalence with respect to a policy  $\pi$  that only defines the sandbox's own memory segments to be public. By using the *arch* model, we consider even *accessing* a secret value to be a successful attack; since our policy  $\pi$  only considers the sandbox memory itself to be public, our property fully captures sandbox breakout attacks.

The second property we formalize captures what we term *poisoning attacks*. Even if a sandbox protects its own secrets from leaking architecturally, it may be speculatively *poisoned*
and still leak these secrets on mispredicted execution paths. We present the following simple example, where *X* and *Y* are arrays of length 64 in the sandbox's heap and *r<sup>A</sup>* is an index into *X*.



Under architectural execution, any value within *X* may be leaked due to the final memory access, but values outside of *X* are not leaked due to the initial conditional check. However, during speculative execution, we may incorrectly predict that the branch should fall through even when *r<sup>A</sup>* is out-of-bounds for *X*. If an attacker is able to control the value of *rA*, it can then leak any value in the victim sandbox's heap.

Since we do not know which memory locations a sandbox developer considers secret within their sandbox, we assume that sandboxed programs are architecturally constant-time, and impose non-interference using architectural equivalence under the *ct* leakage model. This way we can be certain that the sandbox, at the very least, leaks no more information than its architectural execution would.

Swivel offers two different implementations to mitigate these attacks: The first approach, Swivel-SFI, is intended for current x86 processors and relies heavily on rewriting control flow constructs. The second approach, Swivel-CET, relies on the Control-flow Enforcement Technology (CET) extensions developed by Intel in their latest hardware [\[136\]](#page-186-0).

We cover some useful properties common to both implementations, then examine whether each implementation in turn can soundly prevent breakout and poisoning attacks.

#### 3.2.3 Establishing security

Since Swivel only operates on valid WebAssembly programs, we can make certain assumptions about the structure of our initial programs. For example, the stack region (represented in ZFI= $\mathbb{Z}$  as *Mem*[ $r_{Stk}$  +  $e_{off}$ ]) is only used for local variables and register spills; all stack loads and stores use constant (immediate) offsets from the stack pointer (i.e., *eoff* for *rStk* is always a simple value).

Furthermore, Swivel modifies the WebAssembly compiler to ensure the security of memory segment registers: The heap pointer ( $r_{Heap}$  in  $ZFI = \mathcal{D}$ ) is never spilled to the stack, and the stack pointer (*rStk*) is only modified when establishing function stack frames.

*Linear blocks.* A fundamental building block of Swivel's mitigations is *linear blocks*. A linear block is a sequence of instructions ending in *any* control flow instruction. In our execution model, even during speculative execution, we can assume that all instructions within a linear block are executed sequentially in order. Thus linear blocks allow us to establish local invariants: E.g., if a heap offset is truncated to the size of the heap (e.g., via an arithmetic masking operation) at the beginning of a linear block, we can assume it will still be safe to use for the rest of the block.

*Chaining linear blocks*. If we can show that a program, upon leaving any linear block, will always land on the start of a new linear block, then we can inductively extend certain local block invariants to cover the whole program—in particular, invariants about memory safety. For example, if we show that within any linear block, all heap offsets are masked before they are used, then we can inductively show that all heap offsets in the program are safe.

#### 3.2.4 Swivel-SFI

Swivel-SFI provides security, somewhat counterintuitively, by replacing all non-trivial control flow with indirect jumps. Conditional jumps are emulated by selecting the target block's address based on the relevant condition; calls and returns are replaced with instructions that save return addresses to a *separate stack*, distinct from the existing stack memory region and with its own dedicated stack pointer register.

By converting all control flow to indirect jumps, Swivel-SFI can protect speculative control flow by flushing the indirect jump predictor (or *BTB* for *Branch Target Buffer* [\[35\]](#page-178-0)) state at the start of the program:

SPEC-[FLUSH](#page-105-0)

$$
\Psi[\mathit{flush}] \leadsto \Psi^{++} \{ \mathit{pstate} := \bot \}
$$

Since the only relevant predictor in Swivel-SFI is the BTB, we treat *flush* as clearing the entire µ*state* to the empty state ⊥. Flushing µ*state* will not prevent misprediction—it does, however, limit an attacker attempting to mistrain victim predictors. Since BTB predictions have no state to rely on beyond the program itself, any given jump instruction can only be trained to architecturally valid targets for that instruction.

*Breakout security.* Swivel masks all memory operations within a linear block, we need only show that Swivel-SFI executes programs as a sequence of linear blocks. Since we flush the predictor state at the start of the program, we assume that the BTB can only be trained on valid jump targets; thus Oracle $(\cdot)$  will only provide values for *pc* that were *correct* at least once. Since all valid jump targets are linear blocks, we can be sure that predicted jump targets always land on linear blocks.

**Poisoning security.** Unfortunately, even if we assume that the BTB always predicts valid targets, we cannot prove security from poisoning attacks. As a trivial example, consider the program demonstrating a poisoning attack in [Section 3.2.2.](#page-107-0) Even after it is converted to use an indirect jump to replace the conditional branch, it may still mispredict and execute the vulnerable loads—flushing the BTB does not prevent mispredictions from happening. However, by flushing the BTB, Swivel-SFI claims to prevent an attacker from *actively* mistraining a predictor—i.e., an attacker cannot force the victim sandbox to mispredict [\[111\]](#page-184-0). Our framework does not (yet) distinguish active attackers in its security model, and so cannot verify this claim.

#### 3.2.5 Swivel-CET

The Swivel-CET implementation makes use of two features from Intel's CET hardware extensions: The *endbranch* instruction and the *shadow stack*. We formalize the CET hardware extensions as an augmented step relation  $\leadsto_{cet}$  built on top of our prior speculative relation  $\leadsto$ :

$$
\text{spec-cet-step}
$$
\n
$$
\neg \text{ISCONTROLFLOW}(insn) \qquad \text{insn} \notin \{call \cdot, ret\} \qquad \Psi \leadsto \Psi'
$$
\n
$$
\Psi[insn] \leadsto_{cet} \Psi'
$$

*Breakout security.* The *endbranch* instruction provides *forward-edge control-flow integrity* (CFI): Every control flow instruction (except *ret*) must land on an *endbranch* instruction, even when executing speculatively. For most control flow instructions, the augmented semantics simply ensures that the following instruction is indeed *endbranch*.

**SPEC-CET-ENDBRANCH**

\n**ISCONTROLELOW(insn)** 
$$
insn \notin \{call \cdot, ret\}
$$
  $\Psi \leadsto \Psi'$   $\Psi'[endbranch]$ 

\n $\Psi[insn] \leadsto_{cet} \Psi'$ 

For call and return instructions, CET provides a *shadow stack*: All calls and returns, in addition to pushing and popping return addresses off the regular stack, also push and pop return addresses on a separate protected memory region. On a *ret*, the processor will only jump to a predicted return location if it agrees with the address popped from the shadow stack [\[136\]](#page-186-0).

$$
\mathbf{SPEC-CET-CALL}
$$
\n
$$
\mathbf{\Psi} \leadsto \mathbf{\Psi}' \qquad \mathbf{\Psi}'[endbranch] \qquad \mathbf{v}_{SStk} = [r_{SStk} - 1] \mathbf{\Psi}
$$
\n
$$
\mathbf{\Psi}[call \cdot] \leadsto_{cet} \mathbf{\Psi}' \{ \text{Mem}[v_{SStk}] := pc + 1 ,
$$
\n
$$
\text{Reg}[r_{SStk}] := v_{SStk} \qquad \}
$$

SPEC-CET-RETURN

$$
\mathbf{\Psi} \rightsquigarrow \mathbf{\Psi'} \qquad v_{SStk} = [r_{SStk}] \mathbf{\Psi} \qquad \mathbf{\Psi'}, pc = \mathbf{\Psi}.Mem[v_{SStk}]
$$

$$
\mathbf{\Psi}[ret] \rightsquigarrow_{cet} \mathbf{\Psi'} \{ Reg[r_{SStk}] := v_{SStk} + 1\}
$$

The register  $r_{SStk}$  is the protected pointer to the latest shadow stack entry.

As with Swivel-SFI, Swivel-CET masks all memory operations within a linear block. By placing *endbranch*es only at the tops of linear blocks and by relying on the CET shadow stack, Swivel-CET ensures that programs always execute as a chain of linear blocks.

*Poisoning security.* To mitigate poisoning attacks, Swivel-CET constructs a *register interlock* at every linear block transition. The register interlock detects whether speculative control flow has been mispredicted, and if so, clears all the memory base registers (i.e., *rHeap* and *rStk*). By doing so, all memory operations following a misprediction are directed to invalid addresses near the address 0. Memory accesses to this faulting page will not leak the address as they will not create a cache entry—we treat this behavior as a special exception to our established leakage models.

The interlock is implemented as follows: We first give each linear block in the program a unique label. At the end of each block we dynamically calculate the label of the target block without branching. For example, at a conditional branch, we use the same condition expression to select between the two target block labels. When we arrive at the new block, we compare the calculated label to the label of executing block. If the labels do not match, we set  $r_{Heap}$  and  $r_{Stk}$  to ⊥.

With register interlocks in place, we have the following lemma: If Ψ.*mispredicted* is true, then all following memory operations will fail without leaking (per our earlier assumption about near-zero addresses).

However, while this prevents leaking via memory operations, this does not stop leakages via control flow. For example, if a sandbox secret is already in a register before we mispredict, then a later linear block may branch on this register, leaking the secret value. Thus we can only

prove poisoning security for Swivel-CET with respect to the weaker *dmem* leakage model instead of the stronger *ct* leakage model.

## 3.3 Conclusion

We present the first formal framework for SFI security in the face of Spectre attacks. Our language,  $ZFI = \mathcal{D}$ , is expressive enough to verify the security claims of the Swivel SFI system; by formalizing Swivel's security properties, we reveal which of its security claims it soundly upholds, as well as the explicit assumptions about hardware execution that Swivel relies on.

# Acknowledgements

This work was supported in part by gifts from Cisco and Intel; by the NSF under Grant Numbers CNS-1514435, CCF-1918573, and CAREER CNS-2048262; by the Community of Madrid under the project S2018/TCS-4339 BLOQUES; by the Spanish Ministry of Science, Innovation, and University under the project RTI2018-102043-B-I00 SCUM and the Juan de la Cierva-Formación grant FJC2018-036513-I; by the German Federal Ministry of Education and Research (BMBF) through funding for the CISPA-Stanford Center for Cybersecurity; and by the CONIX Research Center, one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA.

[Chapter 3,](#page-99-0) in part, is currently being prepared for submission for publication of the material. Cauligi, Sunjay; Guarnieri, Marco; Mehta, Aastha; Moghimi, Daniel; Narayan, Shravan; Stefan, Deian; Vahldiek-Oberwagner, Anjo; Vassena, Marco. The dissertation author was the primary investigator and author of this paper.

# Chapter 4

# Practical Foundations for Spectre Defenses

*Or, a view from the sky.*

As we have seen throughout this dissertation, *program semantics* and *formal security policies* can help us achieve *provable security guarantees*. These policies help us carefully and explicitly spell out our assumptions about the attacker's strength and ensure that our tools are sound with respect to this class of attackers—e.g., that Spectre vulnerability-detection or -mitigation tools find and mitigate the vulnerabilities they claim to mitigate and find.

Alas, not all foundations are equally practical. The systems presented here, as well as other similar frameworks in the field, all explore different design choices—many of which have important ramifications on defense tools and the software they produce or analyze. For instance, one key choice is the *leakage model* of the semantics, which determines what the attacker is allowed to observe. Another choice is the specific *execution model*, which simultaneously captures the attacker's strength and which Spectre variants the resulting analysis (or mitigation) tool can reason about. These choices in turn determine which *security policies* can be verified or enforced by these tools.

While formal design decisions fundamentally impact the soundness and precision of Spectre analysis and mitigation tools, they have not been systematically explored by the security

community. For example, while there are many choices for a leakage model, the constanttime [\[15\]](#page-177-0) and sandbox isolation [\[65\]](#page-181-0) models are the most pragmatic; leakage models that only consider the data cache trade off security for no clear benefits (e.g., of scalability or precision). As another example, the most practical execution models borrow (again) from work on constant-time: They are detailed enough to capture practical attacks, but abstract across different hardware—and are thus useful for both verification and mitigation of software. Other models, which capture microarchitectural details like cache structures, make the analysis unnecessarily complicated: They do not fundamentally capture additional attacks, and they give up on portability.

In this chapter, we systematize the community's knowledge on Spectre foundations and identify the different design choices made by existing work and their tradeoffs. This complements existing, excellent surveys [\[34,](#page-178-1) [35,](#page-178-0) [162\]](#page-188-0) on the low-level details of Spectre attacks and defenses which do not consider foundations or, for example, high-level security policies. Throughout, we discuss the limitations of existing formal frameworks, the defense tools built on top of these foundations, and future directions for research.

*Contributions.* We systematize knowledge of software Spectre defenses and their associated formalizations, by studying the choices available to developers of Spectre analysis and mitigation tools. Specifically, we:

- $\triangleright$  Study existing foundations for Spectre analysis in the form of semantics, discuss the different design choices which can be made in a semantics, and describe the tradeoffs of each choice.
- ► Compare many proposed Spectre defenses—both with and without formal foundations using a unifying framework, which allows us to understand differences in the security guarantees they offer.
- I Identify open research problems, both for foundations and for Spectre software defenses in general.

 $\triangleright$  Provide recommendations both for developers and for the research community that could result in tools with stronger security guarantees.

*Scope of systematization.* In our systematization, we focus on software-only defenses against Spectre attacks. We focus on *Spectre* because most other transient attacks (e.g., Meltdown [\[93\]](#page-183-0), LVI [\[150\]](#page-187-0), MDS [\[71\]](#page-181-1), or Foreshadow [\[149\]](#page-187-1)) can efficiently be addressed in the hardware, through microcode updates or new hardware designs. (This is also the reason existing software-based tools against transient execution attacks focus solely on Spectre, as we discuss in Section [4.3.4.](#page-146-0)) We focus on *defenses* because prior work, notably Canella et al. [\[35\]](#page-178-0), already give an excellent overview of the types of Spectre vulnerabilities and the powerful capabilities they give attackers. And we focus on *software-only* defenses—although proposals for hardware defenses are extremely valuable, hardware design cycles (and hardware upgrade cycles) are very long. Moreover, software foundations are useful for understanding hardware and hardware-software co-designs (e.g., they directly affect execution and leakage models). Having secure software foundations allows us to defend against today's attacks on today's hardware, and tomorrow's as well.

# 4.1 Preliminaries

In this section, we first discuss Spectre attacks and how they violate security in two particular application domains: high-assurance cryptography and isolation of untrusted code. Then, we provide an introduction to formal semantics for security and its relevance to secure speculation in these application domains.

## <span id="page-116-0"></span>4.1.1 Breaking cryptography with Spectre

High-assurance cryptography has long relied on *constant-time programming* [\[15\]](#page-177-0) in order to create software which is secure from timing side-channel attacks. Constant-time programming

```
if (i < arrALen) { // mispredicted
 int x = arrA[i]; // x is oob value
 int y = arrB[x]; // leaked via address!
 // ...
```
Figure 4.1: Code snippet which an attacker can exploit using Spectre. If an attacker can control i and cause the processor to transiently enter the branch, the attacker can load an arbitrary value from memory into x, which is then leaked via the following memory access.

ensures that program execution does not depend on secrets. It does this via three rules of thumb [\[15,](#page-177-0) [17\]](#page-177-1): control flow (e.g., conditional branches) should not depend on secrets, memory access patterns (e.g., offsets into arrays) should not be influenced by secrets, and secrets should not be used as operands to variable-latency instructions (e.g., floating-point instructions or integer division on many processors). These rules ensure that secrets remain safe from an attacker powerful enough to perform cache attacks, exfiltrate data via branch predictor state, or snoop data via port contention [\[29\]](#page-178-2).

In the face of Spectre, constant-time programming is not sufficient. The snippet in [Figure 4.1](#page-117-0) is indeed constant-time if arrA contains only public data (and i and arrALen are also public). Yet, a Spectre attack can still abuse this code to leak secrets from anywhere in memory.

Cache-based leaks are not the only way for an attacker to learn cryptographic secrets: In the following example, an attacker can again (speculatively) leak out-of-bounds data, but this time the leak is via control flow.

```
if (i < arrALen) {
  int x = arrA[i];
  switch(x) { // leak via branching!
   case 'A': /* ... */
   case 'B': /* ... * /// ...
```
This code uses x as part of a branch condition (in a switch statement). Just as before, the attacker can speculatively read arbitrary memory into x. They can then leak the value of  $x$  in several ways, including: (1) based on the different execution times of the various cases; (2) through the data cache, based on differing (benign) memory accesses performed in the various cases; (3) through the instruction cache or micro-op cache [\[124\]](#page-185-0), based on which instructions were (speculatively) accessed; or (4) through port contention [\[29\]](#page-178-2), branch predictor state [\[76\]](#page-182-0), or other microarchitectural resources that differ among the branches.

#### <span id="page-118-0"></span>4.1.2 Breaking software isolation with Spectre

Spectre attacks also break important guarantees in the domain of *software isolation*. In this domain, a host application executes untrusted code and wants to ensure that the untrusted code cannot access any of the host's data. Common examples of software isolation include JavaScript or WebAssembly runtimes, or even the Linux kernel, through eBPF [\[56\]](#page-180-0). Spectre attacks can break the memory safety and isolation mechanisms commonly used in these settings [\[78,](#page-182-1) [98,](#page-183-1) [111,](#page-184-0) [137\]](#page-186-1).

We demonstrate with a small example:

```
int guest_func() {
 get host val(1);get_host_val(1);
 // ... repeat ...
  char c = get\_host\_val(99999);
  // ... leak c
}
char get_host_val(int idx) {
  if (idx < 100) { // check if within bounds
    return host_arr[idx];
```

```
} else {
    return 0;
} }
```
Here, an attacker-supplied guest function quest function calls the host function get host val to get values from an array. Although get\_host\_val() implements a bounds check, the attacker can still speculatively access out-of-bounds data by mistraining the branch predictor—breaking any isolation guarantees. Once the attacker (speculatively) obtains an out-of-bounds value of their choosing, they can leak the value (e.g., via data cache, etc.) and recover it after the speculative rollback. In this setting, we need to ensure that, *even speculatively*, untrusted code cannot break isolation.

## <span id="page-119-0"></span>4.1.3 Security properties and execution semantics

Formally, we will define safety from Spectre attacks as a security property of a *formal (operational) semantics*. The semantics abstractly captures how a processor executes a program as a series of state transitions. The states, which we will write as  $\sigma$ , include any information the developer will need to track for their analysis, such as the current instruction or command and the contents of memory and registers. The developer then defines an *execution model*—a set of transition rules that specify how state changes during execution. For example, in a semantics for a low-level assembly, a rule for a store instruction will update the resulting state's memory with a new value.

The rules in the execution model determine how and when speculative effects happen. For example, in a sequential semantics, a conditional branch will evaluate its condition then step to the appropriate branch. A semantics that models branch prediction will instead *predict* the condition result and step to the predicted branch. We adapt notation from Guarnieri et al. [\[65\]](#page-181-0), writing  $\llbracket \cdot \rrbracket^{\text{seq}}$  to represent the execution model for standard sequential execution. We notate other

execution models similarly; for example,  $[\![\cdot]\!]^{\text{pht}}$  models prediction for Spectre-PHT attacks—i.e., conditional branch prediction. Other execution models are listed in [Table 4.2.](#page-130-0)

Next, to precisely specify the attacker model, the developer must define which *leakage observations*—information produced during an execution step—are visible to an attacker. For example, we may decide that rules with memory accesses leak the addresses being accessed. The set of leakage observations in a semantics' rules is its *leakage model*. We again borrow notation from Guarnieri et al. [\[65\]](#page-181-0), which defines the leakage models  $[\![\,\cdot\,]\!]_{\rm ct}$  and  $[\![\,\cdot\,]\!]_{\rm arch}$ . The  $[\![\,\cdot\,]\!]_{\rm ct}$  model exposes leakage observations relevant to constant-time security: The sequence of control flow (the *execution trace*) and the sequence of addresses accessed in memory (the *memory trace*).[1](#page-120-0) The  $\llbracket \cdot \rrbracket_{\text{arch}}$  model, on the other hand, exposes all values loaded from memory in addition to the addresses themselves (or equivalently, it exposes the trace of register values) [\[65\]](#page-181-0). Under this model, an attacker is allowed to observe all architectural computation; for a value to remain unobserved, it cannot be accessed at all over the course of execution, adversarial or otherwise. Since the leakage observations in  $[\![\cdot]\!]_{\text{arch}}$  are a strict superset of those in  $[\![\cdot]\!]_{\text{ct}}$ , we say that  $[\![\cdot]\!]_{\text{arch}}$ is *stronger* than  $[\![\cdot]\!]_{\text{ct}}$  (i.e., it models a more powerful attacker). These properties make  $[\![\cdot]\!]_{\text{arch}}$ most useful for software isolation, as any out-of-bounds accesses will immediately show up in an  $\llbracket \cdot \rrbracket_{\text{arch}}$  leakage trace.

Surprisingly, the  $[\![\cdot]\!]_{\text{ct}}$  and  $[\![\cdot]\!]_{\text{arch}}$  models both generalize well to speculative execution for example, if we want to construct a semantics for Spectre-PHT attacks, we need only modify a sequential constant-time semantics to account for branch misprediction. Indeed, the execution model and leakage model of a semantics are orthogonal; we call the combination of the two the *contract* provided by the semantics—a sequential constant-time semantics has the contract  $[\![\cdot]\!]_{\text{ct}}^{\text{seq}}$ , while our hypothetical Spectre-PHT semantics would provide the contract  $[\![\cdot]\!]_{\text{ct}}^{\text{pht}}$ . Formally, the contract governs the attacker-visible information produced when executing a program: Given

<span id="page-120-0"></span><sup>&</sup>lt;sup>1</sup>Like Guarnieri et al. [\[65\]](#page-181-0), we omit variable-latency instructions from our formal model for simplicity.

a program *p*, a semantics with contract  $[\![\cdot]\!]_{\ell}^{\alpha}$ , and an initial state  $\sigma$ , we write  $[\![p]\!]_{\ell}^{\alpha}(\sigma)$  for the sequence (or *trace*) of leakage observations the semantics produces when executing *p*.

After determining a proper contract, the developer must finally define the *policy* that their security property enforces: Precisely which data can and cannot be leaked to the attacker. Formally, a policy  $\pi$  is defined in terms of an equivalence relation  $\simeq_{\pi}$  over states, where  $\sigma_1 \simeq_{\pi} \sigma_2$ iff  $\sigma_1$  and  $\sigma_2$  agree on all values that are public (but may differ on sensitive values).

Armed with these definitions, we can state security as a *non-interference property*: A program satisfies *non-interference* if, for any two π-equivalent initial states for a program *p*, an attacker cannot distinguish the two resulting leakage traces when executing *p*. A developer has several choices when crafting a suitable semantics and security policy; these choices greatly influence how easy or difficult it is to detect or mitigate Spectre vulnerabilities. We cover these choices in detail in [Section 4.2:](#page-121-0) [Sections 4.2.1](#page-122-0) and [4.2.2](#page-128-0) discuss choices in leakage models  $\llbracket \cdot \rrbracket_\ell$ and security policies  $\pi$ . [Sections 4.2.3](#page-132-0) and [4.2.4](#page-135-0) discuss tradeoffs for different execution models  $\lbrack \cdot \rbrack^{\alpha}$  and the transition rules in a semantics. In [Section 4.2.5,](#page-137-0) we discuss how the input language of the semantics affects analysis; and finally, in [Section 4.2.6,](#page-140-0) we discuss which microarchitectural features to include in formal models.

## <span id="page-121-0"></span>4.2 Choices in semantics

The foundation of a well-designed Spectre analysis tool is a carefully constructed formal semantics. Developers face a wide variety of choices when designing their semantics—choices which heavily depend on the attacker model (and thus the intended application area) as well as specifics about the tool they want to develop. Cryptographic code requires different security properties, and therefore different semantics and tools, than in-process isolation. Many of these choices also look different for *detection* tools, focused only on finding Spectre vulnerabilities, vs. *mitigation* tools, which transform programs to be secure. In this section, we describe the important choices about semantics that developers face, and explain those choices' consequences for Spectre analysis tools and for their associated security guarantees. We also point out a number of open problems to guide future work in this area.

*What makes a practical semantics?* A practical semantics should make an appropriate tradeoff between *detail* and *abstraction*: It should be detailed enough to capture the microarchitectural behaviors which we're interested in, but it should also be abstract enough that it applies to all (reasonable) hardware. For example, we do not want the security of our code to be dependent on a specific cache replacement policy or branch predictor implementation.

In this respect, formalisms for constant-time have been successful in the non-speculative world: The principles of constant-time programming—no secrets for branches, no secrets for addresses—create secure code without introducing processor-specific abstractions. Speculative semantics should follow this trend, producing portable tools which can defend against powerful attackers on today's (and tomorrow's) microarchitectures.

#### <span id="page-122-0"></span>4.2.1 Leakage models

Any semantics intended to model side-channel attacks needs to precisely define its attacker model. An important part of the attacker model for a semantics is the *leakage model*—that is, what information does the attacker get to observe? Leakage models intended to support sound mitigation schemes should be *strong*—modeling a powerful attacker—and *hardware-agnostic*, so that security guarantees are portable. That said, the best choice for a leakage model depends in large part on the intended application domain.

*Leakage models for cryptography.* As we saw in [Section 4.1.1,](#page-116-0) high-assurance cryptography implementations have long relied on the constant-time programming model; thus, semantics intended for cryptographic programs naturally choose the  $\llbracket \cdot \rrbracket_{\text{ct}}$  leakage model. Like the constant-time programming model in the non-speculative world, the  $\llbracket \cdot \rrbracket_{\text{ct}}$  leakage model is strong and hardware-agnostic, making it a solid foundation for security guarantees. The

<span id="page-123-0"></span>



 $OOO-Models out-of-order execution? (Section 4.2.6)$ <br>Win. – Can reason about speculation windows? (Section 4.2.3) Win. – Can reason about speculation windows? [\(Section](#page-132-0) 4.2.3) Nondet. – How is nondeterminism handled? [\(Section](#page-135-0) 4.2.4) OOO – Models out-of-order execution? [\(Section](#page-140-0) 4.2.6)



 $\llbracket \cdot \rrbracket_{\text{ct}}$  leakage model is a popular choice among existing formalizations: As we highlight in [Table 4.1,](#page-123-0) over half of the formal semantics for Spectre use the  $\lbrack \cdot \rbrack_{ct}$  leakage model (or an equivalent) [\[16,](#page-177-2) [36,](#page-179-0) [47,](#page-180-1) [63,](#page-181-2) [64,](#page-181-3) [116,](#page-185-1) [151\]](#page-188-1). Guarnieri et al. [\[65\]](#page-181-0) leave the leakage model abstract, allowing the semantics to be used with several different leakage models, including  $\lbrack \cdot \rbrack_{\text{ct}}$ .

*Leakage models for isolation*. Sections [4.1.2](#page-118-0) and [4.1.3](#page-119-0) describe the  $\llbracket \cdot \rrbracket_{\text{arch}}$  leakage model, which is a better fit for modeling speculative isolation, e.g., for a WebAssembly runtime executing untrusted code [\[111\]](#page-184-0) or a kernel defending against memory region probing [\[61\]](#page-181-5). Under  $\lbrack\lbrack \cdot \rbrack\rbrack_{\text{arch}}$ , *all* values in the program are observable—this is what lets it easily model properties for software isolation: If we define a policy  $\pi$  where all values and memory regions outside the isolation boundary are secret, then software isolation security (or speculative memory safety) is simply non-interference with respect to  $\llbracket \cdot \rrbracket_{\text{arch}}$  and this  $\pi$ .

The  $\llbracket \cdot \rrbracket_{\text{arch}}$  leakage model appears less-frequently than  $\llbracket \cdot \rrbracket_{\text{ct}}$  in formal models: Only two of the semantics in [Table 4.1](#page-123-0) ( $[41, 65]$  $[41, 65]$  $[41, 65]$ ) use the  $\lbrack \cdot \rbrack$ <sub>arch</sub> leakage model. On the other hand, Spectre sandbox isolation frameworks such as Swivel [\[111\]](#page-184-0), Venkman [\[137\]](#page-186-1), and ELFbac [\[78\]](#page-182-1) implicitly use the  $\lbrack\!\lbrack \cdot \rbrack\!\rbrack_{\rm arch}$  model, as do SpecFuzz [\[113\]](#page-185-2), ASTCVW [\[83\]](#page-182-2), SpecTaint [\[121\]](#page-185-3), and certain modes of oo7 [\[155\]](#page-188-4). The three isolation frameworks all explicitly prevent memory reads or writes to any locations outside of isolation boundary—i.e., enforcing non-interference under  $\llbracket \cdot \rrbracket_{\text{arch}}$ . The four detection tools, SpecFuzz, ASTCVW, SpecTaint, and oo7 (in "weak" or "v1.1" mode), more generally look for gadgets that can speculatively access *arbitrary* or attacker-controlled memory locations—i.e., breaking speculative memory safety. Unfortunately, these tools are not formalized, so their leakage models are not explicit (nor clear).

*Weaker leakage models.* The remaining semantics and tools in [Table 4.1](#page-123-0) consider only the memory trace of a program, but not its execution trace. The  $\lbrack \cdot \rbrack_{\text{mem}}$  leakage model, like  $[\![\cdot]\!]_{\text{ct}}$ , allows an attacker to observe the sequence of memory accesses during the execution of the program. The  $\lbrack \cdot \rbrack_{\text{cache}}$  leakage model instead tracks (an abstraction of) cache state. The attacker in this model can only observe cached addresses at the granularity of cache lines. A few

tools have leakage models even weaker than these—for instance, oo7 only emits leakages that it considers can be influenced by malicious input (see [Section 4.2.3\)](#page-132-0), and KLEESpectre (with cache modeling enabled) only allows the attacker to observe the final state of the cache once the program terminates.

All of these models, including  $[\![\cdot]\!]_{\text{mem}}$  and  $[\![\cdot]\!]_{\text{cache}}$ , are weaker than  $[\![\cdot]\!]_{\text{ct}}$ —they model less powerful attackers who cannot observe control flow. As a result, they miss attacks which leak via the instruction cache or which otherwise exploit timing differences in the execution of the program. They even miss some attacks that exploit the data cache: If a sensitive value influences a branch, an attacker could infer the sensitive value through the data cache based on differing (benign) memory access patterns on the two sides of the branch, even if no sensitive value influences a memory address. For instance, in the following code, even though cond does not directly influence a memory address, an attacker could infer the value of cond based on whether  $\arctan[a]$  is cached or not:

if (cond)  
\n
$$
b = arr[a];
$$
  
\nelse  
\n
$$
b = 0;
$$

Because the  $[\![\cdot]\!]_{\text{mem}}$  and  $[\![\cdot]\!]_{\text{cache}}$  leakage models miss these attacks, they cannot provide the strong guarantees necessary for secure cryptography or software isolation. Tools which want to provide sound verification or mitigation should choose a strong leakage model appropriate for their application domain, such as  $[\![\cdot]\!]_{\text{ct}}$  or  $[\![\cdot]\!]_{\text{arch}}$ .

That said, weaker leakage models are still useful in certain settings: Tools which are interested in only a certain vulnerability class can use these weaker models to reduce the number of false positives in their analysis or reduce the complexity of their mitigation. Even though these models may miss some Spectre attacks—even some data cache leakage, as discussed above some detection tools still use the  $[\![\cdot]\!]_{\text{cache}}$  or  $[\![\cdot]\!]_{\text{mem}}$  models to find Spectre vulnerabilities in real

codebases. Using a leakage model which ignores control flow leakage may help the detection tool scale to larger codebases.

Some tools [\[66,](#page-181-4)[154\]](#page-188-3) also provide the ability to reason about what attacks are possible with particular cache configurations—e.g., with a particular associativity, cache size, or line size. This is a valuable capability for a detection tool: It helps an attacker zero in on vulnerabilities which are more easily exploitable on a particular target machine. However, security guarantees based on this kind of analysis are not portable, as executing a program on a different machine with a different cache model invalidates the security analysis. Tools that instead want to make guarantees for all possible architectures, such as verifiers or compilers, will need more conservative leakage models—models that assume the entire memory trace (and execution trace) is always leaked.

*Open problems: Leakage models for weak-memory-style semantics.* We have described leakage models only in terms of observations of execution traces; this is a natural way to define leakage for *operational semantics*, where execution is modeled simply as a set of program traces. However, the weak-memory-style speculative semantics proposed by Colvin and Winter [\[44\]](#page-179-2) and Disselkoen et al. [\[51\]](#page-180-2) have a more structured view of program execution, for instance, using pomsets [\[60\]](#page-181-6). Both of these semantics define leakages in a way equivalent to the  $[\![\cdot]\!]_{\text{mem}}$  leakage model; it remains an open problem to explore how to define  $[\![\cdot]\!]_{\text{ct}}$  or  $[\![\cdot]\!]_{\text{arch}}$  leakage in this more structured execution model—in particular, what it means for such a semantics to allow an attacker to observe control-flow leakage.

*Open problems: Leakage models for language-based isolation.* As with most work on Spectre foundations, we focus on cryptography and software-based isolation. Spectre, though, can be used to break most other software abstractions as well—from module systems [\[67\]](#page-181-7) and object capabilities [\[96\]](#page-183-2) to language-based isolation techniques like information flow control [\[129\]](#page-186-3). How do we adopt these abstractions in the presence of speculative execution? What formal security property should we prove? And what leakage model should be used?

#### <span id="page-128-0"></span>4.2.2 Non-interference and policies

After the leakage model, we must determine what *secrecy policy* we consider for our attacker model—i.e., which values can and cannot be leaked. Domains such as cryptography and isolation already have defined policies for sequential security properties. For cryptography, memory that contains secret data (e.g., encryption keys) is considered sensitive. Isolation simply declares that all memory outside the program's assigned sandbox region should not be leaked.

The straightforward extension of sequential non-interference to speculative execution is to simply enforce the same leakage model (e.g.,  $[\![\cdot]\!]_{ct}$ ) with the same security policy—no secrets should be leaked whether in normal or speculative execution. We refer to this straightforward extension as a *direct* non-interference property, or direct NI.

Definition 4.2.1 (Direct non-interference). *Program p satisfies* direct non-interference *with respect to a given contract*  $[\![\cdot]\!]$  *and policy*  $\pi$  *if, for all pairs of*  $\pi$ -equivalent initial states  $\sigma$  and  $\sigma'$ , *executing p* with each initial state produces the same trace. That is,  $p \vdash NI(\pi, \llbracket \cdot \rrbracket)$  is defined as

$$
\forall \sigma, \sigma': \sigma \simeq_{\pi} \sigma' \Rightarrow \llbracket p \rrbracket(\sigma) = \llbracket p \rrbracket(\sigma').
$$

We elide writing  $\pi$  for brevity—e.g.,  $NI([\![\cdot]\!]_{ct}^{pht})$  expresses constant-time security under Spectre-PHT semantics.

Alternatively, we may instead want to assert that the speculative trace of a program has no *new* sensitive leaks as compared to its sequential trace. This is a useful property for compilers and mitigation tools that may not know the secrecy policy of an input program, but want to ensure the resulting program does not leak any additional information. We term this a *relative* non-interference property, or relative NI; a program that satisfies relative NI is no less secure than its sequential execution.

Definition 4.2.2 (Relative non-interference). *Program p satisfies* relative non-interference *from contract*  $[\![\cdot]\!]_a^{seq}$  *to*  $[\![\cdot]\!]_b^{\beta}$  $\frac{\beta}{b}$  and with policy  $\pi$  if: For all pairs of  $\pi$ -equivalent initial states  $\sigma$  and  $\sigma'$ , *if executing p under*  $[\![\cdot]\!]_a^{seq}$  produces equal traces, then executing p under  $[\![\cdot]\!]_b^{\beta}$ *b produces equal traces. That is,*  $p \vdash NI(\pi, [\![\cdot]\!]^{seq}_a \Rightarrow [\![\cdot]\!]^{β}_b$ *b* ) *is defined as*

$$
\forall \sigma, \sigma' : \sigma \simeq_{\pi} \sigma' \wedge [p]_{a}^{seq}(\sigma) = [p]_{a}^{seq}(\sigma')
$$

$$
\Longrightarrow [p]_{b}^{\beta}(\sigma) = [p]_{b}^{\beta}(\sigma').
$$

For non-terminating programs, we can compare finite prefixes of  $[\![p]\!]$ <sup> $\beta$ </sup> against their sequential projections to  $[p]$ <sup>seq</sup>—since speculative execution must preserve sequential semantics, there will always be a valid sequential projection. As before, we may elide  $\pi$  for brevity.

Interestingly, any relative non-interference property  $NI(\pi, [\![\cdot]\!]_a^{seq} \Rightarrow [\![\cdot]\!]_b^{\beta}$  $\binom{p}{b}$  for a program *p* can be expressed equivalently as a direct property  $NI(\pi', \llbracket \cdot \rrbracket_b^{\beta})$  $\beta_{\text{b}}$ ), where  $\pi' = \pi \setminus \text{canLeak}(p, [\![\cdot]\!]^\text{seq}_a)$ . That is, we treat anything that could possibly leak under contract  $[\![\cdot]\!]_a^{seq}$  as public. Relative NI is thus a weaker property than direct NI, as it implicitly declassifies anything that might leak during sequential execution.

However, relative NI is a stronger property than a conventional implication. For example, the property  $NI([\![\cdot]\!]_{ct}^{seq}) \Rightarrow NI([\![\cdot]\!]_{ct}^{pht})$  makes no guarantees at all about a program that is not sequentially constant-time. Conversely, the relative NI property  $NI([\![\cdot]\!]_{ct}^{seq} \Rightarrow [\![\cdot]\!]_{ct}^{pht})$  guarantees that even if a program is not sequentially constant-time, the sensitive information an attacker can learn during the program's speculative execution is limited to what it already might leak sequentially.

In [Table 4.2,](#page-130-0) we classify speculative security properties of different works by which direct or relative NI properties they verify or enforce. We find that tools focused on verifying cryptography or memory isolation verify direct NI properties, whereas frameworks concerned with compilation or inserting Spectre mitigations for general programs tend towards relative NI.

*Verifying programs.* Direct NI unconditionally guarantees that sensitive data is not leaked, whether executing sequentially or speculatively. This makes it ideal for domains that already

<span id="page-130-0"></span>statements (on following page; legend appears here). We write  $\approx$ NI( $\cdots$ ) for unsound approxidespite differences in execution mode. <sup>3</sup>The analysis tool of [36], Pitchfork, only verifies the weaker property NI([[-I]ch<sup>t-st]</sup>). <sup>4</sup>The definitions of SNI and wSNI are parameterized over the target leakage Table 4.2: Speculative security properties in prior works and their equivalent non-interference statements (on following page; legend appears here). We write ≈*NI*(···) for unsound approximations of non-interference properties. <sup>1</sup>Tracks taint of attacker influence rather than value sensitivity. mations of non-interference properties. 1Tracks taint of *attacker influence* rather than value sensitivity.  ${}^{2}$ These works all derive their property from the definition given in [36] and share the same property name Table 4.2: Speculative security properties in prior works and their equivalent non-interference  ${}^{2}$ These works all derive their property from the definition given in [\[36\]](#page-179-0) and share the same property name despite differences in execution mode. <sup>3</sup>The analysis tool of [\[36\]](#page-179-0), Pitchfork, only verifies the weaker property *NI*([[. . ]<sup>pht-stl</sup>). <sup>4</sup>The definitions of SNI and wSNI are parameterized over the target leakage model. <sup>5</sup>The definition of wSNI in [65] does not require that the initial states be  $\pi$ -equivalent. model. <sup>5</sup>The definition of wSNI in [\[65\]](#page-181-0) does not require that the initial states be  $\pi$ -equivalent.



Precision of the defined security property Execution models [\(Section](#page-132-0) 4.2.3) Precision of the defined security property Execution models (Section 4.2.3)

hyper Non-interference hyperproperty, requires two  $\pi$ -equivalent executions taint Sound approximation using taint tracking, requires only one execution · Kseq Sequential execution hyper Non-interference hyperproperty, requires two π-equivalent executions

 $\llbracket \cdot \rrbracket^{\text{ph}}$  Captures Spectre-PHT taint Sound approximation using taint tracking, requires only one execution Captures Spectre-PHT Sequential execution

Captures Spectre-PHT/-BTB/-RSB/-STL · Kpbrs Captures Spectre-PHT/-BTB/-RSB/-STL Captures Spectre-PHT/-STL · Kpht-stl Captures Spectre-PHT/-STL

 $\overline{\phantom{a}}$   $\overline{\phantom{a}}$   $\overline{\phantom{a}}$ 

have clear policies about what data is sensitive, such as cryptography (e.g., secret keys) or software isolation (e.g., memory outside the sandbox). Indeed, tools that target cryptographic applications ( [\[16,](#page-177-2)[36,](#page-179-0)[47,](#page-180-1)[151\]](#page-188-1)) all verify that programs satisfy the direct *speculative constant-time* (SCT) property.

Additionally, we find that current tools that verify relative NI [\[41,](#page-179-1) [64\]](#page-181-3) are indeed capable of verifying direct NI, but intentionally add constraints to their respective checkers to "remove" sequential leaks from their speculative traces. Although this is just as precise, it is an open problem whether tools can verify relative NI for programs without relying on a direct NI analysis.

*Verifying compilers.* On the other hand, compilers and mitigation tools are better suited to verify or enforce relative NI properties: The compiler guarantees that its output program contains *no new* leakages as compared to its input program. This way, developers can reason about their programs assuming a sequential model, and the compiler will mitigate any speculative effects. For instance, if a program *p* is already *sequentially* constant-time  $NI([\![\cdot]\!]_{ct}^{seq})$ , then a compiler that enforces  $NI([\![\cdot]\!]_{ct}^{seq} \Rightarrow [\![\cdot]\!]_{ct}^{pth}$ ) will compile *p* to a program that is *speculatively* constant-time  $NI([\![\cdot]\!]_{\text{ct}}^{\text{pht}})$ . Similarly, if a program is properly sandboxed under sequential execution  $NI([\![\cdot]\!]_{\text{arch}}^{\text{seq}})$ , and is compiled with a compiler that introduces no new *arch* leakage, the resulting program will remain sandboxed even speculatively. Indeed, these propositions are proven by Guarnieri et al. [\[65\]](#page-181-0).

Similarly, Patrignani and Guarnieri [\[116\]](#page-185-1) explore whether compilers *preserve robust* non-interference properties. A security property is *robust* if a program remains secure even when linked against adversarial code (i.e., if the program is called with arbitrary or adversarial inputs)—indeed, most other security properties listed in [Table 4.2](#page-130-0) are implicitly robust. A compiler *preserves* a non-interference property if, after compilation from a source to a target language, the property still holds. In Patrignani and Guarnieri's framework, the source language describes sequential execution while the target language has speculative semantics, making their notion of compiler preservation very similar to enforcing relative NI.

#### <span id="page-132-0"></span>4.2.3 Execution models

To reason about Spectre attacks, a semantics must be able to reason about the leakage of sensitive data in a speculative *execution model*. A speculative execution model is what differentiates a speculative semantics from standard sequential analysis, and determines what speculation the abstract processor can perform. For developers, choosing a proper execution model is a tradeoff: On the one hand, the choice of behaviors their model allows—i.e., which microarchitectural predictors they include—determines which Spectre variants their tools can capture. On the other hand, considering additional kinds of mispredictions inevitably makes their analysis more complex.

*Spectre variants and predictors.* Most semantics and tools in [Table 4.1](#page-123-0) only consider the conditional branch predictor, and thus only Spectre-PHT attacks. (Mis)predictions from the conditional branch predictor are constrained—there are only two possible choices for every decision—so the analysis remains fairly tractable. Jasmin [\[16\]](#page-177-2), Binsec/Haunted [\[47\]](#page-180-1), and Pitchfork [\[36\]](#page-179-0) all additionally model *store-to-load* (STL) predictions, where a processor forwards data to a memory load from a prior store to the same address. If there are multiple pending stores to that address, the processor may choose the wrong store to forward the data—this is the root of a Spectre-STL attack. STL predictions are less constrained than predictions from the conditional branch predictor: In the absence of additional constraints, they allow for a load to draw data from any prior store to the same address.

Other control-flow mechanisms are significantly more complex: Return instructions and indirect jumps can be *speculatively hijacked* to send execution to arbitrary (attacker-controlled) points in the program.[2](#page-132-1) An attacker can trivially hijack a victim program if they can control (mis)prediction of the RSB (for returns) [\[87\]](#page-183-3) or BTB (for indirect jumps) [\[86\]](#page-182-3). Even without this ability, an attacker can hijack control-flow if they speculatively overwrite the target address of a return or jump (e.g., by exploiting a prior PHT misprediction) [\[82,](#page-182-4) [99,](#page-183-4) [142\]](#page-187-2). Formally, these

<span id="page-132-1"></span><sup>2</sup> Including, on x86-family processors, into the *middle* of an instruction [\[28\]](#page-178-3).

attacks still fit within our non-interference framework—if a program can be arbitrarily hijacked, then it will be unable to satisfy any non-interference property. However, to formally verify that this is the case, our semantics needs to be able to model these behaviors in some fashion.

Although capturing all speculative behaviors in a semantics is possible, the resulting analysis is neither practical nor useful; in practice, developers need to make tradeoffs. For example, the semantics proposed by Cauligi et al. [\[36\]](#page-179-0) can simulate all of the aforementioned speculative attacks, but their analysis tool Pitchfork only detects PHT- and STL-based vulnerabilities. On the other hand, tools like oo7 (with the "v1.1" pattern) [\[155\]](#page-188-4) and SpecTaint [\[121\]](#page-185-3) conservatively assume that writes to transient addresses can overwrite *anything*, and thus immediately flag this behavior as vulnerable.

The InSpectre semantics [\[63\]](#page-181-2) proceeds in the opposite direction—it allows the processor to (mis)predict arbitrary values, even the values of constants. InSpectre also allows more outof-order behavior than most other semantics (see [Section 4.2.6\)](#page-140-0)—in particular, it allows the processor to commit writes to memory out-of-order. As a result, InSpectre is very expressive: It is capable of describing a wide variety of Spectre variants both known and unrealized. But, as a result, InSpectre cannot feasibly be used to verify programs; instead, the authors pose InSpectre as a framework for reasoning about and analyzing microarchitectural features themselves.

*Speculation windows.* As shown in [Table 4.1,](#page-123-0) several semantics and tools limit speculative execution by way of a *speculation window*. This models how hardware has finite resources for speculation, and can only speculate through a certain number of instructions or branches at a time.

Explicitly modeling a speculation window serves two purposes for detection tools. One, it reduces false positives: a mispredicted branch will not lead to a speculative leak thousands of instructions later. And two, it bounds the complexity of the semantics and thus the analysis. Since the abstract processor can only speculate up to a certain depth, an analysis tool need only consider the latest window of instructions under speculative execution. Some semantics refine

118

this idea even further: Binsec/Haunted [\[47\]](#page-180-1), for example, uses different speculation windows for load-store forwarding than it uses for branch speculation.

Speculation windows are also valuable for mitigation tools: although tools like Blade [\[151\]](#page-188-1) and Jasmin [\[16\]](#page-177-2) are able to prove security without reasoning about speculation windows, modeling a speculation window would reduce the number of fences (or other mitigations) these tools need to insert, improving the performance of the compiled code.

*Eliminating variants.* Instead of modeling all speculative behaviors, compilers and mitigation tools can use clever tricks to sidestep particularly problematic Spectre variants. For example, even though Jasmin [\[16\]](#page-177-2) does not model the RSB, Jasmin programs do not suffer from Spectre-RSB attacks: The Jasmin compiler inlines all functions, so there are no returns to mispredict. Mitigation tools can also disable certain classes of speculation with hardware flags [\[70\]](#page-181-8). After eliminating complex or otherwise troublesome speculative behavior, a tool only needs to consider those that remain.

*Cross-address-space attacks.* Previous systematizations of Spectre attacks [\[35\]](#page-178-0) differentiate between *same-address-space* and *cross-address-space* attacks. Same-address-space attacks are generally simpler to perform, as they rely on repeatedly executing the victim code itself in order to train a microarchitectural predictor. Cross-address-space attacks are more powerful, as they allow an attacker to perform the training step on a branch within the attacker's own code.

Most of the semantics and tools in [Table 4.1](#page-123-0) make no distinction between same-addressspace and cross-address-space attacks, as they ignore the mechanics of training and consider all predictions to be potentially malicious. A notable exception is oo7 [\[155\]](#page-188-4), which explicitly tracks *attacker influence*. Specifically, oo7 only considers mispredictions for conditional branches which can be influenced by attacker input. Thus, oo7 effectively models only same-address-space attacks. Unfortunately, as a result, oo7 misses Spectre vulnerabilities in real code, as demonstrated by Wang et al. [\[154\]](#page-188-3).

#### <span id="page-135-0"></span>4.2.4 Nondeterminism

Speculative execution is inherently *nondeterministic*: Any given branch in a program may proceed either correctly or incorrectly, regardless of the actual condition value. More generally, speculative hijack attacks can send execution to entirely indeterminate locations. The semantics in [Table 4.1](#page-123-0) all allow these nondeterministic choices to be actively adversarial—for instance, given by attacker-specified directives [\[36,](#page-179-0)[151\]](#page-188-1), or, equivalently, by consulting an abstract oracle [\[41,](#page-179-1) [64,](#page-181-3) [65,](#page-181-0) [100\]](#page-184-1). These semantics all (conservatively) assume that the attacker has full control of microarchitectural prediction and scheduling; we explore the different techniques they use to verify or enforce security in the face of adversarial nondeterminism.

*Exploring nondeterminism.* Several Spectre analysis tools are built on some form of abstract execution: They simulate speculative execution of the program by tracking ranges or properties of different values. By checking these properties throughout the program, they determine if sensitive data can be leaked. Standard tools for (non-speculative) abstract execution are designed only to consider concrete execution paths; they must be adapted to handle the many possible nondeterministic execution paths from speculation. SpecuSym [\[66\]](#page-181-4), KLEESpectre [\[154\]](#page-188-3), and AISE [\[161\]](#page-188-2) handle this nondeterminism by following an *always-mispredict* strategy. When they encounter a conditional branch, they first explore the execution path which mispredicts this branch, up to a given speculation depth. Then, when they exhaust this path, they return to the correct branch. This technique of course only handles the conditional branch predictor; i.e., Spectre-PHT attacks. Pitchfork [\[36\]](#page-179-0) and Binsec/Haunted [\[47\]](#page-180-1) adapt the always-mispredict strategy to additionally account for out-of-order execution and Spectre-STL. Although it may not be immediately clear that these always-mispredict strategies are sufficient to prove security, especially when the attacker can make any number of antagonistic prediction choices, these strategies do indeed form a sound analysis [\[36,](#page-179-0) [47,](#page-180-1) [64\]](#page-181-3).

Unfortunately, simulating execution only works for semantics where the nondeterminism is relatively constrained: Conditional branches are a simple boolean choice, and store-to-load predictions are limited to prior memory operations within the speculation window. If we pursue other Spectre variants, we will quickly become overwhelmed—again, an unconstrained hijack gadget can be exploited to land almost anywhere in a program. The always-mispredict strategy here is nonsensical at best; abstract execution is thus necessarily limited in what it can soundly explore.

*Abstracting out nondeterminism.* Mitigation tools have more flexibility dealing with nondeterminism: Tools like Blade [\[151\]](#page-188-1) and  $\infty$  [\[155\]](#page-188-4) apply dataflow analysis to determine which values may be leaked along *any* path, instead of reasoning about each path individually. Then, these tools insert speculation barriers to preemptively block potential leaks of sensitive data. This style of analysis comes at the cost of some precision: Blade, for example, conservatively treats *all* memory accesses as if they may speculatively load sensitive values, as its analysis cannot reason about the contents of memory. Similarly, oo7's "v1.1" pattern detection conservatively flags all (attacker-controlled) transient *stores*, as they may lead to speculative hijack. However, Blade and oo7—and mitigation tools in general—can afford to be less precise than verification or detection tools; these, conversely, must maintain higher precision to avoid floods of false positives.

*Restricting nondeterminism.* Compilers such as Swivel [\[111\]](#page-184-0), Venkman [\[137\]](#page-186-1), and ELFbac [\[78\]](#page-182-1) restructure programs entirely, imposing their own restricted set of speculative behavior at the software layer. ELFbac allocates sensitive data in separate memory regions and uses page permission bits to disallow untrusted code from accessing these regions of memory regardless of how a program may misspeculate, it will not be able to read (and thus leak) sensitive data. Swivel and Venkman compile code into carefully aligned blocks so that control flow always land at the tops of protected code blocks, even speculatively; Swivel accomplishes this by clearing the BTB state after untrusted execution, while Venkman proposes to recompile all programs on the system to mask addresses before jumping. Both systems also enforce speculative control-flow integrity checks to prevent speculative hijacking, whether by relying on hardware features [\[74\]](#page-182-5) or

by implementing custom CFI checks with branchless assembly instructions. Developers that use these compilers can then reason about their programs much more simply, as the set of speculative behaviors is restricted enough to make the analysis tractable. Of the techniques discussed in this section, this line of work seems the most promising: It produces mitigation tools with strong security guarantees, without relying on an abundance of speculation barriers (as often results from dataflow analysis) or resorting to heavyweight simulation (e.g., symbolic execution).

*Open problems: Rigorous performance comparison.* To the best of our knowledge, no work has rigorously compared the performance of all of the tools in [Table 4.1.](#page-123-0) Perhaps the most complete comparison is by Daniel et al. [\[47\]](#page-180-1), who compare the detection tools KLEESpectre, Pitchfork, and Binsec/Haunted in terms of the analysis time required to detect known violations in a few chosen targets. A general and objective performance comparison is difficult, if not impossible: The tools in [Table 4.1](#page-123-0) operate on different types of programs (general-purpose, cryptographic, sandboxing) and different languages (x86, LLVM, WebAssembly). They also provide different security guarantees, as we discuss above. An intermediate step towards an expanded performance comparison, which would be a valuable contribution on its own, would be to develop a larger corpus of known attacks on realistic (medium-to-large-size) programs. This would help us evaluate both the security and performance of existing or newly-proposed tools.

#### <span id="page-137-0"></span>4.2.5 Higher-level abstractions

Spectre attacks—and speculative execution—fundamentally break our intuitive assumptions about how programs should execute. Higher-level guarantees about programs no longer apply: Type systems or module systems are meaningless when even basic control flow can go awry. In order to rebuild higher-level security guarantees, we first need to repair our model of how programs execute, starting from low-level semantics. Once these foundations are firmly in place, only then can we rebuild higher-level abstractions.

*Semantics for assembly or IRs.* The majority of formal semantics in [Table 4.1](#page-123-0) operate on abstract assembly-like languages, with commands that map to simple architectural instructions. Semantics at this level implement control flow directly in terms of jumps to *program points* usually indices into memory or an array of program instructions—and treat memory as largely unstructured. Since these low-level semantics closely correspond to the behavior of real hardware, they capture speculative behaviors in a straightforward manner, and provide a foundational model for higher-level reasoning. Similarly, many concrete analysis tools for constant-time or Spectre operate directly on binaries or compiler intermediate representations (IRs) [\[36,](#page-179-0) [47,](#page-180-1) [48,](#page-180-3) [64,](#page-181-3) [154\]](#page-188-3). These tools operate at this lowest level so that their analysis will be valid for the program unaltered—compiler optimizations for higher-level languages can end up transforming programs in insecure ways [\[17,](#page-177-1) [47,](#page-180-1) [48\]](#page-180-3). As a result however, these tools necessarily lose access to higherlevel information such as control flow structure or how variables are mapped in memory.

*Semantics for structured languages.* The semantics proposed by Jasmin [\[16\]](#page-177-2), Patrignani and Guarnieri [\[116\]](#page-185-1), and Blade [\[151\]](#page-188-1) build on top of these lower-level ideas to describe what we term "medium-level" languages—those with structured control flow and memory, e.g., explicit loops and arrays. For these medium-level semantics, it is less straightforward to express speculative behavior: For instance, instead of modeling speculation directly, Vassena et al. [\[151\]](#page-188-1) first translate programs in their source language to lower-level commands, then apply speculative execution at that lower level.

In exchange, the structure in a medium-level semantics lends itself well to program analysis. For example, Vassena et al. are able to use a simple type system to prove security properties about a program. Barthe et al. [\[16\]](#page-177-2) also take advantage of structured semantics: They prove that if a sequentially constant-time program is *speculatively (memory) safe*—i.e., all memory operations are in-bounds array accesses—then the program is also speculatively constanttime. Since their source semantics can only access memory through array operations, they can statically verify whether a program is speculatively safe (and thus speculatively secure). An

interesting question for future work is whether their concept of speculative (memory) safety can combine with other sequential security properties to give corresponding speculative guarantees, such as for sandboxing, information flow, or rich type systems.

*Weak-memory-style semantics.* Colvin and Winter [\[44\]](#page-179-2) and Disselkoen et al. [\[51\]](#page-180-2) both present a further abstracted semantics in the style of weak memory models. These semantics represent a fundamentally different approach: Rather than creating operational models of speculative hardware, these authors lift the concept of speculative execution directly to a higher level and reason about it there.

These works provide interesting insights about the relation between Spectre attacks and the weak memory models which characterize modern hardware. They also open the door to adapting techniques from that community to defend against Spectre attacks in software. However, as these models are abstracted away from microarchitectural details, they are only suited for analyzing particular Spectre variants—both [\[44,](#page-179-2) [51\]](#page-180-2) focus only on Spectre-PHT—and are difficult to adapt to other attacks. In addition, it remains an open problem to translate a semantics of this style into a concrete analysis tool: Neither of these works present a tool which can automatically perform a security analysis of a target program.<sup>[3](#page-139-0)</sup> That said, this high-level approach to speculative semantics is certainly underexplored compared to the larger body of work on operational semantics, and is worthy of further investigation.

*Compiler mitigations.* With adequate foundations in place, one avenue to regaining higher-level abstractions is to modify compilers of higher-level languages to produce speculatively secure low-level programs. Many compilers already include options to conservatively insert speculation barriers or hardening into programs, which (when done properly) provides strong security guarantees. Although some such hardening passes have been verified [\[116\]](#page-185-1), they are overly conservative and incur a significant performance cost. Other compiler mitigations been

<span id="page-139-0"></span><sup>&</sup>lt;sup>3</sup>Colvin and Winter do present a tool, but it is only used to mechanically explore manually translated programs.

shown unsound [\[113\]](#page-185-2)—or worse, even introduce new Spectre vulnerabilities [\[47\]](#page-180-1)—further reinforcing that these techniques must be grounded in a formal semantics.

*Open problems: Formalization of new compilation techniques.* Swivel [\[111\]](#page-184-0), Venkman [\[137\]](#page-186-1), and ELFbac [\[78\]](#page-182-1) show how the structure of code itself can provide security guarantees at a reduced performance cost. For instance, Venkman [\[137\]](#page-186-1) and Swivel [\[111\]](#page-184-0) demonstrate that organizing instructions into *bundles* or *linear blocks* respectively can mitigate speculative hijacks, making these transient attacks tractable to analyze and prevent. However, none of these compiler-based approaches are yet grounded in a formal semantics. Formalizing these systems would increase our confidence in the strong guarantees they claim to provide.

*Open problems: New languages.* Another promising approach is to design new languages which are inherently safe from Spectre attacks. Prior work has produced secure languages like FaCT [\[40\]](#page-179-3), which is (sequentially) constant-time by construction. An extension of FaCT, or a new language built on its ideas, could prevent Spectre attacks as well. Vassena et al. [\[151\]](#page-188-1) have already taken a first step in this direction: They construct a simple while-language which is guaranteed safe from Spectre-PHT attacks when compiled with their fence insertion algorithm. It would be valuable to extend this further, both to more realistic (higher-level) languages, and to more Spectre variants. The key question is whether dedicated language support can provide a path to secure code that outperforms the de-facto approach: Compiling standard C code with Spectre mitigations.

### <span id="page-140-0"></span>4.2.6 Expressivity and microarchitectural features

One theme of this chapter is that a good (practical) semantics needs to have an appropriate amount of *expressivity*: On one hand, we want a semantics which is *expressive*—able to model a wide range of possible behaviors (e.g., Spectre variants). This allows us to model powerful attackers. On the other hand, a semantics which is too expressive—allows too many possible behaviors—makes many analyses intractable. One fundamental purpose of semantics is to provide a reasonable abstraction (simplification) of hardware to ease analysis; a semantics which is too expressive simply punts this problem to the analysis writer. Thus, choosing how much expressivity to include in a semantics represents an interesting tradeoff.

By far the most important choice for the expressivity of a semantics is which misprediction behaviors to allow—i.e., which Spectre variants to reason about. We discussed these tradeoffs in [Section 4.2.3.](#page-132-0) But beyond speculative execution itself, there are many other microarchitectural features which could be relevant for a security analysis, and which have been—or could be modeled in a speculative semantics. These features also affect the expressivity of the semantics, which means that choosing whether to include them results in similar tradeoffs.

*Out-of-order execution.* Many speculative semantics simulate a processor feature called *out-of-order execution*: they allow instructions to be executed in any order, as long as those instructions' dependencies (operands) are ready. Out-of-order execution is mostly orthogonal to speculative execution; in fact, out-of-order execution is not required to model Spectre-PHT, -BTB, or -RSB—speculative execution alone is sufficient. However, out-of-order execution is included in most modern processors, and for that reason, $4$  many speculative semantics also model out-of-order execution. Modeling out-of-order execution may provide an easier or more elegant way to express a variety of Spectre attacks, as opposed to modeling speculative execution alone. Further, as a result of including out-of-order execution in their respective semantics, Disselkoen et al. [\[51\]](#page-180-2) and Guanciale et al. [\[63\]](#page-181-2) propose to abuse out-of-order execution to conduct (at least theoretical) novel side-channel attacks. $5$ 

Although modeling out-of-order execution might make the semantics simpler, the additional expressivity definitely makes the resulting analysis more complex. Fully modeling out-of-order execution leads to an explosion in the number of possible executions of a program; naively incorporating out-of-order execution into a detection or mitigation tool results in an

<span id="page-141-0"></span><sup>4</sup>Or, perhaps because out-of-order execution is often discussed alongside, or even confused with, speculative execution

<span id="page-141-1"></span><sup>5</sup>Disselkoen et al. [\[51\]](#page-180-2) propose to abuse compile-time instruction reordering, which is different from microarchitectural out-of-order execution, but related.

intractable analysis. Indeed, while Guarnieri et al. [\[65\]](#page-181-0) and Colvin and Winter [\[44\]](#page-179-2) present analysis tools based on their respective out-of-order semantics, they only analyze very simple Spectre gadgets, not code used in real programs. Instead, for analysis tools based on out-of-order semantics to scale to real programs, developers need to use lemmas to reduce the number of possibilities the analysis needs to consider. As one example, Pitchfork [\[36\]](#page-179-0) operates on a set of "worst-case schedules" which represent a small subset of all possible out-of-order schedules. The developers formally argue that this reduction does not affect the soundness of Pitchfork's analysis.

*Caches and TLBs.* Some speculative semantics and tools [\[66,](#page-181-4) [100,](#page-184-1) [154,](#page-188-3) [161\]](#page-188-2) include abstract models of caches, tracking which addresses may be in the cache at a given time. One could imagine also including detailed models of TLBs. As discussed in [Section 4.2.1,](#page-122-0) modeling caches or TLBs is probably not helpful, at least for mitigation or verification tools—not only does it make the semantics more complicated, but it potentially leads to non-portable guarantees. In particular, including a model of the cache usually leads to the  $\llbracket \cdot \rrbracket_{\text{cache}}$  leakage model, rather than the  $[\![\cdot]\!]_{\text{ct}}$  or  $[\![\cdot]\!]_{\text{arch}}$  leakage models which provide stronger defensive guarantees. Following in the tradition of constant-time programming in the non-speculative world, it seems wiser for our analyses and mitigations to be based on microarchitecture-agnostic principles as much as possible, and not depend on details of the cache or TLB structure.

*Other leakage channels.* There are a variety of specific microarchitectural mechanisms which could result in leakages, beyond the ones we've been focusing on in this chapter. For instance, in the presence of multithreading, port contention in the processor's execution units can reveal sensitive information [\[29\]](#page-178-2); and many processor instructions, e.g., floating-point or SIMD instructions, can reveal information about their operands through timing side channels [\[10\]](#page-177-3). Most existing semantics do not model these specific effects. However, the commonly-used  $[\![\cdot]\!]_{ct}$  and  $\llbracket \cdot \rrbracket_{\text{arch}}$  leakage models are already strong enough to capture leakages from most of these sources: for instance, port contention can only reveal sensitive data if the sensitive data influenced which instructions are being executed—and the  $\llbracket \cdot \rrbracket_{ct}$  leakage model would have already considered the sensitive data leaked once it influenced control flow. For variable-time instructions, most works' definitions of  $\llbracket \cdot \rrbracket_{ct}$  do not capture this leakage, but extending those definitions to cover it is straightforward [\[7\]](#page-176-0). In both of these examples, the  $[\![\cdot]\!]_{\text{arch}}$  leakage model would capture all of the leaks, because it (even more conservatively) would already consider the sensitive data leaked once it reached a register, long before it could influence control-flow or be used in a variable-time instruction. Although modeling any of these effects more precisely could increase the precision with which an analysis detects potential vulnerabilities, the tradeoff in analysis complexity is probably not worth it, and for mitigation and verification tools, the  $[\![\cdot]\!]_{\text{ct}}$  and  $[\![\cdot]\!]_{\text{arch}}$  leakage models provide stronger and more generalizable guarantees.

In a similar vein, most semantics and tools do not explicitly model parallelism or concurrency: They reason only about single-threaded programs and processors. Instead, they abstract away these details by giving attackers broad powers in their models—e.g., complete power over all microarchitectural predictions, and the capability to observe the full cache state after every execution step. The notable exceptions are the weak-memory-style semantics presented by Colvin and Winter [\[44\]](#page-179-2) and Disselkoen et al. [\[51\]](#page-180-2)—multiple threads are an inherent feature for this style of semantics. These semantics may be a promising vehicle for further exploring the interaction between speculation and concurrency. For other semantics, adding detailed models of multithreading is probably not worth the increased analysis complexity.

*Open problems: Process isolation.* In practice, a common response to Spectre attacks has been to move all secret data into a separate process—e.g., Chrome isolates different *sites* in separate processes [\[123\]](#page-185-4). This shifts the burden to OS engineers from application and runtime system engineers. Developing Spectre foundations to model the process abstraction would elucidate the security guarantees of such systems. This would be especially useful since there is evidence showing that the process boundary does not keep an attacker from performing
out-of-place training of the conditional branch predictor, or from leaking secrets via the cache state [\[35\]](#page-178-0).

## 4.3 Related Work

There has been a lot of interest in Spectre and other transient execution attacks, both in industry and in academia. We discuss other systematization papers that address Spectre attacks and defenses, and we briefly survey related work which otherwise falls outside the scope of this chapter.

#### 4.3.1 Systematization of Spectre attacks and defenses

Canella et al. [\[35\]](#page-178-0) present a comprehensive systematization and analysis of Spectre and Meltdown attacks and defenses. They first classify transient execution attacks by whether they are a result of misprediction (Spectre) or an execution fault (Meltdown); then they further classify the attacks by their root microarchitectural cause, yielding the nomenclature we use in this chapter (e.g., *Spectre-PHT* is named for the pattern history table). They then categorize previously known Spectre attacks, revealing several new variants and exploitation techniques for each. Canella et al. also propose a sequence of "phases" for a successful Spectre or Meltdown attack, and group published defenses by the phase they target. A followup survey by Canella et al. [\[34\]](#page-178-1) expands on the idea of attack phases, categorizing both hardware and software Spectre defenses according to which attack phase they prevent: preparation, misspeculation, data access, data encoding, leakage, or decoding. Separately, Xiong et al. [\[162\]](#page-188-0) also survey transient execution attacks, with a specific focus on the mechanics of exploits for these attacks. In contrast, our systematization focuses on the formal semantics behind Spectre analysis and mitigation tools rather than the specifics of attack variants or types of defenses.

#### 4.3.2 Hardware-based Spectre defenses

In this chapter, we focus only on software-based techniques for existing hardware. The research community has also proposed several hardware-based Spectre defenses based on cache partitioning [\[81\]](#page-182-0), cleaning up the cache state after misprediction [\[130\]](#page-186-0), or making the cache invisible to speculation by incorporating some separate internal state [\[2,](#page-176-0) [80,](#page-182-1) [163\]](#page-189-0). Unfortunately, attackers can still use side channels other than the cache to exploit speculative execution [\[29,](#page-178-2) [134\]](#page-186-1). NDA [\[158\]](#page-188-1), DOLMA [\[95\]](#page-183-0), and Speculative Taint Tracking (STT) [\[167\]](#page-189-1) block additional speculative covert channels by analyzing and classifying instructions that can leak information.

Fadiheh et al. [\[55\]](#page-180-0) define a property for hardware execution that they term UPEC: A hardware that satisfies UPEC will not leak speculatively anything more than it would leak sequentially. In other words, UPEC is equivalent to the relative non-interference property  $NI(\pi, \llbracket \cdot \rrbracket_{\text{arch}}^{\text{seq}} \Rightarrow \llbracket \cdot \rrbracket_{\text{arch}}^{\text{pht}}).$ 

The insights and recommendations from our work can guide future hardware mitigations; properties like  $[\![\cdot]\!]_{\text{ct}}$  or  $[\![\cdot]\!]_{\text{arch}}$  can serve as contracts of what software expects from hardware [\[65\]](#page-181-0) (or how defenses need to bridge the gap in software when hardware only offers partial mitigations).

#### 4.3.3 Software-hardware co-design

Although hardware-only approaches are promising for future designs, they require significant modifications and introduce non-negligible performance overhead for all workloads. Several works instead propose a software-hardware co-design approach. Taram et al. [\[145\]](#page-187-0) propose context-sensitive fencing, making various speculative barriers available to software. Li et al. [\[92\]](#page-183-1) propose memory instructions with a conditional speculation flag. Context [\[132\]](#page-186-2) and SpectreGuard [\[57\]](#page-180-1) allow software to mark secrets in memory. This information is propagated through the microarchitecture to block speculative access to the marked regions. SpecCFI [\[88\]](#page-183-2) suggests a hardware extension similar to Intel CET [\[74\]](#page-182-2) that provides target label instructions

with speculative guarantees. Finally, several recent proposals allow partitioning branch predictors based on context provided by the software [\[153,](#page-188-2) [170\]](#page-189-2). As these approaches require both software and hardware changes, we will need a formal semantics to apply them correctly; this represents valuable future work.

#### 4.3.4 Other transient execution attacks

We focus exclusively on Spectre, as other transient execution attacks are probably better addressed in hardware. For completeness, we briefly discuss these other attacks.

*Meltdown variants.* The Meltdown attack [\[93\]](#page-183-3) bypasses implicit memory permission checks within the CPU during transient execution. Unlike Spectre, Meltdown does not rely on executing instructions in the victim domain, so it cannot be mitigated purely by changes to the victim's code. Foreshadow [\[149\]](#page-187-1) and microarchitectural data sampling (MDS) [\[33,](#page-178-3) [71\]](#page-181-1) demonstrate that transient faults and microcode assists can still leak data from other security domains, even on CPUs that are resistant to Meltdown. Researchers have extensively evaluated these Meltdown-style attacks leading to new vulnerabilities [\[106,](#page-184-0) [107,](#page-184-1) [133\]](#page-186-3), but most recent Intel CPUs have hardware-level mitigations for all these vulnerabilities in the form of microcode patches or proprietary hardware fixes [\[73\]](#page-182-3).

*Load value injection.* Load value injection (LVI) [\[150\]](#page-187-2) exploits the same root cause as Meltdown, Foreshadow, and MDS. But LVI reverses these attacks: The attacker induces the transient fault into the victim domain instead of crafting arbitrary gadgets in their own code space. This inverse effect is subject to an exploitation technique similar to Spectre-BTB for transiently hijacking control flow. Although there are software-based mitigations proposed against LVI [\[72,](#page-181-2) [150\]](#page-187-2), Intel only suggests applying them to legacy enclave software. Like Meltdown, LVI does not need software-based mitigation on recent Intel CPUs, and our systematization does not apply.

## 4.4 Conclusion

Spectre attacks break the abstractions afforded to us by conventional execution models, fundamentally changing how we must reason about security. We systematize the community's work towards rebuilding foundations for formal analysis atop the loose earth of speculative execution, evaluating current efforts in a shared formal framework and pointing out open areas for future work in this field.

We find that, as with previous work in the sequential domain, solid foundations for speculative analyses require proper choices for semantics and attacker models. Most importantly, developers must consider leakage models no weaker than  $[\![\cdot]\!]_{\text{arch}}$  or  $[\![\cdot]\!]_{\text{ct}}$ . Weaker models—those that only capture leaks via memory or the data cache—lead to weaker security guarantees with no clear benefit. Next, though many frameworks focus on Spectre-PHT, sound tools must consider all Spectre variants. Although this can increase the complexity of analysis, developers can combine analyses with structured compilation techniques to restrict or remove entire categories of Spectre attacks by construction. Finally, we recommend *against* modeling unnecessary (micro)architectural details in favor of the simpler  $[\![\cdot]\!]_{\text{arch}}$  and  $[\![\cdot]\!]_{\text{ct}}$  models; details like cache structures or port contention introduce complexity and give up on portability.

When properly rooted in formal guarantees, software Spectre defenses provide a firm foundation on which to rebuild secure systems. We intend this systematization to serve as a reference and guide for those seeking to build atop formal frameworks and to develop sound Spectre defenses with strong, precise security guarantees.

## Acknowledgements

We thank Matthew Kolosick for helping us understand some of the formal systems and in organizing our presentation. This work was supported in part by gifts from Cisco; by the NSF under Grant Numbers CNS-1514435, CCF-1918573, and CAREER CNS-2048262; and, by the CONIX Research Center, one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA. Work by Gilles Barthe was supported by the Office of Naval Research (ONR) under project N00014-15-1-2750.

[Chapter 4,](#page-114-0) in part, has been submitted for publication of the material as it may appear in 43rd IEEE Symposium on Security and Privacy (Oakland '22), Cauligi, Sunjay; Disselkoen, Craig; Moghimi, Daniel; Barthe, Gilles; Stefan, Deian. The dissertation author was the primary investigator and author of this material.

# Conclusion

We see time and time again that timing side-channels thoroughly erode our mental models of how programs execute and how we can keep data confidential—even worse, the effects of speculative execution topple all semblance of basic security properties such as memory safety, type safety, or even simply basic control flow. These problems, however, are not insurmountable: With the proper groundwork, we can yet reclaim these security properties in the face of speculative execution. To that end, this dissertation has laid the foundation for rebuilding formal security atop the shaky ground of speculative execution.

We started with FaCT, a DSL for writing sequential constant-time code. Although FaCT doesn't consider speculative effects, it gives us a blueprint for automatic, sound, and secure compilation in the speculative domain. We introduced a formal type system for constant-time, allowing us to capture timing side-channels as a violation of typing judgements. We then demonstrated various compilation techniques that *automatically* transform potentially insecure, high-level FaCT programs all the way down to low-level constant-time bitcode.

We then developed the foundations of *speculative* constant-time with Pitchfork: We defined a formal semantics that captures the effects of microarchitectural predictors and speculative execution. Through this semantics, we were able to extend the traditional definition of constant-time to the speculative domain and show that Spectre attacks are simply a violation of this new property. We also showed that our formal foundation was indeed solid and practical: Our verification tool, Pitchfork, was able to find subtle Spectre vulnerabilities in real code.

We built upon Pitchfork's foundations, adapting its semantics for tackling speculative security in the higher-level context of SFI and software sandboxing. We generalized the notion of speculative constant-time to formally capture the SFI protections against both sandbox breakout and poisoning attacks. We demonstrated the structural soundness of our framework, showing how mitigations from existing tools serve (or fail) to uphold our speculative SFI properties.

Finally, we gave a bird's eye view of speculative software semantics at the time of writing: We categorized and systematized various design choices made in our and others' semantics and identified open areas that have yet to be filled in. We examined how each design choice taken either builds towards or works against our eventual goals; how each open problem solved is one less impediment in our pursuit of high-level software security.

Ultimately, we want to allow developers to program in high-level languages while being verifiably free from Spectre attacks. This dissertation presented formal foundations and frameworks for reclaiming these security goals: We defined type systems, semantics, and formal techniques for verifying and enforcing constant-time and sandbox properties even in the face of speculation; and we implemented these techniques in practical tools to detect and defend against constant-time and Spectre attacks.

# Appendix A

# FaCT: Deferred definitions and proofs

### A.1 Semantics

We define the behavior of expressions, statements and functions using an instrumented big-step semantics. Informally, the big-step semantics relates initial configurations, final configurations, and leakages. Initial configurations are triples of the form  $(C, \rho, h)$  where *C* is an expression, a statement or a function,  $\rho$  is an environment mapping variables to values, and *h* is a heap mapping pointers to values.

Definition A.1.1 (Values). *The set of values is defined by the following syntax:*



An environment is defined as a partial mapping from variables to values, and a heap is defined as a partial mapping from pointers to values. We say that a pointer *p* is allocated in a heap *h*, written  $p \in h$ , if  $h(p)$  is defined. If  $p \in h$  then the associated value to p can be updated:  $h[p \leftarrow v]$ . The associated values of other pointers are unchanged. We assume we are given a

EXPR-STEP

\n
$$
(e, \rho, h) \xrightarrow{\Psi} (v, h)
$$
\nSTMT-STEP

\n
$$
(S, \rho, h) \xrightarrow{\Psi} (v, h)
$$

<span id="page-152-0"></span>

$$
\begin{array}{ll}\n\text{SEQ-NORET} & (i, \rho, h) \xrightarrow{\psi_1} (\rho', h') & \text{VARDEC} \\
& (S, \rho', h') \xrightarrow{\psi_2} (v, h'') & \text{VARDEC} \\
\hline\n(i; S, \rho, h) \xrightarrow{\psi_1 + \psi_2} (v, h'') & \text{XAPEC} \\
& (e, \rho, h) \xrightarrow{\psi} (v, h') \\
\hline\n& (i; S, \rho, h) \xrightarrow{\psi_1 + \psi_2} (v, h'') \\
& (x = e, \rho, h) \xrightarrow{\psi} (\rho[x \leftarrow v], h')\n\end{array}
$$

ASSIGN

$$
\frac{(e_1, \rho, h) \stackrel{\psi_1}{\longrightarrow} (p, h')}{(e_2, \rho, h') \stackrel{\psi_2}{\longrightarrow} (v, h'')} \\ \frac{(e_1 := e_2, \rho, h) \stackrel{\psi_1 + \psi_2}{\longrightarrow} (\rho, h''[p \leftarrow v])}
$$

$$
\frac{(e,\rho,h)\xrightarrow{\Psi} (v,h')}{(x=e,\rho,h)\xrightarrow{\Psi} (\rho[x\leftarrow v],h')}
$$

RETURN  
\n
$$
(e, \rho, h) \xrightarrow{\psi} (v, h')
$$
  
\n $(\text{return } e, \rho, h) \xrightarrow{\psi} (v, h')$ 

**BLOCK**

\n
$$
\begin{array}{ccc}\n(E, \rho, h) & \xrightarrow{\psi} (v, h') & (e_1, \rho, h) & \xrightarrow{\psi_1} (v_1, h_1) & \dots & (e_n, \rho, h_{n-1}) & \xrightarrow{\psi_n} (v_n, h_n) \\
(\{S\}, \rho, h) & \xrightarrow{\psi} (v, h') & (F, \vec{v}, h_n) & \xrightarrow{\psi_1} (v, h') & \\
(x = f(\vec{e}), \rho, h) & \xrightarrow{\Sigma \psi_i + \psi} (\rho[x \leftarrow v], h') &\n\end{array}
$$

IF

$$
\frac{\text{FN}}{(F.S,[F.\vec{x}\leftarrow \vec{v}],h)\xrightarrow{\psi}(v,h')}\n\qquad\n\frac{(e,\rho,h)\xrightarrow{\psi}(b,h')}{(S_b,\rho,h')\xrightarrow{\psi_b}(v,h'')}\n\qquad\n\frac{(S_b,\rho,h')\xrightarrow{\psi_b}(v,h'')}{(\text{if }\ast_{\ell} e \text{ then } S_{\text{T}} \text{ else } S_{\text{F}},\rho,h)\xrightarrow{\psi+\text{if }\ast(\ell,b)+\psi_b}\n\qquad\n\langle v,h''\rangle}
$$

FOR

\n
$$
(e_1, \rho, h) \xrightarrow{\psi_1} (n_1, h_1) \qquad (e_2, \rho, h_1) \xrightarrow{\psi_2} (n_2, h_2)
$$
\n
$$
(if *_{\ell} n_1 < n_2 \text{ then } \{i = n_1; S\};
$$
\n
$$
\text{for} *_{\ell} i = n_1 + 1 \text{ to } n_2 \text{ DO } S\}, \rho, h_2) \xrightarrow{\psi_1} (v, h')
$$
\n
$$
(for *_{\ell} i = n_1 \text{ to } n_2 \text{ DO } S\}, \rho, h) \xrightarrow{\psi_1 + \psi_2 + \psi} (v, h')
$$

Figure A.1: Big-step semantics.

deterministic operator FRESH for creating and initializing a fresh pointer: FRESH $(h, v) = (p, h')$ . This operator satisfies:

- $\blacktriangleright$  *p* is a fresh pointer, i.e.,  $p \notin h$
- $\blacktriangleright$  The associated value of *p* is *v*, i.e.,  $h'(p) = v$
- ▶ Other pointers are unchanged, i.e.,  $\forall p', h(p') = h'(p')$

We further assume the existence of an equivalence relation  $\approx$  on heaps such that:

- $\triangleright \approx$  is stable by allocation: If FRESH $(h_1, v_1) = (p_1, h_1)$  $\binom{1}{1}$  and FRESH $(h_2, v_2) = (p_2, h'_2)$  $_2'$ ) and  $h_1 \approx h_2$  then  $p_1 = p_2$  and  $h'_1 \approx h'_2$  $\frac{1}{2}$ .
- $\blacktriangleright \approx$  is stable by update: if  $h_1 \approx h_2$  then  $h_1[p \leftarrow v_1] \approx h_2[p \leftarrow v_2]$ .

A final configuration is either a pair consisting of a value and a heap, or of an environment and a heap. In particular, the semantics of expressions  $(e, \rho, h) \stackrel{\psi}{\longrightarrow} (v, h')$  returns a value and a new heap (creation of fresh reference). Here, ψ corresponds to the leakage of the evaluation of *e*. The semantics of statements is given by two judgments of similar form:  $(S, \rho, h) \stackrel{\psi}{\longrightarrow} (\rho', h')$  and  $(S, \rho, h) \stackrel{\psi}{\longrightarrow} (\nu, h')$ . These judgments correspond to statements that do not and do return values, respectively. Again,  $\psi$  is the leakage produced by the evaluation of the statement. Finally, the semantics of a function is modelled by a judgment of the form  $(f, \vec{v}, h) \stackrel{\psi}{\longrightarrow} (v', h')$ , where  $\vec{v}$  denotes the values of the parameters of the function, and  $v'$  is the return value (we only consider functions that return a value). Figure [A.1](#page-152-0) presents the semantics. Rules are standard, with the exception of leakage. Primarily, array accesses leak the index at which they are accessed, conditionals leak their control flow, and other rules combine leakage of sub-computations according to evaluation order. Note that in the rules for conditionals and for loops we assume that the guard of the statement is identified by a unique label, which we record in the leakage.

RULES  $\Gamma \vdash e : \beta$  $pc, \beta_r \vdash S : \Gamma \rightarrow \Gamma'$  $\boldsymbol{\omega} \vdash \boldsymbol{\beta}_r~f(\vec{x}:\vec{\boldsymbol{\beta}}) \boldsymbol{\boldsymbol{\xi}} \boldsymbol{\boldsymbol{\hat{s}}}$ 

<span id="page-154-0"></span>

|                                                                                                     |                                                                                                                           | VAR-DEC-FN-CALL                                                                            |                                                     |                                                                        |
|-----------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------|-----------------------------------------------------|------------------------------------------------------------------------|
| <b>SEO</b>                                                                                          | VAR-DEC                                                                                                                   |                                                                                            |                                                     | $f:(\vec{\beta})\rightarrow \beta$                                     |
| $pc, \beta_r \vdash S_1 : \Gamma \rightarrow \Gamma'$                                               | $\Gamma \vdash e : \beta$                                                                                                 |                                                                                            |                                                     | $hasMut(f) \Rightarrow pc \sqsubseteq \omega(f)$                       |
| $pc, \beta_r \vdash S_2 : \Gamma' \rightarrow \Gamma''$                                             | $\Gamma' = \Gamma, x : \beta$                                                                                             |                                                                                            |                                                     | $\Gamma \vdash e_i : \beta_i \qquad \Gamma' = \Gamma, x : \beta$       |
| $pc, \beta_r \vdash S_1; S_2 : \overline{\Gamma \rightarrow \Gamma''}$                              | $pc, \beta_r \vdash \beta x = e : \Gamma \rightarrow \Gamma'$                                                             |                                                                                            |                                                     | $pc, \beta_r \vdash \beta x = f(\vec{e}) : \Gamma \rightarrow \Gamma'$ |
|                                                                                                     | IF                                                                                                                        |                                                                                            |                                                     |                                                                        |
| <b>ASSIGN</b>                                                                                       |                                                                                                                           | $\Gamma \vdash e : \text{Bool}_\ell$                                                       |                                                     |                                                                        |
| $\Gamma \vdash e_1 : \text{REF}_W \beta $                                                           |                                                                                                                           | $pc \sqcup \ell, \beta_r \vdash S_1 : \Gamma \rightarrow \Gamma_1$                         |                                                     |                                                                        |
| $\Gamma \vdash e_2 : \beta \qquad pc \sqsubseteq \beta$                                             |                                                                                                                           | $pc \sqcup \ell, \beta_r \vdash S_2 : \Gamma \rightarrow \Gamma_2$                         |                                                     |                                                                        |
| $pc, \beta_r \vdash e_1 := e_2 : \Gamma \rightarrow \Gamma$                                         |                                                                                                                           | $pc, \beta_r \vdash \text{if} (e) \{S_1\} \text{else} \{S_2\} : \Gamma \rightarrow \Gamma$ |                                                     |                                                                        |
| <b>FOR-RANGE</b>                                                                                    |                                                                                                                           | <b>RETURN</b>                                                                              |                                                     |                                                                        |
| $\Gamma \vdash e_1 : \text{UINT}_{\text{SEC}}$ $\Gamma \vdash e_2 : \text{UINT}_{\text{SEC}}$       |                                                                                                                           |                                                                                            | $\Gamma \vdash e : \beta_r$                         |                                                                        |
| $\Gamma' = \Gamma, x : \text{UINT}_{\text{SEC}} \qquad pc, \beta_r \vdash S : \Gamma' \to \Gamma''$ |                                                                                                                           |                                                                                            | $pc \sqsubseteq \beta_r$                            |                                                                        |
| $pc, \beta_r \vdash$ for $(x \text{ from } e_1 \text{ to } e_2)$ $\{S\} : \Gamma \to \Gamma$        |                                                                                                                           |                                                                                            | $pc, \beta_r \vdash$ return $e : \Gamma \to \Gamma$ |                                                                        |
|                                                                                                     | <b>FN-DEC</b>                                                                                                             |                                                                                            |                                                     |                                                                        |
| $pc = \omega(f)$ $\Gamma = {\vec{x} : \vec{\beta}}$                                                 |                                                                                                                           |                                                                                            |                                                     |                                                                        |
| $pc, \beta_r \vdash S : \Gamma, \text{PUB} \rightarrow \Gamma'$                                     |                                                                                                                           |                                                                                            |                                                     |                                                                        |
|                                                                                                     | $\boldsymbol{\omega} \vdash \boldsymbol{\beta}_r f(\vec{x} : \vec{\boldsymbol{\beta}}) \boldsymbol{\{} S \boldsymbol{\}}$ |                                                                                            |                                                     |                                                                        |

Figure A.2: Type system  $\vdash_{rd}$  for return deferral.

## A.2 Return deferral

We prove that return deferral is correct, i.e., preserves the behavior of programs; and secure, which we formalize as a type-preservation result.

*Type system and type-preservation.* We define a variant of the type system that only allows return statements in PUB contexts. The judgments are thus of the form  $pc, \beta_r \vdash S : \Gamma \to \Gamma'$ or  $\omega \vdash \beta_r f(\vec{x} : \vec{\beta})$  { *S* }, i.e., the return context label is omitted. The typing rules for statements are given in Figure [A.2;](#page-154-0) rules for expressions do not change.

We prove that return deferral transforms typeable expressions (resp. statements and procedures) of the source type system into typeable expressions (resp. statements and procedures) of the  $\vdash_{rd}$  type system.

First, we prove preliminary lemmas.

<span id="page-155-0"></span>**Lemma A.2.1.** *If*  $\omega$ ,  $pc$ ,  $\beta_r \vdash S : \Gamma$ ,  $rc \rightarrow \Gamma'$ ,  $rc'$  then  $rc \sqsubseteq rc'$ .

<span id="page-155-1"></span>**Lemma A.2.2** (PC subtyping type system for return deferral). *For all*  $pc_1 \sqsubseteq pc_2$ , *if*  $pc_2$ ,  $\beta_r \vdash S$ :  $\Gamma \rightarrow \Gamma'$  then

 $\Box$ 

 $pc_1, \beta_r \vdash S : \Gamma \rightarrow \Gamma'.$ 

*Proof.* By induction on  $pc_2$ ,  $\beta_r \vdash S : \Gamma \to \Gamma'$ .

Lemma A.2.3 (Type preservation for return deferral). *If*  $\omega, pc, \beta_r \vdash S : \Gamma, rc \rightarrow \Gamma', rc'$ :

- *I*.  $\Phi, pc, rc \vdash S \rightarrow S'$  then  $pc \sqcup rc, \beta_r \vdash S' : \overline{\Gamma} \rightarrow \overline{\Gamma'}$
- 2.  $\Phi, pc, rc \vdash S \leadsto S'$  then  $pc \sqcup rc, \beta_r \vdash S' : \overline{\Gamma} \to \overline{\Gamma'}$

 $where \ \overline{\Gamma} = \Gamma[notRet, rval \leftarrow \text{REF}_{RW}[\text{Bool}], \text{REF}_{RW}[\beta_r]].$ 

*Proof.* We start by proving (2). Assuming that (1) holds for a given *S*, we prove that (2) holds for *S*. By case on *rc*:

- If *rc* is PUB then  $\Phi$ ,  $pc$ ,  $rc \vdash S \leadsto S'$  is  $\Phi$ ,  $pc$ ,  $rc \vdash S \to S'$ , and we can trivially conclude using  $(1)$ .
- If  *is SEC we should prove*

 $pc\sqcup\text{SEC},\beta_r\vdash\text{ if (deref } notRet)\{S'\}:\overline{\Gamma}\to\overline{\Gamma'}.$  By hypothesis we have  $pc\sqcup\text{SEC},\beta_r\vdash S':$  $\overline{\Gamma} \to \overline{\Gamma'}$  and we can apply the IF rule of type system 2 to conclude (where *l* is SEC).

We now prove (1) by induction on *S*. The cases for (VAR-DEC, ASSIGN, IF, FOR-RANGE, RETURN) are trivial.

<span id="page-156-0"></span>Bool

\nIn the image shows a function 
$$
b \simeq_m b
$$

\nSET

\n1.  $b \simeq_m b$ 

\n1.  $i \simeq_m i$ 

\n2.  $m \geq m$ 

\n3.  $p_1 \simeq_m p_2$ 

\n4.  $v_i \simeq_m w_i$ 

\n5.  $v_i \simeq_m w_i$ 

\n6.  $v_i \simeq_m w_i$ 

\n7.  $v_i \simeq_m w_i$ 

\n8.  $v_i \simeq_m w_i$ 

\n9.  $v_i \simeq_m w_i$ 

\n1.  $v_i \simeq_m w_i$ 

\n1.  $v_i \simeq_m w_i$ 

\n1.  $v_i \simeq_m w_i$ 

\n2.  $v_i \simeq_m w_i$ 

\n3.  $v_i \simeq_m w_i$ 

\n4.  $v_i \simeq_m w_i$ 

\n5.  $v_i \simeq_m w_i$ 

\n6.  $v_i \simeq_m w_i$ 

\n7.  $v_i \simeq_m w_i$ 

\n8.  $v_i \simeq_m w_i$ 

\n9.  $v_i \simeq_m w_i$ 

\n1.  $v_i \simeq_m w_i$ 

\n1.  $v_i \simeq_m w_i$ 

\n1.  $v_i \simeq_m w_i$ 

\n2.  $v_i \simeq_m w_i$ 

\n3.  $v_i \simeq_m w_i$ 

\n4.  $v_i \simeq_m w_i$ 

\n5.  $v_i \simeq_m w_i$ 

\n6.  $v_i \simeq_m w_i$ 

\n7. 

**STRUCT** 

$$
v_i \simeq_m w_i
$$

$$
\{x_1 = v_1, \dots, x_n = v_n\} \simeq_m \{x_1 = w_1, \dots, x_n = w_n\}
$$

$$
\frac{\text{HeAP}}{\forall p_1 \ p_2, \ m(p_1) = p_2 \Rightarrow h_1(p_1) \simeq_m h_2(p_2)}{h_1 \simeq_m h_2}
$$

$$
\frac{\forall x, \text{ Defined } \rho(x) \Rightarrow \text{Defined } \rho'(x) \text{ and } \rho(x) \simeq_m \rho'(x)}{\rho \simeq_m \rho'}
$$



If  $S = S_1$ ;  $S_2$  then we have  $S' = S_1'$  $i_1$ ; S<sub>2</sub><sup> $\theta$ </sup> where

$$
\omega, pc, rc \vdash S_1 : \Gamma, rc \rightarrow \Gamma', rc'
$$
  
\n
$$
\Phi, pc, rc \vdash S_1 \rightarrow S'_1
$$
  
\n
$$
\omega', pc, rc' \vdash S_2 : \Gamma', rc' \rightarrow \Gamma'', rc''
$$
  
\n
$$
\Phi', pc, rc' \vdash S_2 \rightsquigarrow S'_2
$$

By induction hypothesis, we have  $pc \sqcup rc$ ,  $\beta_r \vdash S_1'$  $\frac{1}{1}$ :  $\overline{\Gamma} \rightarrow \overline{\Gamma'}$  and by (2) (using the induction hypothesis on *S*<sub>2</sub>) we have  $pc \sqcup rc', \beta_r \vdash S'_2$  $\frac{1}{2}$ :  $\overline{\Gamma'} \to \overline{\Gamma''}$ . Since  $rc \sqsubseteq rc'$  (by lemma [A.2.1\)](#page-155-0), we can apply lemma [A.2.2](#page-155-1) to obtain  $pc \sqcup rc$ ,  $\beta_r \vdash S'_2$  $\frac{1}{2}:\overline{\Gamma'}\to\overline{\Gamma''}$  and conclude.

If  $S = \beta x = f(\vec{e})$ , we can conclude by induction hypothesis (*f*.*S* can be seen as a substatement of *S* since there is no recursion).

 $\Box$ 

*Preservation of semantics.* We now prove the preservation of semantics for return deferral. Since the compilation introduces references and variables, the correctness lemmas should take this into account. Given a partial mapping *m* from pointers to pointers, we say that two values *v* and *v*<sup> $\prime$ </sup> are in relation for *m*,  $v \simeq_m v'$  if they are equal up to pointers. Figure [A.3](#page-156-0) defines this relation. The relation is extended to heaps  $h \simeq_m h'$  (rule HEAP), if for all pointers p in m we have  $h(p) \simeq_m h'(m(p))$ . The relation is extended to environments (rule ENV): for all defined variables *x* in  $\rho$ , *x* should be defined in  $\rho'$  and the associated values should be in relation for *m*.

**Lemma A.2.4** (Preservation of semantics for return deferral). Let  $\rho_1 \simeq_m \rho_1'$  $\int_1'$  and  $\rho_1'$  $\binom{1}{1}(notRet) = p_r$ and  $\rho_1'$  $p'_1$ (*rval*) =  $p_v$  *and*  $h_1 \simeq_m h'_1$  $n_1'$  and  $h_1'$  $\boldsymbol{h}_{1}^{\prime}(p_{r})=\text{\text{true}}$  and  $\boldsymbol{h}_{1}^{\prime}$  $J'_1(p_v) = init(\beta_r)$ . If  $\Phi, pc, rc \vdash S \rightarrow S'$  $and (S, \rho_1, h_1) \rightarrow (v, h_2)$ , then there exists  $v', m', h'_2$  $Q'_2$  such that  $m \sqsubseteq m'$  and  $(S', \rho'_1)$  $'_{1}, h'_{1}$  $y'_1$   $\rightarrow$   $(y', h'_2)$  $_{2}^{\prime})$ *and*  $h_2 \simeq_{m'} h_2'$ 2 *:*

- $\blacktriangleright$  *If*  $v = \rho_2$  *then there exists*  $\rho_2'$  $\chi_2'$  such that  $v' = \rho_2'$  $\frac{1}{2}$  and  $\rho_2 \simeq_{m'} \rho_2'$  $\frac{1}{2}$  and  $\rho_2'$  $b'_2$ (*notRet*) =  $p_r$  *and*  $\rho_2'$  $Q'_2$ (*rval*) =  $p_v$  *and*  $h'_2(p_r) = \text{true}$  *and*  $h'_2(p_v) = \text{init}(\beta_r)$ *.*
- $\blacktriangleright$  *If*  $v = v$ , there exists  $v'$  such that  $v \simeq_{m'} v'$  and  $v' = v'$ , or there exists  $\rho_2'$  $\frac{1}{2}$  such that  $v' = \rho_2'$  $\frac{7}{2}$  and  $\rho_2'$  $Q'_2$ (*notRet*) =  $p_r$  *and*  $\rho'_2$  $Q'_2(notRet) = p_v \text{ and } h'_2(p_r) = \text{false} \text{ and } h'_2(p_v) = v' \text{ and } v \simeq_{m'} v'.$

*Furthermore, if*  $h_1 \simeq_m h_1'$  $\frac{d}{dx}$  and  $\vec{v} \simeq_m \vec{v}'$  and  $\omega \vdash F \to F'$  and  $(F, \vec{v}, h_1) \to (v, h_2)$  then there exists  $v', m', h'_2$  $\frac{1}{2}$  such that  $v \simeq_{m'} v'$  and  $h_2 \simeq_{m'} h_2'$  $\frac{d}{2}$  and  $(F', \vec{v'}, h'_1)$  $y'_{1}) \rightarrow (v', h'_{2})$  $'_{2}$ ).

*Proof.* The proof is done by mutual induction on  $\Phi$ ,  $pc$ ,  $rc \vdash S \rightarrow S'$  and  $(F, \vec{v}, h_1) \rightarrow (v, h_2)$ . The case for functions is a direct consequence of the case for statements. For statements, the interesting case is the one for sequencing, i.e.,  $S = S_1$ ;  $S_2$ . If  $S_1$  returns in a SEC context then  $S'_1$  will not immediately return, but after its execution *notRet* will be false. So  $S'_2 = \text{if}$  (*notRet*) {  $S''_2$  $_2^{\prime\prime}$  } will immediately terminate.  $\Box$ 

<span id="page-158-0"></span>

| RULES                                                    | $\Gamma \vdash e : \beta$                                                                    |                                                           |                                       |                                      |
|----------------------------------------------------------|----------------------------------------------------------------------------------------------|-----------------------------------------------------------|---------------------------------------|--------------------------------------|
| $\beta_r \vdash S : \Gamma \rightarrow \Gamma'$          | $\vdash \beta_r f(\vec{x} : \vec{\beta}) \{ S \}$                                            |                                                           |                                       |                                      |
| SEQ                                                      | $\beta_r \vdash S_1 : \Gamma \rightarrow \Gamma'$                                            | $\Gamma \vdash e : \beta$                                 | $f : (\vec{\beta}) \rightarrow \beta$ |                                      |
| $\beta_r \vdash S_2 : \Gamma' \rightarrow \Gamma''$      | $\Gamma' = \Gamma, x : \beta$                                                                | $\Gamma \vdash e_i : \beta_i$                             | $\Gamma' = \Gamma, x : \beta$         |                                      |
| $\beta_r \vdash S_1 : S_2 : \Gamma \rightarrow \Gamma''$ | $\Gamma' = \Gamma, x : \beta$                                                                | $\Gamma \vdash e_i : \beta_i$                             | $\Gamma' = \Gamma, x : \beta$         |                                      |
| ASSIGN                                                   | $\Gamma \vdash e_1 : \text{REF}_W[\beta]$                                                    | $\beta_r \vdash \beta x = e : \Gamma \rightarrow \Gamma'$ |                                       |                                      |
| ASSIGN                                                   | $\Gamma \vdash e_2 : \beta$                                                                  | $\beta_r \vdash S_1 : \Gamma \rightarrow \Gamma_1$        |                                       |                                      |
| $\Gamma \vdash e_2 : \beta$                              | $\beta_r \vdash S_2 : \Gamma \rightarrow \Gamma_2$                                           |                                                           |                                       |                                      |
| $\beta_r \vdash e_1 := e_2 : \Gamma \rightarrow \Gamma$  | $\beta_r \vdash \text{if } (e) \{ S_1 \} \text{else } \{ S_2 \} : \Gamma \rightarrow \Gamma$ |                                                           |                                       |                                      |
| FOR-RANGE                                                | $\Gamma \vdash e_1 : \text{UINT}_{\text{PUB}}$                                               | $\Gamma \vdash e_2 : \text{UINT}_{\text{PUB}}$            | $\text{RETURN}$                       | $\Gamma = \{\vec{x} : \vec{\beta}\}$ |
| $\Gamma' = \Gamma, x : \$                                |                                                                                              |                                                           |                                       |                                      |

Figure A.4: Type system  $\vdash_{ct}$  for constant-time.

 $\beta_r \vdash$  return  $e : \Gamma \mathbin{\rightarrow} \Gamma$ 

 $\vdash$   $\beta_r$   $f(\vec{x}:\vec{\beta})$   $\set{S}$ 

## A.3 Branch removal

 $\beta_r \vdash$  for  $(x$  from  $e_1$  to  $e_2)$   $\set{S}$  :  $\Gamma \rightarrow \Gamma$ 

We prove that branch removal is correct, i.e., preserves the behavior of programs, and secure. For the latter, we define a new type system  $\vdash_{ct}$ , show that branch removal returns programs that are typeable with respect to  $\vdash_{ct}$ , and that typeable programs are constant-time.

*Type system and type-preservation.* The type system manipulates judgments of the form  $\beta_r \vdash S : \Gamma \to \Gamma'$  and  $\vdash \beta_r$   $f(\vec{x} : \vec{\beta}) \{ S \}$ . Notably, the path context label is omitted. Since we require that statements no longer branch on secrets, we can assume that the path context label is public throughout execution.

Figure [A.4](#page-158-0) presents the typing rules for statements in  $\vdash_{ct}$ . Rules for expressions do not change.

We prove that branch removal transforms expressions (resp. statements and procedures) typeable in  $\vdash_{rd}$  into expressions (resp. statements and procedures) typeable in  $\vdash_{ct}$ .

**Lemma A.3.1.** If  $\Phi$ ,  $p \vdash S \to S'$  and  $\overline{p}$ ,  $\beta_r \vdash S : \Gamma \to \Gamma'$  then  $\beta_r \vdash S' : \overline{\Gamma}_p \to \overline{\Gamma'}_p$ , where  $\overline{p}$  is PUB *if*  $p = \text{true}$ , SEC *otherwise and*  $\overline{\Gamma}_p = \Gamma[\text{vars}(p) \leftarrow \text{Bool}_{\text{SEC}}]$  *and vars*(*p*) *is the set of variables in p.*

*Proof.* By induction on *S*.

 $\Box$ 

*Typeable programs are constant-time.* We start by defining an equivalence between heaps. We index equivalence by a partial mapping *t* from pointers to types. Note that such partial mappings are naturally equipped with a partial order relation: we write that  $t_1 \sqsubseteq t_2$  if for all  $p, \beta$ such that  $t_1(p) = \beta$  we have  $t_2(p) = \beta$ .

We define a relation  $v_1 \equiv_{\beta,t} v_2$  between values saying that the two values  $v_1$  and  $v_2$  are in relation with respect to the type β and the partial mapping *t*. The relation imposes that the values have type  $\beta$  and are equal according to the security level. For example, base values (booleans and integers) must be equal if their level is PUB but can be arbitrary otherwise. For pointers, the relation imposes that the two pointers are equal and the mapping *t* should associate a type β' such that  $\beta' \sqsubseteq \beta$ . The relation  $h_1 \equiv_t h_2$  is extended to heaps in the following way: the two heaps should be in relation for  $\approx$ , and for all pointers p such that  $m(p) = \beta$ , the associated values should be in relation with respect to *t* and  $\beta$  :  $h_1(p) \equiv_{\beta,t} h_2(p)$ . The relation is extended to environments naturally:  $\rho_1$ ,  $h_1 \equiv_{\Gamma,t} \rho_2$ ,  $h_2$ . The relation is extended to final configurations in a straightforward manner. The formal definition is given in Figure [A.5.](#page-160-0)

We prove some preliminary lemmas.

<span id="page-159-0"></span>**Lemma A.3.2** (Stability of type interpretation). *For all partial maps t and t' such that*  $t \sqsubseteq t'$  *the following properties hold:*

*1. For all*  $v_1$   $v_2$ *, if*  $v_1 \equiv_{\beta,t} v_2$  *then*  $v_1 \equiv_{\beta,t'} v_2$ 

<span id="page-160-0"></span>RULE  $v_1 \equiv_{\beta,t} v_2$  $h_1 ≡_m h_2$  $ρ<sub>1</sub>, h<sub>1</sub> ≡<sub>Γ,t</sub> ρ<sub>2</sub>, h<sub>2</sub>$ BOOL  $\ell = \text{PUB} \Rightarrow b_1 = b_2$  $b_1 \equiv_{\text{Bool}\left(\ell, t\right)} b_2$ INT  $\ell = \text{PUB} \Rightarrow i_1^s = i_2^s$ 2  $i_1^s \equiv_{\text{INT}_{\ell}^s, t} i_2^s$ 2 UINT  $\ell = \text{PUB} \Rightarrow i_1^s = i_2^s$ 2  $i_1^s \equiv_{\text{UINT}^s_{\ell},t} i_2^s$ 2



**STRUCT** 

$$
\frac{v_i \equiv_{\beta_i, t} w_i}{\{x_1 = v_1, \dots, x_n = v_n\} \equiv_{\{x_1 : \beta_1, \dots, x_n : \beta_n\}, t} \{x_1 = w_1, \dots, x_n = w_n\}}
$$

HEAP

$$
h_1 \approx h_2
$$
  
\n
$$
\forall p \beta, m(p) = \beta \Rightarrow h_1(p) \equiv_{\beta, t} h_2(p)
$$
  
\n
$$
h_1 \equiv_t h_2
$$

$$
\forall x, x \in \Gamma \Rightarrow \rho_1(x) \equiv_{\Gamma(x),t} \rho_2(x)
$$
  

$$
\frac{h_1 \equiv_t h_2}{\rho_1, h_1 \equiv_{\Gamma,t} \rho_2, h_2}
$$

NOTE

\n
$$
\begin{array}{ll}\n\text{RET} & & \text{RET} \\
\mathbf{p}_1, h_1 \equiv_{\Gamma, t} \mathbf{p}_2, h_2 & & h_1 \equiv_{\beta_r, t} v_2, h_2 \\
\mathbf{p}_1, h_1 \equiv_{\Gamma, \beta_r, t} \mathbf{p}_2, h_2 & & \mathbf{v}_1, h_1 \equiv_{\Gamma, \beta_r, t} v_2, h_2\n\end{array}
$$

#### Figure A.5: Type interpretation.

2. For all heaps  $h_1$   $h_2$   $h'_1$  $\frac{1}{1} h_2'$  $\frac{1}{2}$  such that  $h'_1 \equiv_{t'} h'_2$  $\varphi_2$ ,  $\rho_1$ ,  $h_1 ≡_{β,t} ρ_2$ ,  $h_2 ⇒ ρ_1$ ,  $h'_1 ≡_{β,t} ρ_2$ ,  $h'_2$ 2

*Proof.* We prove (1) by induction on  $v_1$ ,  $h_1 \equiv_{\beta,t} v_2$ ,  $h_2$ . The only interesting case is for REF, which follows directly from definitions of  $t \sqsubseteq t'$ . (2) is a direct consequence of (1).  $\Box$ 

<span id="page-161-0"></span>**Lemma A.3.3** (Reference creation). *If*  $h_1 \equiv_t h_2$  *and* 

 $\text{FRESH}(h_1, v_1) = (p_1, h_1')$  $q_1$ ) *and* FRESH $(h_2, v_2) = (p_2, h_2)$  $v_1$  and  $v_1 ≡_{b,t} v_2$  then  $p_1 = p_2$  and  $h'_1 \equiv m[p_1 \leftarrow \beta] h'_2$ 2 *.*

*Proof.*  $h_1 \equiv_t h_2$  implies  $h_1 \approx h_2$ , so  $p_1 = p_2$  and  $h'_1 \approx h'_2$ 2. It remains to prove  $\forall p \beta', m[p_1 \leftarrow$  $\beta|(p) = \beta' \Rightarrow h'_1$  $\eta_1'(p) \equiv_{\beta',t} h_2'$  $\mathcal{L}_2(p)$ . If  $p = p_1$  then  $m[p_1 \leftarrow \beta](p) = \beta$  and  $h'_i(p) = v_i$  and we have  $\nu_1 \equiv_{\beta,t} \nu_2$  by hypothesis. Else  $p \neq p_1$  and  $m[p_1 \leftarrow \beta](p) = m(p)$  and  $h'_i(p) = h_i(p)$  and the property follows from  $h_1 \equiv_t h_2$ .  $\Box$ 

We now prove that typeable expressions and statements are constant-time.

<span id="page-161-1"></span>Lemma A.3.4 (Typing constant-time, expressions).

$$
\left\{\n\begin{aligned}\n\rho_1, h_1 &\equiv_{\Gamma, t} \rho_2, h_2 \\
\Gamma \vdash e : \beta \\
(e, \rho_1, h_1) \xrightarrow{\Psi_1} (\nu_1, h'_1) \\
(e, \rho_2, h_2) \xrightarrow{\Psi_2} (\nu_2, h'_2)\n\end{aligned}\n\right\}\n\Rightarrow \exists t', \n\left\{\n\begin{aligned}\nt &\sqsubseteq t' \\
h'_1 &\equiv_{t'} h'_2 \\
\psi_1 &\equiv_{\beta, t'} \nu_2 \\
v_1 &\equiv_{\beta, t'} \nu_2\n\end{aligned}\n\right.
$$

*Proof.* By induction on  $\Gamma \vdash e : \beta$ . We do only the interesting cases:

 $\blacktriangleright$  Case  $e = e_1[e_2]$ , we have

$$
(e_1, \rho_1, h_1) \xrightarrow{w'_1} ([w_1; \dots; w_{k_1}], h''_1)
$$
  
\n
$$
(e_1, \rho_2, h_2) \xrightarrow{w'_2} ([w'_1; \dots; w'_{k_2}], h''_2)
$$
  
\n
$$
(e_2, \rho_1, h'_1) \xrightarrow{w''_1} (n_1, h'_1)
$$
  
\n
$$
(e_2, \rho_2, h'_2) \xrightarrow{w'_2} (n_2, h'_2)
$$
  
\n
$$
v_1 = w_{n_1} \qquad v_2 = w'_{n_2}
$$
  
\n
$$
\psi_1 = \psi'_1 + \psi''_1 + \text{ARR}[n_1]
$$
  
\n
$$
\psi_2 = \psi'_2 + \psi''_2 + \text{ARR}[n_2]
$$
  
\n
$$
\Gamma \vdash e_1 : \text{ARR}[\beta, e_{len}]
$$
  
\n
$$
\Gamma \vdash e_2 : \text{UINT}^{\text{PUB}}
$$

By induction hypothesis on  $e_1$  there exists  $t'$  such that

$$
t \sqsubseteq t'' \qquad h_1'' \equiv_{t''} h_2'' \qquad \psi_1' = \psi_2'
$$

$$
[w_1; \dots; w_{k_1}] \equiv_{\text{ARR}[\beta, e_{len}], t''} [w_1'; \dots; w_{k_2}']
$$

By lemma [A.3.2,](#page-159-0) we have  $\rho_1$ ,  $h_1'' \equiv_{\Gamma, t''} \rho_2$ ,  $h_2''$  $\frac{1}{2}$  and we can apply the induction hypothesis on *e*<sup>2</sup> to get:

$$
t'' \sqsubseteq t' \qquad h'_1 \equiv_{t'} h'_2 \qquad \psi_1'' = \psi_2''
$$

$$
n_1 \equiv_{\text{UINT}^{\text{PUB}}, t'} n_2
$$

So  $n_1 = n_2$  and by lemma [A.3.2](#page-159-0) we get

$$
[w_1; \ldots; w_{k_1}] \equiv_{\text{ARR}[\beta, e_{len}], t'} [w'_1; \ldots; w'_{k_2}]
$$

which allows to conclude  $v_1 \equiv_{\beta,t'} v_2$ . We conclude by using  $t'$  as witness.

 $\blacktriangleright$  Case  $e = \text{REF}[e']$ , we have

$$
(e', \rho_1, h_1) \xrightarrow{\psi_1} (v'_1, h''_1)
$$
  
\n
$$
(e', \rho_2, h_2) \xrightarrow{\psi_2} (v'_2, h''_2)
$$
  
\nFRESH $(h''_1, v'_1) = (p_1, h'_1)$   
\nFRESH $(h''_2, v'_2) = (p_2, h'_2)$   
\n $v_1 = p_1 \qquad v_2 = p_2$   
\n $\Gamma \vdash e' : \beta' \qquad \beta = REF_{RW}[\beta']$ 

By induction hypothesis on  $e'$  we get

$$
t \sqsubseteq t'' \qquad h_1'' \equiv_{t''} h_2'' \qquad \psi_1' = \psi_2'
$$

$$
v_1' \equiv_{\beta',t''} v_2'
$$

We can conclude the proof by using

$$
t' = t''[p_1 \leftarrow \beta']
$$

and use lemma [A.3.3.](#page-161-0)

 $\hfill \square$ 

Lemma A.3.5 (Typing constant-time: statements).

$$
\left\{\n\begin{aligned}\n\rho_1, h_1 &\equiv_{\Gamma, t} \rho_2, h_2 \\
\beta_r &\vdash S : \Gamma \to \Gamma' \\
(S, \rho_1, h_1) \xrightarrow{\psi_1} (v_1, h'_1) \\
(S, \rho_2, h_2) \xrightarrow{\psi_2} (v_2, h'_2)\n\end{aligned}\n\right\}\n\right\}\n\right\}\n\right\}\n\right\neq \exists t',\n\left\{\n\begin{aligned}\nt &\sqsubseteq t' \\
\psi_1 &\equiv \psi_2 \\
v_1, h_1 &\equiv_{\Gamma, \beta_r, t} v_2, h_2\n\end{aligned}\n\right.
$$

*Proof.* By induction on  $(S, \rho_1, h_1) \xrightarrow{\psi_1} (v_1, h_1')$  $'_{1}).$ 

- ▶ Cases SEQ-RET, SEQ-NORET, and BLOCK are trivial.
- $\triangleright$  Cases VARDEC, ASSIGN, RETURN, IF and FN-CALL follow from lemmas [A.3.4](#page-161-1) and [A.3.2.](#page-159-0)
- $\blacktriangleright$  The last case, FOR, is almost a direct consequence of the induction hypothesis, the only difficulty being to prove that the statement is well-typed:

$$
\beta_r \vdash \text{if } *_\ell n_1 < n_2 \text{ then } \{ \{ i = n_1; S \};
$$
\n
$$
\text{for } *_\ell i = n_1 + 1 \text{ to } n_2 \text{ DO } S \} : \Gamma \to \Gamma
$$

 $\Box$ 

*Preservation of semantics.* Finally, we prove that branch removal preserves the semantics of programs. The proof is performed in two steps. First, we show that if the value of the control predicate is false then the code does not modify the initial heap; it can only create fresh references.

Lemma A.3.6. *Let m a partial mapping on pointers. Assume that p is not trivially true (i.e., p is not the literal*  $t \text{ true}$ *) and*  $\Phi$ *,*  $p \vdash S \rightarrow S'$  *and*  $h \simeq_m h_1'$  $\int_1'$  and  $\rho \simeq_m \rho_1'$  $\int_1^t$  and  $\text{Sec}, \beta_r \vdash S : \Gamma \to \Gamma'$  $and (S', \rho'_1)$  $'_{1}, h'_{1}$  $y'_1$   $\rightarrow$   $(y', h'_2)$  $J_2'$ ) (*i.e., S' is safe*). *If*  $(p, p', h'_1)$  $\mathcal{H}_1'$ )  $\rightarrow$  false then  $h \simeq_m h_2'$ 2 *and there exists*  $\rho_2'$  $\frac{1}{2}$  such that  $v' = \rho_2'$  $\frac{1}{2}$  and  $\rho \simeq_m \rho_2'$ 2 *.*

*Furthermore, assume that*  $p$  *is not trivially true and*  $\omega \vdash F \rightarrow F'$  *and*  $\omega(f) = \text{SEC}$  *and*  $h \simeq_m h_1'$  $\mathcal{F}_{1}$  and F is well typed and  $(F',(\vec{v'},\text{false}),h_{1}')$  $y'_{1}) \rightarrow (v', h'_{2})$  $\binom{1}{2}$ . Then  $h \simeq_m h_2'$ 2 *.*

*Proof.* By mutual induction on *S* and *F*. The key point of the proof is to notice that if *p* is not trivially true then the *pc* used for type-checking is necessarily SEC, so there is no return statement in  $S'$ .  $\Box$ 

We now prove that if the control predicate evaluates to true then the semantics of statements and functions are preserved.

**Lemma A.3.7.** Let m be a partial mapping on pointers. Assume  $\Phi$ ,  $p \vdash S \rightarrow S'$  and  $h_1 \simeq_m$  $h_1'$  $\eta_1'$  and  $\rho_1 \simeq_m \rho_1'$  $\int_{1}^{t}$  and  $pc, \beta_r \vdash S : \Gamma \rightarrow \Gamma'$  and  $pc = (if p = true then \text{SEC else} \text{ PUB})$  and  $(S, \rho_1, h_1) \rightarrow (v, h_2)$  *and*  $(S', \rho_1')$  $'_{1}, h'_{1}$  $y'_1$   $\rightarrow$   $(y', h'_2)$  $J_2'$  (*i.e.*, *S' is safe*). If  $(p, p', h'_1)$  $t_{1}^{\prime}) \rightarrow$  true then *there exists m' such that*  $m \sqsubseteq m'$  *and*  $h_2 \simeq_{m'} h_2'$  $\frac{1}{2}$  and  $v \simeq_{m'} v'$ .

*Furthermore, assume that*  $\omega \vdash F \rightarrow F'$  *and F is well typed and*  $(F, \vec{v}, h_1) \rightarrow (v, h_2)$  *and*  $h_1 \simeq_m h'_1$  $y'_1$  and  $\vec{v} \simeq_m \vec{v}'$ . If  $\omega(f) = \text{PUB}$  and  $(F', \vec{v}', h'_1)$  $y'_{1}) \rightarrow (v', h'_{2})$  $\mathcal{L}_2$ ) then there exists m<sup> $\prime$ </sup> such that  $m \sqsubseteq m'$  and  $h_2 \simeq_{m'} h_2'$  $\mathcal{L}_2'$  and  $v \simeq_{m'} v'$ . Else, if  $\omega(f) = \text{SEC}$  and  $(F', (\vec{v'}, \text{true}), h'_1)$  $y'_{1}) \rightarrow (v', h'_{2})$ 2 ) *then there exists m' such that*  $m \sqsubseteq m'$  *and*  $h_2 \simeq_{m'} h_2'$  $\frac{1}{2}$  and  $v \simeq_{m'} v'$ .

*Proof.* By mutual induction on *S* and *F*.

 $\Box$ 

# Appendix B

# Pitchfork: Full proofs

### B.1 Consistency

**Lemma B.1.1** (Determinism). *If*  $C \xrightarrow[d]{o'} C'$  *and*  $C \xrightarrow[d]{o''} C''$  *then*  $C' = C''$  *and*  $o' = o''$ .

*Proof.* The tuple  $(C, d)$  fully determines which rule of the semantics can be executed.  $\Box$ 

Definition B.1.2 (Initial/terminal configuration). *A configuration C is an* initial *(or* terminal*) configuration if*  $|C.buf| = 0$ *.* 

Definition B.1.3 (Sequential schedule). *Given a configurationC, we say a schedule D is* sequential *if every instruction that is fetched is executed and retired before further instructions are fetched.*

**Definition B.1.4** (Sequential execution).  $C_{Q} \psi_{D}^{N} C'$  *is a sequential execution if C is an initial configuration, D is a sequential schedule for C, and C*<sup>0</sup> *is a terminal configuration.*

We write  $C_0 \cup_{seq}^N C'$  if we execute sequentially.

<span id="page-166-0"></span>**Lemma B.1.5** (Sequential equivalence). *If*  $C_{O_1} \psi_{D_1}^N C_1$  *is sequential and*  $C_{O_2} \psi_{D_2}^N C_2$  *is sequential, then*  $C_1 = C_2$ *.* 

*Proof.* Suppose  $N = 0$ . Then neither  $D_1$  nor  $D_2$  may contain any retire directives. Since we assume that both  $C_1$  *buf* and  $C_2$  *buf* have size 0, neither  $D_1$  nor  $D_2$  may contain any fetch directives. Therefore, both  $D_1$  and  $D_2$  are empty; both  $C_1$  and  $C_2$  are equal to  $C$ .

We proceed by induction on *N*.

Let  $D_1'$  $I_1'$  be a sequential prefix of  $D_1$  up to the  $N-1$  th retire, and let  $D_1''$  $\frac{1}{1}$  be the remainder of *D*<sub>1</sub>. That is, #{ $d \in D_1'$  $\binom{1}{1}$   $d$  = retire} =  $N-1$  and  $D_1'$  $\binom{n}{1}$   $D_1'' = D_1$ . Let  $D_2'$  $\frac{1}{2}$  and  $D_2^{\prime\prime}$  $\frac{1}{2}$  be similarly defined.

By our induction hypothesis, we know  $C_{O_1'} \psi_{D_1'}^{N-1}$  $D_1^{N-1}$  *C'* and  $C_{O_2^{\prime}} \psi_{D_2^{\prime}}^{N-1}$  $D_2^{N-1}$ *C*' for some *C*'. Since  $D_1^\prime$  $\frac{1}{1}$  (resp.  $D_2'$  $Z_2$ ) is sequential and  $|C'.\text{buf}| = 0$ , the first directive in  $D_1''$  $I_1''$  (resp.  $D_2''$ )  $2'$ ) must be a fetch directive. Furthermore,  $C'_{O_1^{\prime\prime}}\Downarrow_{D_1^{\prime\prime}}^1 C_1$  and  $C'_{O_2^{\prime\prime}}\Downarrow_{D_2^{\prime\prime}}^1 C_2$ .

We can now proceed by cases on  $C'.\mu[C'.n]$ , the final instruction to be fetched.

- $\triangleright$  For op, the only valid sequence of directives is (fetch, execute *i*, retire) where *i* is the sole valid index in the buffer. Similarly for fence, with the sequence  $\{$  fetch, retire $\}$ .
- $\triangleright$  For load, alias prediction is not possible, as no prior stores exist in the buffer. Therefore, just as with op, the only valid sequence of directives is (fetch, execute *i*, retire).
- $\triangleright$  For store, the only possible difference between  $D_1''$  $\frac{1}{1}$  and  $D_2''$  $\frac{y}{2}$  is the ordering of the execute *i* : value and execute  $i$ : addr directives. However, both orderings will result in the same configuration since they independently resolve the components of the store.
- $\blacktriangleright$  For br,  $D_1''$  $\frac{1}{1}$  and  $D_2^{\prime\prime}$  may have different guesses for their initial fetch directives. However, both COND-EXECUTE-CORRECT and COND-EXECUTE-INCORRECT will result in the same configuration regardless of the initial guess, as the br is the only instruction in the buffer. Similarly for jmpi.
- $\triangleright$  For call and ret, the ordering of execution of the resulting transient instructions does not affect the final configuration.

Thus for all cases we have  $C_1 = C_2$ .

 $\Box$ 

To make our discussion easier, we will say that a directive *d applies to* a buffer index *i* if when executing a step  $C \xrightarrow[d]{O'} C'$ :

- $\blacktriangleright$  *d* is a fetch directive, and would fetch an instruction into index *i* in *buf*.
- $\triangleright$  *d* is an execute directive, and would execute the instruction at index *i* in *buf*.
- $\blacktriangleright$  *d* is a retire directive, and would retire the instruction at index *i* in *buf*.

We would like to reason about schedules that do not contain *misspeculated steps*, i.e., directives that are superfluous due to their effects getting wiped away by rollbacks.

<span id="page-168-1"></span>**Definition B.1.6** (Misspeculated steps). *Given an execution*  $C_{\mathcal{O}} \Downarrow_{D}^{N} C'$ , we say that D contains misspeculated steps *if there exists d*  $\in$  *D such that D'*  $=$  *D*  $\setminus$  *d and C*  $_{O}\Downarrow_{D'}^N$  *C''*  $=$  *C'*.

Given an execution  $C_0 \psi_D^N C'$  that may contain rollbacks, we can create an alternate schedule  $D^*$  without any rollbacks by removing all misspeculated steps. Note that sequential schedules have no misspeculated steps<sup>[1](#page-168-0)</sup> as defined in Definition [B.1.6.](#page-168-1)

<span id="page-168-2"></span>Theorem B.1.7 (Equivalence to sequential execution). *Let C be an initial configuration and D* a well-formed schedule for C. If  $C_{O_1}\!\!\downarrow^N_D\!C_1$ , then  $C_{O_2}\!\!\downarrow^N_{seq}C_2$  and  $C_1\approx C_2$ . Furthermore, if  $C_1$  is *terminal then*  $C_1 = C_2$ *.* 

*Proof.* Since we can always remove all misspeculated steps from any well-formed execution without affecting the final configuration, we assume  $D_1$  has no misspeculated steps.

Suppose  $N = 0$ . Then the theorem is trivially true. We proceed by induction on N.

Let  $D_1'$  $\frac{1}{1}$  be the subsequence of  $D_1$  containing the first  $N-1$  retire directives and the directives that apply to the same indices of the first  $N-1$  retire directives. Let  $D_1''$  $\frac{1}{1}$  be the complement of  $D_1'$  with respect to  $D_1$ . All directives in  $D_1''$  $\frac{1}{1}$  apply to indices later than any directive

<span id="page-168-0"></span><sup>&</sup>lt;sup>1</sup>Sequential schedules may still misspeculate on conditional branches but the rollback does not imply removal of any reorder buffer instructions as defined in Definition [B.1.6.](#page-168-1)

in  $D_1'$  $\frac{1}{1}$ , and thus cannot affect the execution of directives in  $D_1'$  $\frac{1}{1}$ . Thus  $D_1'$  $\frac{1}{1}$  is a well-formed schedule and produces execution  $C_{O_1'} \downarrow_{D_1'}^{N-1}$  $\frac{N-1}{D_1^\prime}$   $C_1^\prime$  $\frac{1}{1}$ .

Since  $D_1$  contains no misspeculated steps, the directives in  $D_1''$  $\frac{1}{1}$  can be reordered after the directives in  $D_1'$ <sup> $\frac{1}{1}$ </sup>. Thus  $D_1^{\prime\prime}$  $\frac{1}{1}$  is a well-formed schedule for  $C_1'$ <sup> $\alpha'_{1}$ </sup>, producing execution  $C'_{1}$   $_{O''_{1}}\Downarrow_{D''_{1}}^{1} C''_{1}$ 1 with  $C_1'' \approx C_1$ . If  $C_1$  is terminal, then  $C_1''$  $C_1''$  is also terminal and  $C_1'' = C_1$ .

By our induction hypothesis, we know there exists  $D'_{seq}$  such that  $C_{O'_2} \psi_{D'_{seq}}^{N-1}$  $\frac{N-1}{D'_{seq}} C_2'$  $2^{\prime}$ . Since  $D_1^{\prime}$  $\frac{1}{1}$  contains equal numbers of fetch and retire directives, ends with a retire, and contains no misspeculated steps,  $C_1'$  $C_1'$  is terminal. Thus  $C_1' = C_2'$  $\frac{1}{2}$ .

Let  $D''_{seq}$  be the subsequence of  $D''_1$  $\frac{1}{1}$  containing the retire directive in  $D_1''$  $\frac{1}{1}$  and the directives that apply to the same index.  $D''_{seq}$  is sequential with respect to  $C'_1$  $\frac{1}{1}$  and produces execution  $C'_{1}$   $_{O''_{2}}\Downarrow_{D''_{seq}}^{1} C''_{2}$  with  $C''_{2} \approx C''_{1} \approx C_{1}$ . If  $C''_{1}$  $D''_1$  is terminal, then  $D''_{seq} = D''_1$  $C_1''$  and thus  $C_2'' = C_1'' = C_1$ .

Let  $D_{seq} = D'_{seq} || D''_{seq}$ .  $D_{seq}$  is thus itself sequential and produces execution  $C_{(O_2'||O_2'')}\mathcal{V}_{seq}^N C_2''$  $2^{\prime\prime}$ , completing our proof.  $\Box$ 

**Corollary B.1.8** (General consistency). Let *C* be an initial configuration. If  $C_{O_1} \psi_{D_1}^N C_1$  and  $C_{O_2} \Downarrow_{D_2}^N C_2$ , then  $C_1 \approx C_2$ . Furthermore, if  $C_1$  and  $C_2$  are both terminal then  $C_1 = C_2$ .

*Proof.* By [Theorem B.1.7,](#page-168-2) there exists  $D'_{seq}$  such that executing with *C* produces  $C'_1 \approx C_1$ (resp.  $C'_1 = C_1$ ). Similarly, there exists  $D''_{seq}$  that produces  $C'_2 \approx C_2$  (resp.  $C'_2 = C_2$ ). By [Lemma B.1.5,](#page-166-0) we have  $C_1' = C_2'$  $C_2'$ . Thus  $C_1 \approx C_2$  (resp.  $C_1 = C_2$ ).  $\Box$ 

### B.2 Security

**Theorem B.2.1** (Label stability). Let  $\ell$  be a label in the lattice  $\mathscr{L}$ . If  $C_{O_1}\Downarrow_{D_1}^N C_1$  and  $\forall o \in O_1$ :  $\ell \notin o$ , then  $C_{O_2} \psi_{seq}^N C_2$  and  $\forall o \in O_2 : \ell \notin o$ .

*Proof.* Let *D* ∗  $_1^*$  be the schedule given by removing all misspeculated steps from  $D_1$ . The corresponding trace *O* ∗ <sup>\*</sup><sub>1</sub> is a subsequence of  $O_1$ , and hence  $\forall o \in O_1^*$  $j_1^* : \ell \notin \mathcal{O}$ . We thus proceed assuming that execution of  $D_1$  contains no misspeculated steps.

Our proof closely follows that of [Theorem B.1.7.](#page-168-2) When constructing *D*<sup>1</sup>  $\frac{1}{1}$  and  $D_1''$  $\binom{n}{1}$  from  $D_1$ in the inductive step, we know that all directives in  $D_1''$  $\frac{1}{1}$  apply to indices later than any directive in  $D_1^{\prime}$  $\eta_1$ , and cannot affect execution of any directive in  $D_1'$  $\eta'$ . This implies that  $O'_1$  $\frac{1}{1}$  is the subsequence of  $O_1$  that corresponds to the mapping of  $D_1'$  $\frac{1}{1}$  to  $D_1$ .

Reordering the directives in  $D_1''$  $\frac{1}{1}$  after  $D_1'$  $\frac{1}{1}$  do not affect the observations produced by most directives. The exceptions to this are execute directives for load instructions that would have received a forwarded value: after reordering, the store instruction they forwarded from may have been retired, and they must fetch their value from memory. However, even in this case, the address  $a_{\ell_a}$  attached to the observation does not change. Thus  $\forall o \in O_2^{\prime\prime}$  $''_2$ :  $\ell \notin o$ .

Continuing the proof as in [Theorem B.1.7,](#page-168-2) we create schedule  $D'_{seq}$  (with trace  $O'_2$  $'_{2}$ ) from the induction hypothesis and  $D''_{seq}$  (with trace  $O''_2$ )  $\binom{n}{2}$  as the subsequence of  $D_1''$  $\frac{1}{1}$  of directives applying to the remaining instruction to be retired. As noted before, executing the subsequence of a schedule produces the corresponding subsequence of the original trace; hence  $\forall o \in O''_2$  $''_2$ :  $\ell \notin o$ .

The trace of the final (sequential) schedule  $D_{seq} = D'_{seq} || D''_{seq}$  is  $O'_2$  $\frac{1}{2}$ ||O''<sub>2</sub>  $\frac{1}{2}$ . Since  $O_2'$  $\frac{1}{2}$  satisfies the label stability property via the induction hypothesis, we have  $\forall o \in O_2^{\prime}$  $\frac{1}{2}$ ||O'<sub>2</sub>  $''_2$ :  $\ell \notin o$ .

 $\Box$ 

By letting  $\ell$  be the label secret, we get the following corollary:

<span id="page-170-0"></span>Corollary B.2.2 (Secrecy). *If speculative execution of C under schedule D produces a trace O that contains no secret labels, then sequential execution of C will never produce a trace that contains any secret labels.*

With this, we can prove the following proposition:

Proposition B.2.3. *For a given initial configuration C and well-formed schedule D, if C is SCT with respect to D, and execution of C with D results in a terminal configuration C*1*, then C is also sequentially constant-time.*

*Proof.* Since *C* is SCT, we know that for all  $C' \simeq_{\text{pub}} C$ , we have  $C_0 \psi_D^N C_1$  and  $C'_0 \psi_D^N C'_1$  where  $C_1 \simeq_{\text{pub}} C'_1$  $\frac{1}{1}$  and  $O = O'$ . By [Theorem B.1.7,](#page-168-2) we know there exist sequential executions such that  $C_{O_{seq}}\Downarrow_{seq}^N C_2$  and  $C^{\prime}_{O_{seq}^{\prime}}\Downarrow_{seq}^N C_2^{\prime}$  $\frac{1}{2}$ . Note that the two sequential schedules need not be the same.

 $C_1$  is terminal by hypothesis. Execution of  $C'$  uses the same schedule *D*, so  $C'_1$  $i_1$  is also terminal. Since we have  $C_1 = C_2$  and  $C'_1 = C'_2$ <sup>2</sup>, we can lift  $C_1 \simeq_{\text{pub}} C_1'$ <sup> $\frac{1}{1}$ </sup> to get  $C_2 \simeq_{\text{pub}} C_2'$  $\frac{1}{2}$ .

To prove the trace property  $O_{seq} = O'_{seq}$ , we note that if  $O_{seq} \neq O'_{seq}$ , then since  $C_2 \simeq_{pub} C'_2$  $\frac{1}{2}$ , it must be the case that there exists some  $o \in O_{seq}$  such that secret  $\in O_{seq}$ . Since this is also true for *O* and *O'*, we know that there exist no observations in either *O* or *O'* that contain secret labels. By [Corollary B.2.2,](#page-170-0) it follows that no secret labels appear in either  $O_{seq}$  or  $O'_{seq}$ , and thus  $O_{seq} = O'_{seq}$ .  $\Box$ 

### B.3 Soundness of Pitchfork

Definition B.3.1 (Affecting an index). *We say a directive d* affects an index *i if:*

- ▶ *d is a fetch-type directive and would produce a new mapping in buf at index i.*
- $\blacktriangleright$  *d* is an execute-type directive and specifies index i directly (e.g., execute i).
- ▶ *d is a retire directive and would cause the instruction at i in buf to be removed.*

Definition B.3.2 (Path function). *The function Path*(*C*,*D*) *produces the sequence of branch choice (from fetching br instructions) and store-forwarding information (when executing load instructions) when executing D with initial configuration C. That is, for a schedule D without* *misspeculated steps:*

$$
Path(C, \emptyset) = []
$$
\n
$$
Path(C, D); (i, b), \quad d = fetch:b
$$
\n
$$
Path(C, D||d) = \n\begin{cases}\nPath(C, D); (i, j), & d \text{ produces } v_{\ell} \{j, a\} \\
Path(C, D); (i, \perp), & d \text{ produces } v_{\ell} \{\perp, a\} \\
Path(C, D), & otherwise\n\end{cases}
$$

*where d* affects index i. If D has misspeculated steps, then  $Path(C, D) = Path(C, D^*)$  where  $D^*$  is *the subset of D with misspeculated steps removed. We write simply Path*(*D*) *when C is obvious.*

For the [Lemmas B.3.3,](#page-172-0) [B.3.5](#page-173-0) and [B.3.6,](#page-174-0) we start with the following shared assumptions:

- $\triangleright$  *C* is an initial configuration.
- $\blacktriangleright$  *D*<sub>1</sub> and *D*<sub>2</sub> are nonempty schedules.
- $\blacktriangleright$   $C_{D_1} \Downarrow_{O_1} C_1$  and  $C_{D_2} \Downarrow_{O_2} C_2$ .
- $\blacktriangleright$   $Path(C, D_1) = Path(C, D_2).$
- $D_1 = D'_1$  $\binom{1}{1}$  and  $D_2 = D'_2$  $\chi_2'$  ||d<sub>2</sub> and  $d_1 = d_2$ .
- $\blacktriangleright$  *d*<sub>1</sub> and *d*<sub>2</sub> affect the same index *i* in the their respective reorder buffers.

<span id="page-172-0"></span>Let  $o_1$  (resp.  $o_2$ ) be the observation produced during execution of  $d_1$  (resp.  $d_2$ ).

**Lemma B.3.3** (Fetch). If  $d_1$  and  $d_2$  are both fetch-type directives, then  $C_1.n = C_2.n$  and  $C_1.buf[i] = C_2.buf[i].$ 

*Proof.* Since fetches happen in-order, the index *i* of a given physical instruction along a control flow path is deterministic. Both  $D_1$  and  $D_2$  both have the same (control flow) path. Since by hypothesis both  $d_1$  and  $d_2$  affect the same index *i*,  $d_1$  and  $d_2$  must necessarily both be fetching the same physical instruction. Furthermore, since  $Path(D_1) = Path(D_2)$ , if the fetched instruction is a br instruction, then both  $d_1$  and  $d_2$  must have made the same guess. The lemma statements all  $\Box$ hold accordingly.

<span id="page-173-1"></span>Corollary B.3.4. *If D* ∗  $^{*}_{1}$  *and*  $D^{*}_{2}$  $\frac{k}{2}$  are nonempty schedules such that  $C_{D_1^*}\Downarrow C_1^*$  $C_1^*$  and  $C_{D_2^*}\downarrow C_2^*$  $2^*$  and  $Path(C, D_1^*$  $P_1^*$ ) = *Path*(*C*,*D*<sup>\*</sup><sub>2</sub>)  $\binom{1}{2}$ *, then: For any*  $i \in C_1^*$ <sup>\*</sup> $, buf, if i \in C_2^*$  $\int_2^*$ *buf, then both*  $C_1^*$  $i_1^*$ *.buf*[*i*] and *C* ∗ 2 .*buf*[*i*] *were derived from the same physical instruction.*

*Proof.* Let  $D_1$  be the prefix of  $D_1^*$  $\frac{1}{1}$  such that the final directive in  $D_1$  is the latest fetch that affects *i*. Let  $D_2$  be similarly defined w.r.t.  $D_2^*$  $2^*$ . Then by [Lemma B.3.3,](#page-172-0)  $D_1$  and  $D_2$  both fetch the same physical instruction to index *i*.  $\Box$ 

<span id="page-173-0"></span>**Lemma B.3.5.** *If*  $d_1$  *and*  $d_2$  *are both* **execute**-type directives, then  $C_1$ .buf[i] =  $C_2$ .buf[i] *and*  $o_1 = o_2.$ 

*Proof.* We proceed by full induction on the size of  $D_1$ .

For the base case: if  $|D_1| = 1$ , then the lemma statements are trivial regardless of the directive  $d_1$ .

We know from [Corollary B.3.4](#page-173-1) that since  $d_1$  and  $d_2$  both affect the same index *i*, the two transient instruction must be derived from the same physical instruction, and thus has the same register dependencies. For each register dependency *r*, if the register was calculated by a transient instruction at a prior index *j*, we can create prefixes  $D_{1,j}$  and  $D_{2,j}$  of  $D_1$  and  $D_2$  respectively that end at the execute directive that resolves *r* at buffer index *j*. By our induction hypothesis, both  $D_{1,j}$  and  $D_{2,j}$  calculate the same value  $v_\ell$  for *r*.

We now proceed by cases on the transient instruction being executed.

*Op, Store (value).* Since all dependencies calculate the same values, both instructions calculate the same value.

*Store (address).* Both instructions calculate the same address. Since  $Path(D_1)$  = *Path*( $D_2$ ), both schedules have the same pattern of store-forwarding behavior. Thus execution of  $d_1$  causes a hazard if and only if  $d_2$  causes a hazard.

*Load.* Both instructions calculate the same address, producing the same observations  $o_1$ and  $o_2$ . Since  $Path(D_1) = Path(D_2)$ , either  $d_1$  and  $d_2$  cause the values to be retrieved from the same prior stores, or they both load values from the same address in memory. By our induction hypothesis, these values will be the same, so both instructions will resolve to the same value.

*Branch.* Both instructions calculate the same branch condition, producing the same observations  $o_1$  and  $o_2$ . Since  $Path(D_1) = Path(D_2)$ , execution of  $d_1$  causes a misspeculation  $\Box$ hazard if and only if  $d_2$  also causes misspeculation hazard.

<span id="page-174-0"></span>**Lemma B.3.6.** *If d*<sub>1</sub> *and d*<sub>2</sub> *are both retire directives, then*  $o_1 = o_2$ *.* 

*Proof.* From [Lemmas B.3.3](#page-172-0) and [B.3.5](#page-173-0) we know that for both  $d_1$  and  $d_2$ , the transient instructions to be retired are the same. Thus the produced observations  $o_1$  and  $o_2$  are also the same.  $\Box$ 

We now formally define the set of schedules examined by Pitchfork:

<span id="page-174-1"></span>Definition B.3.7 (Tool schedules). *Given an initial configuration C and a speculative window size n, we define the set of tool schedules*  $D_T(n)$  *recursively as follows: The empty schedule* 0 *is in*  $D_T(n)$ *. If*  $D_0 \in D_T(n)$  *and*  $C_{D_0} \cup C_0$  *and*  $|C_0.buf| < n$ *, then based on the next instruction to be fetched (and where i is the index of the fetched instruction):*

- ▶ op:  $D_0$  || fetch; execute  $i \in D_T(n)$ .
- $\blacktriangleright$  *load: D*<sub>0</sub> $\parallel$ *fetch*; *execute i* ∈ *D*<sub>*T*</sub>(*n*)*.*
- ▶ *store:*  $D_0$  $\parallel$  *fetch; execute i : value*  $\in$   $D_T(n)$  *and*  $D_0$ ||fetch; execute *i* : *value*; execute *i* : *addr*  $\in$   $D_T(n)$ .

 $\triangleright$  br: Let *b* be the "correct" path for the branch condition. Then  $D_0$  fetch: *b*; execute *i* ∈  $D_T(n)$  *and D*<sub>0</sub>*lfetch:*  $\neg b \in D_T(n)$ *.* 

*Otherwise, if*  $|C_0.buf| = n$ , then we instead extend based on the oldest instruction in the reorder *buffer. If the oldest instruction is a store with an unresolved address, and will not cause a hazard, then*  $D_0$  execute *i* : addr; retire  $\in D_T(n)$ . Otherwise, if the oldest instruction is fully resolved, *then*  $D_0$ *\| retire*  $\in$   $D_T(n)$ *.* 

<span id="page-175-0"></span>Proposition B.3.8 (Path coverage). *If D*<sup>1</sup> *is a well-formed schedule for C whose reorder buffer never grows beyond size n, then*  $\exists D_2 : Path(D_1) = Path(D_2) \land D_2 \in D_T(n)$ .

*Proof.* The proof stems directly from the definition of  $D_T(n)$ ; at every branch, both branches are added to the set of schedules, and every load is able to "skip" any combination of prior stores.  $\Box$ 

Theorem B.3.9 (Soundness of tool). *If speculative execution of C under a schedule D with speculation bound n produces a trace O that contains at least one secret label, then there exists a schedule*  $D_t \in D_T(n)$  *that produces a trace*  $O_t$  *that also contains at least one secret label.* 

*Proof.* We can truncate  $D$  to a schedule  $D^*$  that ends at the first directive to produce a secret observation. By [Proposition B.3.8](#page-175-0) there exists a schedule  $D_0 \in D_T(n)$  such that  $Path(D_t)$  $Path(D^*)$ . By following construction of tool schedules as given in [Definition B.3.7,](#page-174-1) we can find a schedule  $D_t \in D_T(n)$  that satisfies the preconditions for [Lemma B.3.5.](#page-173-0) Then by that same lemma,  $D_t$  produces the same final observation as  $D^*$ , which contains a secret label.  $\Box$ 

# Bibliography

- [1] Johan Agat. Transforming out timing leaks. In *27th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages*. ACM, 2000.
- <span id="page-176-0"></span>[2] Sam Ainsworth and Timothy M Jones. MuonTrap: Preventing cross-domain Spectrelike attacks by capturing speculative state. In *47th Annual International Symposium on Computer Architecture*. ACM/IEEE, 2020.
- [3] Nadhem J. Al Fardan and Kenneth G. Paterson. Lucky thirteen: Breaking the TLS and DTLS record protocols. In *34th IEEE Symposium on Security and Privacy*. IEEE, 2013.
- [4] Jade Alglave, Anthony Fox, Samin Ishtiaq, Magnus O. Myreen, Susmit Sarkar, Peter Sewell, and Francesco Zappa Nardelli. The semantics of power and ARM multiprocessor machine code. In *Proceedings of the 4th Workshop on Declarative Aspects of Multicore Programming*, 2009.
- [5] José Bacelar Almeida, Manuel Barbosa, Gilles Barthe, Arthur Blot, Benjamin Grégoire, Vincent Laporte, Tiago Oliveira, Hugo Pacheco, Benedikt Schmidt, and Pierre-Yves Strub. Jasmin: High-assurance and high-speed cryptography. In *Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security*. ACM, 2017.
- [6] José Bacelar Almeida, Manuel Barbosa, Gilles Barthe, and François Dupressoir. Verifiable side-channel security of cryptographic implementations: Constant-time MEE-CBC. In *Fast Software Encryption*. Springer, 2016.
- [7] José Bacelar Almeida, Manuel Barbosa, Gilles Barthe, François Dupressoir, and Michael Emmi. Verifying constant-time implementations. In *25th USENIX Security Symposium*. USENIX Association, 2016.
- [8] José Bacelar Almeida, Manuel Barbosa, Jorge S. Pinto, and Bárbara Vieira. Formal verification of side-channel countermeasures using self-composition. *Science of Computer Programming*, 2013.
- [9] AMD. Security analysis of AMD predictive store forwarding. [https://www.amd.com/](https://www.amd.com/system/files/documents/security-analysis-predictive-store-forwarding.pdf) [system/files/documents/security-analysis-predictive-store-forwarding.pdf,](https://www.amd.com/system/files/documents/security-analysis-predictive-store-forwarding.pdf) 2020.
- [10] Marc Andrysco, David Kohlbrenner, Keaton Mowery, Ranjit Jhala, Sorin Lerner, and Hovav Shacham. On subnormal floating point and abnormal timing. In *36th IEEE Symposium on Security and Privacy*. IEEE, 2015.
- [11] Andrew W. Appel. Verification of a cryptographic primitive: SHA-256. *ACM Transactions on Programming Languages and Systems*, 2015.
- [12] ARM. Straight-line speculation. [https://developer.arm.com/support/arm-security-updates/](https://developer.arm.com/support/arm-security-updates/speculative-processor-vulnerability/downloads/straight-line-speculation) [speculative-processor-vulnerability/downloads/straight-line-speculation,](https://developer.arm.com/support/arm-security-updates/speculative-processor-vulnerability/downloads/straight-line-speculation) 2020.
- [13] Arm Mbed. mbed TLS. [https://github.com/armmbed/mbedtls,](https://github.com/armmbed/mbedtls) 2018.
- [14] Jean-Philippe Aumasson and Yolan Romailler. Automated testing of crypto software using differential fuzzing. *Black Hat USA*, 2017.
- [15] Gilles Barthe, Gustavo Betarte, Juan Campo, Carlos Luna, and David Pichardie. Systemlevel non-interference for constant-time cryptography. In *Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security*. ACM, 2014.
- [16] Gilles Barthe, Sunjay Cauligi, Benjamin Gregoire, Adrien Koutsos, Kevin Liao, Tiago Oliveira, Swarn Priya, Tamara Rezk, and Peter Schwabe. High-assurance cryptography in the Spectre era. In *IEEE S&P*, 2021.
- [17] Gilles Barthe, Benjamin Grégoire, and Vincent Laporte. Secure compilation of sidechannel countermeasures: the case of cryptographic "constant-time". In *Computer Security Foundations Symposium*, 2018.
- [18] Gilles Barthe, Tamara Rezk, and Martijn Warnier. Preventing timing leaks through transactional branching instructions. *Electronic Notes in Theoretical Computer Science*, 2006.
- [19] Lennart Beringer, Adam Petcher, Q. Ye Katherine, and Andrew W. Appel. Verified correctness and security of openssl hmac. In *24th USENIX Security Symposium*, 2015.
- [20] Daniel J. Bernstein. Cache-timing attacks on AES. Technical report, 2005. [https://cr.yp.to/](https://cr.yp.to/antiforgery/cachetiming-20050414.pdf) [antiforgery/cachetiming-20050414.pdf.](https://cr.yp.to/antiforgery/cachetiming-20050414.pdf)
- [21] Daniel J. Bernstein. The Poly1305-AES message-authentication code. In *Fast Software Encryption*. IACR, 2005.
- [22] Daniel J. Bernstein. Curve25519: New Diffie-Hellman speed records. In *International Workshop on Public Key Cryptography*. Springer, 2006.
- [23] Daniel J. Bernstein. qhasm: Tools to help write high-speed software. [https://cr.yp.to/](https://cr.yp.to/qhasm.html) [qhasm.html,](https://cr.yp.to/qhasm.html) 2007.
- [24] Daniel J. Bernstein. The Salsa20 family of stream ciphers. In *New Stream Cipher Designs*. Springer, 2008.
- [25] Daniel J. Bernstein. Cryptography in NaCl. Technical report, 2009. [http://cr.yp.to/](http://cr.yp.to/highspeed/naclcrypto-20090310.pdf) [highspeed/naclcrypto-20090310.pdf.](http://cr.yp.to/highspeed/naclcrypto-20090310.pdf)
- [26] Daniel J. Bernstein, Tanja Lange, and Peter Schwabe. The security impact of a new cryptographic library. In *International Conference on Cryptology and Information Security in Latin America*. Springer, 2012.
- [27] Karthikeyan Bhargavan, Cédric Fournet, Markulf Kohlweiss, Alfredo Pironti, and Pierre-Yves Strub. Implementing tls with verified cryptographic security. In *2013 IEEE Symposium on Security and Privacy*. IEEE, 2013.
- [28] Atri Bhattacharyya, Andrés Sánchez, Esmaeil M Koruyeh, Nael Abu-Ghazaleh, Chengyu Song, and Mathias Payer. SpecROP: Speculative exploitation of ROP chains. In *23rd International Symposium on Research in Attacks, Intrusions and Defenses*, 2020.
- <span id="page-178-2"></span>[29] Atri Bhattacharyya, Alexandra Sandulescu, Matthias Neugschwandtner, Alessandro Sorniotti, Babak Falsafi, Mathias Payer, and Anil Kurmus. SMoTherSpectre: Exploiting speculative execution through port contention. In *Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security*, 2019.
- [30] Barry Bond, Chris Hawblitzel, Manos Kapritsos, K. Rustan M. Leino, Jacob R. Lorch, Bryan Parno, Ashay Rane, Srinath Setty, and Laure Thompson. Vale: Verifying highperformance cryptographic assembly code. In *26th USENIX Security Symposium*. USENIX Association, 2017.
- [31] Benjamin A. Braun, Suman Jana, and Dan Boneh. Robust and efficient elimination of cache and timing side channels. [https://arxiv.org/abs/1506.00189,](https://arxiv.org/abs/1506.00189) 2015.
- [32] David Brumley and Dan Boneh. Remote timing attacks are practical. *Computer Networks*, 2005.
- <span id="page-178-3"></span>[33] Claudio Canella, Daniel Genkin, Lukas Giner, Daniel Gruss, Moritz Lipp, Marina Minkin, Daniel Moghimi, Frank Piessens, Michael Schwarz, Berk Sunar, Jo Van Bulck, and Yuval Yarom. Fallout: Leaking data on Meltdown-resistant CPUs. In *Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security*. ACM, 2019.
- <span id="page-178-1"></span>[34] Claudio Canella, Sai Manoj Pudukotai Dinakarrao, Daniel Gruss, and Khaled N Khasawneh. Evolution of defenses against transient-execution attacks. In *Great Lakes Symposium on VLSI*, 2020.
- <span id="page-178-0"></span>[35] Claudio Canella, Jo Van Bulck, Michael Schwarz, Moritz Lipp, Benjamin von Berg, Philipp Ortner, Frank Piessens, Dmitry Evtyushkin, and Daniel Gruss. A systematic evaluation of transient execution attacks and defenses. In *28th USENIX Security Symposium*. USENIX Association, 2019.
- [36] Sunjay Cauligi, Craig Disselkoen, Klaus v Gleissenthall, Dean Tullsen, Deian Stefan, Tamara Rezk, and Gilles Barthe. Constant-time foundations for the new Spectre era. In *Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation*, 2020.
- [37] Sunjay Cauligi, Craig Disselkoen, Klaus von Gleissenthall, Dean Tullsen, Deian Stefan, Tamara Rezk, and Gilles Barthe. Towards constant-time foundations for the new Spectre era. [https://arxiv.org/pdf/1910.01755v2.pdf,](https://arxiv.org/pdf/1910.01755v2.pdf) 2019.
- [38] Sunjay Cauligi, Gary Soeller, Fraser Brown, Brian Johannesmeyer, Yunlu Huang, Ranjit Jhala, and Deian Stefan. FaCT: A flexible, constant-time programming language. In *Secure Development Conference*. IEEE, 2017.
- [39] Sunjay Cauligi, Gary Soeller, Brian Johannesmeyer, Fraser Brown, Riad S. Wahby, John Renner, Benjamin Grégoire, Gilles Barthe, Ranjit Jhala, and Deian Stefan. FaCT: A DSL for timing-sensitive computation. [https://fact.programming.systems/FaCT\\_extended.pdf,](https://fact.programming.systems/FaCT_extended.pdf) 2019.
- [40] Sunjay Cauligi, Gary Soeller, Brian Johannesmeyer, Fraser Brown, Riad S. Wahby, John Renner, Benjamin Gregoire, Gilles Barthe, Ranjit Jhala, and Deian Stefan. FaCT: A DSL for timing-sensitive computation. In *40th ACM SIGPLAN Conference on Programming Language Design and Implementation*. ACM, 2019.
- [41] Kevin Cheang, Cameron Rasmussen, Sanjit Seshia, and Pramod Subramanyan. A formal approach to secure speculation. In *2019 IEEE 32nd Computer Security Foundations Symposium*, 2019.
- [42] Tien-Fu Chen and Jean-Loup Baer. Reducing memory latency via non-blocking and prefetching caches. In *5th ACM International Conference on Architectural Support for Programming Languages and Operating Systems*. ACM, 1992.
- [43] David G. Clarke, John M. Potter, and James Noble. Ownership types for flexible alias protection. In *Proceedings of the 13th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications*. ACM, 1998.
- [44] Robert J Colvin and Kirsten Winter. An abstract semantics of speculative execution for reasoning about security vulnerabilities. In *International Symposium on Formal Methods*, 2019.
- [45] Bart Coppens, Ingrid Verbauwhede, Koen De Bosschere, and Bjorn De Sutter. Practical mitigations for timing-based side-channel attacks on modern x86 processors. In *30th IEEE Symposium on Security and Privacy,*. IEEE, 2009.
- [46] Cryptography Coding Standard. Coding rules. [https://cryptocoding.net/index.php/Coding\\_](https://cryptocoding.net/index.php/Coding_rules) [rules,](https://cryptocoding.net/index.php/Coding_rules) 2016.
- [47] Lesly-Ann Daniel, Sebastian Bardin, and Tamara Rezk. Hunting the haunter efficient relational symbolic execution for Spectre with Haunted RelSE. In *Network and Distributed Systems Security Symposium 2021*. Internet Society, 2021.
- [48] Lesly-Ann Daniel, Sébastien Bardin, and Tamara Rezk. Binsec/Rel: Efficient relational symbolic execution for constant-time at binary-level. In *41st IEEE Symposium on Security and Privacy*. IEEE, 2020.
- [49] Leonardo De Moura and Nikolaj Bjørner. Z3: An efficient SMT solver. *Tools and Algorithms for the Construction and Analysis of Systems*, 2008.
- [50] Frank Denis. libsodium. [https://github.com/jedisct1/libsodium,](https://github.com/jedisct1/libsodium) 2019.
- [51] Craig Disselkoen, Radha Jagadeesan, Alan Jeffrey, and James Riely. The code that never ran: Modeling attacks on speculative evaluation. In *40th IEEE Symposium on Security and Privacy*. IEEE, 2019.
- [52] Goran Doychev, Boris Köpf, Laurent Mauborgne, and Jan Reineke. Cacheaudit: A tool for the static analysis of cache side channels. *ACM Transactions on Information and System Security*, 2015.
- [53] Andres Erbsen, Jade Philipoom, Jason Gross, Robert Sloan, and Adam Chlipala. Systematic generation of fast elliptic curve cryptography implementations. Technical report, 2018. [https://people.csail.mit.edu/jgross/personal-website/papers/](https://people.csail.mit.edu/jgross/personal-website/papers/2018-fiat-crypto-pldi-draft.pdf) [2018-fiat-crypto-pldi-draft.pdf.](https://people.csail.mit.edu/jgross/personal-website/papers/2018-fiat-crypto-pldi-draft.pdf)
- [54] Dmitry Evtyushkin, Ryan Riley, Nael Abu-Ghazaleh, and Dmitry Ponomarev. Branchscope: A new side-channel attack on directional branch predictor. In *23rd International Conference on Architectural Support for Programming Languages and Operating Systems*. ACM, 2018.
- [55] Mohammad Rahmani Fadiheh, Johannes Müller, Raik Brinkmann, Subhasish Mitra, Dominik Stoffel, and Wolfgang Kunz. A formal approach for detecting vulnerabilities to transient execution attacks in out-of-order processors. In *57th ACM/IEEE Design Automation Conference*. ACM/IEEE, 2020.
- [56] Matt Fleming. A thorough introduction to eBPF. [https://lwn.net/Articles/740157/,](https://lwn.net/Articles/740157/) 2017.
- [57] Jacob Fustos, Farzad Farshchi, and Heechul Yun. SpectreGuard: An efficient data-centric defense mechanism against Spectre attacks. In *56th ACM/IEEE Design Automation Conference*, 2019.
- [58] GCC Team. Using the gnu compiler collection (gcc): Instrumentation options. [https:](https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html) [//gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html,](https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html) 2019.
- [59] Qian Ge, Yuval Yarom, David Cock, and Gernot Heiser. A survey of microarchitectural timing attacks and countermeasures on contemporary hardware. *Journal of Cryptographic Engineering*, 2018.
- [60] Jay L Gischer. The equational theory of pomsets. *Theoretical Computer Science*, 1988.
- [61] Enes Göktas, Kaveh Razavi, Georgios Portokalidis, Herbert Bos, and Cristiano Giuffrida. Speculative probing: Hacking blind in the Spectre era. In *Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security*, 2020.
- [62] Ben Gras, Kaveh Razavi, Herbert Bos, and Cristiano Giuffrida. Translation leak-aside buffer: Defeating cache side-channel protections with TLB attacks. In *27th USENIX Security Symposium*, 2018.
- [63] Roberto Guanciale, Musard Balliu, and Mads Dam. Inspectre: Breaking and fixing microarchitectural vulnerabilities by formal analysis. In *CCS*, 2020.
- [64] Marco Guarnieri, Boris Köpf, José F. Morales, Jan Reineke, and Andrés Sánchez. SPEC-TECTOR: principled detection of speculative information flows. In *41st IEEE Symposium on Security and Privacy*. IEEE, 2020.
- [65] Marco Guarnieri, Boris Köpf, Jan Reineke, and Pepe Vila. Hardware-software contracts for secure speculation. In *42nd IEEE Symposium on Security and Privacy*. IEEE, 2021.
- [66] Shengjian Guo, Yueqi Chen, Peng Li, Yueqiang Cheng, Huibo Wang, Meng Wu, and Zhiqiang Zuo. SpecuSym: Speculative symbolic execution for cache timing leak detection. In *Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering*, 2020.
- [67] Andreas Haas, Andreas Rossberg, Derek L. Schuff, Ben L. Titzer, Michael Holman, Dan Gohman, Luke Wagner, Alon Zakai, and Jf Bastien. Bringing the web up to speed with WebAssembly. In *Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation*, 2017.
- [68] Pat Hickey. Announcing Lucet: Fastly's native WebAssembly compiler and runtime. [https:](https://www.fastly.com/blog/announcing-lucet-fastly-native-webassembly-compiler-runtime) [//www.fastly.com/blog/announcing-lucet-fastly-native-webassembly-compiler-runtime,](https://www.fastly.com/blog/announcing-lucet-fastly-native-webassembly-compiler-runtime) 2019.
- [69] Jann Horn. Speculative execution, variant 4: speculative store bypass. [https://bugs.](https://bugs.chromium.org/p/project-zero/issues/detail?id=1528) [chromium.org/p/project-zero/issues/detail?id=1528,](https://bugs.chromium.org/p/project-zero/issues/detail?id=1528) 2018.
- [70] Intel. Speculative store bypass / CVE-2018-3639 / INTEL-SA-00115. [https://software.](https://software.intel.com/security-software-guidance/software-guidance/speculative-store-bypass) [intel.com/security-software-guidance/software-guidance/speculative-store-bypass,](https://software.intel.com/security-software-guidance/software-guidance/speculative-store-bypass) 2018.
- [71] Intel. Deep dive: Intel analysis of microarchitectural data sampling. [https://software.intel.](https://software.intel.com/security-software-guidance/software-guidance/microarchitectural-data-sampling) [com/security-software-guidance/software-guidance/microarchitectural-data-sampling,](https://software.intel.com/security-software-guidance/software-guidance/microarchitectural-data-sampling) 2019.
- [72] Intel. An Optimized Mitigation Approach for Load Value Injection. [https://software.intel.com/security-software-guidance/best-practices/](https://software.intel.com/security-software-guidance/best-practices/optimized-mitigation-approach-load-value-injection) [optimized-mitigation-approach-load-value-injection,](https://software.intel.com/security-software-guidance/best-practices/optimized-mitigation-approach-load-value-injection) 2020.
- [73] Intel. Side channel mitigation by product CPU model. [https://software.intel.com/security-software-guidance/](https://software.intel.com/security-software-guidance/processors-affected-transient-execution-attack-mitigation-product-cpu-model) [processors-affected-transient-execution-attack-mitigation-product-cpu-model,](https://software.intel.com/security-software-guidance/processors-affected-transient-execution-attack-mitigation-product-cpu-model) 2020.
- [74] Intel 64 and IA-32 architectures software developer's manual, 2021.
- [75] Saad Islam, Ahmad Moghimi, Ida Bruhns, Moritz Krebbel, Berk Gulmezoglu, Thomas Eisenbarth, and Berk Sunar. SPOILER: Speculative load hazards boost rowhammer and cache attacks. In *28th USENIX Security Symposium*. USENIX Association, 2019.
- [76] Md Hafizul Islam Chowdhuryy, Hang Liu, and Fan Yao. BranchSpec: Information leakage attacks exploiting speculative branch instruction executions. In *38th International Conference on Computer Design*. IEEE, 2020.
- [77] Jann Horn. Reading privileged memory with a side-channel. [https://googleprojectzero.](https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html) [blogspot.com/2018/01/reading-privileged-memory-with-side.html,](https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html) 2018.
- [78] Ira Ray Jenkins, Prashant Anantharaman, Rebecca Shapiro, J. Peter Brady, Sergey Bratus, and Sean W. Smith. Ghostbusting: Mitigating spectre with intraprocess memory isolation. In *Proceedings of the 7th Symposium on Hot Topics in the Science of Security*. ACM, 2020.
- [79] Burt Kaliski. PKCS #7: Cryptographic Message Syntax Version 1.5. RFC 2315, 1998.
- [80] Khaled N Khasawneh, Esmaeil Mohammadian Koruyeh, Chengyu Song, Dmitry Evtyushkin, Dmitry Ponomarev, and Nael Abu-Ghazaleh. Safespec: Banishing the Spectre of a Meltdown with leakage-free speculation. In *56th ACM/IEEE Design Automation Conference*. ACM/IEEE, 2019.
- [81] Vladimir Kiriansky, Ilia Lebedev, Saman Amarasinghe, Srinivas Devadas, and Joel Emer. DAWG: A defense against cache timing attacks in speculative execution processors. In *51st Annual IEEE/ACM International Symposium on Microarchitecture*. IEEE, 2018.
- [82] Vladimir Kiriansky and Carl Waldspurger. Speculative Buffer Overflows: Attacks and Defenses. [https://arxiv.org/pdf/1807.03757.pdf,](https://arxiv.org/pdf/1807.03757.pdf) 2018.
- [83] Ofek Kirzner and Adam Morrison. An analysis of speculative type confusion vulnerabilities in the wild. In *30th USENIX Security Symposium*, 2021.
- [84] Paul Kocher. Timing attacks on implementations of Diffie-Hellman, RSA, DSS, and other systems. In *Advances in Cryptology*. Springer, 1996.
- [85] Paul Kocher. Spectre mitigations in Microsoft's C/C++ compiler. [https://www.paulkocher.](https://www.paulkocher.com/doc/MicrosoftCompilerSpectreMitigation.html) [com/doc/MicrosoftCompilerSpectreMitigation.html,](https://www.paulkocher.com/doc/MicrosoftCompilerSpectreMitigation.html) 2018.
- [86] Paul Kocher, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz, and Yuval Yarom. Spectre attacks: Exploiting speculative execution. In *40th IEEE Symposium on Security and Privacy*. IEEE, 2019.
- [87] Esmaeil Mohammadian Koruyeh, Khaled N. Khasawneh, Chengyu Song, and Nael Abu-Ghazaleh. Spectre returns! speculation attacks using the return stack buffer. In *12th USENIX Workshop on Offensive Technologies (WOOT 18)*. USENIX Association, 2018.
- [88] Esmaeil Mohammadian Koruyeh, Shirin Haji Amin Shirazi, Khaled N Khasawneh, Chengyu Song, and Nael Abu-Ghazaleh. SPECCFI: Mitigating Spectre attacks using CFI informed speculation. In *41st IEEE Symposium on Security and Privacy*, 2020.
- [89] Shuvendu K. Lahiri, Sanjit A. Seshia, and Randal E. Bryant. Modeling and verification of out-of-order microprocessors in uclid. In *International Conference on Formal Methods in Computer-Aided Design*. Springer, 2002.
- [90] Adam Langley. curve25519-donna. [https://github.com/agl/curve25519-donna.](https://github.com/agl/curve25519-donna)
- [91] Adam Langley. ImperialViolet lucky thirteen attack on tls cbc. [https://www.imperialviolet.](https://www.imperialviolet.org/2013/02/04/luckythirteen.html) [org/2013/02/04/luckythirteen.html,](https://www.imperialviolet.org/2013/02/04/luckythirteen.html) 2013.
- [92] Peinan Li, Lutan Zhao, Rui Hou, Lixin Zhang, and Dan Meng. Conditional speculation: An effective approach to safeguard out-of-order execution against Spectre attacks. In *IEEE International Symposium on High Performance Computer Architecture*, 2019.
- [93] Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner Haas, Anders Fogh, Jann Horn, Stefan Mangard, Paul Kocher, Daniel Genkin, Yuval Yarom, and Mike Hamburg. Meltdown: Reading kernel memory from user space. In *27th USENIX Security Symposium*. USENIX Association, 2018.
- [94] Chang Liu, Michael Hicks, and Elaine Shi. Memory trace oblivious program execution. In *IEEE 26th Computer Security Foundations Symposium*. IEEE, 2013.
- [95] Kevin Loughlin, Ian Neal, Jiacheng Ma, Elisa Tsai, Ofir Weisse, Satish Narayanasamy, and Baris Kasikci. DOLMA: Securing speculation with the principle of transient nonobservability. In *30th USENIX Security Symposium*, 2021.
- [96] Sergio Maffeis, John C Mitchell, and Ankur Taly. Object capabilities and isolation of untrusted web applications. In *31st IEEE Symposium on Security and Privacy*, 2010.
- [97] Giorgi Maisuradze and Christian Rossow. ret2spec: Speculative execution using return stack buffers. In *Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security*. ACM, 2018.
- [98] Andrea Mambretti, Matthias Neugschwandtner, Alessandro Sorniotti, Engin Kirda, William Robertson, and Anil Kurmus. Speculator: a tool to analyze speculative execution attacks and mitigations. In *Proceedings of the 35th Annual Computer Security Applications Conference*, 2019.
- [99] Andrea Mambretti, Alexandra Sandulescu, Alessandro Sorniotti, William Robertson, Engin Kirda, and Anil Kurmus. Bypassing memory safety mechanisms through speculative control flow hijacks. [https://arxiv.org/pdf/2003.05503.pdf,](https://arxiv.org/pdf/2003.05503.pdf) 2020.
- [100] Ross McIlroy, Jaroslav Sevcik, Tobias Tebbi, Ben L. Titzer, and Toon Verwaest. Spectre is here to stay: An analysis of side-channels and speculative execution. [https://arxiv.org/pdf/](https://arxiv.org/pdf/1902.05178) [1902.05178,](https://arxiv.org/pdf/1902.05178) 2019.
- [101] Tyler McMullen. Lucet: A compiler and runtime for high-concurrency low-latency sandboxing. Principles of Secure Compilation, 2020.
- [102] Microsoft. Spectre mitigations in MSVC. [https://devblogs.microsoft.com/cppblog/](https://devblogs.microsoft.com/cppblog/spectre-mitigations-in-msvc/) [spectre-mitigations-in-msvc/,](https://devblogs.microsoft.com/cppblog/spectre-mitigations-in-msvc/) 2018.
- [103] John C. Mitchell, Rahul Sharma, Deian Stefan, and Joe Zimmerman. Information-flow control for programming on encrypted data. In *Computer Security Foundations Symposium*. IEEE, 2012.
- [104] Bodo Moeller. Security of CBC ciphersuites in SSL/TLS: Problems and countermeasures. [https://www.openssl.org/~bodo/tls-cbc.txt,](https://www.openssl.org/~bodo/tls-cbc.txt) 2004.
- [105] Ahmad Moghimi, Jan Wichelmann, Thomas Eisenbarth, and Berk Sunar. Memjam: A false dependency attack against constant-time crypto implementations. *International Journal of Parallel Programming*, 2019.
- [106] Daniel Moghimi. Data sampling on MDS-resistant 10th Generation Intel Core (Ice Lake). *arXiv:2007.07428*, 2020.
- [107] Daniel Moghimi, Moritz Lipp, Berk Sunar, and Michael Schwarz. Medusa: Microarchitectural data leakage via automated attack synthesis. In *29th USENIX Security Symposium*, 2020.
- [108] David Molnar, Matt Piotrowski, David Schultz, and David Wagner. The program counter security model: Automatic detection and removal of control-flow side channel attacks. In *Information Security and Cryptology*. Springer, 2006.
- [109] Andrew C. Myers. JFlow: Practical mostly-static information flow control. In *26th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages*. ACM, 1999.
- [110] Andrew C. Myers, Lantian Zheng, Steve Zdancewic, Stephen Chong, and Nathaniel Nystrom. Jif: Java information flow, 2006. [http://www.cs.cornell.edu/jif.](http://www.cs.cornell.edu/jif)
- [111] Shravan Narayan, Craig Disselkoen, Daniel Moghimi, Sunjay Cauligi, Evan Johnson, Zhao Gang, Anjo Vahldiek-Oberwagner, Ravi Sahita, Hovav Shacham, Dean Tullsen, and Deian Stefan. Swivel: Hardening WebAssembly against Spectre. In *30th USENIX Security Symposium*, 2021.
- [112] Van Chan Ngo, Mario Dehesa-Azuara, Matthew Fredrikson, and Jan Hoffmann. Verifying and synthesizing constant-resource implementations with types. In *38th IEEE Symposium on Security and Privacy*. IEEE, 2017.
- [113] Oleksii Oleksenko, Bohdan Trach, Mark Silberstein, and Christof Fetzer. SpecFuzz: Bringing Spectre-type vulnerabilities to the surface. In *29th USENIX Security Symposium*, 2020.
- [114] The OpenSSL Project. OpenSSL. [https://github.com/openssl/openssl.](https://github.com/openssl/openssl)
- [115] Dag Arne Osvik, Adi Shamir, and Eran Tromer. Cache attacks and countermeasures: the case of AES. In *Cryptographers' Track at the RSA Conference*. Springer, 2006.
- [116] Marco Patrignani and Marco Guarnieri. Exorcising Spectres with secure compilers. [https:](https://arxiv.org/pdf/1910.08607) [//arxiv.org/pdf/1910.08607,](https://arxiv.org/pdf/1910.08607) 2020.
- [117] Jérémy Planul and John C. Mitchell. Oblivious program execution and path-sensitive non-interference. In *26th IEEE Computer Security Foundations Symposium*. IEEE, 2013.
- [118] Thomas Pornin. Why constant-time crypto? [https://www.bearssl.org/constanttime.html,](https://www.bearssl.org/constanttime.html) 2016.
- [119] Thomas Pornin. Constant-time toolkit. [https://github.com/pornin/CTTK,](https://github.com/pornin/CTTK) 2018.
- [120] Jonathan Protzenko, Jean-Karim Zinzindohoué, Aseem Rastogi, Tahina Ramananandro, Peng Wang, Santiago Zanella-Béguelin, Antoine Delignat-Lavaud, Cătălin Hrițcu, Karthikeyan Bhargavan, Cédric Fournet, and Nikhil Swamy. Verified low-level programming embedded in F\*. *Proceedings of the ACM on Programming Languages*, 2017.
- [121] Zhenxiao Qi, Qian Feng, Yueqiang Cheng, Mengjia Yan, Peng Li, Heng Yin, and Tao Wei. SpecTaint: Speculative taint analysis for discovering Spectre gadgets. In *Network and Distributed Systems Security Symposium 2021*, 2021.
- [122] Ashay Rane, Calvin Lin, and Mohit Tiwari. Raccoon: Closing digital side-channels through obfuscated execution. In *24th USENIX Security Symposium*. USENIX Association, 2015.
- [123] Charles Reis, Alexander Moshchuk, and Nasko Oskov. Site isolation: Process separation for web sites within the browser. In *28th USENIX Security Symposium*, 2019.
- [124] Xida Ren, Logan Moody, Mohammadkazem Taram, Matthew Jordan, Dean M Tullsen, and Ashish Venkat. I see dead µops: Leaking secrets via Intel/AMD micro-op caches. In *ACM/IEEE 48th Annual International Symposium on Computer Architecture*, 2021.
- [125] Oscar Reparaz, Josep Balasch, and Ingrid Verbauwhede. Dude, is my code constant time? In *2017 Design, Automation & Test in Europe Conference & Exhibition*. IEEE, 2017.
- [126] Thomas Ristenpart, Eran Tromer, Hovav Shacham, and Stefan Savage. Hey, you, get off of my cloud: Exploring information leakage in third-party compute clouds. In *Proceedings of the 16th ACM conference on Computer and communications security*. ACM, 2009.
- [127] Bruno Rodrigues, Fernando Magno Quintão Pereira, and Diego F. Aranha. Sparse representation of implicit flows with applications to side-channel detection. In *25th International Conference on Compiler Construction*. ACM, 2016.
- [128] Stephen Röttger and Artur Janc. A Spectre proof-of-concept for a Spectre-proof web. [https:](https://security.googleblog.com/2021/03/a-spectre-proof-of-concept-for-spectre.html) [//security.googleblog.com/2021/03/a-spectre-proof-of-concept-for-spectre.html,](https://security.googleblog.com/2021/03/a-spectre-proof-of-concept-for-spectre.html) 2021.
- [129] Andrei Sabelfeld and Andrew C. Myers. Language-based information-flow security. *IEEE Journal on Selected Areas in Communications*, 2003.
- [130] Gururaj Saileshwar and Moinuddin K Qureshi. CleanupSpec: An "undo" approach to safe speculation. In *52nd Annual IEEE/ACM International Symposium on Microarchitecture*, 2019.
- [131] Michael Schwarz, Claudio Canella, Lukas Giner, and Daniel Gruss. Store-to-leak forwarding: Leaking data on meltdown-resistant cpus. [https://arxiv.org/pdf/1905.05725,](https://arxiv.org/pdf/1905.05725) 2019.
- [132] Michael Schwarz, Moritz Lipp, Claudio Canella, Robert Schilling, Florian Kargl, and Daniel Gruss. ConTExT: A generic approach for mitigating Spectre. In *NDSS*, 2020.
- [133] Michael Schwarz, Moritz Lipp, Daniel Moghimi, Jo Van Bulck, Julian Stecklina, Thomas Prescher, and Daniel Gruss. ZombieLoad: Cross-Privilege-Boundary Data Sampling. In *Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security*. ACM, 2019.
- [134] Michael Schwarz, Martin Schwarzl, Moritz Lipp, Jon Masters, and Daniel Gruss. Net-Spectre: Read arbitrary memory over network. In *European Symposium on Research in Computer Security*, 2019.
- [135] Martin Schwarzl, Claudio Canella, Daniel Gruss, and Michael Schwarz. Specfuscator: Evaluating branch removal as a Spectre mitigation. In *Financial Cryptography and Data Security*, 2021.
- [136] Vedvyas Shanbhogue, Deepak Gupta, and Ravi Sahita. Security analysis of processor instruction set architecture for enforcing control-flow integrity. In *Proceedings of the 8th International Workshop on Hardware and Architectural Support for Security and Privacy*, 2019.
- [137] Zhuojia Shen, Jie Zhou, Divya Ojha, and John Criswell. Restricting control flow during speculative execution with Venkman. [https://arxiv.org/pdf/1903.10651,](https://arxiv.org/pdf/1903.10651) 2019.
- [138] Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Audrey Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. SoK: (State of) The Art of War: Offensive Techniques in Binary Analysis. In *37th IEEE Symposium on Security and Privacy*. IEEE, 2016.
- [139] Laurent Simon, David Chisnall, and Ross J. Anderson. What you get is what you C: controlling side effects in mainstream C compilers. In *3rd IEEE European Symposium on Security and Privacy*. IEEE, 2018.
- [140] Juraj Somorovsky. Curious padding oracle in OpenSSL (CVE-2016-2107). [https:](https://web-in-security.blogspot.co.uk/2016/05/curious-padding-oracle-in-openssl-cve.html) [//web-in-security.blogspot.co.uk/2016/05/curious-padding-oracle-in-openssl-cve.html,](https://web-in-security.blogspot.co.uk/2016/05/curious-padding-oracle-in-openssl-cve.html) 2016.
- [141] Deian Stefan, Pablo Buiras, Edward Z. Yang, Amit Levy, David Terei, Alejandro Russo, and David Mazières. Eliminating cache-based timing attacks with instruction-based scheduling. In *European Symposium on Research in Computer Security*. Springer, 2013.
- [142] Marius Sternberger. Spectre-ng: An avalanche of attacks. In *Wiesbaden Workshop on Advanced Microkernel Operating Systems*, 2018.
- [143] Josef Svenningsson and David Sands. Specification and verification of side channel declassification. In *International Workshop on Formal Aspects in Security and Trust*. Springer, 2009.
- [144] Gang Tan. *Principles and Implementation Techniques of Software-Based Fault Isolation*. Now Publishers Inc., 2017.
- [145] Mohammadkazem Taram, Ashish Venkat, and Dean Tullsen. Context-sensitive fencing: Securing speculative execution via microcode customization. In *Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems*, 2019.
- [146] Eran Tromer, Dag Arne Osvik, and Adi Shamir. Efficient cache attacks on AES, and countermeasures. *Journal of Cryptology*, 2010.
- [147] Ming-Hsien Tsai, Bow-Yaw Wang, and Bo-Yin Yang. Certified verification of algebraic properties on low-level mathematical constructs in cryptographic programs. In *Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security*. ACM, 2017.
- [148] Paul Turner. Retpoline: a software construct for preventing branch-target-injection. [https:](https://support.google.com/faqs/answer/7625886) [//support.google.com/faqs/answer/7625886,](https://support.google.com/faqs/answer/7625886) 2019.
- [149] Jo Van Bulck, Marina Minkin, Ofir Weisse, Daniel Genkin, Baris Kasikci, Frank Piessens, Mark Silberstein, Thomas F. Wenisch, Yuval Yarom, and Raoul Strackx. Foreshadow: Extracting the keys to the Intel SGX kingdom with transient out-of-order execution. In *27th USENIX Security Symposium*, 2018.
- [150] Jo Van Bulck, Daniel Moghimi, Michael Schwarz, Moritz Lipp, Marina Minkin, Daniel Genkin, Yarom Yuval, Berk Sunar, Daniel Gruss, and Frank Piessens. LVI: Hijacking transient execution through microarchitectural load value injection. In *41st IEEE Symposium on Security and Privacy*, 2020.
- [151] Marco Vassena, Craig Disselkoen, Klaus V Gleissenthall, Sunjay Cauligi, Rami Gökhan Kici, Ranjit Jhala, Dean Tullsen, and Deian Stefan. Automatically eliminating speculative leaks from cryptographic code with Blade. In *Proceedings of the ACM on Programming Languages*, 2021.
- [152] Serge Vaudenay. Security flaws induced by CBC padding applications to SSL, IPSEC, WTLS. . . . In *International Conference on the Theory and Applications of Cryptographic Techniques*. Springer, 2002.
- [153] Ilias Vougioukas, Nikos Nikoleris, Andreas Sandberg, Stephan Diestelhorst, Bashir M Al-Hashimi, and Geoff V Merrett. BRB: Mitigating branch predictor side-channels. In *2019 IEEE International Symposium on High Performance Computer Architecture*, 2019.
- [154] Guanhua Wang, Sudipta Chattopadhyay, Arnab Kumar Biswas, Tulika Mitra, and Abhik Roychoudhury. KLEESpectre: Detecting information leakage through speculative cache attacks via symbolic execution. *ACM Transactions on Software Engineering and Methodology*, 2020.
- [155] Guanhua Wang, Sudipta Chattopadhyay, Ivan Gotovchits, Tulika Mitra, and Abhik Roychoudhury. oo7: Low-overhead defense against spectre attacks via program analysis. *IEEE Transactions on Software Engineering*, 2019.
- [156] Conrad Watt, John Renner, Natalie Popescu, Sunjay Cauligi, and Deian Stefan. CT-Wasm: Type-driven secure cryptography for the web ecosystem. *Proceedings of the ACM on Programming Languages*, 2019.
- [157] WebAssembly Community Group. Webassembly. [http://webassembly.org,](http://webassembly.org) 2018.
- [158] Ofir Weisse, Ian Neal, Kevin Loughlin, Thomas F Wenisch, and Baris Kasikci. NDA: Preventing speculative execution attacks at their source. In *Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture*, 2019.
- [159] Henry Wong. Store-to-load forwarding and memory disambiguation in x86 processors. [https://blog.stuffedcow.net/2014/01/x86-memory-disambiguation/,](https://blog.stuffedcow.net/2014/01/x86-memory-disambiguation/) 2014.
- [160] Meng Wu, Shengjian Guo, Patrick Schaumont, and Chao Wang. Eliminating timing side-channel leaks using program repair. In *Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis*. ACM, 2018.
- [161] Meng Wu and Chao Wang. Abstract interpretation under speculative execution. In *40th SIGPLAN ACM Conference on Programming Language Design and Implementation*. ACM, 2019.
- [162] Wenjie Xiong and Jakub Szefer. Survey of transient execution attacks and their mitigations. *ACM Computing Surveys*, 2021.
- [163] Mengjia Yan, Jiho Choi, Dimitrios Skarlatos, Adam Morrison, Christopher Fletcher, and Josep Torrellas. Invisispec: Making speculative execution invisible in the cache hierarchy. In *51st Annual IEEE/ACM International Symposium on Microarchitecture*, 2018.
- [164] Yuval Yarom, Daniel Genkin, and Nadia Heninger. CacheBleed: a timing attack on openssl constant-time RSA. *Journal of Cryptographic Engineering*, 2017.
- [165] Katherine Q. Ye, Matthew Green, Naphat Sanguansin, Lennart Beringer, Adam Petcher, and Andrew W. Appel. Verified correctness and security of mbedTLS HMAC-DRBG. In *Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security*. ACM, 2017.
- [166] Bennet Yee, David Sehr, Gregory Dardyk, J Bradley Chen, Robert Muth, Tavis Ormandy, Shiki Okasaka, Neha Narula, and Nicholas Fullagar. Native client: A sandbox for portable, untrusted x86 native code. In *30th IEEE Symposium on Security and Privacy*, 2009.
- [167] Jiyong Yu, Mengjia Yan, Artem Khyzha, Adam Morrison, Josep Torrellas, and Christopher W Fletcher. Speculative taint tracking (STT): A comprehensive protection for speculatively accessed data. In *Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture*, 2019.
- [168] Danfeng Zhang, Yao Wang, G. Edward Suh, and Andrew C. Myers. A hardware design language for timing-sensitive information-flow security. *ACM SIGPLAN Notices*, 2015.
- [169] Tao Zhang, Kenneth Koltermann, and Dmitry Evtyushkin. Exploring branch predictors for constructing transient execution trojans. In *Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems*, 2020.
- [170] Lutan Zhao, Peinan Li, Rui Hou, Jiazhen Li, Michael C Huang, Lixin Zhang, Xuehai Qian, and Dan Meng. A lightweight isolation mechanism for secure branch predictors. [https://arxiv.org/pdf/2005.08183,](https://arxiv.org/pdf/2005.08183) 2020.
- [171] Ziqiao Zhou, Michael K. Reiter, and Yinqian Zhang. A software approach to defeating side channels in last-level caches. In *Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security*. ACM, 2016.
- [172] Jean-Karim Zinzindohoué, Karthikeyan Bhargavan, Jonathan Protzenko, and Benjamin Beurdouche. HACL\*: a verified modern cryptographic library. In *Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security*. ACM, 2017.