Building efficient concurrent data structures that scale to the level of GPU parallelism is a challenging problem. Solutions designed for the CPU do not scale to the thousands of cores that modern GPUs offer. In this dissertation, we show how to efficiently build a lock-based B-Tree on the GPU; then, we show how to extend the B-Tree to support snapshots and linearizable multipoint queries. Finally, we show how to compose a special-purpose data structure from general-purpose ones (e.g., hash tables) and discuss the design decisions that make composing data structures easy.
Our B-Tree supports concurrent queries (point, range, and successor) and updates (insertions and deletions). The B-Tree design use fine-grain locks to synchronize between concurrent updates, yet with clever design designs that reduce contention, the tree provide high update throughput. We show how our cache-aware B-Tree design take advantage of the cache-hierarchy of the GPU, achieving lookup throughput that exceeds the DRAM bandwidth of the GPU.
We address the critical question of understanding and providing semantics for concurrent updates and multipoint queries (e.g., range query). Using linearizability, we offer an intuitive understanding of concurrent operations. We show how to build a linearizable GPU B-Tree that uses snapshots to ensure linearizable multipoint queries. To support our snapshot B-Tree, we design a GPU epoch-based-reclamation scheme.
Finally, we show how to compose a graph data structure from general-purpose hash tables resulting in a hash-based graph data structure that excels at updates and point queries. Our graph data structure outperforms sorted-array-based solutions. The design of the data structure is a general one such that we can replace the hash tables with a B-Tree to address graph problems that require sorted adjacency lists.