Demands for storing huge volumes of data and limited communication networkbandwidth call for effective data compression with high performance and energy efficiency to
reduce high storage and communication costs for a wide range of systems and applications.
Data compression consists of two types: lossy and lossless. Canonical Huffman encoding and
Gzip are two popular lossless compression techniques. Hardware implementation of such
lossless compression techniques in many-core processors and other hardware platforms is
crucial in achieving optimized performance and energy-efficiency. This dissertation analyzes
static and dynamic canonical Huffman codec algorithms, Gzip compression techniques, and
presents high throughput and energy-efficient hardware and software implementations with
good compression ratios.
Canonical Huffman encoding is naturally sequential which consists of several tasks:finding a histogram of symbol-frequencies, sorting of symbols, Huffman tree creation, building
canonical code tables, and encoding symbols. This dissertation demonstrates energy-efficient
canonical Huffman encoder architectures exploiting task-level parallelism for the above tasks
and introduces a concurrent approach to execute sorting, Huffman tree creation, and code
length computation tasks that yields better memory efficiency and performance than the
conventional approach. The proposed architectures are implemented on a many-core array,
Intel i7, Nvidia GT 750M, Intel FPGA, and 45-nm ASIC.
The many-core encoder implementations achieve a scaled throughput per chip areathat is 89.2x and 4.7x greater on average and 44.7x and 8.2x greater in terms of scaled
energy efficiency (compressed bits encoded per energy) than the Intel i7 and Nvidia GT
750M, respectively executing the common Corpus benchmarks for data compression. Encoder
implementations on the many-core processor array yield scaled throughput per chip area and
scaled energy efficiency that is 58x and 4.8x greater on average than the state-of-the-art
efficient canonical Huffman encoder implementation on Tesla V100 GPUs executing the
enwik8 dataset.
Scaled synthesis results from a proposed pipelined canonical Huffman encoder 45-nm ASIC results in 2.44x greater on throughput, 3.5x lower on total power dissipation, and7.6x lower energy dissipation over Intel FPGA implementations.
Next, this dissertation presents bit-parallel static and dynamic canonical Huffmandecoder implementations using an optimized lookup table approach on a fine-grain manycore
array, Intel i7, Nvidia GT 750M, Intel FPGA, and 45-nm ASIC. The many-core
implementations achieve a scaled throughput per chip area that is 891x and 7x greater on
average and scaled energy efficiency (compressed bits decoded per energy) that is 149.5x and
3.9x greater on average than the i7 and GT 750M, respectively.
The 45-nm ASIC synthesis results show that the pipelined and memory-efficient staticdecoder yields a 5.1x throughput improvement and 13.4x energy efficiency improvement
over the FPGA implementation.
Furthermore, this dissertation presents energy-efficient and high throughput Gziphardware architectures using a chained hash bank memory design of depth three for LZ77
encoder and synthesis results of the Gzip compression engines implemented in a 45-nm ASIC.
The proposed encoder architectures exploit both static and dynamic canonical Huffman
encoders along with a pipelined LZ77 encoder. The pipelined Gzip engine using a static
canonical Huffman encoder with a parallel window size (PWS) of 16 bytes per clock cycle,
achieves a maximum input throughput of 2.53 GB/s, while the dynamic canonical Huffman
encoder-based Gzip compressor achieves a maximum input throughput of 0.52 GB/s. To the
best of our knowledge, this Gzip compressor offers the highest reported compression ratio in
the literature of 2.47 for the Calgary Corpus benchmark.
Finally, this dissertation presents DeepScaleTool, an open-source tool for the accurateestimation of deep-submicron technology scaling by modeling and curve fitting published data
by a leading commercial fabrication company for silicon fabrication technology generations
from 130 nm to 7 nm for the key parameters of area, delay, and energy.