H.264/AVC is a popular video coding standard used in the fields of communication, video streaming and broadcasting. The H.264/AVC standard as specified in ITU-T — ISO/IEC has two methods of entropy coding, namely Context-based Adaptive Variable Length Coding (CAVLC) and Context-based Adaptive Binary Arithmetic Coding (CABAC). CABAC utilizes probability estimation to achieve a bit-rate reduction of 19% compared to CAVLC.
In order to deal with the higher level of computational complexity of the CABAC entropy coding over the CAVLC coding, the CABAC algorithm is sometimes implemented in hardware to achieve real- time high resolution video coding. The CABAC algorithm can be broken into smaller tasks that can be performed independently. This makes the many-core processor array an appropriate option for the hardware implementation of the CABAC encoding algorithm. The independent tasks within the algorithm can be assigned to individual cores of the array. The KiloCore II, which this thesis uses for its hardware platform, contains hundreds of programmable processors and multiple 64kB shared memories per chip. Lanes of processors are constructed to perform the functions of the blocks within the CABAC.
The aim of this thesis is to compare the throughput results with existing hardware and software implementations of CABAC and to show that the throughput, power and energy in the case of the KiloCore II is competitive. The CABAC algorithm was mapped on the KiloCore II array using the Project manager and Simulator platform. The total area occupied by the algorithm was 3.52 mm 2 in 32 nm technology with 64 cores and 177 routing links. The implementation achieved a throughput of 37 million bins per second at 1.1 V operating voltage and an energy of 34.37 µJ per individual bin at 0.8 V operating voltage.
This implementation of the CABAC has an improvement of 57 times in throughput, when compared to the software implementation, that is, the JM software reference run on the Intel Xeon Processor E5-2680 v2. Despite being a fully-software implementation, the presented KiloCore design achieves a throughput within a factor of five when compared to hardware CABAC implementations scaled to the same 32 nm fabrication technology.