Advancements in Higher Order Ambisonics Compression and Loss Concealment Techniques
- Namazi, Mahmoud
- Advisor(s): Rose, Kenneth
Abstract
Virtual reality's resurgence has intensified interest in higher order Ambisonics (HOA), renowned for its ability to recreate spatial audio across diverse speaker setups which is crucial for many applications. Using spherical harmonics to create a 3D soundfield, HOA's popularity in spatial audio storage and transmission is significant. However, challenges arise in enabling immersive experiences due to the potential for HOA to encompass up to 64 audio channels, necessitating effective compression methods. This thesis addresses this challenge by proposing new and tailored algorithms for the compression and loss concealment of HOA signals. The first part of the thesis discusses a new adaptive framework for HOA compression which considers both the previous frame reconstructed data and the current frame data, to obtain a more relevant set of SVD basis vectors spanning the null space, in order to extend the available set of dominant basis vectors, at the decoder, at little bitrate cost, leading to significantly improved audio quality. The second part of the thesis focuses on low-delay HOA compression. Modern codecs use a combination of inter-channel and inter-frame linear predictors or combine frame-based singular value decomposition (SVD) with the MDCT. This thesis shows that reduced delay and superior bitrate can be gained, by instead applying an adaptive SVD transform, which relies on previously decoded data rather than the current time samples, for inter-channel decorrelation, as well as LPC and cascaded long term prediction (which can capture the periodic components of polyphonic signals) to capture short-term and long-term temporal correlations, respectively. The third part of the thesis focuses on loss concealment for HOA. Current methods for loss concealment involve essentially applying a predictor trained on past and future data to predict the lost frame. However, such methods do not consider the spatial aspects of HOA. The thesis shows how significant improvements can be made by decorrelating the signal using SVD, treating the audio aspect with a predictor and the spatial basis vectors with interpolation on a sample-by-sample basis, before recombining the audio and spatial aspects of the signal to arrive at a superior estimate of the lost frame.