- Main
Towards optimal prediction and transforms for video compression
- Bharath Vishwanath, Fnu
- Advisor(s): Rose, Kenneth
Abstract
The focus of the dissertation is on optimal prediction paradigms and the complementary design of transforms for video compression. One main line of research is on motion compensated prediction for spherical videos. Standard approaches project spherical videos onto planes for processing with traditional 2D video coding standards. Such approaches are significantly sub-optimal as standard video coders only allow for block translations in the critical tool of motion compensated prediction, which is incompatible with the expected motion in projected spherical video. Specifically, the effective sampling density varies over the sphere and the resulting locally varying warping yields complex non-linear motion in the projected domain. Moreover, motion vector in the projected domain does not have a useful physical interpretation. To address these shortcomings, the thesis presents a rotational motion model that performs motion compensation in terms of rotations along geodesics on the sphere. Rotation preserves object shape and size on the sphere. A motion vector in this model implicitly specifies an axis of rotation and the degree of rotation about that axis, to convey the actual motion of objects on the sphere. Complementary to the novel motion model, an effective motion search technique that is tailored to the sphere’s geometry is presented for improved motion estimation.
The thesis then considers an important class of spherical videos whose dynamics involve camera motion. The thesis presents a new geodesic translation motion model that captures the motion field on the sphere, and capitalizes on insights into the perceived motion on the sphere due to camera translation. Specifically, surrounding static points are perceived as moving along their respective geodesics, which all intersect at the poles corresponding to the camera velocity axis. The method further exploits insights into the displacement rate of static points, which depends on object depth and degree of elevation on the sphere (with respect to the camera velocity axis). Complementary to the new motion model, a search grid tailored to capture expected geodesic motion on the sphere for effective motion estimation is presented.
Another focus related to predictor optimization is on design of prediction filters for adaptive compression of non-stationary signals with applications to video coding. The design poses several challenges including: i) catastrophic instability due to statistical mismatch driven by error propagation through the prediction loop, and ii) severe non-convexity of the cost surface that is often riddled with poor local minima. Motivated by these challenges, the thesis presents a near-optimal method for designing prediction modes for adaptive compression. The design builds on a stable, open-loop platform, but with a subterfuge that ensures that it is asymptotically optimized for closed-loop operation. The non-convexity is handled by deterministic annealing, a powerful optimization tool to avoid poor local minima. The impact of the design paradigm on practical applications is demonstrated by designing temporal prediction filters in video coding.
The second line of research focuses on offline design of transforms for inter-prediction residuals, a complementary step to the effective prediction paradigms. Existing codecs for regular 2D videos use standard trigonometric transforms. These transforms are only optimal under certain assumptions that are highly questionable for inter-prediction residue. For projected spherical videos, derivation of transforms even under certain assumptions is a highly challenging task. Thus, there is a strong motivation for a data-driven approach to learn these transforms. The joint design of multiple transform modes is highly challenging due to critical stability problems inherent to feedback through the codec’s prediction loop, wherein training updates inadvertently impact the signal statistics the transform ultimately operates on, and are often counter-productive. It is the premise of this work that a truly effective switched transform design procedure must account for and circumvent this shortcoming. We introduce a data-driven approach to design optimal transform modes for adaptive switching by the encoder. Most importantly, to overcome the critical stability issues, the approach is derived within an asymptotic closed loop (ACL) design framework. The design yields transforms that outperform the transforms obtained by standard design procedures.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-