Video stabilization is one of the most widely sought features in video processing. The problem of video stabilization typically consists of two steps: video motion analysis and stabilized frame synthesis. Traditional video motion analysis relies heavily on local image features and is prone to error in the presence of occlusions and motion blur. It is also challenging to avoid visual artifacts in the stabilized frame synthesis due to complexity of the scenes, e.g. parallax, dynamic objects. In this dissertation, we focus on exploring video stabilization methods that generate high quality stable results and are generalizable to real world videos.
We start from selfie videos, which is a special type of video that is challenging for all video stabilization methods due to the dynamic occlusion. Our solution is to analyze motion of the foreground/background separately and jointly stabilize the motion. Specifically, we seek to use 3D human face model as a prior information of foreground motion. For the background, our approach tracks random pixels using optical flow. We exploit non-linear least squares optimization to stablize both the foreground/background motion. To make the process practical for commercial applications, we also designed an online version of the pipeline using a sliding window scheme. We also exploit deep learning techniques to replace optimization, resulting in orders of magnitude speed improvement compared to the state-of-the-art optimization based approaches.
Our works also generalize to the general video stabilization context. General video occlusion is free-form, therefore a more general scheme is required to appropriately constrain the video stabilization. We proposed two different solutions based on this principle. Both of our approaches use optical flow to analyze the motion and generate a dense warp field for stabilizing input frames. Our first approach seeks to constrain the frame warp field using a global linear transformation. In the second we designed specific metrics based on optical flow to exclude dynamic occlusions and motion boundaries. Our experiments show that our approach is more generalizable and produces both visually and quantatively better results compared to previous video stabilization methods.