In carbon capture and sequestration, developing rapid and effective imaging techniques is crucial for real-time monitoring of the spatial and temporal dynamics of CO2 propagation during and after injection. With continuing improvements in computational power and data storage, data-driven techniques based on machine learning (ML) have been effectively applied to seismic inverse problems. In particular, ML helps alleviate the ill-posedness and high computational cost of full-waveform inversion (FWI). However, such data-driven inversion techniques require massive high-quality training data sets to ensure prediction accuracy, which hinders their application to time-lapse monitoring of CO2 sequestration. We develop an efficient “hybrid” time-lapse workflow that combines physics-based FWI and data-driven ML inversion. The scarcity of the available training data is addressed by developing a new data-generation technique with physics constraints. The method is validated using a synthetic CO2-sequestration model based on the Kimberlina storage reservoir in California. Our approach is shown to synthesize a large volume of high-quality, physically realistic training data, which is critically important in accurately characterizing the CO2 movement in the reservoir. The developed hybrid methodology can also simultaneously predict the variations in velocity and saturation and achieve high spatial resolution in the presence of realistic noise in the data.