Optimizing Diffusion Models for Efficiency and Accountability
- Vora, Jayneel
- Advisor(s): Mohapatra, Prasant
Abstract
Recent advancements in diffusion models have significantly enhanced their application across image, audio, and video generation, yet they continue to face key challenges in efficiency, scalability, privacy, and accountability.This thesis addresses these issues by introducing novel solutions for federated learning, post-training quantization, 3D semantic segmentation, and identity accountability in diffusion models. First, the FedDM framework is developed to improve communication efficiency and handle data heterogeneity to enable the federated training of diffusion models. It uses quantized model updates and proximal terms to ensure model convergence and high-quality generation in non-IID scenarios. Second, a post-training quantization strategy for text-conditional audio diffusion models is proposed. This strategy reduces the model size by up to 70% while maintaining synthesis fidelity through coverage-driven prompt augmentation and activation-aware calibration. Additionally, an inference framework is presented to ensure identity accountability in diffusion models, allowing for the verification of unauthorized data usage. Three attacks—membership inference, identity inference, and identity-based data extraction attacks—are explored. Finally, a hybrid 2D-3D vision technique is introduced to enhance the computational efficiency of 3D semantic segmentation, reducing memory usage and inference times without compromising accuracy. These contributions collectively advance the deployment of diffusion models in decentralized, resource-constrained, and privacy-sensitive environments, paving the way for more efficient and ethical generative models across various domains.