- Main
Curricula-Driven Approaches for Efficient Model Training
- Nunez, Elvis
- Advisor(s): Soatto, Stefano
Abstract
As deep learning models and datasets continue to scale up, the financial and environmental costs associated with training these models have grown considerably over the past decade. While hardware accelerators--such as GPUs and TPUs--have made strides in efficiency, these gains are insufficient to offset the training demands of contemporary models. To move beyond the limits of hardware improvements alone, we pursue a complementary approach of developing more algorithmically-efficient training methods. In particular, we explore dynamically modulating the capacity of neural networks during training in service of improving training efficiency. Traditional training methods often assume a fixed model capacity and dataset complexity during the training process. However, we demonstrate that utilizing the model's full capacity for the entire duration of training is unnecessary and can lead to excessive resource consumption, overfitting, and slow convergence. To this end, we explore several approaches for dynamically modulating both model capacity and data complexity during training. For the model capacity, we consider techniques like dynamic model compression and modifications to the underlying architecture. For data complexity, we consider dynamic adjustments to the input resolution. Central to our approaches are distinct curricula, each governing how the model's capacity and data complexity evolve throughout training. By modulating capacity in alignment with the model's learning stage, we reduce unnecessary computation while maintaining the model's performance. We demonstrate the effectiveness of this modulation across a wide range of vision and language tasks, as well as for various architectures.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-