- Main
Unlearning and Privacy in Deep Neural Networks
- Golatkar, Aditya
- Advisor(s): Soatto, Stefano
Abstract
We explore the problem of selectively forgetting (or unlearning) a particular subset of the data used for training a deep neural network. While the effects of the data to be forgotten can be hidden from the output of the network, insights may still be gleaned by probing deep into its weights. We propose methods for ``scrubbing'' the weights clean of information about a particular set of training data without requiring retraining from scratch. In the ``white-box'' setting, the weights are modified so that any probing function of the weights is indistinguishable from the same function applied to the weights of a network trained without the data to be forgotten. This condition is a generalized and weaker form of Differential Privacy. Then we improve upon the white-box forgetting method by generalizing it across different readout functions, and show that it can be extended to ensure forgetting in the final activations of the network in a ``black-box'' setting. We introduce a new bound on how much information can be extracted per query about the forgotten cohort from a black-box network for which only the input-output behavior is observed. The proposed forgetting procedure has a deterministic part derived from the differential equations of a linearized version of the model, and a stochastic part that ensures information destruction by adding noise tailored to the geometry of the loss landscape. We exploit the connections between the final activations and weight dynamics of a DNN inspired by Neural Tangent Kernels to compute the information in the final activations. To improve the deterministic part of the forgetting procedure, we present the first method for linearizing a pre-trained model (Linear Quadratic Fine-tuning) that achieves comparable performance to non-linear fine-tuning on most of real-world image classification tasks tested, thus enjoying the interpretability of linear models without incurring punishing losses in performance. LQF consists of simple modifications to the architecture, loss function and optimization typically used for classification which is sufficient to approach the performance of non-linear fine-tuning. We use this to introduce a novel notion of forgetting in mixed-privacy setting, where we know that a ``core'' subset of the training samples does not need to be forgotten. While this variation of the problem is conceptually simple, we show that working in this setting significantly improves the accuracy and guarantees of forgetting methods applied to vision classification tasks. Moreover, our method allows efficient removal of all information contained in non-core data by simply setting to zero a subset of the weights with minimal loss in performance and can achieve close to the state-of-the-art accuracy on large scale vision tasks. To cover the other end of the privacy spectrum, we introduce AdaMix, an adaptive differentially private algorithm for training deep neural network classifiers using both private and public image data. AdaMix incorporates few-shot training, or cross-modal zero-shot learning, on public data prior to private fine-tuning, to improve the trade-off. AdaMix reduces the error increase from the non-private upper bound from the 167-311\% of the baseline, on average across 6 datasets, to 68-92\% depending on the desired privacy level selected by the user. AdaMix tackles the trade-off arising in visual classification, whereby the most privacy sensitive data, corresponding to isolated points in representation space, are also critical for high classification accuracy. In addition, AdaMix comes with strong theoretical privacy guarantees and convergence analysis.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-