Recently, deep learning models have been extensively adopted in numerous applications, from health care to finance and entertainment industry. This wide-spread deployment of deep models raised concern over the privacy of data used to train deep models. This is a huge concern particularly for data-sensitive applications, such as health records, personal data, bio-metric data, etc. As a result, a new direction of research focusing on possible attacks aiming to identify training data of deep models emerged, called membership inference.Membership inference (MI) attacks identify which samples have been used during training and which samples have not. The first generation of membership inference attacks mainly used deep models' prediction confidence as a feature to identify training samples. The intuition is that deep models are more confident on samples they have seen during training than non-training samples.
Despite their sound intuition and apparent successful reports, we, along a few other parallel studies, showed that the first generation of membership inference attacks are ineffective in practice for multiple reasons. First, they could not significantly outperform a naive baseline that labels a sample as a member (training sample) if it is correctly classified by the deep model and as a non-member (non-training sample) otherwise. Second, the confidence distribution of correctly classified samples, which cover the majority of a dataset, are not distinguishable between train and non-train samples. Only a small portion of mis-classified samples exhibit discrepant distribution. Third, all these membership inference attacks report average-case success metric (e.g., accuracy or ROC-AUC). However, privacy is not an average case-metric, and it should be treated similar to other security and privacy related problems. Similar to other security problems, the attack is reliable if it can identify a few training samples while almost on non-training samples are falsely labeled as a training sample. In other words, a reliable membership inference attack should have a decent true-positive rate (TPR) at low false-positive rates (FPR).
In this dissertation, we aim to move the membership inference research in a more practical direction, either by showing the limitations of the current attacks or by proposing more reliable attacks. As stated earlier, we first show that the current generation of membership inference attacks are not reliable in practice. Then, we propose several new membership inference attacks that achieve more reliable performance in more realistic scenarios. The first attack focuses on the model's behavior in the entire sub-population, instead of a single sample in vacuum. More specifically, we compare the model's confidence on a target sample and other samples from the same sub-population. If the confidence of a sample is significantly higher than the average confidence on that sub-population, that is an indication of a training sample. We show that this attack can achieve moderate true positive with very low false positive. Additionally, we propose a BiGAN architecture to generate samples from the same sub-population, in case it is not available. The second attack aims to focus on user-level MI attack instead of the record-level MI attack. In this scenario, we identify if a user's data has been used during training instead if identifying which samples from the user have been used. Not only this attack is more realistic in privacy domain, but we show that we can achieve the state-of-the-art accuracy if multiple samples from a user are used to draw the membership inference. In another study, we show that MI attacks are generally more successful when deep ensemble is used. We show that deep ensemble shifts the distribution of train and non-train samples in a different way where they become significantly more distinguishable. Finally, we show that are a few simple aggregation mechanisms instead of ensemble averaging that can improve the accuracy and privacy of deep models in deep ensemble context.
Finally, we illustrate a fundamental issue with current MI attacks, including the state-of-the-art attacks, that limits their applications in certain scenarios. We elaborate the issues with a practical scenario where membership inference attacks are used by an auditor (investigator) to prove to a judge/jury that the auditee unlawfully used sensitive data during training. Although the current SOTA attacks can identify some training samples with low false positive ratio in a common experimental setting extensively used for MI attacks, an auditee can generate unlimited number of samples on which MI attacks catastrophically fail. This can be used in court to easily discredit the allegation of the auditor and make the case dismissed. Interestingly, we show that auditee does not need to know anything about the auditor's membership inference attack to generate those challenging samples. We called this problem, discredibility. Currently, there is no attack immune to discredibility. We hope that our research sheds light on this newly-discovered issue and encourage researchers to investigate it.