* External authors




Privacy for Free: How does Dataset Condensation Help Privacy?

Tian Dong

Bo Zhao*

Lingjuan Lyu

* External authors

ICML 2022



To prevent unintentional data leakage, research community has resorted to data generators that can produce differentially private data for model training. However, for the sake of the data privacy, existing solutions suffer from either expensive training cost or poor generalization performance. Therefore, we raise the question whether training efficiency and privacy can be achieved simultaneously. In this work, we for the first time identify that dataset condensation (DC) which is originally designed for improving training efficiency can be a better solution to replace data generators for private data generation, thus providing privacy for free. To demonstrate the privacy benefit of DC, we build a connection between DC and differential privacy (DP), and theoretically prove on linear feature extractors (and then extended to non-linear feature extractors) that the existence of one sample has limited impact (O(m/n)) on the parameter distribution of networks trained on m samples synthesized from n (n >> m) raw data by DC. We also empirically validate the vision privacy and membership privacy of DC-synthesized data by launching both the loss-based and the state-of-the-art likelihood-based membership inference attacks. We envision this work as a milestone for data-efficient and privacy-preserving machine learning.

Related Publications

MocoSFL: enabling cross-client collaborative self-supervised learning

ICLR, 2023
Jingtao Li, Lingjuan Lyu, Daisuke Iso, Chaitali Chakrabarti*, Michael Spranger

Existing collaborative self-supervised learning (SSL) schemes are not suitable for cross-client applications because of their expensive computation and large local data requirements. To address these issues, we propose MocoSFL, a collaborative SSL framework based on Split Fe…

IDEAL: Query-Efficient Data-Free Learning from Black-Box Models

ICLR, 2023
Jie Zhang, Chen Chen, Lingjuan Lyu

Knowledge Distillation (KD) is a typical method for training a lightweight student model with the help of a well-trained teacher model. However, most KD methods require access to either the teacher's training data or model parameter, which is unrealistic. To tackle this prob…

Twofer: Tackling Continual Domain Shift with Simultaneous Domain Generalization and Adaptation

ICLR, 2023
Chenxi Liu*, Lixu Wang, Lingjuan Lyu, Chen Sun*, Xiao Wang*, Qi Zhu*

In real-world applications, deep learning models often run in non-stationary environments where the target data distribution continually shifts over time. There have been numerous domain adaptation (DA) methods in both online and offline modes to improve cross-domain adaptat…

  • HOME
  • Publications
  • Privacy for Free: How does Dataset Condensation Help Privacy?


Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.