Authors

* External authors

Venue

Date

Share

Privacy for Free: How does Dataset Condensation Help Privacy?

Tian Dong

Bo Zhao*

Lingjuan Lyu

* External authors

ICML 2022

2022

Abstract

To prevent unintentional data leakage, research community has resorted to data generators that can produce differentially private data for model training. However, for the sake of the data privacy, existing solutions suffer from either expensive training cost or poor generalization performance. Therefore, we raise the question whether training efficiency and privacy can be achieved simultaneously. In this work, we for the first time identify that dataset condensation (DC) which is originally designed for improving training efficiency can be a better solution to replace data generators for private data generation, thus providing privacy for free. To demonstrate the privacy benefit of DC, we build a connection between DC and differential privacy (DP), and theoretically prove on linear feature extractors (and then extended to non-linear feature extractors) that the existence of one sample has limited impact (O(m/n)) on the parameter distribution of networks trained on m samples synthesized from n (n >> m) raw data by DC. We also empirically validate the vision privacy and membership privacy of DC-synthesized data by launching both the loss-based and the state-of-the-art likelihood-based membership inference attacks. We envision this work as a milestone for data-efficient and privacy-preserving machine learning.

Related Publications

FLoRA: Federated Fine-Tuning Large Language Models with Heterogeneous Low- Rank Adaptations

NeurIPS, 2024
Lingjuan Lyu, Ziyao Wang, Zheyu Shen, Yexiao He, Guoheng Sun, Hongyi Wang, Ang Li

The rapid development of Large Language Models (LLMs) has been pivotal in advancing AI, with pre-trained LLMs being adaptable to diverse downstream tasks through fine-tuning. Federated learning (FL) further enhances fine-tuning in a privacy-aware manner by utilizing clients'…

pFedClub: Controllable Heterogeneous Model Aggregation for Personalized Federated Learning

NeurIPS, 2024
Jiaqi Wang*, Lingjuan Lyu, Fenglong Ma*, Qi Li

Federated learning, a pioneering paradigm, enables collaborative model training without exposing users’ data to central servers. Most existing federated learning systems necessitate uniform model structures across all clients, restricting their practicality. Several methods …

CURE4Rec: A Benchmark for Recommendation Unlearning with Deeper Influence

NeurIPS, 2024
Chaochao Chen*, Yizhao Zhang*, Lingjuan Lyu, Yuyuan Li*, Jiaming Zhang, Li Zhang, Biao Gong, Chenggang Yan

With increasing privacy concerns in artificial intelligence, regulations have mandated the right to be forgotten, granting individuals the right to withdraw their data from models. Machine unlearning has emerged as a potential solution to enable selective forgetting in model…

  • HOME
  • Publications
  • Privacy for Free: How does Dataset Condensation Help Privacy?

JOIN US

Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.