Skip to content

GenDataAgent: On-the-fly Dataset Augmentation with Synthetic Data

Abstract

We propose a generative agent that augments training datasets with synthetic data
for model fine-tuning. Unlike prior work, which uniformly samples synthetic data,
our agent iteratively generates relevant samples on-the-fly, aligning with the target
distribution. It prioritizes synthetic data that complements difficult training samples,
focusing on those with high variance in gradient updates. Experiments across
several image classification tasks demonstrate the effectiveness of our approach.

View PDF

Authors

  • Zhiteng Li
  • Lele Chen
  • Jerone Andrews
  • Yunhao Ba
  • Yulun Zhang
  • Alice Xiang

Venue

ICLR-25

Date

2026

Share

Related Publications

Join Us on the Cutting-Edge of AI Innovation