GenDataAgent: On-the-fly Dataset Augmentation with Synthetic Data
Abstract
We propose a generative agent that augments training datasets with synthetic data
for model fine-tuning. Unlike prior work, which uniformly samples synthetic data,
our agent iteratively generates relevant samples on-the-fly, aligning with the target
distribution. It prioritizes synthetic data that complements difficult training samples,
focusing on those with high variance in gradient updates. Experiments across
several image classification tasks demonstrate the effectiveness of our approach.
Authors
- Zhiteng Li
- Lele Chen
- Jerone Andrews
- Yunhao Ba
- Yulun Zhang
- Alice Xiang
Venue
ICLR-25
Date
2026