Skip to content

Towards Exact Gradient-based Training on Analog In-memory Computing

Abstract

Analog in-memory accelerators present a promising solution for energy-efficient training and inference of large vision or language models. While the inference on analog accelerators has been studied recently, the analog training perspective is under-explored. Recent studies have shown that the vanilla analog stochastic gradient descent (Analog SGD) algorithm {em converges inexactly} and thus performs poorly when applied to model training on non-ideal devices. To tackle this issue, various analog-friendly gradient-based algorithms have been proposed, such as Tiki-Taka and its variants. Even though Tiki-Taka exhibits superior empirical performance compared to Analog SGD, it is a heuristic algorithm that lacks theoretical underpinnings. This paper puts forth a theoretical foundation for gradient-based training on analog devices. We begin by characterizing the non-convergence issue of Analog SGD, which is caused by the asymptotic error arising from asymmetric updates and gradient noise. Further, we provide a convergence analysis of Tiki-Taka, which shows its ability to exactly converge to a critical point and hence eliminates the asymptotic error. The simulations verify the correctness of the analyses.

View PDF

Authors

  • Zhaoxian Wu*
  • Tayfun Gokmen*
  • Malte J. Rasch
  • Tianyi Chen*

*External Authors

Venue

NeurIPS 2024

Date

2024

Share

Related Publications

Join Us on the Cutting-Edge of AI Innovation