* External authors




FedBERT: When Federated Learning Meets Pre-Training

Yuanyishu Tian*

Yao Wan*

Lingjuan Lyu

Dezhong Yao*

Hai Jin*

Lichao Sun*

* External authors

ACM Transactions on Intelligent Systems and Technology



The fast growth of pre-trained models (PTMs) has brought natural language processing to a new era, which becomes a dominant technique for various natural language processing (NLP) applications. Every user can download weights of PTMs, then fine-tune the weights on a task on the local side. However, the pre-training of a model relies heavily on accessing a large-scale of training data and requires a vast amount of computing resources. These strict requirements make it impossible for any single client to pre-train such a model. In order to grant clients with limited computing capability to participate in pre-training a large model, in this paper, we propose a new learning approach FedBERT that takes advantage of the federated learning and split learning approaches, resorting to pre-training BERT in a federated way. FedBERT can prevent sharing the raw data information and obtain excellent performance. Extensive experiments on seven GLUE tasks demonstrate that FedBERT can maintain its effectiveness without communicating the sensitive local data of clients.

Related Publications

MocoSFL: enabling cross-client collaborative self-supervised learning

ICLR, 2023
Jingtao Li, Lingjuan Lyu, Daisuke Iso, Chaitali Chakrabarti*, Michael Spranger

Existing collaborative self-supervised learning (SSL) schemes are not suitable for cross-client applications because of their expensive computation and large local data requirements. To address these issues, we propose MocoSFL, a collaborative SSL framework based on Split Fe…

IDEAL: Query-Efficient Data-Free Learning from Black-Box Models

ICLR, 2023
Jie Zhang, Chen Chen, Lingjuan Lyu

Knowledge Distillation (KD) is a typical method for training a lightweight student model with the help of a well-trained teacher model. However, most KD methods require access to either the teacher's training data or model parameter, which is unrealistic. To tackle this prob…

Twofer: Tackling Continual Domain Shift with Simultaneous Domain Generalization and Adaptation

ICLR, 2023
Chenxi Liu*, Lixu Wang, Lingjuan Lyu, Chen Sun*, Xiao Wang*, Qi Zhu*

In real-world applications, deep learning models often run in non-stationary environments where the target data distribution continually shifts over time. There have been numerous domain adaptation (DA) methods in both online and offline modes to improve cross-domain adaptat…


Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.