SAFA: Handling Sparse and Scarce Data in Federated Learning with Accumulative Learning

Published in IEEE Transactions on Computers, 2025

Abstract: Federated Learning (FL) has emerged as an effective paradigm allowing multiple parties to collaboratively train a global model while protecting their private data. However, it is observed that the performance of FL approaches tends to degrade significantly when data are sparsely distributed across clients with small datasets. This is referred to as the sparse-and-scarce challenge, where data held by each client is both sparse (does not contain examples to all classes) and scarce (small dataset). Sparse-and-scarce data diminishes the generalizability of clients’ data, leading to intensive over-fitting and massive domain shifts in the local models and, ultimately, decreasing the aggregated model’s performance. Interestingly, while this scenario is a specific manifestation of the well-known non-IID11This refers to the generic situation where local data distributions are not identical and independently distributed. challenge in FL, it has not been distinctly addressed. Our empirical investigation highlights that generic approaches to the non-IID challenge often prove inadequate in mitigating the sparse-and-scarce issue. To bridge this gap, we develop SAFA, a novel FL algorithm that specifically addresses the sparse-and-scarce challenge via a novel continual model iteration procedure. SAFA maximally exposes local models to the inter-client diversity of data with minimal effects of catastrophic forgetting. Our experiments show that SAFA outperforms existing FL solutions, up to 17.86%, compared to the prominent baseline. The code is accessible via https://github.com/HungNguyen20/SAFA

Paper Bibtex