Learning Reconfigurable Representations for Multimodal Federated Learning with Missing Data

Published in 39th Annual Conference on Neural Information Processing Systems, 2025

Abstract: Multimodal federated learning in real-world settings often encounters incomplete and heterogeneous data across clients, resulting in misaligned local feature representations that limit the effectiveness of direct model aggregation. Unlike prior work that assumes either differing modality sets without missing input features or a shared modality set with missing features across clients, we consider a more general and realistic setting where each client observes a different subset of modalities and may also have missing input features within each modality. To address the resulting misalignment in learned representations, we propose a new federated learning framework featuring locally adaptive representations based on learnable client-side embedding controls that encode each client’s data-missing patterns. These embeddings serve as reconfiguration signals that align the globally aggregated representation with each client’s local context, enabling more effective use of shared information. Furthermore, the embedding controls can be algorithmically aggregated across clients with similar data-missing patterns to enhance the robustness of reconfiguration signals in adapting the global representation. Empirical results on multiple federated multimodal benchmarks with diverse data-missing patterns across clients demonstrate the efficacy of the proposed method, achieving up to 36.45% performance improvement under severe data incompleteness. The method is also supported by theoretical analysis with an explicit performance bound that matches our empirical observations.

Paper Supplementary Bibtex