My current research focuses on examining the theoretical foundations of decentralized learning (or swarm learning). I am also dedicated to utilizing elegant theoretical insights to construct fast and generalizable decentralized learning algorithms. Please refer to my publicatios below.
2024
Preprint 2024
Lie Symmetry Net: Preserving Conservation Laws in Modelling Financial Market Dynamics via Differential Equations
Xuelian Jiang, Tongtian Zhu, Can Wang, Yingxiang Xu, and Fengxiang He
This paper introduce Lie symmetry net, a symmetry-aware approach that addresses a fundamental challenge in AI-driven SDE solvers: ensuring AI models can learn and preserve intrinsic symmetries from data. By incorporating Lie symmetry principles, LSN achieves a significant reduction in test error—over an order of magnitude—compared to state-of-the-art AI-driven methods. The framework is not limited to specific equations or methods but provides a universal solution that can be applied across various AI-driven differential equation solvers.
2023
ICML 2023 (My Favorite)
Decentralized SGD and Average-direction SAM are Asymptotically Equivalent
Decentralized stochastic gradient descent (D-SGD) allows collaborative learning on massive device simultaneously without the control of a central server. Sharpness-aware minimization, or SAM, is a popular optimization technique that effectively improves model generalization by explicitly minimizing a sharpness-based measure alongside the training loss. In this paper, we prove that D-SGD asymptotically minimizes the loss function of an average-direction SAM. This asymptotic equivalence further demonstrates three advantages of D-SGD: (1) D-SGD exhibits a gradient smoothing effect; (2) there exists an uncertainty self-estimating mechanism in D-SGD to improve posterior estimation; and (3) the sharpness regularization effect of D-SGD does not decrease as total batch size increases, which justifies the superiority of D-SGD over centralized SGD (C-SGD) in large-batch settings. We conduct extensive experiments which are in full agreement with our theory. Our code will be made publicly available.
TLDR: The first work on the surprising sharpness-aware minimization nature (i.e., a kind of unique implicit bias) of decentralized learning. We provide a completely new perspective to understand model decentralization, which helps to bridge the gap between practice and exisiting theory in decentralized learning.
AAAI 2023 (Oral)
Contrastive Identity-Aware Learning for Multi-Agent Value Decomposition
Value Decomposition (VD) aims to deduce the contributions of agents for decentralized policies in the presence of only global rewards, and has recently emerged as a powerful credit assignment paradigm for tackling cooperative Multi-Agent Reinforcement Learning (MARL) problems. One of the main challenges in VD is to promote diverse behaviors among agents, while existing methods directly encourage the diversity of learned agent networks with various strategies. However, we argue that these dedicated designs for agent networks are still limited by the indistinguishable VD network, leading to homogeneous agent behaviors and thus downgrading the cooperation capability. In this paper, we propose a novel Contrastive Identity-Aware learning (CIA) method, explicitly boosting the credit-level distinguishability of the VD network to break the bottleneck of multi-agent diversity. Specifically, our approach leverages contrastive learning to maximize the mutual information between the temporal credits and identity representations of different agents, encouraging the full expressiveness of credit assignment and further the emergence of individualities. The algorithm implementation of the proposed CIA module is simple yet effective that can be readily incorporated into various VD architectures. Experiments on the SMAC benchmarks and across different VD backbones demonstrate that the proposed method yields results superior to the state-of-the-art counterparts. Our code is available at this https URL.
KDD 2023
Improving Expressivity of GNNs with Subgraph-Specific Factor Embedded Normalization
Graph Neural Networks (GNNs) have emerged as a powerful category of learning architecture for handling graph-structured data. However, existing GNNs typically ignore crucial structural characteristics in node-induced subgraphs, which thus limits their expressiveness for various downstream tasks. In this paper, we strive to strengthen the representative capabilities of GNNs by devising a dedicated plug-and-play normalization scheme, termed as SUbgraph-sPEcific FactoR Embedded Normalization (SuperNorm), that explicitly considers the intra-connection information within each node-induced subgraph. To this end, we embed the subgraph-specific factor at the beginning and the end of the standard BatchNorm, as well as incorporate graph instance-specific statistics for improved distinguishable capabilities. In the meantime, we provide theoretical analysis to support that, with the elaborated SuperNorm, an arbitrary GNN is at least as powerful as the 1-WL test in distinguishing non-isomorphism graphs. Furthermore, the proposed SuperNorm scheme is also demonstrated to alleviate the over-smoothing phenomenon. Experimental results related to predictions of graph, node, and link properties on the eight popular datasets demonstrate the effectiveness of the proposed method. The code is available at https://github.com/chenchkx/SuperNorm.
ECAI 2023
Adversarial Erasing with Pruned Elements: Towards Better Graph Lottery Ticket
Yuwen Wang, Shunyu Liu, Kaixuan Chen, Tongtian Zhu, Ji Qiao, Mengjie Shi, Yuanyu Wan, and Mingli Song
In European Conference on Artificial Intelligence, 30 sep–04 oct 2023
Graph Lottery Ticket (GLT), a combination of core subgraph and sparse subnetwork, has been proposed to mitigate the computational cost of deep Graph Neural Networks (GNNs) on large input graphs while preserving original performance. However, the winning GLTs in exisiting studies are obtained by applying iterative magnitude-based pruning (IMP) without re-evaluating and re-considering the pruned information, which disregards the dynamic changes in the significance of edges/weights during graph/model structure pruning, and thus limits the appeal of the winning tickets. In this paper, we formulate a conjecture, i.e., existing overlooked valuable information in the pruned graph connections and model parameters which can be re-grouped into GLT to enhance the final performance. Specifically, we propose an adversarial complementary erasing (ACE) framework to explore the valuable information from the pruned components, thereby developing a more powerful GLT, referred to as the ACE-GLT. The main idea is to mine valuable information from pruned edges/weights after each round of IMP, and employ the ACE technique to refine the GLT processing. Finally, experimental results demonstrate that our ACE-GLT outperforms existing methods for searching GLT in diverse tasks.
2022
ICML 2022 (Spotlight)
Topology-aware Generalization of Decentralized SGD
This paper studies the algorithmic stability and generalizability of decentralized stochastic gradient descent (D-SGD). We prove that the consensus model learned by D-SGD is \mathcalO(N^-1+ m^-1 +λ^2)\mathcalO(N^-1+ m^-1 +λ^2)-stable in expectation in the non-convex non-smooth setting, where NN is the total sample size, mm is the worker number, and 1+\lambda1+λis the spectral gap that measures the connectivity of the communication topology. These results then deliver an \mathcalO(N^-(1+α)/2+ m^-(1+α)/2+λ^1+α + \phi_S)\mathcalO(N^-(1+α)/2+ m^-(1+α)/2+λ^1+α + \phi_S) in-average generalization bound, which is non-vacuous even when λλis closed to 11, in contrast to vacuous as suggested by existing literature on the projected version of D-SGD. Our theory indicates that the generalizability of D-SGD is positively correlated with the spectral gap, and can explain why consensus control in initial training phase can ensure better generalization. Experiments of VGG-11 and ResNet-18 on CIFAR-10, CIFAR-100 and Tiny-ImageNet justify our theory. To our best knowledge, this is the first work on the topology-aware generalization of vanilla D-SGD.
One-sentence summary: The first work on the topology-aware generalization analysis of vanilla Decentralized SGD algorithm.