Publications | Tongtian Zhu

2025

ICLR 2025
DICE: Data Influence Cascade in Decentralized Learning

Tongtian Zhu, Wenhao Li, Can Wang, and Fengxiang He

In International Conference on Learning Representations, 2025

My Comment Abs Bib PDF Code

Decentralized learning offers a promising approach to crowdsource data consumptions and computational workloads across geographically distributed compute interconnected through peer-to-peer networks, accommodating the exponentially increasing demands. However, proper incentives are still in absence, considerably discouraging participation. Our vision is that a fair incentive mechanism relies on fair attribution of contributions to participating nodes, which faces non-trivial challenges arising from the localized connections making influence “cascade” in a decentralized network. To overcome this, we design the first method to estimate Data Influence CascadE (DICE) in a decentralized environment. Theoretically, the framework derives tractable approximations of influence cascade over arbitrary neighbor hops, suggesting the influence cascade is determined by an interplay of data, communication topology, and the curvature of loss landscape.DICE also lays the foundations for applications including selecting suitable collaborators and identifying malicious behaviors. Project page is available at https://raiden-zhu.github.io/blog/2025/DICE.

TLDR: DICE introduces the first framework to define and estimate data influence in decentralized learning. We first theoretically uncover that data contribution in decentralized learning doesn’t remain local—it spreads across the communication graph like "ripples in water," creating a cascading effect. This cascade depends on three key factors: the data itself, the curvature of the optimization landscape, and the structure of the communication graph.
@inproceedings{openreview-dice25, title = {DICE: Data Influence Cascade in Decentralized Learning}, author = {Zhu, Tongtian and Li, Wenhao and Wang, Can and He, Fengxiang}, booktitle = {International Conference on Learning Representations}, year = {2025}, url = {https://openreview.net/forum?id=2TIYkqieKw}, }

2024

Preprint 2024
Lie Symmetry Net: Preserving Conservation Laws in Modelling Financial Market Dynamics via Differential Equations

Xuelian Jiang, Tongtian Zhu, Can Wang, Yingxiang Xu, and Fengxiang He

In arXiv preprint arXiv:2406.09189, 2024

Abs Bib Code

This paper introduce Lie symmetry net, a symmetry-aware approach that addresses a fundamental challenge in AI-driven SDE solvers: ensuring AI models can learn and preserve intrinsic symmetries from data. By incorporating Lie symmetry principles, LSN achieves a significant reduction in test error—over an order of magnitude—compared to state-of-the-art AI-driven methods. The framework is not limited to specific equations or methods but provides a universal solution that can be applied across various AI-driven differential equation solvers.
@inproceedings{jiang2024lie, title = {Lie Symmetry Net: Preserving Conservation Laws in Modelling Financial Market Dynamics via Differential Equations}, author = {Jiang, Xuelian and Zhu, Tongtian and Wang, Can and Xu, Yingxiang and He, Fengxiang}, booktitle = {arXiv preprint arXiv:2406.09189}, year = {2024}, }

2023

ICML 2023
Decentralized SGD and Average-direction SAM are Asymptotically Equivalent

Tongtian Zhu, Fengxiang He, Kaixuan Chen, Mingli Song, and Dacheng Tao

In Proceedings of the 40th International Conference on Machine Learning, 2023

My Comment Abs Bib PDF Slides Poster Code

Decentralized stochastic gradient descent (D-SGD) allows collaborative learning on massive device simultaneously without the control of a central server. Sharpness-aware minimization, or SAM, is a popular optimization technique that effectively improves model generalization by explicitly minimizing a sharpness-based measure alongside the training loss. In this paper, we prove that D-SGD asymptotically minimizes the loss function of an average-direction SAM. This asymptotic equivalence further demonstrates three advantages of D-SGD: (1) D-SGD exhibits a gradient smoothing effect; (2) there exists an uncertainty self-estimating mechanism in D-SGD to improve posterior estimation; and (3) the sharpness regularization effect of D-SGD does not decrease as total batch size increases, which justifies the superiority of D-SGD over centralized SGD (C-SGD) in large-batch settings. We conduct extensive experiments which are in full agreement with our theory. Our code will be made publicly available.

TLDR: The first work on the surprising sharpness-aware minimization nature (i.e., a kind of unique implicit bias) of decentralized learning. We provide a completely new perspective to understand model decentralization, which helps to bridge the gap between practice and exisiting theory in decentralized learning.
@inproceedings{pmlr-zhu23, title = {Decentralized SGD and Average-direction SAM are Asymptotically Equivalent}, author = {Zhu, Tongtian and He, Fengxiang and Chen, Kaixuan and Song, Mingli and Tao, Dacheng}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, year = {2023}, pages = {43005--43036}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, publisher = {PMLR}, }
AAAI 2023 (Oral)
Contrastive Identity-Aware Learning for Multi-Agent Value Decomposition

Shunyu Liu, Yihe Zhou, Jie Song, Tongya Zheng, Kaixuan Chen, Tongtian Zhu, Zunlei Feng, and Mingli Song

In Proceedings of the 37th AAAI Conference on Artificial Intelligence, 2023

Abs Bib PDF Code

Value Decomposition (VD) aims to deduce the contributions of agents for decentralized policies in the presence of only global rewards, and has recently emerged as a powerful credit assignment paradigm for tackling cooperative Multi-Agent Reinforcement Learning (MARL) problems. One of the main challenges in VD is to promote diverse behaviors among agents, while existing methods directly encourage the diversity of learned agent networks with various strategies. However, we argue that these dedicated designs for agent networks are still limited by the indistinguishable VD network, leading to homogeneous agent behaviors and thus downgrading the cooperation capability. In this paper, we propose a novel Contrastive Identity-Aware learning (CIA) method, explicitly boosting the credit-level distinguishability of the VD network to break the bottleneck of multi-agent diversity. Specifically, our approach leverages contrastive learning to maximize the mutual information between the temporal credits and identity representations of different agents, encouraging the full expressiveness of credit assignment and further the emergence of individualities. The algorithm implementation of the proposed CIA module is simple yet effective that can be readily incorporated into various VD architectures. Experiments on the SMAC benchmarks and across different VD backbones demonstrate that the proposed method yields results superior to the state-of-the-art counterparts. Our code is available at this https URL.
@inproceedings{liu2023, title = {Contrastive Identity-Aware Learning for Multi-Agent Value Decomposition}, author = {Liu, Shunyu and Zhou, Yihe and Song, Jie and Zheng, Tongya and Chen, Kaixuan and Zhu, Tongtian and Feng, Zunlei and Song, Mingli}, booktitle = {Proceedings of the 37th AAAI Conference on Artificial Intelligence}, year = {2023}, }
KDD 2023
Improving Expressivity of GNNs with Subgraph-Specific Factor Embedded Normalization

Kaixuan Chen, Shunyu Liu, Tongtian Zhu, Ji Qiao, Yun Su, Yingjie Tian, Tongya Zheng, Haofei Zhang, and 3 more authors

In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023

Abs Bib Code

Graph Neural Networks (GNNs) have emerged as a powerful category of learning architecture for handling graph-structured data. However, existing GNNs typically ignore crucial structural characteristics in node-induced subgraphs, which thus limits their expressiveness for various downstream tasks. In this paper, we strive to strengthen the representative capabilities of GNNs by devising a dedicated plug-and-play normalization scheme, termed as SUbgraph-sPEcific FactoR Embedded Normalization (SuperNorm), that explicitly considers the intra-connection information within each node-induced subgraph. To this end, we embed the subgraph-specific factor at the beginning and the end of the standard BatchNorm, as well as incorporate graph instance-specific statistics for improved distinguishable capabilities. In the meantime, we provide theoretical analysis to support that, with the elaborated SuperNorm, an arbitrary GNN is at least as powerful as the 1-WL test in distinguishing non-isomorphism graphs. Furthermore, the proposed SuperNorm scheme is also demonstrated to alleviate the over-smoothing phenomenon. Experimental results related to predictions of graph, node, and link properties on the eight popular datasets demonstrate the effectiveness of the proposed method. The code is available at https://github.com/chenchkx/SuperNorm.
@inproceedings{10.1145/3580305.3599388, author = {Chen, Kaixuan and Liu, Shunyu and Zhu, Tongtian and Qiao, Ji and Su, Yun and Tian, Yingjie and Zheng, Tongya and Zhang, Haofei and Feng, Zunlei and Ye, Jingwen and Song, Mingli}, title = {Improving Expressivity of GNNs with Subgraph-Specific Factor Embedded Normalization}, year = {2023}, publisher = {Association for Computing Machinery}, booktitle = {Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining}, pages = {237–249}, numpages = {13}, keywords = {graph normalization, subgraph-specific factor, graph isomorphism test, oversmoothing issue, graph neural networks}, location = {Long Beach, CA, USA}, series = {KDD '23}, }
ECAI 2023
Adversarial Erasing with Pruned Elements: Towards Better Graph Lottery Ticket

Yuwen Wang, Shunyu Liu, Kaixuan Chen, Tongtian Zhu, Ji Qiao, Mengjie Shi, Yuanyu Wan, and Mingli Song

In European Conference on Artificial Intelligence, 2023

Abs Bib PDF Code

Graph Lottery Ticket (GLT), a combination of core subgraph and sparse subnetwork, has been proposed to mitigate the computational cost of deep Graph Neural Networks (GNNs) on large input graphs while preserving original performance. However, the winning GLTs in exisiting studies are obtained by applying iterative magnitude-based pruning (IMP) without re-evaluating and re-considering the pruned information, which disregards the dynamic changes in the significance of edges/weights during graph/model structure pruning, and thus limits the appeal of the winning tickets. In this paper, we formulate a conjecture, i.e., existing overlooked valuable information in the pruned graph connections and model parameters which can be re-grouped into GLT to enhance the final performance. Specifically, we propose an adversarial complementary erasing (ACE) framework to explore the valuable information from the pruned components, thereby developing a more powerful GLT, referred to as the ACE-GLT. The main idea is to mine valuable information from pruned edges/weights after each round of IMP, and employ the ACE technique to refine the GLT processing. Finally, experimental results demonstrate that our ACE-GLT outperforms existing methods for searching GLT in diverse tasks.
@inproceedings{wang2023adversarial, title = {Adversarial Erasing with Pruned Elements: Towards Better Graph Lottery Ticket}, author = {Wang, Yuwen and Liu, Shunyu and Chen, Kaixuan and Zhu, Tongtian and Qiao, Ji and Shi, Mengjie and Wan, Yuanyu and Song, Mingli}, booktitle = {European Conference on Artificial Intelligence}, year = {2023}, series = {Proceedings of European Conference on Artificial Intelligence}, publisher = {ECAI}, }

2022

ICML 2022 (Spotlight)
Topology-aware Generalization of Decentralized SGD

Tongtian Zhu, Fengxiang He, Lan Zhang, Zhengyang Niu, Mingli Song, and Dacheng Tao

In Proceedings of the 39th International Conference on Machine Learning, 2022

My Comment Abs Bib PDF Slides Poster Code

This paper studies the algorithmic stability and generalizability of decentralized stochastic gradient descent (D-SGD). We prove that the consensus model learned by D-SGD is \mathcalO(N^-1+ m^-1 +λ^2)\mathcalO(N^-1+ m^-1 +λ^2)-stable in expectation in the non-convex non-smooth setting, where NN is the total sample size, mm is the worker number, and 1+\lambda1+λis the spectral gap that measures the connectivity of the communication topology. These results then deliver an \mathcalO(N^-(1+α)/2+ m^-(1+α)/2+λ^1+α + \phi_S)\mathcalO(N^-(1+α)/2+ m^-(1+α)/2+λ^1+α + \phi_S) in-average generalization bound, which is non-vacuous even when λλis closed to 11, in contrast to vacuous as suggested by existing literature on the projected version of D-SGD. Our theory indicates that the generalizability of D-SGD is positively correlated with the spectral gap, and can explain why consensus control in initial training phase can ensure better generalization. Experiments of VGG-11 and ResNet-18 on CIFAR-10, CIFAR-100 and Tiny-ImageNet justify our theory. To our best knowledge, this is the first work on the topology-aware generalization of vanilla D-SGD.

One-sentence summary: The first work on the topology-aware generalization analysis of vanilla Decentralized SGD algorithm.
@inproceedings{pmlr-v162-zhu22d, title = {Topology-aware Generalization of Decentralized {SGD}}, author = {Zhu, Tongtian and He, Fengxiang and Zhang, Lan and Niu, Zhengyang and Song, Mingli and Tao, Dacheng}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {27479--27503}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, publisher = {PMLR}, }