DICE: Data Influence Cascade in Decentralized Learning

DICE

Updates in progress. More coming soon! 😉

🗓️ 2025-07-10 — Updated main results

Main Results

Theorem (Approximation of r-hop DICE-GT)

The r-hop DICE-GT influence \(\mathcal{I}_{\mathrm{DICE-GT}}^{(r)}(\boldsymbol{z}_j^t, \boldsymbol{z}^{\prime})\) can be approximated as follows:

\[\begin{equation} \begin{split} &\mathcal{I}_{\mathrm{DICE-E}}^{(r)}(\boldsymbol{z}_j^t, \boldsymbol{z}^{\prime})\\ & = - \sum_{\rho=0}^{r} \sum_{ (k_1, \dots, k_{\rho}) \in P_j^{(\rho)} } \eta^{t} q_{k_\rho}  \underbrace{ \left( \prod_{s=1}^{\rho} \boldsymbol{W}_{k_s, k_{s-1}}^{t+s-1} \right) }_{\text{communication graph-related term}} \times \underbrace{ \nabla L\bigl(\boldsymbol{\theta}_{k_{\rho}}^{t+\rho}; \boldsymbol{z}^{\prime}\bigr)^\top }_{\text{test gradient}} \\ & \quad \times \underbrace{ \left( \prod_{s=2}^{\rho} \left( \boldsymbol{I} - \eta^{t+s-1} \boldsymbol{H}(\boldsymbol{\theta}_{k_s}^{t+s-1}; \boldsymbol{z}_{k_s}^{t+s-1}) \right) \right) }_{\text{curvature-related term}} \times \underbrace{ \Delta_j(\boldsymbol{\theta}_j^t,\boldsymbol{z}_j^t) }_{\text{optimization-related term}} \end{split} \end{equation}\]

where \(\Delta_j(\boldsymbol{\theta}_j^t,\boldsymbol{z}_j^t) = \mathcal{O}_j(\boldsymbol{\theta}_j^t,\boldsymbol{z}_j^t)-\boldsymbol{\theta}_j^t\), \(k_0 = j\). Here \(P_j^{(\rho)}\) denotes the set of all sequences \((k_1, \dots, k_{\rho})\) such that \(k_s \in \mathcal{N}_{\mathrm{out}}^{(1)}(k_{s-1})\) for \(s=1,\dots,\rho\) and \(\boldsymbol{H}(\boldsymbol{\theta}_{k_s}^{t+s}; \boldsymbol{z}_{k_s}^{t+s})\) is the Hessian matrix of \(L\) with respect to \(\boldsymbol{\theta}\) evaluated at \(\boldsymbol{\theta}_{k_s}^{t+s}\) and data \(\boldsymbol{z}_{k_s}^{t+s}\).

For the cases when \(\rho = 0\) and \(\rho = 1\), the relevant product expressions are defined as identity matrices, thereby ensuring that the r-hop DICE-E remains well-defined.

Key Insights from DICE

Our theory uncovers the intricate interplay of factors that shape data influence in decentralized learning:

  • 1. Asymmetric Influence and Topological Importance: The influence of identical data is not uniform across the network. Instead, nodes with greater topological significance exert stronger influence.
  • 2. The Role of Intermediate Nodes and Loss Landscape: Intermediate nodes actively contribute to an "influence chain". The local loss landscape of these models also actively shapes the influence as it propagates through the network.
  • 3. Influence Cascades with Damped Decay: Data influence cascades with "damped decay" induced by mixing parameter W. This decay, which can be exponential with the number of hops, ensures that influence is "localized".

Citation

Cite Our Paper 😀

If you find our work insightful, we would greatly appreciate it if you could cite our paper.

@inproceedings{zhu2025dice,
  title="{DICE: Data Influence Cascade in Decentralized Learning}",
  author="Tongtian Zhu and Wenhao Li and Can Wang and Fengxiang He",
  booktitle="The Thirteenth International Conference on Learning Representations",
  year="2025",
  url="https://openreview.net/forum?id=2TIYkqieKw"
}

Background

Motivation: Quantifying Contribution in Decentralized Learning

Decentralized learning offers a promising approach to crowdsource computational workloads across geographically distributed compute nodes connected through P2P networks. However, in these systems, “proper incentives are still in absence, considerably discouraging participation”. Our vision is that a fair incentive mechanism relies on the fair attribution of contributions from participating nodes. This leads to the fundamental problem:

How to quantify individual contributions in decentralized learning?

Why Centralized Influence Estimation is Not Applicable

Quantifying data influence has been well studied in the centralized paradigm. These methods, however, are not applicable to decentralized learning. The challenges are non-trivial, arising from the unique characteristics of decentralized networks. We identify two key observations:

  1. Neighbors as “Customers”: In decentralized learning, neighbors who serve as customers hold the rights to determine data influence. This aligns with the “customer-centric principle (Drucker, 1985) in determining value”.
  2. Dynamic, Cascading Influence: Data influence is not static; it “spreads across participants through gossips during training”. The influence of data on one node propagates to its neighbors and cascades to multi-hop neighbors. We term this mechanism as cascading influence.

Existing estimators, designed for a single, static model, cannot account for this recursive propagation and are therefore unsuitable.