DICE: Data Influence Cascade in Decentralized Learning

DICE

Updates in progress. More coming soon! 😉

🗓️ 2025-07-10 — Updated main results

Main Results

Theorem (Approximation of r-hop DICE-GT)

The r-hop DICE-GT influence \(\mathcal{I}_{\mathrm{DICE-GT}}^{(r)}(\boldsymbol{z}_j^t, \boldsymbol{z}^{\prime})\) can be approximated as follows:

\[\begin{equation} \begin{split} &\mathcal{I}_{\mathrm{DICE-E}}^{(r)}(\boldsymbol{z}_j^t, \boldsymbol{z}^{\prime})\\ & = - \sum_{\rho=0}^{r} \sum_{ (k_1, \dots, k_{\rho}) \in P_j^{(\rho)} } \eta^{t} q_{k_\rho}  \underbrace{ \left( \prod_{s=1}^{\rho} \boldsymbol{W}_{k_s, k_{s-1}}^{t+s-1} \right) }_{\text{communication graph-related term}} \times \underbrace{ \nabla L\bigl(\boldsymbol{\theta}_{k_{\rho}}^{t+\rho}; \boldsymbol{z}^{\prime}\bigr)^\top }_{\text{test gradient}} \\ & \quad \times \underbrace{ \left( \prod_{s=2}^{\rho} \left( \boldsymbol{I} - \eta^{t+s-1} \boldsymbol{H}(\boldsymbol{\theta}_{k_s}^{t+s-1}; \boldsymbol{z}_{k_s}^{t+s-1}) \right) \right) }_{\text{curvature-related term}} \times \underbrace{ \Delta_j(\boldsymbol{\theta}_j^t,\boldsymbol{z}_j^t) }_{\text{optimization-related term}} \end{split} \end{equation}\]

where \(\Delta_j(\boldsymbol{\theta}_j^t,\boldsymbol{z}_j^t) = \mathcal{O}_j(\boldsymbol{\theta}_j^t,\boldsymbol{z}_j^t)-\boldsymbol{\theta}_j^t\), \(k_0 = j\). Here \(P_j^{(\rho)}\) denotes the set of all sequences \((k_1, \dots, k_{\rho})\) such that \(k_s \in \mathcal{N}_{\mathrm{out}}^{(1)}(k_{s-1})\) for \(s=1,\dots,\rho\) and \(\boldsymbol{H}(\boldsymbol{\theta}_{k_s}^{t+s}; \boldsymbol{z}_{k_s}^{t+s})\) is the Hessian matrix of \(L\) with respect to \(\boldsymbol{\theta}\) evaluated at \(\boldsymbol{\theta}_{k_s}^{t+s}\) and data \(\boldsymbol{z}_{k_s}^{t+s}\).

For the cases when \(\rho = 0\) and \(\rho = 1\), the relevant product expressions are defined as identity matrices, thereby ensuring that the r-hop DICE-E remains well-defined.

Key Insights from DICE

Our theory uncovers the intricate interplay of factors that shape data influence in decentralized learning:

  • 1. Asymmetric Influence and Topological Importance: The influence of identical data is not uniform across the network. Instead, nodes with greater topological significance exert stronger influence.
  • 2. The Role of Intermediate Nodes and Loss Landscape: Intermediate nodes actively contribute to an "influence chain". The local loss landscape of these models also actively shapes the influence as it propagates through the network.
  • 3. Influence Cascades with Damped Decay: Data influence cascades with "damped decay" induced by mixing parameter W. This decay, which can be exponential with the number of hops, ensures that influence is "localized".

Citation

Cite Our Paper 😀

If you find our work insightful, we would greatly appreciate it if you could cite our paper.

@inproceedings{zhu2025dice,
  title="{DICE: Data Influence Cascade in Decentralized Learning}",
  author="Tongtian Zhu and Wenhao Li and Can Wang and Fengxiang He",
  booktitle="The Thirteenth International Conference on Learning Representations",
  year="2025",
  url="https://openreview.net/forum?id=2TIYkqieKw"
}