← Back to Main Page ... Go to Next Page (Theoretical Framework)

Problem Formulation

Assumptions

We formalize the anti-causal representation learning problem as follows: given a causal structure where label $Y$ causes the observation $X$ and environment $E$ also influences $X$ ($Y \rightarrow X \leftarrow E$), our goal is to learn representations that capture the causal generative invariant from $Y$ to $X$.

Note: A causal generative invariant is a stable function $f: \mathcal{Y} \rightarrow \mathcal{X}$ by which $Y$ produces $X$, formally represented as $X = f(Y, \epsilon)$ where $\epsilon$ represents noise, and $f$ is invariant across environments.
Learning Goal

Given observations from environments $\mathcal{E} = \{e_i\}_{i=1}^n$ with corresponding datasets $\mathcal{D} = \{D_{e_i}\}_{i=1}^n$, we aim to learn two-level representations:

The predictor $\mathcal{C}: \mathcal{Z}_H \rightarrow \mathcal{Y}$ then maps high-level representations to labels. With a loss function $\ell: \mathcal{Y} \times \mathcal{Y} \rightarrow \mathbb{R}_+$ defined across all environments $\mathcal{E}$, the full model $f = \mathcal{C} \circ \phi_H \circ \phi_L$ can be trained in an end-to-end fashion.

Learning the Two-Level Representations

More specifically, we introduce causal dynamics (Theorem 3) to facilitate learning low-level representations $\mathcal{Z}_L$ by jointly optimizing the loss with a causal structure consistency regularizer ($R_2$), where minimizing it encourages the low-level representations to align with the true causal mechanisms underlying the data.

On top of $\mathcal{Z}_L$, we further introduce causal abstraction (Theorem 4) to learn high-level representations, guided by another environment independence regularizer ($R_1$). This regularizer measures the discrepancy between the expected high-level representations across environments conditioned on the label $Y$. Minimizing it can remove environment-specific information while retaining label-relevant causal features.

Two-Level Design Rationale

This hierarchical structure enables us to do two procedures:

  1. Interventional effect calculation: $\phi_L$ captures how the anti-causal setting responds to interventions, handling both perfect and imperfect intervention scenarios.
  2. Information bottleneck: $\phi_H$ retains only label-relevant invariants while discarding environment-specific noise.