Problem Formulation

Assumptions

The number of training environments $|\mathcal{E}_{\text{train}}| = K < \infty$.
For any spurious feature $s$ correlated with $Y$ in $e_i$, there exists $e_j \in \mathcal{E}_{\text{train}}$ where $s \perp\!\!\!\perp Y$.
We have access to samples from interventional distributions $P(X | do(Y=y))$ via intervention $\mathbb{Q}$.
Representations are $L$-Lipschitz continuous with respect to appropriate metrics.

We formalize the anti-causal representation learning problem as follows: given a causal structure where label $Y$ causes the observation $X$ and environment $E$ also influences $X$ ($Y \rightarrow X \leftarrow E$), our goal is to learn representations that capture the causal generative invariant from $Y$ to $X$.

Note: A causal generative invariant is a stable function $f: \mathcal{Y} \rightarrow \mathcal{X}$ by which $Y$ produces $X$, formally represented as $X = f(Y, \epsilon)$ where $\epsilon$ represents noise, and $f$ is invariant across environments.

Learning Goal

Given observations from environments $\mathcal{E} = \{e_i\}_{i=1}^n$ with corresponding datasets $\mathcal{D} = \{D_{e_i}\}_{i=1}^n$, we aim to learn two-level representations:

Low-level representation $\phi_L: \mathcal{D} \subset \mathcal{X} \rightarrow \mathcal{Z}_L$ that extracts features from raw data to uncover the anti-causal structure including both $Y \rightarrow X$ and $E \rightarrow X$.
High-level representation $\phi_H: \mathcal{Z}_L \rightarrow \mathcal{Z}_H$ that distills environment-invariant causal features from $\phi_L$, i.e., enforcing $\phi_H(\phi_L(X)) \perp E \mid Y$.

The predictor $\mathcal{C}: \mathcal{Z}_H \rightarrow \mathcal{Y}$ then maps high-level representations to labels. With a loss function $\ell: \mathcal{Y} \times \mathcal{Y} \rightarrow \mathbb{R}_+$ defined across all environments $\mathcal{E}$, the full model $f = \mathcal{C} \circ \phi_H \circ \phi_L$ can be trained in an end-to-end fashion.

Learning the Two-Level Representations

More specifically, we introduce causal dynamics (Theorem 3) to facilitate learning low-level representations $\mathcal{Z}_L$ by jointly optimizing the loss with a causal structure consistency regularizer ($R_2$), where minimizing it encourages the low-level representations to align with the true causal mechanisms underlying the data.

On top of $\mathcal{Z}_L$, we further introduce causal abstraction (Theorem 4) to learn high-level representations, guided by another environment independence regularizer ($R_1$). This regularizer measures the discrepancy between the expected high-level representations across environments conditioned on the label $Y$. Minimizing it can remove environment-specific information while retaining label-relevant causal features.

Learning the Two-Level Representations

Two-Level Design Rationale