10. More Related Work

Table 1: Summarizing Causal and Non-Causal Invariant Representation Learning Methods

Method	Anti-causal Structure	SCM Requirements	Imperfect Interventions	Intervention Inference	Nonparametric	High-dim Data	OOD
Distribution/Domain-invariant Learning
(C-)ADA	❌	❌	❌	❌	❌	✅	❌
Domain adaptation	❌	❌	❌	❌	❌	✅	✅
DDAIG	❌	❌	❌	❌	❌	✅	✅
L2A-OT	❌	❌	❌	❌	❌	✅	✅
ERM	❌	❌	❌	❌	❌	✅	❌
DOMAINBED	❌	❌	❌	❌	❌	✅	✅
StableNet	❌	❌	❌	❌	❌	✅	✅
SagNets	❌	❌	❌	❌	❌	✅	✅
SWAD	❌	❌	❌	❌	❌	✅	❌
FACT	❌	❌	❌	❌	✅	✅	✅
Evaluation Protocol	❌	❌	❌	❌	❌	✅	✅
Ratatouille	❌	❌	❌	❌	❌	✅	✅
XRM	❌	❌	❌	❌	✅	✅	❌
FeAT	❌	❌	❌	❌	✅	✅	✅
AIA	❌	❌	❌	❌	✅	✅	✅
IRM	❌	❌	❌	❌	❌	✅	❌
Rex	❌	❌	❌	❌	✅	✅	✅
CI to Spurious	❌	❌	❌	❌	✅	✅	✅
Information Bottleneck	❌	❌	❌	❌	✅	✅	✅
CausalDA	✅	❌	❌	❌	✅	✅	✅
Transportable Rep	✅	❌	❌	❌	✅	✅	✅
Structure-based Causal Representation Learning
DISRL	❌	✅	❌	✅	✅	✅	✅
Causal Disentanglement	❌	✅	❌	❌	❌	✅	✅
ICP	❌	✅	❌	❌	✅	❌	❌
ICP for nonlinear	❌	✅	❌	❌	✅	❌	✅
Active ICP	❌	✅	❌	❌	✅	❌	✅
CSG	❌	✅	❌	❌	✅	✅	✅
LECI	❌	✅	✅	❌	✅	✅	✅
KCDC	❌	✅	✅	❌	✅	✅	✅
Separation & Risk	❌	✅	✅	❌	✅	✅	✅
Intervention-based Causal Learning
Nonparametric ICR	❌	✅	✅	✅	✅	✅	✅
General Nonlinear Mixing	❌	✅	✅	❌	✅	✅	✅
Weakly supervised	❌	✅	✅	✅	✅	✅	✅
iCaRL	❌	✅	❌	✅	✅	✅	✅
CIRL	❌	✅	✅	✅	✅	✅	✅
ICRL	❌	✅	❌	❌	✅	✅	✅
LCA	❌	✅	✅	✅	❌	✅	✅
ICA	❌	✅	✅	✅	❌	✅	✅
AIT	❌	✅	✅	✅	❌	✅	✅
ACIA	✅	❌	✅	✅	✅	✅	✅

More Related Work

In Table 1, we broadly summarize causal and non-causal invariant representation learning methods.

Distribution/Domain-invariant Learning

Early non-causal methods like (C-)ADA and DDAIG focused on domain adaptation strategies, while more recent works such as FeAT and AIA have developed sophisticated objective functions for distribution shift robustness. Domain adaptation methods, L2A-OT, ERM, DOMAINBED, Adversarial and Pre-training, StableNet, SagNets, SWAD, Theoretical Framework, FACT, Evaluation Protocol, Ratatouille, XRM, IRM, Rex, CI to Spurious, Information Bottleneck, CausalDA, and Transportable Rep also fall under this category as they aim to learn representations invariant across different distributions or domains.

Intervention-based Causal Learning

Foundational works in the causal domain include Nonparametric ICR which jointly learns encoders and intervention targets with SCM-based structures and General Nonlinear Mixing which addresses non-linear relationships in latent spaces, which model causal effects through interventions. Weakly supervised methods, iCaRL, CIRL, and LCA also leverage interventions for learning causal representations. ICRL also falls under this category. Weak distributional invariances considers perfect interventions for single-node, and addresses multi-node imperfect interventions by identifying latent variables whose distributional properties remain stable. Independent Component Analysis (ICA) focus on unsupervised identification of latent causal variables through component analysis, and operate the disentanglement through taxonomic distance measures and graph-based analysis. AIT builds on SCMs with explicit DAG assumptions, and primarily focuses on standard causal direction.

Structure-based Causal Representation Learning

Methods like DISRL and Causal Disentanglement explicitly model causal structures. Foundational works like ICP and its nonlinear extension, as well as CSG and LECI which focuses on identifying causal subgraphs while removing spurious correlations, also incorporate causal structure but often require explicit Directed Acyclic Graphs (DAGs) or focus on identifying causal subgraphs. KCDC employs kernel methods primarily for causal discovery and orientation, and focuses on statistical independence tests through kernel measures. Anti-causal separation and risk invariance inputs are generated as functions of target labels and protected attributes. They use conventional causal modeling with DAGs and do-calculus.

Table 2: Key Notations in Anti-Causal Representation Learning Framework

Name	Symbol	Name	Symbol
Environment	\(e_i\)	Product sample space	\(\Omega = \Omega_{e_i} \times \Omega_{e_j}\)
Set of environments	\(\mathcal{E}\)	Product \(\sigma\)-algebra	\(\mathscr{H} = \mathscr{H}_{e_i} \otimes \mathscr{H}_{e_j}\)
Environment sample space	\(\Omega_{e_i}\)	Product probability measure	\(\mathbb{P} = \mathbb{P}_{e_i} \otimes {P}_{e_j}\)
\(\sigma\)-algebra on \(\Omega_{e_i}\)	\(\mathscr{H}_{e_i}\)	Product causal kernel family	\(\mathbb{K} = \{K_S : S \in \mathscr{P}(T)\}\)
Probability measure	\(\mathbb{P}_{e_i}\)	Input space	\(\mathcal{X}\)
Causal kernel	\(K_{e_i}\)	Low-level latent space domain	\(\mathcal{D}_{\mathbb{Z}_L}\)
Environment causal space	\((\Omega_{e_i}, \mathscr{H}_{e_i}, \mathbb{P}_{e_i}, K_{e_i})\)	High-level latent space	\(\mathcal{D}_{Z_H}\)
Causal product space	\((\Omega, \mathscr{H}, \mathbb{P}, \mathbb{K})\)	Label space	\(\mathcal{Y}\)
Sub-\(\sigma\)-algebra	\(\mathscr{H}_S\)	Low-level representation	\(\phi_L: \mathcal{X} \rightarrow \mathcal{D}_{\mathbb{Z}_L}\)
Index set	\(T = T_{e_i} \cup T_{e_j}\)	High-level representation	\(\phi_H: \mathcal{D}_{\mathbb{Z}_L} \rightarrow \mathcal{D}_{Z_H}\)
Interventional kernel	\(K_S^{do(\mathcal{X}, \mathbb{Q})}(\omega, A)\)	Predictor	\(\mathcal{C}: \mathcal{D}_{Z_H} \rightarrow \mathcal{Y}\)
Intervention measure	\(\mathbb{Q}(\cdot\|\cdot)\)	Full predictive model	\(f = \mathcal{C} \circ \phi_H \circ \phi_L\)
Marginal measure on \(\mathcal{Y}\)	\(\mu_Y\)	Loss function	\(\ell: \mathcal{Y} \times \mathcal{Y} \rightarrow \mathbb{R}_+\)
Causal dynamic	\(\mathcal{Z}_L = \langle \mathcal{X}, \mathbb{Q}, \mathbb{K}_L\rangle\)	Environment independence reg.	\(R_1\)
Causal abstraction	\(\mathcal{Z}_H = \langle \mathbf{V}_H, \mathbb{K}_H \rangle\)	Causal structure alignment reg.	\(R_2\)
Set of low-level kernels	\(\mathbb{K}_L = \{K_S^{\mathbb{Z}_{L}}(\omega, A)\}\)	Regularization parameters	\(\lambda_1, \lambda_2\)
Set of high-level kernels	\(\mathbb{K}_H = \{K_S^{\mathbb{Z}_{H}}(\omega, A)\}\)	Conditional mutual information	\(I(X; E=e \mid Y)\)

Comparison of Prior Information Requirements

Table 3 compares the prior knowledge requirements across state-of-the-art causal representation learning methods. Our results demonstrate that despite requiring less prior information, ACIA outperforms these methods.

"Variable roles only" refers to knowing which variable is the target (\(Y\)), observations (\(X\)), and environmental factors (\(E\)).
"Variable types" means knowing the data type of each variable, such as whether it is binary, categorical, or drawn from a specific noise distribution.
"Variable relationship" refers to knowing the causal structure (i.e., the directionality in the causal DAG) among the variables.

Table 3: Comparison of prior information requirements across causal representation learning methods

Method	Causal Structure	SCM Knowledge	Intervention Type
ACTIR	Anti-causal \(Y \rightarrow X \leftarrow E\)	Variable roles only	Perfect only
CausalDA	DAG structure	Variable types	Perfect only
LECI	Partial connectivity	Variable relationships	Perfect only
ACIA	\(\mathbf{Anti-causal\ Y \rightarrow X \leftarrow E}\)	\(\mathbf{Variable\ roles\ only}\)	\(\mathbf{Both\ perfect\ and\ imperfect}\)