← Back to Main Page

10. More Related Work

Table 1: Summarizing Causal and Non-Causal Invariant Representation Learning Methods

Method Anti-causal Structure SCM Requirements Imperfect Interventions Intervention Inference Nonparametric High-dim Data OOD
Distribution/Domain-invariant Learning
(C-)ADA
Domain adaptation
DDAIG
L2A-OT
ERM
DOMAINBED
StableNet
SagNets
SWAD
FACT
Evaluation Protocol
Ratatouille
XRM
FeAT
AIA
IRM
Rex
CI to Spurious
Information Bottleneck
CausalDA
Transportable Rep
Structure-based Causal Representation Learning
DISRL
Causal Disentanglement
ICP
ICP for nonlinear
Active ICP
CSG
LECI
KCDC
Separation & Risk
Intervention-based Causal Learning
Nonparametric ICR
General Nonlinear Mixing
Weakly supervised
iCaRL
CIRL
ICRL
LCA
ICA
AIT
ACIA

More Related Work

In Table 1, we broadly summarize causal and non-causal invariant representation learning methods.

Distribution/Domain-invariant Learning

Early non-causal methods like (C-)ADA and DDAIG focused on domain adaptation strategies, while more recent works such as FeAT and AIA have developed sophisticated objective functions for distribution shift robustness. Domain adaptation methods, L2A-OT, ERM, DOMAINBED, Adversarial and Pre-training, StableNet, SagNets, SWAD, Theoretical Framework, FACT, Evaluation Protocol, Ratatouille, XRM, IRM, Rex, CI to Spurious, Information Bottleneck, CausalDA, and Transportable Rep also fall under this category as they aim to learn representations invariant across different distributions or domains.

Intervention-based Causal Learning

Foundational works in the causal domain include Nonparametric ICR which jointly learns encoders and intervention targets with SCM-based structures and General Nonlinear Mixing which addresses non-linear relationships in latent spaces, which model causal effects through interventions. Weakly supervised methods, iCaRL, CIRL, and LCA also leverage interventions for learning causal representations. ICRL also falls under this category. Weak distributional invariances considers perfect interventions for single-node, and addresses multi-node imperfect interventions by identifying latent variables whose distributional properties remain stable. Independent Component Analysis (ICA) focus on unsupervised identification of latent causal variables through component analysis, and operate the disentanglement through taxonomic distance measures and graph-based analysis. AIT builds on SCMs with explicit DAG assumptions, and primarily focuses on standard causal direction.

Structure-based Causal Representation Learning

Methods like DISRL and Causal Disentanglement explicitly model causal structures. Foundational works like ICP and its nonlinear extension, as well as CSG and LECI which focuses on identifying causal subgraphs while removing spurious correlations, also incorporate causal structure but often require explicit Directed Acyclic Graphs (DAGs) or focus on identifying causal subgraphs. KCDC employs kernel methods primarily for causal discovery and orientation, and focuses on statistical independence tests through kernel measures. Anti-causal separation and risk invariance inputs are generated as functions of target labels and protected attributes. They use conventional causal modeling with DAGs and do-calculus.

Table 2: Key Notations in Anti-Causal Representation Learning Framework

Name Symbol Name Symbol
Environment\(e_i\)Product sample space\(\Omega = \Omega_{e_i} \times \Omega_{e_j}\)
Set of environments\(\mathcal{E}\)Product \(\sigma\)-algebra\(\mathscr{H} = \mathscr{H}_{e_i} \otimes \mathscr{H}_{e_j}\)
Environment sample space\(\Omega_{e_i}\)Product probability measure\(\mathbb{P} = \mathbb{P}_{e_i} \otimes {P}_{e_j}\)
\(\sigma\)-algebra on \(\Omega_{e_i}\)\(\mathscr{H}_{e_i}\)Product causal kernel family\(\mathbb{K} = \{K_S : S \in \mathscr{P}(T)\}\)
Probability measure\(\mathbb{P}_{e_i}\)Input space\(\mathcal{X}\)
Causal kernel\(K_{e_i}\)Low-level latent space domain\(\mathcal{D}_{\mathbb{Z}_L}\)
Environment causal space\((\Omega_{e_i}, \mathscr{H}_{e_i}, \mathbb{P}_{e_i}, K_{e_i})\)High-level latent space\(\mathcal{D}_{Z_H}\)
Causal product space\((\Omega, \mathscr{H}, \mathbb{P}, \mathbb{K})\)Label space\(\mathcal{Y}\)
Sub-\(\sigma\)-algebra\(\mathscr{H}_S\)Low-level representation\(\phi_L: \mathcal{X} \rightarrow \mathcal{D}_{\mathbb{Z}_L}\)
Index set\(T = T_{e_i} \cup T_{e_j}\)High-level representation\(\phi_H: \mathcal{D}_{\mathbb{Z}_L} \rightarrow \mathcal{D}_{Z_H}\)
Interventional kernel\(K_S^{do(\mathcal{X}, \mathbb{Q})}(\omega, A)\)Predictor\(\mathcal{C}: \mathcal{D}_{Z_H} \rightarrow \mathcal{Y}\)
Intervention measure\(\mathbb{Q}(\cdot|\cdot)\)Full predictive model\(f = \mathcal{C} \circ \phi_H \circ \phi_L\)
Marginal measure on \(\mathcal{Y}\)\(\mu_Y\)Loss function\(\ell: \mathcal{Y} \times \mathcal{Y} \rightarrow \mathbb{R}_+\)
Causal dynamic\(\mathcal{Z}_L = \langle \mathcal{X}, \mathbb{Q}, \mathbb{K}_L\rangle\)Environment independence reg.\(R_1\)
Causal abstraction\(\mathcal{Z}_H = \langle \mathbf{V}_H, \mathbb{K}_H \rangle\)Causal structure alignment reg.\(R_2\)
Set of low-level kernels\(\mathbb{K}_L = \{K_S^{\mathbb{Z}_{L}}(\omega, A)\}\)Regularization parameters\(\lambda_1, \lambda_2\)
Set of high-level kernels\(\mathbb{K}_H = \{K_S^{\mathbb{Z}_{H}}(\omega, A)\}\)Conditional mutual information\(I(X; E=e \mid Y)\)

Comparison of Prior Information Requirements

Table 3 compares the prior knowledge requirements across state-of-the-art causal representation learning methods. Our results demonstrate that despite requiring less prior information, ACIA outperforms these methods.

Table 3: Comparison of prior information requirements across causal representation learning methods

Method Causal Structure SCM Knowledge Intervention Type
ACTIRAnti-causal \(Y \rightarrow X \leftarrow E\)Variable roles onlyPerfect only
CausalDADAG structureVariable typesPerfect only
LECIPartial connectivityVariable relationshipsPerfect only
ACIA\(\mathbf{Anti-causal\ Y \rightarrow X \leftarrow E}\)\(\mathbf{Variable\ roles\ only}\)\(\mathbf{Both\ perfect\ and\ imperfect}\)