Tutorial 4

Causal Random Forests: Estimating Heterogeneous Treatment Effects with Machine Learning 

Date: TBA
Time: TBA
Place: TBA

Recent developments in Artificial Intelligence and Machine Learning have transformed empirical research across the social sciences. Yet a persistent gap remains between prediction and causation: standard supervised learning excels at predicting outcomes but cannot, on its own, answer the counterfactual questions that motivate most applied research. Did the policy work, for whom, and by how much? The emergence of causal machine learning, and Causal Random Forests in particular, offers a principled bridge between the flexibility of ensemble methods and the rigour of econometric identification. 

This tutorial seeks to enhance awareness and understanding of causal machine learning methods for estimating heterogeneous treatment effects, while providing a comprehensive overview of the underlying theory and the tools available for their application. The tutorial is structured into three parts to facilitate a thorough exploration, moving from foundations, to interpretation, to hands-on estimation, and discussion. 

The first part will begin with the motivation for causal inference and the limitations of standard machine learning. We will introduce the potential outcomes framework, the core identification assumptions, and the central estimand, the Conditional Average Treatment Effect (CATE). The focus will then shift to the Robinson decomposition and the key ideas behind Causal Random Forests: honest sample splitting, the causal splitting criterion, and the Generalized Random Forests framework, concluding with a discussion of valid inference and confidence intervals. 

The second part will involve a closing discussion on applications, limitations, and extensions of the method offering participants an opportunity to reflect on when these methods are appropriate, how to validate them, and where the open questions lie. 

The final part will involve participants in a hands-on session using Python and the EconML package. Working from a worked example dataset, participants will fit a causal forest, predict individualised treatment effects, estimate variances, and run the standard battery of diagnostics. This interactive session aims to provide participants with practical experience and a deeper, applied understanding of the concepts presented. 

By participating in this tutorial, attendees will gain a comprehensive understanding of causal machine learning methods, the intuition and theory behind Causal Random Forests, and the practical skills to estimate and validate heterogeneous treatment effects in their own research.

Outline

Presentation

  • Motivation: why prediction is not causation, and where standard ML fails 
  • The potential outcomes framework (Rubin Causal Model) 
  • The three identification assumptions: SUTVA, unconfoundedness, overlap 
  • Heterogeneous treatment effects and the CATE as the central estimand 
  • The Robinson decomposition and residual-on-residual regression 
  • Standard random forests as adaptive kernel smoothers  
  • Causal Random Forests: honesty and the causal splitting criterion 
  • Generalized Random Forests (GRF): a unifying framework 
  • Confidence intervals and the Infinitesimal Jackknife variance estimator 
  • Interpretability and explainability: SHAP and BLP 
  • Empirical cases and applications 

Hands-On Session 

  • Environment setup (Python and the EconML package) 
  • Fitting a causal forest on an example dataset 
  • Predicting CATEs and estimating prediction variance 
  • Diagnostics I: checking overlap and the propensity score distribution 
  • Diagnostics II: calibration via the Best Linear Projection (BLP) 
  • Interpreting heterogeneity and variable importance 

Target Audience

This tutorial is open to any conference participant who wants to become knowledgeable about recent developments in causal machine learning and the estimation of heterogeneous treatment effects. While there are no strict prerequisites, we suggest participants have a basic background in probability, regression, and applied statistics. For the hands-on session, a working laptop with internet connection is enough.

Presenters

Eugeni Gil-Ocaña is a PhD candidate at the Universitat Politècnica de València (UPV), where he works as a predoctoral researcher at the eSMART Research Center. His research interests lie in applied microeconomics, with a specific focus on the causal analysis of public policy 

Ana García-Bernabeu is a Professor of Economics at the Universitat Politècnica de València (UPV) and a researcher at the eSMART Research Center. Her research focuses on multicriteria decision analysis, sustainability assessment, and the application of artificial intelligence and causal machine learning to public policy evaluation.