This is really a collection of resources for myself. It covers applications of machine learning methods to quantitative problems in economics/econometrics. It doesn’t cover the economics of machine learning or artificial intelligence.
It’s broken down by resource type:
I’ll try and keep it updated — let me know if any of the links get broken or you think there’s something I should add!
The NBER summer institutes in econometrics from 2013 and 2015 are heavily ML related. Both include full videos of the lectures and downloadable slides. The 2013 series focuses on high dimensional data, including text-as-data. The 2015 series focuses on more generic ML methods and applications to causal problems.
The Becker Friedman Institute at UChicago held an event “Machine Learning: What’s in it for Economics?” in 2016. It covers a relatively broad set of topics including some network analysis and choice modelling. The slides for the sessions are available here, and videos of the lectures are available here.
This is a course run by Victor Chernozhukov at various institions. I went to a version at cemmap. I never figured out how to get updates on events from cemmap, but there is an old course page online still. Not sure if this will be run again any time soon. If it is, it covers some basics of machine learning methods and then applies them to typical treatment effect estimation problems.
Susan Athey hosts a public google drive
folder with a range of materials for different talks, and tutorials in R
for
some treatment effect methods. The ate_tutorial.html
and hte_tutorial.html
are particularly useful.
Victor Chernozhukov hosts a public dropbox containing a full set of course materials. Includes lecture notes, labs, and code. This is the background material that’s used for the course discussed above.
If you’re not familiar with this as a topic area, these are probably the best place to start. This is a series of Journal of Economic Perspectives papers. They’re all very readable and cover a good introduction to the basics of machine learning and how it can be applied to quantitative economic problems.
I’d probably start with the first and the last, the middle two are a bit more technical.
You’re almost certainly going to need to look outside Stata
to implement
these methods in real world problems. Though the latest version of Stata
does offer tools for high-dimensional inference using LASSO.
Below are some R
and python
packages I’ve found useful.
Microsoft makes the ALICE/EconML package for python
.
It uses a consistent API to estimate treatment effects in a variety of settings, using
base learners from scikit-learn
.
Uber makes the CausalML package for python
.
It focuses almost exclusively on heterogenous treatment effect estimation problems,
though it also provides interpretability tools for those problems.
There’s also an Uber engineering blogpost on similar topics.
GRF-labs is a collection of researchers at Stanford.
They made the R
packages grf
(or “generalised random forests”), and now policytree
.
These are packages implementing various forest-based methods, mostly focusing on
heterogenous treatment effect estimation and optimal policy choices.
These packages are extremely high-quality relative to most academic releases. (Which is not intended as a slight on anyone! — it just isn’t an academic’s job to also be a professional developer).
This is an R
package implementing various high-dimensional methods related to LASSO
estimation. The CRAN mirror of it is here. Personally
I found this quite buggy, to the point that I abandoned using it.