We will cover techniques in modern data analysis: estimation, regression and econometrics, prediction, experimental design, randomized control trials (and A/B testing), machine learning, and data visualization. We will illustrate these concepts with applications drawn from real world examples and frontier research. Finally, we will provide instruction for how to use the statistical package R and opportunities for students to perform self-directed empirical analyses.
This course is designed for anyone who wants to learn how to work with data and communicate data-driven findings effectively.
What you'll learn:
- Intuition behind probability and statistical analysis
- How to summarize and describe data
- A basic understanding of various methods of evaluating social programs
- How to present results in a compelling and truthful way
-Skills and tools for using R for data analysis
- MODULE 0: THE BASICS OF R
Introduction to the software R with exercises. Suggested resources for learning more on the web.
- MODULE 1: INTRODUCTION
Introduction to the power of data and data analysis, overview of what will be covered in the course.
- MODULE 2: FUNDAMENTALS OF PROBABILITY, RANDOM VARIABLES, DISTRIBUTIONS AND JOINT DISTRIBUTIONS
Basics of probability and introduction to random variables.
Discussion of distributions and joint distributions.
- MODULE 3: GATHERING AND COLLECTING DATA, ETHICS, AND KERNEL DENSITY ESTIMATES
Introduction to collecting data through surveys, web scraping, and other data collection methods.
Principles and practical steps for protection of human subjects in research.
Discussion of kernel density estimates.
- MODULE 4: JOINT, MARGINAL, AND CONDITIONAL DISTRIBUTIONS & FUNCTIONS OF RANDOM VARIABLES
Builds on the basics from module 2 to cover joint, marginal, and conditional distributions.
Similarly builds on the basics from module 2 to cover functions of random variables.
- MODULE 5: MOMENTS OF A RANDOM VARIABLE, APPLICATIONS TO AUCTIONS, & INTRO TO REGRESSION
Discussion of moments of a distribution, expectation, and variance.
Application of some principles of probability to the analysis of auctions.
Basics of regression analysis.
- MODULE 6: SPECIAL DISTRIBUTIONS, THE SAMPLE MEAN, CENTRAL LIMIT THEOREM, AND ESTIMATION
Discussion of properties of special distributions with several examples.
Statistics: Introduction to the sample mean, central limit theorem, and estimation.
- MODULE 7: ASSESSING AND DERIVING ESTIMATORS- CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
Deriving and assessing estimators.
Constructing and interpreting confidence intervals.
Introduction to hypothesis testing.
- MODULE 8: CAUSALITY, ANALYSING RANDOMIZED EXPERIMENTS, & NONPARAMETRIC REGRESSION
Understanding randomization in the context of experimentation.
Introduction to nonparametric regression techniques
- MODULE 9: SINGLE AND MULTIVARIATE LINEAR MODELS
In-depth discussion of the linear model and the multivariate linear model.
- MODULE 10: PRACTICAL ISSUES IN RUNNING REGRESSIONS, AND OMITTED VARIABLE BIAS
Covariates, fixed effects, and other functional forms.
Introduction to regression discontinuity design.
- MODULE 11: INTRO TO MACHINE LEARNING AND DATA VISUALIZATION
Introduction to the use of machine learning for prediction. Covers tuning and training.
Principles of data visualization with examples of well-crafted visual presentations of data.
- MODULE 12: ENDOGENEITY, INSTRUMENTAL VARIABLES, AND EXPERIMENTAL DESIGN
Understanding the problem of endogeneity. Introduction to instrumental variables and two stage least squares, with a discussion of how to assess the validity of an instrument.
Discussion of how to design an effective experiment, followed by an example from Indonesia.