We will cover techniques in modern data analysis: estimation, regression and econometrics, prediction, experimental design, randomized control trials (and A/B testing), machine learning, and data visualization. We will illustrate these concepts with applications drawn from real world examples and frontier research. Finally, we will provide instruction for how to use the statistical package R and opportunities for students to perform self-directed empirical analyses.

This course is designed for anyone who wants to learn how to work with data and communicate data-driven findings effectively.

**What you'll learn:**

- Intuition behind probability and statistical analysis

- How to summarize and describe data

- A basic understanding of various methods of evaluating social programs

- How to present results in a compelling and truthful way

-Skills and tools for using R for data analysis

### Course Syllabus

- MODULE 0: THE BASICS OF R

Introduction to the software R with exercises. Suggested resources for learning more on the web.

- MODULE 1: INTRODUCTION

Introduction to the power of data and data analysis, overview of what will be covered in the course.

- MODULE 2: FUNDAMENTALS OF PROBABILITY, RANDOM VARIABLES, DISTRIBUTIONS AND JOINT DISTRIBUTIONS

Basics of probability and introduction to random variables.

Discussion of distributions and joint distributions.

- MODULE 3: GATHERING AND COLLECTING DATA, ETHICS, AND KERNEL DENSITY ESTIMATES

Introduction to collecting data through surveys, web scraping, and other data collection methods.

Principles and practical steps for protection of human subjects in research.

Discussion of kernel density estimates.

- MODULE 4: JOINT, MARGINAL, AND CONDITIONAL DISTRIBUTIONS & FUNCTIONS OF RANDOM VARIABLES

Builds on the basics from module 2 to cover joint, marginal, and conditional distributions.

Similarly builds on the basics from module 2 to cover functions of random variables.

- MODULE 5: MOMENTS OF A RANDOM VARIABLE, APPLICATIONS TO AUCTIONS, & INTRO TO REGRESSION

Discussion of moments of a distribution, expectation, and variance.

Application of some principles of probability to the analysis of auctions.

Basics of regression analysis.

- MODULE 6: SPECIAL DISTRIBUTIONS, THE SAMPLE MEAN, CENTRAL LIMIT THEOREM, AND ESTIMATION

Discussion of properties of special distributions with several examples.

Statistics: Introduction to the sample mean, central limit theorem, and estimation.

- MODULE 7: ASSESSING AND DERIVING ESTIMATORS- CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Deriving and assessing estimators.

Constructing and interpreting confidence intervals.

Introduction to hypothesis testing.

- MODULE 8: CAUSALITY, ANALYSING RANDOMIZED EXPERIMENTS, & NONPARAMETRIC REGRESSION

Understanding randomization in the context of experimentation.

Introduction to nonparametric regression techniques

- MODULE 9: SINGLE AND MULTIVARIATE LINEAR MODELS

In-depth discussion of the linear model and the multivariate linear model.

- MODULE 10: PRACTICAL ISSUES IN RUNNING REGRESSIONS, AND OMITTED VARIABLE BIAS

Covariates, fixed effects, and other functional forms.

Introduction to regression discontinuity design.

- MODULE 11: INTRO TO MACHINE LEARNING AND DATA VISUALIZATION

Introduction to the use of machine learning for prediction. Covers tuning and training.

Principles of data visualization with examples of well-crafted visual presentations of data.

- MODULE 12: ENDOGENEITY, INSTRUMENTAL VARIABLES, AND EXPERIMENTAL DESIGN

Understanding the problem of endogeneity. Introduction to instrumental variables and two stage least squares, with a discussion of how to assess the validity of an instrument.

Discussion of how to design an effective experiment, followed by an example from Indonesia.