American University: Statistics 618/SPA 696 (Every FALL): Bayesian Statistics for Social and Biomedical Sciences





Course Description
Principles and applications of modern statistical decision theory, with a special focus on Bayesian modeling, data analysis, inference, and optimal decision making. Prior and posterior; comparison of Bayesian and frequentist approaches, including minimax decision making and elementary game theory. Bayesian estimation, hypothesis testing, credible sets, and Bayesian prediction. Introduction to Bayesian computing software and applications to diverse fields. Grading: A-F only. Prerequisite: STAT-514 or permission of instructor. 

Location: Wednesday, 11:20-2:10 AM, Kerwin Hall Room 201.

Learning Outcomes: By the end of this course, students will be able to:
1. Demonstrate a basic understanding of Bayesian model specification, Bayesian posterior inference, and model assessment and comparison.

2. Use this understanding of Bayesian statistics to specify and estimate Bayesian multilevel (hierarchical) models with linear and nonlinear outcomes, treat missing data in a principled and correct manner using multiple imputation, gain facility in the R and bugs statistical languages, know how to compute the appropriate sample size and power calculations for Bayesian models, gain exposure to Bayesian approaches including MCMC computation, and be able to assess model reliability and fit in complex models.

3. Apply this understanding of Bayesian statistics to data in the social and biomedical sciences.

4. Convey analytical results from these models to both lay and technical audiences clearly in both writing and speech.

Prerequisite Details: This course assumes a knowledge of basic statistics as taught in a first year undergraduate or graduate sequence. Topices should include: probability, cross-tabulation, basic statistical summaries, and linear regression in either scalar or matrix form. Knowledge of R, basic matrix algebra and calculus is helpful.

Course Requirements and Expectations: The final grade will be based on two components: weekly attendance and participation (20%) and exercises (80%). Graduate students will have one additional component of their exercise grade that constitutes 30 points out of the 80 points total: submission of an analysis of real research using a multilevel model applied to data in their field along with 5-10 pages of discussion to include a description of the data, model diagnostics, and the subsequent findings. Consider this assignment to be the start of a research manuscript to be eventually submitted to a an academic journal. Graduate students will still submit all exercises assigned below in addition to this work.

Office Hours: By appointment.

Incompletes: Due to the scheduled nature of the course, no incompletes will be given.

Teaching Assistant: Kumail Wasif, location by arrangement. Office Hours: Wednesday 9-11.

Required Reading: Gelman and Hill, "Data Analysis Using Regression and Multilevel/Hierarchical Models (Cambridge University Press 2007). Some papers will be available at or distributed by the instructor. Readings should be completed before class. A NYT story about radon.

Statement Regarding Student Resources: see the following link

The Academic Integrity Code:

Datasets: the data are either provided in links below or are available at Gelman's webpage for the book.

Topics (subject to minor change):
August 28: No class meeting (conference commitment). Reading:  R For Beginners (to make sure you are fluent on the basics)

September 4: Introducing Bayesian Inference. Reading: Gelman & Hill, Chapters 1 and 2,  MLE Review, Intro code from the lecture, Bayesian mechanics slides. Exercises: Gelman & Hill 2.2, 2.3, 2.4.

September 11: Linear Model Theory Review. Reading: Gelman & Hill, Chapters 3 and 4, Chapter 3-4 code from the lecture, Binomial PMF likelihood grid search, lecture slides (do not print!). Anaemia data. Tweed data. clx.R. Exercises: Gelman & Hill 3.4, 4.4, 5.4, 6.1.

September 18: Multilevel Structures and Multilevel Linear Models: the Basics. Reading: Gelman & Hill, Chapters 11 and 12, Introductory Chapter (Gill and Womack, from the SAGE Handbook of Multilevel Modeling). Lecture slides and chapter 11-12 code. Radon dataUranium data. Smoking data. Exercises: Gelman & Hill 11.4, 12.2, 12.5.

September 25: Multilevel Linear Models: Varying Slopes, Non-Nested Models and Other Complexities. Reading: Gelman & Hill, Chapter 13, Lecture slidesChapter 13 code from the lecture. Exercises: Gelman & Hill 13.2, 13.4, 13.5.

October 2: Multilevel Logistic Regression, Multilevel Generalized Linear Models. Reading: Gelman & Hill, Chapter 14 (skip Section 14.3), Chapter 15, Lecture slidesChapter 14 code from the lecture. Exercises: Gelman & Hill 14.5, 14.6, 15.1, 15.2. Speed Dating Data, NES Data (remove .txt appendix, load with foreign library), polls.dta file (remove .txt appendix, load with foreign library).

October 9: Multilevel Modeling in Bugs and R: the Basics, MCMC Theory. Part 1. Reading: Bayesian Estimation Case Study (Gill and Witko 2012), R to JAGS code for the model (get data from here), Numerical methods slides. Exercise: Replicate the model in Gill and Witko (2012).

October 16: Multilevel Modeling in Bugs and R: the Basics, MCMC Theory. Part 2. Reading: Gelman & Hill Chapter 16, Chapter 16 code from the lecture. Lecture slides. Exercises: Gelman & Hill 16.1, 16.2, 16.3 (due November 7).

October 23: Causal Inference. Guest lecture by Dr. Ryan Moore. Reading: Gelman & Hill Chapter 9. Exercises: 9.4.

October 30: Fitting Multilevel Linear and Generalized Linear Models in Bugs and R, MCMC Coding. Reading: Gelman & Hill, Chapter 16, Chapter 17 code from the lecture. Exercises: Gelman & Hill Rerun 16.3 using the instructions in 17.2, 17.3, 17.5.

November 6: Likelihood and Bayesian Inference, Computation, MCMC Diagnostics and Customization. Reading: Gelman & Hill, Chapter 18. Chapter 18 code from the lecture. Exercises: Gelman & Hill 18.1, 18.2, 18.4.

November 13: Treatment of Missing Data. Reading: Gelman & Hill, Chapter 25, Paper by van Buuren and Groothuis-Oudshoorn, Chapter 25 code from the lecture. Exercises: missing data problem set (use this dataset).

November 20: Thanksgiving Holiday. No homework.

November 27: Understanding and Summarizing the Fitted Models, Multilevel Analysis of Variance. Reading: Gelman & Hill, Chapter 21 code from the lecture, Chapter 22 code from the lecture. Exercises: 21.1, 21.3, 21.4, 22.1.

December 5: Model Checking and Comparison. Reading: Gelman & Hill, Chapter 24. Chapter 24 code from the lecture. Exercises: none. All remaining homework due this day. Data analysis project due Friday, December 7.