Principles and applications of modern statistical decision theory, with a special focus on Bayesian modeling, data analysis, inference, and optimal decision making. Prior and posterior; comparison of Bayesian and frequentist approaches, including minimax decision making and elementary game theory. Bayesian estimation, hypothesis testing, credible sets, and Bayesian prediction. Introduction to Bayesian computing software and applications to diverse fields. Grading: A-F only. Prerequisite: STAT-514 or permission of instructor.
Location: Wednesday, 11:20-2:10 AM, Kerwin Hall Room 207. There may be some sessions online. Details to be determined.
Learning Outcomes: By the end of this course, students will be able to:
1. Demonstrate a basic understanding of Bayesian model specification, Bayesian posterior inference, and model assessment and comparison.
2. Use this understanding of Bayesian statistics to specify and estimate Bayesian multilevel (hierarchical) models with linear and nonlinear outcomes, treat missing data in a principled and correct manner using multiple imputation, gain facility in the R and bugs statistical languages, know how to compute the appropriate sample size and power calculations for Bayesian models, gain exposure to Bayesian approaches including MCMC computation, and be able to assess model reliability and fit in complex models.
3. Apply this understanding of Bayesian statistics to data in the social and biomedical sciences.
4. Convey analytical results from these models to both lay and technical audiences clearly in both writing and speech.
Prerequisite Details: This course assumes a knowledge of basic statistics as taught in a first year undergraduate or graduate sequence. Topices should include: probability, cross-tabulation, basic statistical summaries, and linear regression in either scalar or matrix form. Knowledge of R is essential, knowledge of basic matrix algebra and calculus is helpful.
Course Requirements and Expectations: The final grade will be based on two components: weekly attendance and participation (20%) and exercises (80%). Graduate students will have one additional component of their exercise grade that constitutes 30 points out of the 80 points total: submission of an analysis of real research using a multilevel model applied to data in their field along with 5-10 pages of discussion to include a description of the data, model diagnostics, and the subsequent findings. Consider this assignment to be the start of a research manuscript to be eventually submitted to a an academic journal. Graduate students will still submit all exercises assigned below in addition to this work. Some guidelines are here.
Office Hours: Zoom meeting by appointment.
Incompletes: Due to the scheduled nature of the course, no incompletes will be given.
Teaching Assistant: Kumail Wasif, Zoom hours TBD.
Required Reading: Gelman and Hill, "Data Analysis Using Regression and Multilevel/Hierarchical Models (Cambridge University Press 2007). Some papers will be available at jstor.org or distributed by the instructor. Readings should be completed before class. A NYT story about radon.
Statement Regarding Student Resources: see the following link https://edspace.american.edu/ctrl/classroomsupport/.
The Academic Integrity Code: http://www.american.edu/academics/integrity/code.cfm.
Emergency Preparedness. In the event of an emergency, students should refer to the AU Web site (http://www.american.edu/emergency) and the AU information line at (202) 885-1100 for general university-wide information. In case of a prolonged closure of the University, I send updates to you by email and will post all announcements on the course web site.
Support Services. A wide range of services is available to support you in your efforts to meet the course requirements. Mathematics & Statistics Tutoring Lab (x3154, x3120, Don Myers Building, Room 103) provides tutoring in Mathematics and Statistics. Lab hours are Mo-Th 11 am – 8 pm, Fr 11 am – 3 pm, and Su 3 pm – 8 pm. http://www.american.edu/cas/mathstat/tutoring.cfm. Academic Support and Access Center (x3360, MGC 243) offers study skills workshops, individual instruction, tutor referrals, Supplemental Instruction, writing support, and technical and practical support and assistance with accommodations for students with physical, medical, or psychological disabilities. Writing support is also available in the Writing Center, Battelle-Tompkins 228. CTRL Connect – software support with R (firstname.lastname@example.org, x2117). Counseling Center (x3500, MGC 214) offers counseling and consultations regarding personal concerns, self-help information, and connections to off-campus mental health resources.
Datasets: the data are either provided in links below or are available at Gelman's webpage for the book.
Topics (subject to minor change):
August 26: Introducing Bayesian Inference. Reading: R For Beginners (to make sure you are fluent on the basics), Gelman & Hill, Chapters 1 and 2, MLE Review, Intro code from the lecture, Bayesian mechanics slides. Exercises: Gelman & Hill 2.2, 2.3, 2.4.
September 2: Linear Model Theory Review. Reading: Gelman & Hill, Chapters 3 and 4, Chapter 3-4 code from the lecture, Binomial PMF likelihood grid search, lecture slides (do not print!). Anaemia data. Tweed data. clx.R. Exercises: Gelman & Hill 3.4, 4.4, 5.4, 6.1.
September 9: Multilevel Structures and Multilevel Linear Models: the Basics. Reading: Gelman & Hill, Chapters 11 and 12, Introductory Chapter (Gill and Womack, from the SAGE Handbook of Multilevel Modeling). Lecture slides and chapter 11-12 code. Radon data. Uranium data. Smoking data. Exercises: Gelman & Hill 11.4, 12.2, 12.5.
September 16: Multilevel Linear Models: Varying Slopes, Non-Nested Models and Other Complexities. Reading: Gelman & Hill, Chapter 13, Lecture slides, Chapter 13 code from the lecture. Exercises: Gelman & Hill 13.2, 13.4, 13.5.
September 23: Multilevel Logistic Regression, Multilevel Generalized Linear Models. Reading: Gelman & Hill, Chapter 14 (skip Section 14.3), Chapter 15, Lecture slides, Chapter 14 code from the lecture. Exercises: Gelman & Hill 14.5, 14.6, 15.1, 15.2. Speed Dating Data, NES Data (remove .txt appendix, load with foreign library), polls.dta file (remove .txt appendix, load with foreign library), cheney.asia.sub.txt, police_stops_data.txt.
September 30: Multilevel Modeling in Bugs and R: the Basics, MCMC Theory. Part 1. Reading: Bayesian Estimation Case Study (Gill and Witko 2012), R to JAGS code for the model (get data from here), Lecture slides. Exercise: Replicate the model in Gill and Witko (2012).
October 7: Causal Inference. Guest lecture by Dr. Ryan Moore. Reading: Gelman & Hill Chapters 9 and 10. Exercises: 9.4.
October 14: Multilevel Modeling in Bugs and R: the Basics, MCMC Theory. Part 2. Reading: Gelman & Hill Chapter 16, Chapter 16 code from the lecture. Lecture slides. Exercises: Gelman & Hill 16.1, 16.2, 16.3.
October 21: Fitting Multilevel Linear and Generalized Linear Models in Bugs and R, MCMC Coding. Reading: Gelman & Hill, Chapter 16, Chapter 17 code from the lecture. Exercises: Gelman & Hill Rerun 16.3 with instructions from 17.2 & 17.3, AND do 17.5 using the age guessing data.
October 28: Understanding and Summarizing the Fitted Models, Multilevel Analysis of Variance. Reading: Gelman & Hill, Chapter 21 slides, Chapter 22 slides, Chapter 21 code from the lecture, Chapter 22 code from the lecture. CD4 data. Caesarian data. Bypass data. Depression data. Exercises: 21.1, 21.3, 21.4, 22.1.
November 11: Treatment of Missing Data. Reading: Gelman & Hill, Chapter 25, Paper by van Buuren and Groothuis-Oudshoorn. Lecture slides. Chapter 25 code from the lecture. Exercises: missing data problem set (use this dataset).
November 25: Online Wrap Up and Discussion.