This simple (somewhat Pythonesque) example uses data from a 1854 study on mental health in the fourteen counties of Massachusettts conducted by Edward Jarvis, who was then president of the American Statistical Association and discussed in Hunter (Hunter, J.M. 1987. ``Need and Demand for Mental Health Care: Massachussetts 1854.'' \emph{The Geographic Review} 77, No. 2 (April), 139-156. In the vernacular of the time, Dr. Jarvis investigated the number of lunatics per county and the tendency to care for them at home (without using a linear model). The explanatory variables are: number of lunatics per county (NBR), distance to the nearest mental healthcare center (DIST), population in the county by thousands (POP), population per square county mile (PDEN). The outcome variable is the percent of lunatics cared for in the home (PHOME).

Three variables can be transformed in order to improve linearity of the relationship and the subsequent distribution of the residuals. Can you see how?


The Data:
COUNTY NBR DIST POP PDEN PHOME
BERKSHIRE 119 97 26.656 56 77
FRANKLIN 84 62 22.260 45 81
HAMPSHIRE 94 54 23.312 72 75
HAMPDEN 105 52 18.900 94 69
WORCESTER 351 20 82.836 98 64
MIDDLESEX 357 14 66.759 231 47
ESSEX 377 10 95.004 3252 47
SUFFOLK 458 4 123.202 3042 6
NORFOLK 241 14 62.901 235 49
BRISTOL 158 14 29.704 151 60
PLYMOUTH 139 16 32.526 91 68
BARNSTABLE 78 44 16.692 93 76
NANTUCKET 12 77 1.740 179 25
DUKES 19 52 7.524 46 79