# A case studyMar 13 2019

## Can you use residual plots to look for correlated errors?

It might show up in the plots we’ve talked about…

There are some specific purpose plots too.

.. illustrates several of the ambiguities and difficulties encountered in statistical practice. – Faraway

Indeed, it has been said democracy is the worst form of Government except all those other forms that have been tried from time to time. – Winston Churchill

We might say the same about statistics with respect to how it helps us reason in the face of uncertainty. It is not entirely satisfying but the alternatives are worse. – Faraway

## Background

Insurance redlining refers to the practice of refusing to issue insurance to certain types of people or within some geographic area.

Seems reasonable to refuse someone based on their past behavior (e.g. drunk driving), but it would be wrong to refuse someone based on their race.

It is the 1970s. Chicago neighborhoods are being red-lined. Residents claim racial discrimination. Insurance company claims it’s based on historical losses. (This is a simplification)

## Available data

We don’t actually have data on insurance refusals for individuals.

We do have the number of applications for FAIR plans (a city plan for people who get refused insurance) at a zip code level.

data(chredlin, package = "faraway")
head(chredlin)
##       race fire theft  age involact income side
## 60626 10.0  6.2    29 60.4      0.0 11.744    n
## 60640 22.2  9.5    44 76.5      0.1  9.323    n
## 60613 19.6 10.5    36 73.5      1.2  9.948    n
## 60657 17.3  7.7    37 66.9      0.5 10.656    n
## 60614 24.5  8.6    53 81.4      0.7  9.730    n
## 60610 54.0 34.1    68 52.6      0.3  8.231    n

## Getting started

• Do some basic exploration of the data.

• Verify the observation: “Zips with higher proportions of the minority have higher rates of FAIR plan application”

• What would the insurance company argue? How could we address their argument?