Multiple Regression and Model Selection

# Multiple Regression and Model Selection
## <br><br> Introduction to Global Health Data Science
### <a href="https://sta198f2021.github.io/website/">Back to Website</a>
### <br> Prof. Amy Herring

---

layout: true
  
<div class="my-footer">
<span>
<a href="https://sta198f2021.github.io/website/" target="_blank">Back to website</a>
</span>
</div>

---

## Read in Data and Restrict to One Person Per Household

```r
library(tidyverse)
library(tidymodels)
library(readr)
library(patchwork)
library(plotly)
library(widgetframe)
mercury <- readr::read_csv("mercury_reg.csv")
mercury <-
  mercury %>%
  # scale() subtracts the mean and divides by the SD to make the units "standard deviations" like a z-score
  mutate(assets_sc=scale(SESassets)) %>%
    #another variable we may use later
  mutate(form_min_sc=scale(FM_buffer)) %>%
  #so I don't have to remember coding
  mutate(sex,sex_cat=ifelse(sex==1,"Male","Female")) %>%
  mutate(native,native_cat=ifelse(native==1,"Native","Non-native")) 
# limit to one observation per household (household ID=1)
mercury1 <-
  mercury %>%
  filter(withinid==1)
```

---

# The linear model with multiple predictors

---

## Hair mercury vs assets and native status

.pull-left[
Linear regression model with assets (standardized to mean 0 and sd 1) and community native status as predictors and log(hair Hg) as the response.

```r
linear_reg() %>%
  set_engine("lm") %>%
  fit(lhairHg ~ assets_sc + native_cat,
      data = mercury1) %>%
  tidy()
```

```
#> # A tibble: 3 × 5
#>   term                 estimate std.error statistic  p.value
#>   <chr>                   <dbl>     <dbl>     <dbl>    <dbl>
#> 1 (Intercept)            1.00      0.0994     10.1  2.65e-22
#> 2 assets_sc             -0.0945    0.0474     -1.99 4.66e- 2
#> 3 native_catNon-native  -0.957     0.115      -8.32 4.58e-16
```

]
.pull-right[
<img src="w9-l02-multiple-reg_files/figure-html/unnamed-chunk-3-1.png" width="100%" style="display: block; margin: auto 0 auto auto;" />

]

---

## Interpretation of estimates

Model: `$lhairHg_i=\beta_0+\beta_1 assets\_sc_i + \beta_2 native\_cat_i + \varepsilon_i$`, where the `$\varepsilon_i$` have mean 0 (and are typically assumed to follow a normal distribution)

`$$\hat{\beta_0}=1.00, ~~ \hat{\beta_1}=-0.0945, ~~\hat{\beta_2}=-0.957$$`

We can predict log hair mercury as:
`$$\widehat{lhairHg}_i=\hat{\beta_0}+\hat{\beta_1} assets\_sc_i + \hat{\beta_2} native\_cat_i$$`
so
`$\widehat{lhairHg}_i=1.00 - 0.0945 \times assets\_sc_i - 0.957 \times native\_cat_i$`

---
## Interpretation of estimates

--
- **Slope - assets:** *All else held constant*, for each additional standard deviation that assets are larger, we would expect the log hair mercury to be lower, on average, by 0.09 log ppm. Alternatively, *all else held constant*, for each additional standard deviation that assets are larger, we would expect hair mercury to be lower by a factor of `$e^{-0.0945}=0.91$`.

--
- **Slope - non-native:** *All else held constant*, those living in non-native communities have hair mercury levels that are on average 0.957 log ppm lower than those living in native communities. Alternatively,  *all else held constant*, those living in non-native communities are expected to have hair mercury that is lower by a factor of `$e^{-0.957}=0.38$`.

--
- **Intercept:** Individuals with the mean level of assets (remember this is standardized!) living in native communities are expected to have hair mercury levels of 1 log ppm, on average.

---

## Interpretation of estimates

The column labeled "statistic" contains the t-statistic calculated by subtracting the hypothesized value (0 by default) from the parameter estimate, and then dividing by the appropriate standard error (so the estimate column entries divided by the std.error column entries). These are t-statistics because we had to estimate the variance (you'll learn more about how that is done if you take STA 210!).

Each of the p-values here reflects the result of a t-test of `$H_0: \beta_k=0$` for the `$k^{th}$` regression coefficient, versus the alternative `$H_A: \beta_k \neq 0$`. Generally, we don't care too much about the model intercept and focus on tests of the slope.  If we use `$\alpha=0.05$` as our significance level, then both slope parameter estimates appear to be important.

---

## Interpretation of estimates

```r
linear_reg() %>%
  set_engine("lm") %>%
  fit(lhairHg ~ assets_sc + native_cat, data = mercury1) %>%
  tidy(conf.int=TRUE)
```

```
#> # A tibble: 3 × 7
#>   term   estimate std.error statistic  p.value conf.low conf.high
#>   <chr>     <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
#> 1 (Inte…   1.00      0.0994     10.1  2.65e-22    0.805   1.20   
#> 2 asset…  -0.0945    0.0474     -1.99 4.66e- 2   -0.187  -0.00143
#> 3 nativ…  -0.957     0.115      -8.32 4.58e-16   -1.18   -0.732
```

We can use a CLT-based confidence interval here. Because we had to estimate the standard error of our estimates `$\hat{\beta}$`, we use a t-interval, and R can calculate this for us upon request.

This type of model, with one continuous predictor and the rest categorical, can also be formulated as an ANCOVA (analysis of covariance) model.

---

## Main vs. interaction effects

.question[
Suppose we want to predict hair mercury from assets and native status of the community. Do you think a model with main effects or 
interaction effects is more appropriate? Explain your reasoning.

**Hint:** Main effects would mean rate at which hair mercury changes as assets 
increase would be the same in native and non-native communities, and interaction 
effects would mean the rate at which hair mercury changes as assets increase would be different for native and non-native communities.
]

---

---

## In pursuit of Occam's razor

- Occam's Razor states that among competing hypotheses that predict equally well, the one with the fewest assumptions should be selected.

--
- Model selection follows this principle.

--
- We only want to add another variable to the model if the addition of that variable brings something valuable in terms of predictive power to the model.

--
- In other words, we prefer the simplest best model, i.e. **parsimonious** model.

---

.pull-left[
<img src="w9-l02-multiple-reg_files/figure-html/unnamed-chunk-7-1.png" width="100%" style="display: block; margin: auto;" />
]
.pull-right[
.question[
Visually, which of the two models is preferable under Occam's razor?
]
]

---

## R-squared

- `$R^2$` is the percentage of variability in the response variable explained by 
the regression model.

```r
Hg_main_fit <- linear_reg() %>%
  set_engine("lm") %>%
  fit(lhairHg ~ assets_sc + native_cat, data = mercury1)
glance(Hg_main_fit)$r.squared
```

```
#> [1] 0.1537472
```

```r
Hg_int_fit <- linear_reg() %>%
  set_engine("lm") %>%
  fit(lhairHg ~ assets_sc + native_cat + assets_sc*native_cat, data = mercury1)
glance(Hg_int_fit)$r.squared
```

```
#> [1] 0.1630589
```

--
- The model with interactions has a higher `$R^2$`.

---

## Adjusted R-squared

- Using `$R^2$` for model selection in models with multiple explanatory variables is not a good idea, as `$R^2$` never decreases when **any** variable is added to the model.

... a (more) objective measure for model selection

- Adjusted `$R^2$` doesn't increase if the new variable does not provide any new 
informaton or is completely unrelated, as it applies a penalty for number of 
variables included in the model.
- This makes adjusted `$R^2$` a preferable metric for model selection in multiple
regression models.

---

## Comparing models

```r
glance(Hg_main_fit)$r.squared
```

```
#> [1] 0.1537472
```

```r
glance(Hg_int_fit)$r.squared
```

```
#> [1] 0.1630589
```
]
.pull-right[

```r
glance(Hg_main_fit)$adj.r.squared
```

```
#> [1] 0.1512836
```

```r
glance(Hg_int_fit)$adj.r.squared
```

```
#> [1] 0.1593988
```
]

```r
glance(Hg_int_fit)$r.squared > glance(Hg_main_fit)$r.squared 
```

```
#> [1] TRUE
```

- Is R-sq adj. higher for int model?

```r
glance(Hg_int_fit)$adj.r.squared > glance(Hg_main_fit)$adj.r.squared
```

```
#> [1] TRUE
```
]
]

---

# Two numerical predictors

---

## Multiple predictors

- Response variable: `lhairHg` 
- Explanatory variables: Assets (scaled) and proximity to formal mining operations (scaled)

```r
mr_fit <- linear_reg() %>%
  set_engine("lm") %>%
  fit(lhairHg ~ assets_sc + form_min_sc, data = mercury1)
tidy(mr_fit)
```

```
#> # A tibble: 3 × 5
#>   term        estimate std.error statistic       p.value
#>   <chr>          <dbl>     <dbl>     <dbl>         <dbl>
#> 1 (Intercept)    0.249    0.0412      6.04 0.00000000254
#> 2 assets_sc     -0.258    0.0455     -5.67 0.0000000216 
#> 3 form_min_sc   -0.126    0.0432     -2.92 0.00366
```

---

##  Linear model with multiple predictors

<br>

`$$\widehat{log\_Hg}_i = 0.249 - 0.258 \times {assets\_sc}_i  - 0.126 \times {form\_min\_sc}_i$$`

---

# Numerical and categorical predictors

---

## Mercury, assets, and native communities

- Explore the relationship between hair mercury and assets, conditioned 
on whether or not the individual lives in a native community
- First visualize and explore, then model
- But first, prep the data

```r
mercury1 %>%
  count(native)
```

```
#> # A tibble: 2 × 2
#>   native     n
#>    <dbl> <int>
#> 1      0   899
#> 2      1   227
```
]

---

## Typical surface area

.panelset[
.panel[.panel-name[Plot]
.pull-left-narrow[
Note that asset levels differ across native and non-native communities.
]
.pull-right-wide[
<img src="w9-l02-multiple-reg_files/figure-html/unnamed-chunk-14-1.png" width="90%" style="display: block; margin: auto;" />
]
]
.panel[.panel-name[Code]

```r
ggplot(data = mercury1, aes(x = assets_sc, fill = native_cat)) +
  geom_histogram() +
  facet_grid(native_cat ~ .) +
  scale_fill_manual(values = c("#E48957", "#071381")) +
  guides(fill = FALSE) +
  labs(x = "Assets", y = NULL) 
```
]
]

---

## Facet to get a better look

.panelset[
.panel[.panel-name[Plot]
<img src="w9-l02-multiple-reg_files/figure-html/unnamed-chunk-15-1.png" width="80%" style="display: block; margin: auto;" />
]
.panel[.panel-name[Code]

```r
ggplot(data = mercury1,
       aes(
         y = lhairHg,
         x = assets_sc,
         color = native_cat,
         shape = native_cat
       )) +
  geom_point(alpha = 0.7) +
  facet_wrap( ~ native_cat) +
  scale_color_manual(values = c("#E48957", "#071381")) +
  theme(legend.position = "NONE") +
  labs(y = "Hg (log ppm)",
       x = "Assets (standardized)")
```
]
]

---

## Two ways to model

- **Main effects:** Assuming relationship between  log hair mercury and assets
**does not vary** by whether or not the individual resides in a native community.
- **Interaction effects:** Assuming relationship between log hair mercury and assets **varies** by whether or not the person is a resident of a native community.

---

## Interacting explanatory variables

- Including an interaction effect in the model allows for different slopes, i.e. 
non-parallel lines.

- This implies that the regression coefficient for an explanatory variable would 
change as another explanatory variable changes.

- This can be accomplished by adding an interaction variable: the product of two 
explanatory variables.

---

## Two ways to model

.pull-left-narrow[
- **Main effects:** Assuming relationship between logged mercury and assets **does not vary** by whether or not the individual lives in a native community
- **Interaction effects:** Assuming relationship between logged mercury and assets **varies** by whether or not the individual lives in a native community
]
.pull-right-wide[
<img src="w9-l02-multiple-reg_files/figure-html/hg-main-int-viz-1.png" width="70%" style="display: block; margin: auto;" />
]

---

## Fit model with main effects

- Response variable: `lhairHg`
- Explanatory variables: `assets_sc` and `native_cat`

```r
hg_main_fit <- linear_reg() %>%
  set_engine("lm") %>%
  fit(lhairHg ~ assets_sc + native_cat, data = mercury1)
tidy(hg_main_fit)
```

`$$\widehat{log\_Hg}_i = 1.00 - 0.0945 \times assets\_sc_i - 0.957 \times I(nonnative_i=1)$$`
The notation `$I(nonnative_i=1)$` indicates that predictor takes value 1 if the person's community is non-native, and 0 otherwise. 
---

## Solving the model

- Native community: Plug in 0 for `native_cat`

`$$\widehat{log\_Hg}_i = 1.00 - 0.0945 \times assets\_sc_i - 0.957  \times 0 = 1.00 - 0.0945 \times assets\_sc_i$$`

--
- Non-native community: Plug in 1 for `native_cat`

`$$\widehat{log\_Hg}_i = 1.00 - 0.0945 \times assets\_sc_i - 0.957 = 0.043 - 0.0945 \times assets\_sc_i$$`

---

## Visualizing main effects

.pull-left-narrow[
- **Same slope:** Average rate of change in log Hg as assets increase does not vary between individuals living in native and non-native communities.
- **Different intercept:** Mercury levels in hair of those living in native communities are consistently higher than levels in those living in non-native communities.
]
.pull-right-wide[
<img src="w9-l02-multiple-reg_files/figure-html/unnamed-chunk-16-1.png" width="100%" style="display: block; margin: auto;" />
]

---

## Interpreting main effects

```r
tidy(hg_main_fit) %>% 
  mutate(exp_estimate = exp(estimate)) %>%
  select(term, estimate, exp_estimate)
```

```
#> # A tibble: 3 × 3
#>   term                 estimate exp_estimate
#>   <chr>                   <dbl>        <dbl>
#> 1 (Intercept)            1.00          2.72 
#> 2 assets_sc             -0.0945        0.910
#> 3 native_catNon-native  -0.957         0.384
```
]

- All else held constant, for each additional standard deviation in assets, the hair Hg level is predicted, on average, to be lower by a factor of `$e^{-0.0945}=0.91$`.
- All else held constant, residents of non-native communities are predicted, on average, to have hair Hg concentrations that are lower by a factor of `$e^{-0.957}=0.38$` than concentrations among residence of native communities.
- Hair Hg concentrations of native community residents who have the mean level of assets are predicted, on average, to be 2.72 ppm

---

## Main vs. interaction effects

- The way we specified our main effects model only lets `native_cat` affect the intercept.

- Model implicitly assumes that residents in each type of community have the *same slope* and only allows for *different intercepts*.

.question[
What seems more appropriate in this case?
+ Same slope and same intercept for both colors
+ Same slope and different intercept for both colors
+ Different slope and different intercept for both colors
]

---

## Interaction: `Assets_sc*native_cat`

---

## Fit model with interaction effects

- Response variable: lhairHg
- Explanatory variables: `assets_sc`, `native_cat`, and their interaction

```r
Hg_int_fit <- linear_reg() %>%
  set_engine("lm") %>%
  fit(lhairHg ~ assets_sc * native_cat, data = mercury1)
```

---

```r
tidy(Hg_int_fit)
```

```
#> # A tibble: 4 × 5
#>   term                   estimate std.error statistic     p.value
#>   <chr>                     <dbl>     <dbl>     <dbl>       <dbl>
#> 1 (Intercept)               0.779    0.127       6.11     1.66e-9
#> 2 assets_sc                -0.320    0.0944     -3.39     7.29e-4
#> 3 native_catNon-native     -0.757    0.135      -5.59     3.25e-8
#> 4 assets_sc:native_catN…    0.301    0.109       2.76     5.89e-3
```

$$\widehat{log\_Hg}_i = 0.779 - 0.320 \times assetssc_i $$
$$ - 0.757 \times I(nonnative_i=1) + 0.301 \times {assetssc}_i \times I(nonnative_i=1)$$

---

## Interpretation of interaction effects

Rate of change in log hair Hg as assets increase does 
vary between residents of native and non-native communities (different slopes), and those in native communities have generally higher levels of hair mercury than those in non-native communities (different intercept + lines do not cross in range of data).

- Native community resident:

$$\widehat{log\_Hg}_i = 0.779 - 0.320 \times assetssc_i - 0.757 \times 0 + 0.301 \times assetssc_i \times 0 $$
`$$\widehat{log\_Hg}_i = 0.779 - 0.320 \times assetssc_i$$`

- Non-native community resident: 
`$$\widehat{log\_Hg}_i = 0.779 - 0.320 \times assetssc_i - 0.757 \times 1 + 0.301 \times assetssc_i \times 1$$`

`$$\widehat{log\_Hg}_i = 0.022 - 0.019 \times assetssc_i$$`

---

## Interpretation of interaction effects

`$$\widehat{log\_Hg}_i = 0.779 - 0.320 \times assetssc_i$$`

- Non-native community resident:

`$$\widehat{log\_Hg}_i = 0.022 - 0.019 \times assetssc_i$$`

]
]
.pull-right[
<img src="w9-l02-multiple-reg_files/figure-html/viz-interaction-effects2-1.png" width="80%" style="display: block; margin: auto;" />
]

---

## Comparing models

It appears that adding the interaction actually increased adjusted `$R^2$`, so we 
should indeed use the model with the interactions.

```r
glance(hg_main_fit)$adj.r.squared
```

```
#> [1] 0.1512836
```

```r
glance(hg_int_fit)$adj.r.squared
```

```
#> [1] 0.1593988
```

---

## Third order interactions

- Can you? Yes
- Should you? Probably not if you want to interpret these interactions in context
of the data.