Are mercury concentrations in hair related to household socioeconomic position, as measured by assets?
library(tidyverse)library(tidymodels)library(readr)mercury <- readr::read_csv("mercury_reg.csv")mercury <- mercury %>% # scale() subtracts the mean and divides by the SD to make the units "standard deviations" like a z-score mutate(assets_sc=scale(SESassets)) %>% #another variable we may use later mutate(form_min_sc=scale(FM_buffer)) %>% #so I don't have to remember coding mutate(sex,sex_cat=ifelse(sex==1,"Male","Female")) %>% mutate(native,native_cat=ifelse(native==1,"Native","Non-native"))
ˆyi=ˆβ0+ˆβ1×xi
Here yi represents log hair mercury for subject i, and xi represents standardized household income.
linear_reg()
#> Linear Regression Model Specification (regression)#> #> Computational engine: lm
linear_reg() %>% set_engine("lm") # lm: linear model
#> Linear Regression Model Specification (regression)#> #> Computational engine: lm
... using formula syntax
linear_reg() %>% set_engine("lm") %>% fit(lhairHg ~ assets_sc, data = mercury)
#> parsnip model object#> #> Fit time: 1ms #> #> Call:#> stats::lm(formula = lhairHg ~ assets_sc, data = data)#> #> Coefficients:#> (Intercept) assets_sc #> 0.2795 -0.3577
#> parsnip model object#> #> Fit time: 1ms #> #> Call:#> stats::lm(formula = lhairHg ~ assets_sc, data = data)#> #> Coefficients:#> (Intercept) assets_sc #> 0.2795 -0.3577
ˆyi=0.2795−0.3577×xi
linear_reg() %>% set_engine("lm") %>% fit(lhairHg ~ assets_sc, data = mercury) %>% tidy()
#> # A tibble: 2 × 5#> term estimate std.error statistic p.value#> <chr> <dbl> <dbl> <dbl> <dbl>#> 1 (Intercept) 0.280 0.0213 13.1 7.49e-38#> 2 assets_sc -0.358 0.0216 -16.5 3.90e-58
ˆyi=0.28−0.358×xi
ˆyi=0.2795−0.3577×xi
Intercept: Individuals with household assets at the mean level (assets_sc
=0) are expected to have hair mercury concentrations of 0.2795 log ppm, on average
assets_sc
was standardized, so assets_sc
=0 does not mean "no assets" but instead means "average assets" as it corresponds to an assets z-score of 0The slope coefficient for the log transformed model is -0.3577, meaning the log mercury difference between people whose household incomes are one SD apart is predicted to be -0.3577 log ppm.
The slope coefficient for the log transformed model is -0.3577, meaning the log mercury difference between people whose household incomes are one SD apart is predicted to be -0.3577 log ppm.
Using this information, and properties of logs that we just reviewed, fill in the blanks in the following alternate interpretation of the slope:
For each additional SD the household assets are greater, the hair mercury concentration is expected to be
___
, on average, by a factor of___
.
For each additional increase in scaled assets, hair mercury content is expected to be
___
, on average, by a factor of___
.
log(hair Hg for assets x+1)−log(hair Hg for assets x)=−0.3577
For each additional increase in scaled assets, hair mercury content is expected to be
___
, on average, by a factor of___
.
log(hair Hg for assets x+1)−log(hair Hg for assets x)=−0.3577
log(hair Hg for assets x+1hair Hg for assets x)=−0.3577
For each additional increase in scaled assets, hair mercury content is expected to be
___
, on average, by a factor of___
.
log(hair Hg for assets x+1)−log(hair Hg for assets x)=−0.3577
log(hair Hg for assets x+1hair Hg for assets x)=−0.3577
elog(Hg for assets x+1Hg for assets x)=e−0.3577
For each additional increase in scaled assets, hair mercury content is expected to be
___
, on average, by a factor of___
.
log(hair Hg for assets x+1)−log(hair Hg for assets x)=−0.3577
log(hair Hg for assets x+1hair Hg for assets x)=−0.3577
elog(Hg for assets x+1Hg for assets x)=e−0.3577
Hg for assets x+1Hg for assets x≈0.70
For each additional increase in scaled assets, hair mercury content is expected to be
___
, on average, by a factor of___
.
log(hair Hg for assets x+1)−log(hair Hg for assets x)=−0.3577
log(hair Hg for assets x+1hair Hg for assets x)=−0.3577
elog(Hg for assets x+1Hg for assets x)=e−0.3577
Hg for assets x+1Hg for assets x≈0.70
For each SD increase in assets, the hair Hg is expected to be lower, on average, by a factor of 0.70.
Remember this when interpreting model coefficients
Source: XKCD, Cell phones
yi=β0+β1 xi+εi where ε represents random error around our mean
yi=β0+β1 xi+εi where ε represents random error around our mean
yi=β0+β1 xi+εi where ε represents random error around our mean
ˆyi=b0+b1 xi or ˆyi=ˆβ0+ˆβ1 xi
ˉy=b0+b1ˉx → b0=ˉy−b1ˉx
ˉy=b0+b1ˉx → b0=ˉy−b1ˉx
ˉy=b0+b1ˉx → b0=ˉy−b1ˉx
ˉy=b0+b1ˉx → b0=ˉy−b1ˉx
The correlation coefficient describes the linear relationship between x and y on the standard deviation scale.
cor(mercury$lhairHg,mercury$assets_sc,use="complete.obs")
#> [,1]#> [1,] -0.326132
Here, for each one SD increase in assets, we expect a -0.33 SD increase (so thus a 0.33 decrease) in standard deviations of log hair mercury.
#> # A tibble: 2,308 × 3#> lhairHg assets_sc[,1] native_cat#> <dbl> <dbl> <chr> #> 1 0.676 -0.837 Non-native#> 2 0.0908 0.197 Non-native#> 3 1.67 -0.280 Non-native#> 4 0.450 -0.280 Non-native#> 5 0.704 -0.280 Non-native#> 6 -0.512 -0.280 Non-native#> 7 -0.125 0.826 Non-native#> 8 -0.103 0.826 Non-native#> 9 0.351 -1.27 Non-native#> 10 0.0444 -1.27 Non-native#> 11 0.662 -0.359 Non-native#> 12 1.01 1.29 Non-native#> 13 -0.149 1.29 Non-native#> 14 0.881 0.517 Non-native#> 15 1.51 0.517 Non-native#> 16 0.328 -0.658 Non-native#> 17 0.0553 0.533 Non-native#> 18 0.768 1.29 Non-native#> 19 0.431 1.30 Non-native#> 20 1.55 1.30 Non-native#> # … with 2,288 more rows
Native
: community classified as native by Peruvian governmentNon-native
: community does not have native classificationlinear_reg() %>% set_engine("lm") %>% fit(lhairHg ~ native_cat, data = mercury) %>% tidy()
#> # A tibble: 2 × 5#> term estimate std.error statistic p.value#> <chr> <dbl> <dbl> <dbl> <dbl>#> 1 (Intercept) 1.12 0.0416 26.9 4.97e-139#> 2 native_catNon-native -1.11 0.0476 -23.2 2.94e-107
Define Non-nativei=1 if non-native community and 0 otherwise.
^yi=1.12−1.11 Non-nativei
Slope: Residents of non-native communities, on average, are expected to have Hg concentrations that are -1.11 log ppm lower, on average, than residents of native communities. Alternatively, the hair Hg is expected to be lower, on average, by a factor of e−1.11=0.33 among residents of non-native communities.
Native
) to the other level (Non-native
)Intercept: Individuals living in native communities (Non-native
=0) are expected to have hair mercury exposures of 1.12 log ppm.
linear_reg() %>% set_engine("lm") %>% fit(lhairHg ~ community, data = mercury) %>% tidy()
#> # A tibble: 23 × 5#> term estimate std.error statistic p.value#> <chr> <dbl> <dbl> <dbl> <dbl>#> 1 (Intercept) 0.614 0.0644 9.54 3.55e-21#> 2 communityBoca Isiriwe 1.20 0.159 7.56 5.77e-14#> 3 communityBoca Manu 0.833 0.129 6.43 1.56e-10#> 4 communityCaychihue -1.10 0.132 -8.36 1.09e-16#> 5 communityChoque -0.778 0.117 -6.68 3.04e-11#> 6 communityDiamante 1.06 0.109 9.67 1.00e-21#> 7 communityHuepetuhe -0.694 0.0732 -9.48 5.96e-21#> 8 communityIsla de los Valles 1.12 0.245 4.57 5.18e- 6#> 9 communityMasenawa 1.20 0.401 3.00 2.74e- 3#> 10 communityPalotoa Teparo 0.350 0.148 2.36 1.82e- 2#> # … with 13 more rows
#> # A tibble: 23 × 5#> term estimate std.error statistic p.value#> <chr> <dbl> <dbl> <dbl> <dbl>#> 1 (Intercept) 0.614 0.0644 9.54 3.55e-21#> 2 communityBoca Isiriwe 1.20 0.159 7.56 5.77e-14#> 3 communityBoca Manu 0.833 0.129 6.43 1.56e-10#> 4 communityCaychihue -1.10 0.132 -8.36 1.09e-16#> 5 communityChoque -0.778 0.117 -6.68 3.04e-11#> 6 communityDiamante 1.06 0.109 9.67 1.00e-21#> 7 communityHuepetuhe -0.694 0.0732 -9.48 5.96e-21#> 8 communityIsla de los Valles 1.12 0.245 4.57 5.18e- 6#> 9 communityMasenawa 1.20 0.401 3.00 2.74e- 3#> 10 communityPalotoa Teparo 0.350 0.148 2.36 1.82e- 2#> # … with 13 more rows
community | BI | BM | Cay | Cho | Dia | Hue | Isl | Mas |
---|---|---|---|---|---|---|---|---|
Boca Colorado | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Boca Isiriwe | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Boca Manu | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
Caychihue | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
Choque | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
Diamante | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
Huepetuhe | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
Isla de los Valles | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
Masenawa | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
etc. for the other communities. Note Boca Colorado, the referent community, =0 for all vars
#> # A tibble: 23 × 5#> term estimate std.error statistic p.value#> <chr> <dbl> <dbl> <dbl> <dbl>#> 1 (Intercept) 0.614 0.0644 9.54 3.55e-21#> 2 communityBoca Isiriwe 1.20 0.159 7.56 5.77e-14#> 3 communityBoca Manu 0.833 0.129 6.43 1.56e-10#> 4 communityCaychihue -1.10 0.132 -8.36 1.09e-16#> 5 communityChoque -0.778 0.117 -6.68 3.04e-11#> 6 communityDiamante 1.06 0.109 9.67 1.00e-21#> 7 communityHuepetuhe -0.694 0.0732 -9.48 5.96e-21#> 8 communityIsla de los Valles 1.12 0.245 4.57 5.18e- 6#> 9 communityMasenawa 1.20 0.401 3.00 2.74e- 3#> 10 communityPalotoa Teparo 0.350 0.148 2.36 1.82e- 2#> # … with 13 more rows
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |