class: center, middle, inverse, title-slide # Visualising data with ggplot2 ##
Introduction to Global Health Data Science ###
Course Website
###
Prof. Amy Herring --- layout: true <div class="my-footer"> <span> <a href="https://sta198f2021.github.io/website/" target="_blank">Back to website</a> </span> </div> --- ## ggplot2 `\(\in\)` tidyverse .pull-left[ <img src="img/ggplot2-part-of-tidyverse.png" width="80%" style="display: block; margin: auto;" /> ] .pull-right[ - **ggplot2** is tidyverse's data visualization package - Structure of the code for plots can be summarized as ```r ggplot(data = [dataset], mapping = aes(x = [x-variable], y = [y-variable])) + geom_xxx() + other options ``` ] --- ## Data: Life expectancy from infancy IHME data on location (mostly countries), World Bank region, binary sex, and estimated life expectancy and population in 2019 will be used to explore the men's health gap (later we will add more years) .pull-left-narrow[ <img src="img/drabernathy.png" width="50%" style="display: block; margin: auto;" /> Dr. Abernathy of *The Simpsons* (aged 40) ] .pull-right-wide[ ```r glimpse(lifeexpwide2019) ``` ``` ## Rows: 204 ## Columns: 6 ## $ location <chr> "Afghanistan", "Albania", "Algeria", "A… ## $ worldbankregion <chr> "South Asia", "Europe and Central Asia"… ## $ pop <dbl> 38041757, 2854191, 43053054, 55312, 771… ## $ year <dbl> 2019, 2019, 2019, 2019, 2019, 2019, 201… ## $ Female <dbl> 63.22770, 81.38516, 76.81345, 74.60923,… ## $ Male <dbl> 63.46025, 75.83942, 75.61707, 69.94064,… ``` Grampa Simpson (attr. Matt Groening) <img src="img/Abraham_Simpson.png" width="10%" style="display: block; margin: auto;" /> ] --- .panelset[ .panel[.panel-name[Plot] <img src="w1-l02-ggplot2_files/figure-html/unnamed-chunk-6-1.png" width="70%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Code] ```r ggplot(data = lifeexpwide2019, mapping = aes(x = Female, y = Male, color = worldbankregion)) + geom_point() + labs(title = "Life expectancy", subtitle = "2019", x = "Female life expectancy", y = "Male life expectancy", color = "World Bank Region") ``` ] ] --- class: middle # Coding out loud --- .midi[ > **Start with the `lifeexpwide2019` data** ] .pull-left[ ```r *ggplot(data = lifeexpwide2019) ``` ] .pull-right[ <img src="w1-l02-ggplot2_files/figure-html/unnamed-chunk-7-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .midi[ > Start with the `lifeexpwide2019` data > **map female life expectancy to the x-axis** ] .pull-left[ ```r ggplot(data = lifeexpwide2019, * mapping = aes(x = Female)) ``` ] .pull-right[ <img src="w1-l02-ggplot2_files/figure-html/unnamed-chunk-8-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .midi[ > Start with the `lifeexpwide2019` data > map female life expectancy to the x-axis > **and map male life expectancy to the y-axis.** ] .pull-left[ ```r ggplot(data = lifeexpwide2019, mapping = aes(x = Female, * y = Male)) ``` ] .pull-right[ <img src="w1-l02-ggplot2_files/figure-html/unnamed-chunk-9-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .midi[ > Start with the `lifeexpwide2019` data, > map female life expectancy to the x-axis > and map male life expectancy to the y-axis. > **Represent each observation with a point** ] .pull-left[ ```r ggplot(data = lifeexpwide2019, mapping = aes(x = Female, y = Male)) + * geom_point() ``` ] .pull-right[ <img src="w1-l02-ggplot2_files/figure-html/unnamed-chunk-10-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .midi[ > Start with the `lifeexpwide2019` data, > map female life expectancy to the x-axis > and map male life expectancy to the y-axis. > Represent each observation with a point > **and map region to the color of each point.** ] .pull-left[ ```r ggplot(data = lifeexpwide2019, mapping = aes(x = Female, y = Male, * color = worldbankregion)) + geom_point() ``` ] .pull-right[ <img src="w1-l02-ggplot2_files/figure-html/unnamed-chunk-11-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .midi[ > Start with the `lifeexpwide2019` data, > map female life expectancy to the x-axis > and map male life expectancy to the y-axis. > Represent each observation with a point > and map region to the color of each point. > **Title the plot "Life expectancy"** ] .pull-left[ ```r ggplot(data = lifeexpwide2019, mapping = aes(x = Female, y = Male, color = worldbankregion)) + geom_point() + * labs(title = "Life expectancy") ``` ] .pull-right[ <img src="w1-l02-ggplot2_files/figure-html/unnamed-chunk-12-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .midi[ > Start with the `lifeexpwide2019` data, > map female life expectancy to the x-axis > and map male life expectancy to the y-axis. > Represent each observation with a point > and map region to the color of each point. > Title the plot "Life expectancy", > **add the subtitle "2019"** ] .pull-left[ ```r ggplot(data = lifeexpwide2019, mapping = aes(x = Female, y = Male, color = worldbankregion)) + geom_point() + labs(title = "Life expectancy", * subtitle = "2019") ``` ] .pull-right[ <img src="w1-l02-ggplot2_files/figure-html/unnamed-chunk-13-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .midi[ > Start with the `lifeexpwide2019` data, > map female life expectancy to the x-axis > and map male life expectancy to the y-axis. > Represent each observation with a point > and map region to the color of each point. > Title the plot "Life expectancy", > add the subtitle "2019", > **label the x and y axes as "Female life expectancy" and "Male life expectancy", respectively** ] .pull-left[ ```r ggplot(data = lifeexpwide2019, mapping = aes(x = Female, y = Male, color = worldbankregion)) + geom_point() + labs(title = "Life expectancy", subtitle = "2019", * x = "Female life expectancy", y = "Male life expectancy") ``` ] .pull-right[ <img src="w1-l02-ggplot2_files/figure-html/unnamed-chunk-14-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .midi[ > Start with the `lifeexpwide2019` data, > map female life expectancy to the x-axis > and map male life expectancy to the y-axis. > Represent each observation with a point > and map region to the color of each point. > Title the plot "Life expectancy", > add the subtitle "2019", > label the x and y axes as "Female life expectancy" and "Male life expectancy", respectively, > **label the legend "World Bank Region"** ] .pull-left[ ```r ggplot(data = lifeexpwide2019, mapping = aes(x = Female, y = Male, color = worldbankregion)) + geom_point() + labs(title = "Life expectancy", subtitle = "2019", * x = "Female life expectancy", y = "Male life expectancy",color="World Bank Region") ``` ] .pull-right[ <img src="w1-l02-ggplot2_files/figure-html/unnamed-chunk-15-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .midi[ > Start with the `lifeexpwide2019` data, > map female life expectancy to the x-axis > and map male life expectancy to the y-axis. > Represent each observation with a point > and map region to the color of each point. > Title the plot "Life expectancy", > add the subtitle "2019", > label the x and y axes as "Female life expectancy" and "Male life expectancy", respectively, > label the legend "World Bank Region", > **and add a caption for the data source.** ] .pull-left[ ```r ggplot(data = lifeexpwide2019, mapping = aes(x = Female, y = Male, color = worldbankregion)) + geom_point() + labs(title = "Life expectancy", subtitle = "2019", x = "Female life expectancy", y = "Male life expectancy",color="World Bank Region", * caption = "Source: IHME") ``` ] .pull-right[ <img src="w1-l02-ggplot2_files/figure-html/unnamed-chunk-16-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .panelset[ .panel[.panel-name[Plot] <img src="w1-l02-ggplot2_files/figure-html/unnamed-chunk-17-1.png" width="70%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Code] ```r ggplot(data = lifeexpwide2019, mapping = aes(x = Female, y = Male, color = worldbankregion)) + geom_point() + labs(title = "Life expectancy", subtitle = "2019", x = "Female life expectancy", y = "Male life expectancy",color="World Bank Region", caption = "Source: IHME") ``` ] .panel[.panel-name[Narrative] .pull-left-wide[ .midi[ Start with the `lifeexpwide2019` data, map female life expectancy to the x-axis, and map male life expectancy to the y-axis. Represent each observation with a point and map region to the color of each point. Title the plot "Life expectancy", add the subtitle "2019", label the x and y axes as "Female life expectancy" and "Male life expectancy", respectively, label the legend "World Bank Region", and add a caption for the data source. ] ] ] ] --- ## Argument names .tip[ You can omit the names of first two arguments when building plots with `ggplot()`. ] .pull-left[ ```r ggplot(data = lifeexpwide2019, mapping = aes(x = Female, y = Male, color = worldbankregion)) + geom_point() ``` ] .pull-right[ ```r ggplot(lifeexpwide2019, aes(x = Female, y = Male, color = worldbankregion)) + geom_point() ``` ] --- class: middle # Aesthetics --- ## Aesthetics options Commonly used characteristics of plotting characters that can be **mapped to a specific variable** in the data are - `color` - `shape` - `size` - `alpha` (transparency) --- ## Color .pull-left[ ```r ggplot(lifeexpwide2019, aes(x = Female, y = Male, * color = worldbankregion)) + geom_point() ``` ] .pull-right[ <img src="w1-l02-ggplot2_files/figure-html/unnamed-chunk-18-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Shape .pull-left[ ```r ggplot(lifeexpwide2019, aes(x = Female, y = Male, * shape = worldbankregion)) + geom_point() ``` ] .pull-right[ <img src="w1-l02-ggplot2_files/figure-html/unnamed-chunk-19-1.png" width="100%" style="display: block; margin: auto;" /> ] Oops! Note only 6 shapes are available, so this isn't a good option for our $>$6 regions! --- ## Shape Can also map to the same variable as `color` (helps for color vision deficiency) .pull-left[ ```r ggplot(lifeexpwide2019, aes(x = Female, y = Male, * shape = worldbankregion, * color = worldbankregion)) + geom_point() ``` ] .pull-right[ <img src="w1-l02-ggplot2_files/figure-html/unnamed-chunk-20-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Size .pull-left[ ```r ggplot(lifeexpwide2019, aes(x = Female, y = Male, color = worldbankregion, * size = pop)) + geom_point() ``` ] .pull-right[ <img src="w1-l02-ggplot2_files/figure-html/unnamed-chunk-21-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Alpha .pull-left[ ```r ggplot(lifeexpwide2019, aes(x = Female, y = Male, color = worldbankregion, * alpha = pop)) + geom_point() ``` ] .pull-right[ <img src="w1-l02-ggplot2_files/figure-html/unnamed-chunk-22-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .pull-left[ **Mapping** ```r ggplot(lifeexpwide2019, aes(x = Female, y = Male, * size = pop)) + geom_point() ``` <img src="w1-l02-ggplot2_files/figure-html/unnamed-chunk-23-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ **Setting** ```r ggplot(lifeexpwide2019, aes(x = Female, y = Male)) + * geom_point(size = 2) ``` <img src="w1-l02-ggplot2_files/figure-html/unnamed-chunk-24-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Mapping vs. setting - **Mapping:** Determine the size, alpha, etc. of points based on the values of a variable in the data - goes into `aes()` - **Setting:** Determine the size, alpha, etc. of points **not** based on the values of a variable in the data - goes into `geom_*()` (this was `geom_point()` in the previous example, but we'll learn about other geoms soon!) --- class: middle # Faceting --- ## Faceting - Smaller plots that display different subsets of the data - Useful for exploring conditional relationships and large data --- .panelset[ .panel[.panel-name[Plot] <img src="w1-l02-ggplot2_files/figure-html/unnamed-chunk-25-1.png" width="70%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Code] ```r lifeexpwide %>% filter(year %in% c(1999, 2009,2019),worldbankregion %in% c("North America","South Asia","Sub-Saharan Africa")) %>% ggplot(aes(x = Female, y = Male)) + geom_point() + * facet_grid(year~worldbankregion) ``` ] ] --- ## Various ways to facet --- ```r lifeexpwide %>% filter(year %in% c(1999, 2009,2019),worldbankregion %in% c("North America","South Asia","Sub-Saharan Africa")) %>% ggplot(aes(x = Female, y = Male)) + geom_point() + facet_wrap(~worldbankregion) ``` <img src="w1-l02-ggplot2_files/figure-html/unnamed-chunk-26-1.png" width="60%" style="display: block; margin: auto;" /> --- ```r lifeexpwide %>% filter(year %in% c(1999, 2009,2019),worldbankregion %in% c("North America","South Asia","Sub-Saharan Africa")) %>% ggplot(aes(x = Female, y = Male)) + geom_point() + facet_grid(.~worldbankregion) ``` <img src="w1-l02-ggplot2_files/figure-html/unnamed-chunk-27-1.png" width="60%" style="display: block; margin: auto;" /> --- ## Faceting summary - `facet_grid()`: - 2d grid - `rows ~ cols` - use `.` for no split - `facet_wrap()`: 1d ribbon wrapped according to number of rows and columns specified or available plotting area --- ## Facet and color .pull-left-narrow[ ```r lifeexpwide %>% filter(year %in% c(1999, 2009,2019),worldbankregion %in% c("North America","South Asia","Sub-Saharan Africa")) %>% ggplot(aes(x = Female, y = Male, color=year)) + geom_point() + facet_wrap(~worldbankregion) ``` ] .pull-right-wide[ <img src="w1-l02-ggplot2_files/figure-html/unnamed-chunk-28-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Face and color, no legend .pull-left-narrow[ ```r lifeexpwide %>% filter(year %in% c(1999, 2009,2019),worldbankregion %in% c("North America","South Asia","Sub-Saharan Africa")) %>% ggplot(aes(x = Female, y = Male, color=year)) + geom_point() + facet_wrap(~worldbankregion) + * guides(color = FALSE) ``` ] .pull-right-wide[ <img src="w1-l02-ggplot2_files/figure-html/unnamed-chunk-29-1.png" width="100%" style="display: block; margin: auto;" /> ]