Lab #04: Spatial Data and Handling Merge Conflicts

due Wednesday, September 22, 4:00 PM

Goals

Getting Started

Log in to GitHub to determine your team number and members for Lab 4.

Merge conflicts

You may have seen this already through the course of your collaboration last week in Lab 03. When two collaborators make changes to a file and push the file to their repository, git merges these two files.

If these two files have conflicting content on the same line, git will produce a merge conflict. Merge conflicts need to be resolved manually, as they require a human intervention:

To resolve the merge conflict, decide if you want to keep only your text/code, the text/code on GitHub, or incorporate changes from both sets. Delete the conflict markers <<<<<<<, =======, >>>>>>> and make the changes you want in the final merge.

Assign numbers 1, 2, and 3 to each of your team members (if only 2 team members, members 1-2 can share the work of 3). Go through the following steps in detail, which simulate a merge conflict. Completing this exercise will be part of the lab grade.

Resolving a merge conflict

Step 1: Everyone clone your team lab and open the Rmd file.

Member 3 should look at the group’s repo on GitHub to ensure that the other members’ files are pushed to GitHub after every step.

Step 2: Member 1 should change the team name to your team name. Knit, commit, and push.

Step 3: Member 2 should change the team name to something different (i.e., not your team name). Knit, commit, and push.

Member 2 should get an error on the attempted push.

Pull and review the document with the merge conflict. Member 2 should display and read the error to the entire team. A merge conflict occurred because Member 2 edited the same part of the document as Member 1. Resolve the conflict with whichever name you want to keep (please keep your real team name), then knit, commit with a message that clearly states you fixed the merge conflict, and push again.

Step 4: Member 3 verifies the commit shows the merge conflict on GitHub. Then Member 3 writes some narrative below the last code chunk in your Rmd file. Knit, commit, and push.

This time, no merge conflicts should occur, since you edited a different part of the document from Members 1 and 2. Member 3 should display and read the message to the entire team.

Everyone pull and delete the narrative. All team members should have the same content in the Rmd file before proceeding to the exercises.

Packages

library(tidyverse)

Getting started

In case you forgot…

library(usethis)
use_git_config(user.name = "GitHub username", user.email="your email")

Men’s Health Gap: Life Expectancy

Life Expectancy

life <- readr::read_csv("04/lifeexpectancy_infant.csv")

names(life)
## [1] "location" "sex"      "year"     "lifeexp"

R packages

The R packages ggplot2 and sf (for “simple features”) have made it relatively straightforward to make great spatial maps. We’ll use these packages along with rnaturalearth and rnaturalearthdata to access free spatial mapping tools for world maps. You’ll need to install these yourself before proceeding.

# install.packages("rnaturalearth") 
# install.packages("rnaturalearthdata")
# install.packages("sf")
# install.packages("rgeos") # needed to pull countries for plot 
# ggplot2 is part of the tidyverse, so we don't have to call it
library(sf)
library(rnaturalearth)
library(rnaturalearthdata)
library(rgeos)
library(scales)
# the classic dark-on-light theme for ggplot2 is nice for maps
theme_set(theme_bw())
# world contains the country information for plotting in addition to a lot of other information about the countries
world <- ne_countries(scale = "medium", returnclass = "sf")
names(world)
##  [1] "scalerank"  "featurecla" "labelrank"  "sovereignt" "sov_a3"    
##  [6] "adm0_dif"   "level"      "type"       "admin"      "adm0_a3"   
## [11] "geou_dif"   "geounit"    "gu_a3"      "su_dif"     "subunit"   
## [16] "su_a3"      "brk_diff"   "name"       "name_long"  "brk_a3"    
## [21] "brk_name"   "brk_group"  "abbrev"     "postal"     "formal_en" 
## [26] "formal_fr"  "note_adm0"  "note_brk"   "name_sort"  "name_alt"  
## [31] "mapcolor7"  "mapcolor8"  "mapcolor9"  "mapcolor13" "pop_est"   
## [36] "gdp_md_est" "pop_year"   "lastcensus" "gdp_year"   "economy"   
## [41] "income_grp" "wikipedia"  "fips_10"    "iso_a2"     "iso_a3"    
## [46] "iso_n3"     "un_a3"      "wb_a2"      "wb_a3"      "woe_id"    
## [51] "adm0_a3_is" "adm0_a3_us" "adm0_a3_un" "adm0_a3_wb" "continent" 
## [56] "region_un"  "subregion"  "region_wb"  "name_len"   "long_len"  
## [61] "abbrev_len" "tiny"       "homepart"   "geometry"

Now let’s get started!

ggplot(data=world) +
  geom_sf()

OK, that’s the world (or one projection of it!). Let’s dress up the map a bit.

ggplot(data = world) +
    geom_sf() +
    labs(x = "Longitude",
       y = "Latitude",
       title = "World Map")

The standard projection is a Mercator projection, with latitude and longitude at right angles. As you can see, this distorts the size of land masses at the poles. The ETRS89 Lambert Azimuthal Equal-Area projection is focused instead on Europe and Africa and does not force the map into a rectangular shape.

ggplot(data = world) +
  geom_sf() +
  coord_sf(crs = st_crs(3035)) +
  labs(title = "World Map")

We could also center our map on Asia.

ggplot(data = world) +
  geom_sf() +
  coord_sf(crs = st_crs(8859)) +
  labs(title = "World Map")

We can color the world Duke blue…

ggplot(data = world) +
    geom_sf(color="black", fill="#00539B") +
    labs(x = "Longitude",
       y = "Latitude",
       title = "World Map")

but perhaps it’s more informative to use color to convey information. In this plot, color is mapped to the square root of population (you can remove trans=“sqrt” to see why the square root is used).

ggplot(data = world) +
  geom_sf(aes(fill = pop_est)) +
  scale_fill_viridis_c(option = "plasma", #color scheme
                       trans = "sqrt", #map color to sqrt(pop_est)
                       labels = label_comma()) + #avoid 1e9 notation
  labs(x = "Longitude",
       y = "Latitude",
       fill = "Population",
       title = "World Map")

Pretty nifty!

We will be making plots of our life expectancy data. In order to do so, we need to do a bit of data wrangling.

  1. First, restrict the life expectancy data to 2019 and then transform the data from long to wide format so that male and female life expectancies are on the same line (one line per country), to facilitate plotting 2019 data and combining with the mapping data. Create four new variables: a variable indicating whether the country’s female life expectancy is > 80 years, a variable indicating whether the country’s male life expectancy is > 80 years, a variable indicating whether the life expectancy in a country for males is greater than that for females, and a variable corresponding to the difference in years of life expectancy between females and males (calculate this by subtracting male life expectancy from female life expectancy); be sure to show your code chunk for this variable creation. Print a three-column table of country names and their female and male life expectancies, just for those countries in which male life expectancy exceeds that of females and for those countries in which female life expectancy is 8 or more years greater than male life expectancy, and include a few sentences describing the findings in your table. Finally, provide the probability that a randomly-selected country’s female life expectancy will be > 80, the probability that a randomly-selected country’s male life expectancy will be > 80, and the probability a randomly-selected country’s male life expectancy will be greater than that of its females.

  2. Let’s get ready to plot! The world data set is great because it provides ISO country codes that conform to international standards (ISO=International Organization for Standardization) to facilitate easy linkage across data sets. Sadly, the life expectancy data don’t contain these handy codes. The code below creates a variable long_name in the lifewide dataset (I called my wide format data set lifewide, but feel free to use any name) to allow linkage with the world data set. Use it to join the two data sets, keeping only those countries present in the world data set (we can’t plot if we don’t have the location data incorporated anyway). Then, create three maps: one map showing female life expectancy worldwide, another showing male life expectancy worldwide, and a third showing the difference/disparity in life expectancy by gender (for this last map, use the ETRS89 Lambert Azimuthal Equal-Area projection).

library(sf)

lifewide$name_long=lifewide$location #create variable with same name as country variable in world in the lifewide data set for later merging of data sets

# fix inconsistent names, note use of ISO country codes would prevent having to do this!
lifewide <- lifewide %>%
  mutate(name_long = case_when(
    name_long == "Bolivia (Plurinational State of)" ~ "Bolivia",
    name_long == "Cabo Verde" ~ "Cape Verde",
    name_long == "Cote d'Ivoire" ~ "Côte d'Ivoire",
    name_long == "Congo" ~ "Republic of Congo",
    name_long == "Czechia" ~ "Czech Republic",
    name_long == "Democratic People's Republic of Korea" ~ "Dem. Rep. Korea",
    name_long == "Micronesia (Federated States of)" ~ "Federated States of Micronesia",
    name_long == "Gambia" ~ "The Gambia",
    name_long == "Iran (Islamic Republic of)" ~ "Iran",
    name_long == "Lao People's Democratic Republic" ~ "Lao PDR",
    name_long == "North Macedonia" ~ "Macedonia",
    name_long == "Republic of Moldova" ~ "Moldova",
    name_long == "Northern Mariana Islands" ~ "N. Mariana Is.",
    name_long == "Sao Tome and Principe" ~ "São Tomé and Principe",
    name_long == "Syrian Arab Republic" ~ "Syria",
    name_long == "Taiwan (Province of China)" ~ "Taiwan",
    name_long == "United Republic of Tanzania" ~ "Tanzania",
    name_long == "United States of America" ~ "United States",
    name_long == "Venezuela (Bolivarian Republic of)" ~ "Venezuela",
    name_long == "Viet Nam" ~ "Vietnam",
    TRUE ~ name_long
  ))

Grading

Total: 50 pts