Homework #01: COVID in Israel

due Friday, September 10 at 4:00pm

Clone the repo & start new RStudio project (Lab 1 recap!)

How to Clone using SSH in GitHub

Version Control and Cloning Repo

Israeli COVID data

The Israeli government in August reported that out of 515 people aged 12 and up hospitalized with severe COVID cases, 301 (58.4%) were fully-vaccinated (2 doses of the Pfizer vaccine). Subsequent chatter broke out on Twitter suggesting that the Pfizer vaccine efficacy in protecting against severe disease (based on prior studies, efficacy is around 95% in people under 75 and 91% in those 75 and up) is waning over time. In this homework, you will use the Israeli data on vaccinated and unvaccinated individuals (the partially vaccinated group is excluded from your data) to evaluate to what extent the Israeli government’s data (as of August 15, 2021) indicate that vaccine efficacy is waning.

Because vaccination rates in Israel increase with age, and risk of hospitalization also increases with age, age is a potential confounder of the true relationship between vaccination and hospitalization with severe COVID. Thus a principled analysis must be careful to account for age.

First, let’s read in the data from the file provided. The variables include agegroup (12-15, 16-19, 20-29, …, 80-89, 90+), vaxstatus, population (number of people in Israel in each age and vaccination group), and severe (number of severe COVID cases requiring hospitalization in each age and vaccination group).

library(readr)
library(tidyverse)
library(knitr)
covid=readr::read_csv("israeli_data_long.csv")
  1. Create a compelling visualization of vaccination rates by age group. Be sure to include a title and label axes nicely. Then, comment briefly on any trends, and provide point estimates (e.g., calculate percentages from the data) of the conditional probability of vaccination given a person’s age is 12-15, and of the conditional probability of vaccination given a person’s age is 90 or above. HINT: lab 2 contains an example plot based on summary data; you can peek ahead or check out this reference showing how geom_col() can be useful with summary data.
# starter code to print out probabilities
covidrate <- covid %>%
  #the first three lines of code are just
  #generating the vaccination rate for each age
  #group in the data
  group_by(agegroup, vaxstatus) %>%
  summarise(n = sum(population)) %>%
  mutate(rate = n / sum(n)) %>%
  print()
## # A tibble: 20 x 4
## # Groups:   agegroup [10]
##    agegroup vaxstatus         n   rate
##    <chr>    <chr>         <dbl>  <dbl>
##  1 12-15    unvaccinated 383649 0.675 
##  2 12-15    vaccinated   184549 0.325 
##  3 16-19    unvaccinated 127745 0.229 
##  4 16-19    vaccinated   429109 0.771 
##  5 20-29    unvaccinated 265871 0.211 
##  6 20-29    vaccinated   991408 0.789 
##  7 30-39    unvaccinated 194213 0.167 
##  8 30-39    vaccinated   968837 0.833 
##  9 40-49    unvaccinated 145355 0.136 
## 10 40-49    vaccinated   927214 0.864 
## 11 50-59    unvaccinated  84545 0.102 
## 12 50-59    vaccinated   747949 0.898 
## 13 60-69    unvaccinated  65205 0.0892
## 14 60-69    vaccinated   665717 0.911 
## 15 70-79    unvaccinated  20512 0.0423
## 16 70-79    vaccinated   464336 0.958 
## 17 80-89    unvaccinated  12683 0.0572
## 18 80-89    vaccinated   208911 0.943 
## 19 90+      unvaccinated   3132 0.0630
## 20 90+      vaccinated    46602 0.937
#starter code for plot
covid %>%
  #the first three lines of code are just
  #generating the vaccination rate for each age 
  #group in the data
  group_by(agegroup, vaxstatus) %>%
  summarise(n = sum(population)) %>%
  mutate(rate = n / sum(n)) %>%
  #here I assume we're just going to plot % vax
  #and not plot % unvax
  filter(vaxstatus == "vaccinated") 
## # A tibble: 10 x 4
## # Groups:   agegroup [10]
##    agegroup vaxstatus       n  rate
##    <chr>    <chr>       <dbl> <dbl>
##  1 12-15    vaccinated 184549 0.325
##  2 16-19    vaccinated 429109 0.771
##  3 20-29    vaccinated 991408 0.789
##  4 30-39    vaccinated 968837 0.833
##  5 40-49    vaccinated 927214 0.864
##  6 50-59    vaccinated 747949 0.898
##  7 60-69    vaccinated 665717 0.911
##  8 70-79    vaccinated 464336 0.958
##  9 80-89    vaccinated 208911 0.943
## 10 90+      vaccinated  46602 0.937
  1. Create a compelling visualization of the probability of severe disease by age group and vaccination status. Be sure to include a title and good labels. Compare these conditional probabilities to the probability of severe disease in the unvaccinated (collapsing over all age groups) and to the probability of severe disease in the vaccinated (collapsing over all age groups). Comment briefly on trends with age.
# starter code
# suppress scientific notation 
options(scipen=999)

# to see the raw probabilities
covid %>%
  mutate(probcovid = severe / population) %>%
  as.data.frame() %>%
  print()
##    agegroup    vaxstatus population severe      probcovid
## 1     12-15 unvaccinated     383649      1 0.000002606549
## 2     12-15   vaccinated     184549      0 0.000000000000
## 3     16-19 unvaccinated     127745      2 0.000015656190
## 4     16-19   vaccinated     429109      0 0.000000000000
## 5     20-29 unvaccinated     265871      4 0.000015044890
## 6     20-29   vaccinated     991408      0 0.000000000000
## 7     30-39 unvaccinated     194213     12 0.000061787831
## 8     30-39   vaccinated     968837      2 0.000002064331
## 9     40-49 unvaccinated     145355     24 0.000165112999
## 10    40-49   vaccinated     927214      9 0.000009706497
## 11    50-59 unvaccinated      84545     34 0.000402152700
## 12    50-59   vaccinated     747949     22 0.000029413770
## 13    60-69 unvaccinated      65205     50 0.000766812361
## 14    60-69   vaccinated     665717     58 0.000087124108
## 15    70-79 unvaccinated      20512     39 0.001901326053
## 16    70-79   vaccinated     464336     92 0.000198132387
## 17    80-89 unvaccinated      12683     32 0.002523062367
## 18    80-89   vaccinated     208911    100 0.000478672736
## 19      90+ unvaccinated       3132     16 0.005108556833
## 20      90+   vaccinated      46602     18 0.000386249517
#helpful code for summarizing over all ages by vax status
risksummary <- covid %>%
  group_by(vaxstatus) %>%
  summarise(totpop = sum(population), totcovid = sum(severe)) %>%
  mutate(prob = 1) %>% #edit to calculate correct answer
  print()
## # A tibble: 2 x 4
##   vaxstatus     totpop totcovid  prob
##   <chr>          <dbl>    <dbl> <dbl>
## 1 unvaccinated 1302910      214     1
## 2 vaccinated   5634632      301     1
  1. The relative risk (RR) of severe COVID, comparing vaccinated to unvaccinated people, is defined as \(RR=P(severe|vaccinated)/P(severe|unvaccinated)\), which is simply the ratio of conditional probabilities of severe COVID given vaccination status. The RR is used to calculate vaccine efficacy (VE), which is given by \(VE=1-RR\). Calculate the vaccine efficacy for each age group, and provide an informative visualization showing the efficacy. How do the values from the Israeli data compare to studies that show VE=95% in those under 75 and 91% for adults aged 75+?
#starter code gets you to the RR
covidwide <- covid %>%
  mutate(probcovid = severe / population) %>%
  group_by(agegroup, vaxstatus) %>%
  pivot_wider(id_cols = agegroup,
              names_from = vaxstatus,
              values_from = probcovid) %>%
  mutate(RR = vaccinated / unvaccinated) 

print(covidwide[,c("agegroup","RR")])
## # A tibble: 10 x 2
## # Groups:   agegroup [10]
##    agegroup     RR
##    <chr>     <dbl>
##  1 12-15    0     
##  2 16-19    0     
##  3 20-29    0     
##  4 30-39    0.0334
##  5 40-49    0.0588
##  6 50-59    0.0731
##  7 60-69    0.114 
##  8 70-79    0.104 
##  9 80-89    0.190 
## 10 90+      0.0756
  1. Suppose we’re back in August, and you are advising a communications group within the State of Israel Ministry of Health on crafting their press release based on these data. You can include one data visualization in the press release. What type of visualization do you advise, and why? Include your visualization here!

TikTok SciComm

  1. On July 31, Howie Hua posted a TikTok video in which he stated that \(P(vacc|infected)\) “cannot be reliable.” In what sense is he correct, and in what sense is he incorrect?

  2. In the video, he mentions \(P(vacc|infected)\) and \(P(infected|vacc)\). Write the formula for Bayes’ theorem to show how to go between these two conditional probabilities, and provide the values corresponding to the formula based on his sample data.

🧶 ✅ ⬆️ Knit and commit remaining changes, use the commit message “Done with hw 1!” and push.

Submission

Once you are finished with the lab, you will submit the PDF document produced from your final knit, commit, and push to Gradescope. Be sure you clean up any “mess” in the output and file before submitting.

Before you wrap up the assignment, make sure all documents are updated on your GitHub repo. We will be checking these to make sure you have been practicing how to commit and push changes.

Remember – you must turn in a .pdf file to the Gradescope page by the submission deadline to be considered “on time”.

To submit your assignment:

Grading (50 pts)


Component Points
Ex 1 8
Ex 2 8
Ex 3 8
Ex 4 8
Ex 5 6
Ex 6 6
Workflow & formatting 6

Grading notes: