Go to the STA198F2021 organization on GitHub. Click on the repo with the prefix hw-01. It contains the starter documents you need to complete the lab.
Click on the green CODE button, select Use SSH (this might already be selected by default, and if it is, you’ll see the text Clone with SSH). Click on the clipboard icon to copy the repo URL.
Go to https://vm-manage.oit.duke.edu/containers and login with your Duke NetID and Password.
Click STA198-199 to log into the Docker container. You should now see the RStudio environment.
Go to File \(\rightarrow\) New Project \(\rightarrow\) Version Control \(\rightarrow\) Git.
Copy and paste the URL of your assignment repo into the dialog box Repository URL. Again, please make sure to have SSH highlighted under Clone when you copy the address.
Click Create Project, and the files from your GitHub repo will be displayed in the Files pane in RStudio.
Open the template R Markdown file. This is where you will write up your code and narrative for the lab.
The Israeli government in August reported that out of 515 people aged 12 and up hospitalized with severe COVID cases, 301 (58.4%) were fully-vaccinated (2 doses of the Pfizer vaccine). Subsequent chatter broke out on Twitter suggesting that the Pfizer vaccine efficacy in protecting against severe disease (based on prior studies, efficacy is around 95% in people under 75 and 91% in those 75 and up) is waning over time. In this homework, you will use the Israeli data on vaccinated and unvaccinated individuals (the partially vaccinated group is excluded from your data) to evaluate to what extent the Israeli government’s data (as of August 15, 2021) indicate that vaccine efficacy is waning.
Because vaccination rates in Israel increase with age, and risk of hospitalization also increases with age, age is a potential confounder of the true relationship between vaccination and hospitalization with severe COVID. Thus a principled analysis must be careful to account for age.
First, let’s read in the data from the file provided. The variables include agegroup
(12-15, 16-19, 20-29, …, 80-89, 90+), vaxstatus
, population
(number of people in Israel in each age and vaccination group), and severe
(number of severe COVID cases requiring hospitalization in each age and vaccination group).
library(readr)
library(tidyverse)
library(knitr)
=readr::read_csv("israeli_data_long.csv") covid
# starter code to print out probabilities
<- covid %>%
covidrate #the first three lines of code are just
#generating the vaccination rate for each age
#group in the data
group_by(agegroup, vaxstatus) %>%
summarise(n = sum(population)) %>%
mutate(rate = n / sum(n)) %>%
print()
## # A tibble: 20 x 4
## # Groups: agegroup [10]
## agegroup vaxstatus n rate
## <chr> <chr> <dbl> <dbl>
## 1 12-15 unvaccinated 383649 0.675
## 2 12-15 vaccinated 184549 0.325
## 3 16-19 unvaccinated 127745 0.229
## 4 16-19 vaccinated 429109 0.771
## 5 20-29 unvaccinated 265871 0.211
## 6 20-29 vaccinated 991408 0.789
## 7 30-39 unvaccinated 194213 0.167
## 8 30-39 vaccinated 968837 0.833
## 9 40-49 unvaccinated 145355 0.136
## 10 40-49 vaccinated 927214 0.864
## 11 50-59 unvaccinated 84545 0.102
## 12 50-59 vaccinated 747949 0.898
## 13 60-69 unvaccinated 65205 0.0892
## 14 60-69 vaccinated 665717 0.911
## 15 70-79 unvaccinated 20512 0.0423
## 16 70-79 vaccinated 464336 0.958
## 17 80-89 unvaccinated 12683 0.0572
## 18 80-89 vaccinated 208911 0.943
## 19 90+ unvaccinated 3132 0.0630
## 20 90+ vaccinated 46602 0.937
#starter code for plot
%>%
covid #the first three lines of code are just
#generating the vaccination rate for each age
#group in the data
group_by(agegroup, vaxstatus) %>%
summarise(n = sum(population)) %>%
mutate(rate = n / sum(n)) %>%
#here I assume we're just going to plot % vax
#and not plot % unvax
filter(vaxstatus == "vaccinated")
## # A tibble: 10 x 4
## # Groups: agegroup [10]
## agegroup vaxstatus n rate
## <chr> <chr> <dbl> <dbl>
## 1 12-15 vaccinated 184549 0.325
## 2 16-19 vaccinated 429109 0.771
## 3 20-29 vaccinated 991408 0.789
## 4 30-39 vaccinated 968837 0.833
## 5 40-49 vaccinated 927214 0.864
## 6 50-59 vaccinated 747949 0.898
## 7 60-69 vaccinated 665717 0.911
## 8 70-79 vaccinated 464336 0.958
## 9 80-89 vaccinated 208911 0.943
## 10 90+ vaccinated 46602 0.937
# starter code
# suppress scientific notation
options(scipen=999)
# to see the raw probabilities
%>%
covid mutate(probcovid = severe / population) %>%
as.data.frame() %>%
print()
## agegroup vaxstatus population severe probcovid
## 1 12-15 unvaccinated 383649 1 0.000002606549
## 2 12-15 vaccinated 184549 0 0.000000000000
## 3 16-19 unvaccinated 127745 2 0.000015656190
## 4 16-19 vaccinated 429109 0 0.000000000000
## 5 20-29 unvaccinated 265871 4 0.000015044890
## 6 20-29 vaccinated 991408 0 0.000000000000
## 7 30-39 unvaccinated 194213 12 0.000061787831
## 8 30-39 vaccinated 968837 2 0.000002064331
## 9 40-49 unvaccinated 145355 24 0.000165112999
## 10 40-49 vaccinated 927214 9 0.000009706497
## 11 50-59 unvaccinated 84545 34 0.000402152700
## 12 50-59 vaccinated 747949 22 0.000029413770
## 13 60-69 unvaccinated 65205 50 0.000766812361
## 14 60-69 vaccinated 665717 58 0.000087124108
## 15 70-79 unvaccinated 20512 39 0.001901326053
## 16 70-79 vaccinated 464336 92 0.000198132387
## 17 80-89 unvaccinated 12683 32 0.002523062367
## 18 80-89 vaccinated 208911 100 0.000478672736
## 19 90+ unvaccinated 3132 16 0.005108556833
## 20 90+ vaccinated 46602 18 0.000386249517
#helpful code for summarizing over all ages by vax status
<- covid %>%
risksummary group_by(vaxstatus) %>%
summarise(totpop = sum(population), totcovid = sum(severe)) %>%
mutate(prob = 1) %>% #edit to calculate correct answer
print()
## # A tibble: 2 x 4
## vaxstatus totpop totcovid prob
## <chr> <dbl> <dbl> <dbl>
## 1 unvaccinated 1302910 214 1
## 2 vaccinated 5634632 301 1
#starter code gets you to the RR
<- covid %>%
covidwide mutate(probcovid = severe / population) %>%
group_by(agegroup, vaxstatus) %>%
pivot_wider(id_cols = agegroup,
names_from = vaxstatus,
values_from = probcovid) %>%
mutate(RR = vaccinated / unvaccinated)
print(covidwide[,c("agegroup","RR")])
## # A tibble: 10 x 2
## # Groups: agegroup [10]
## agegroup RR
## <chr> <dbl>
## 1 12-15 0
## 2 16-19 0
## 3 20-29 0
## 4 30-39 0.0334
## 5 40-49 0.0588
## 6 50-59 0.0731
## 7 60-69 0.114
## 8 70-79 0.104
## 9 80-89 0.190
## 10 90+ 0.0756
On July 31, Howie Hua posted a TikTok video in which he stated that \(P(vacc|infected)\) “cannot be reliable.” In what sense is he correct, and in what sense is he incorrect?
In the video, he mentions \(P(vacc|infected)\) and \(P(infected|vacc)\). Write the formula for Bayes’ theorem to show how to go between these two conditional probabilities, and provide the values corresponding to the formula based on his sample data.
🧶 ✅ ⬆️ Knit and commit remaining changes, use the commit message “Done with hw 1!” and push.
Once you are finished with the lab, you will submit the PDF document produced from your final knit, commit, and push to Gradescope. Be sure you clean up any “mess” in the output and file before submitting.
Before you wrap up the assignment, make sure all documents are updated on your GitHub repo. We will be checking these to make sure you have been practicing how to commit and push changes.
Remember – you must turn in a .pdf file to the Gradescope page by the submission deadline to be considered “on time”.
To submit your assignment:
Go to http://www.gradescope.com and click Log in in the top right corner.
Click School Credentials \(\rightarrow\) Duke NetID and log in using your NetID credentials.
Click on your STA 198 course.
Click on the assignment, and you’ll be prompted to submit it.
Mark the pages associated with each exercise, 1 - 4. All of the papers of your lab should be associated with at least one question (i.e., should be “checked”).
Select the first page of your .pdf submission to be associated with the “Workflow” section.
Component | Points |
---|---|
Ex 1 | 8 |
Ex 2 | 8 |
Ex 3 | 8 |
Ex 4 | 8 |
Ex 5 | 6 |
Ex 6 | 6 |
Workflow & formatting | 6 |
Grading notes: