Lab #03: Probability and Teamwork

due Wednesday, September 15, 4:00 PM

Goals

Getting started

Log in to GitHub to determine your team number and members for Lab 3.

Every team member should now go to the course GitHub organization and locate your lab 3 repository, which should have the prefix lab03. Copy the URL of the repository and clone in RStudio. If you have trouble, see the first lab for step-by-step instructions or ask a teammate for help.

Then, configure git in the console as we have done in previous labs, using your GitHub username and email address.

Do not edit the .Rmd file until explicitly asked to do so in the instructions.

library(usethis)
use_git_config(user.name = "GitHub username", user.email="your email")

Preterm Birth

Just over 1 in 10 babies worldwide are born too soon, and the preterm birth rate has been increasing in most countries, according to the Global Action Report on Preterm Birth. In the poorest countries, roughly 12% of births are preterm, while this rate is only 9% in the highest income countries. Within countries, poorer families are at higher risk. Because preterm birth is associated with increased risk of morbidity and mortality, it is important to offer better pre-conception and pregnancy healthcare to reduce the likelihood an infant is born too early.

We consider 10,000 observations from a random sample of births in three countries. Variables include preterm (which takes values preterm for births < 37 weeks and term for births \(\geq\) 37 weeks) and country (Malawi, Nigeria, or South Africa).

Team workflow

Assign each team member a number 1 through 3 and write your number down on a piece of paper. This lab will walk you through the basics of team workflow step-by-step. If your team has just two members, then team member 1 should complete the first task (item 4) of member 3, and team member 2 should complete the other tasks of member 3.

Do the following exercises in order, following each step carefully.

Only one person at a time should type in the .Rmd file and push updates.

The person working should share their screen, and the others should follow along.

Team member 1: Open the lab3.Rmd file and change the author of the YAML header to the following “Team Number: Member 1, Member 2, Member 3” with your team number (for example Team 3) and the first and last names of all team members.

Team member 1: Run the load-data code chunk to read in the data and print the first six rows. Share the results with your team members. Then, answer the questions below.

library(tidyverse)
load("preterm.Rdata")
head(preterm,2) #you'll want to modify this line!
##   PTBstatus      Country
## 1   preterm South Africa
## 2      term      Nigeria
  1. Compute the probability that a baby in the study was born in Nigeria. Do the same for South Africa and for Malawi. Summarize your results in a table directly generated by R, using informative variable names.
preterm %>%
  group_by(Country) %>%
  summarize(renamethis=n()) %>% #to summarize count
  mutate(moreinformativename=renamethis/sum(renamethis)) #getting prob
## # A tibble: 3 x 3
##   Country      renamethis moreinformativename
##   <chr>             <int>               <dbl>
## 1 Malawi              820               0.082
## 2 Nigeria            7860               0.786
## 3 South Africa       1320               0.132
  1. Compute the conditional probability that a baby born in Nigeria was born preterm. Do the same for babies born in South Africa and in Malawi, respectively. Summarize your results in a table directly generated by R, using informative variable names. Include a sentence or two describing your findings.

Team member 1: When you have finished, knit to PDF, then stage, commit, and push your .Rmd and PDF to GitHub with an appropriate commit message.

All other team members: Once your team member has pushed the work, pull to get the updated documents from GitHub. Click on the .Rmd file and you should see the responses to the first two exercises. Knit the file to update your own documents.

Team member 2: It’s your turn. Answer the question below.

  1. Create a segmented bar chart, with each bar going from 0-1, with the country names along the y-axis and horizontal bars illustrating the fraction of term and preterm births for each country. Use informative labels and titles.

Team member 2: Knit to PDF, then stage, commit, and push your .Rmd and PDF to GitHub with an appropriate commit message.

All other team members: Once your team member has pushed the work, pull to get the updated documents from GitHub. Click on the .Rmd file and you should see the responses to the first three exercises. Knit the file.

Team member 3: It’s your turn. Complete the exercise below.

  1. Given that a baby in the study is preterm, what is the probability it was born in Malawi? Calculate this conditional probability along with the corresponding probabilities that a preterm baby was born in Nigeria and that a preterm baby was born in South Africa. Summarize your results in a table directly generated by R (a tibble is fine) as well as in an informative bar plot. Include a sentence describing your results in comparison to those from the first exercise.

Team member 3: Knit to PDF, then stage, commit, and push your .Rmd and PDF to GitHub with an appropriate commit message.

All other team members: Once your team member has pushed the work, pull to get the updated documents from GitHub. Click on the .Rmd file and you should see the responses to the first four exercises. Knit the file.

Team member 1: It’s your turn. Complete the exercise below.

  1. Are the two random variables Country and PTBstatus independent, or not? Provide numeric calculations to support your answer.

Team member 1: Knit to PDF, then stage, commit, and push your .Rmd and PDF to GitHub with an appropriate commit message.

All other team members: Once your team member has pushed the work, pull to get the updated documents from GitHub. Click on the .Rmd file and you should see the responses to the first five exercises. Knit the file.

Team member 2: Almost done! Your job is to explore the preterm birth rate in Malawi relative to that overall in poorer nations.

  1. Suppose you want to quantify how different the preterm birth rate in Malawi is from the preterm birth rate in poorer countries (12%). Let’s assume (before seeing the data!) that the preterm birth rate in Malawi is 12%, and that the # of preterms out of a fixed number of births in Malawi is a binomial random variable. Under this distribution, how many preterm births would you expect to see in a sample of 820 births in Malawi? In addition, what is the probability in a sample of 820 births (what we had in Malawi) that we would see 148 (our number) or more preterms, if the true preterm birth probability were 0.12? CODING TIP: you can use R to print output inline; see Section 3.1 here for details.

Team member 2: Check to confirm all code chunks are named and all code follows the tidyverse style guidelines. Make changes as necessary.

Team member 2: When you have finished, knit to PDF, then stage, commit, and push your .Rmd and PDF to GitHub with an appropriate commit message.

All other team members: Once your team member has pushed the work, pull to get the updated documents from GitHub. Click on the .Rmd file to see your final version of the lab.

Team member 3: Upload your team’s PDF to Gradescope. Include every team member’s name in the Gradescope submission and identify which problems are on each page in Gradescope. Associate the “Overall” section with the first page of your PDF.

There should only be one submission per team on Gradescope.

Grading

Total: 50 pts