Homework #02: Baby Boom or Baby Bust?

due Friday, October 1 at 4:00pm

Clone the repo & start new RStudio project

Baby Boom or Baby Bust?

Philip Cohen of the University of Maryland reported that the U.S. experienced a decline of 3.8% in births in 2020 relative to 2019. Because the decline in December was 8%, Dr. Cohen suggests that we may be experiencing a “baby bust” related to COVID-19, given that dramatic social restrictions began in the U.S. in March 2020 (counting back 9 months from December).

The file monthly births.xlsx contains data on the number of births each month from January 2016 through June 2021 (not all months are available for all places) for a variety of countries (Denmark, Finland, Italy, Japan, the Netherlands, Spain, Sweden, and the U.S.) and a subset of U.S. states, along with one Canadian province. We will explore these data with assessment of a possible “baby bust” in mind. One challenge in making this assessment is that demographers have documented a general decline in the birth rate in some areas in recent years, so it is important to explore a “baby bust” in the context of a more broad-scale decline unrelated to the epidemic.

Variables in the data set, which is essentially a convenience sample of geographical units with fairly accessible information, include the following.

For this analysis, the primary outcome of interest is the standardized count of births by month and year (permday).

  1. First, we will compare births in 2020 and 2021 to average monthly births in 2016-2019. Create a new data set that contains the births data along with a new column of monthly averages for each place and month from 2016-2019. Then create a new variable, ratio, that is constructed as the ratio of births in any given month, year, and place, based on the variable permday to the relevant monthly average over 2016-2019 (for the month and place, using permday). Show your code in your response. What is this ratio for the entire United States for December 2020?

The starter code below reads the data from the Excel spreadsheet and formats the date variable.

library(tidyverse)
library(readxl)
library(lubridate)
births <- read_excel("monthly births.xlsx",sheet=1)

#you can start with this to work with dates
#  mutate(date=ymd(yyyymmdd)) 
#  mutate(monthnames = factor(month.abb[month], levels = month.abb)) 
  1. Now create two plots that show the ratios of monthly births for 2020 vs 2016-2019 and 2021 vs 2016-2019 (where data are available), separately. Use nice labels in both plots (e.g., no numbers representing the months). In the first plot, use faceting to show a separate frame for each of the 8 countries in the data set, and in each facet distinguish the years 2020 and 2021 from each other using different colors. The second plot should accomplish the same goal but should focus on the states and province in the data. Comment on the trends you see in the data.

  2. In order to explore general trends in births over time, now create a line graph separately for each country (but not states or the province) in the data set, showing the count of births by month, with color used to distinguish the lines for each year separately (2016, 2017, etc.). Comment on (a) seasonality in birth patterns for each country, (b) general trends across years in birth patterns and counts for each country, and (c) whether there appears to be evidence for a disruption of the birth pattern in each country due to COVID-19.

  3. Bonus: fun visuals! Feel free to submit an innovative and informative visual for up to 2 bonus points. Visual submission indicates your willingness for your code to be shared (with acknowledgement) in the solution sketch.

Submission

Once you are finished with the lab, you will submit the PDF document produced from your final knit, commit, and push to Gradescope.

Before you wrap up the assignment, make sure all documents are updated on your GitHub repo. We will be checking these to make sure you have been practicing how to commit and push changes.

Remember – you must turn in a .pdf file to the Gradescope page by the submission deadline to be considered “on time”.

To submit your assignment:

Grading (50 pts)


Component Points
Ex 1 16
Ex 2 14
Ex 3 18
Workflow & formatting 2
Bonus +2

Grading notes: