There are many ways to draw maps and show data on maps in R. In this computer lab, we will cover some of the packages available, but keep in mind that many more tools are available and new packages are released regularly.

Case study: COVID-19 cases in the US.

For this example, we are using real data compiled by the New York Times and available on their github repository at: https://github.com/nytimes/covid-19-data

Download from the github repository given above the following files into a data/NYT subfolder: us.csv, us-states.csv, us-counties.csv. You can open these files (I would recommend opening us.csv as it is the smallest file) using a text editor (such as Notepad in windows) and then with Excel (or LibreOffice).

We will now load this file in R:

setwd("C:/lab/MSSTATE/Class/CMB8013/R")

t = read.csv("data/NYT/us.csv")
head(t)
##         date cases deaths
## 1 2020-01-21     1      0
## 2 2020-01-22     1      0
## 3 2020-01-23     1      0
## 4 2020-01-24     2      0
## 5 2020-01-25     3      0
## 6 2020-01-26     5      0
t$time = as.Date(t$date)
plot(t$deaths~t$time)

Now, let’s have a look at the file containing the information per state since the beginning of the pandemic (January 2020):

ts = read.csv("data/NYT/us-states.csv")
ts$Date = as.Date(ts$date)
head(ts)
##         date      state fips cases deaths       Date
## 1 2020-01-21 Washington   53     1      0 2020-01-21
## 2 2020-01-22 Washington   53     1      0 2020-01-22
## 3 2020-01-23 Washington   53     1      0 2020-01-23
## 4 2020-01-24   Illinois   17     1      0 2020-01-24
## 5 2020-01-24 Washington   53     1      0 2020-01-24
## 6 2020-01-25 California    6     1      0 2020-01-25

We also need information about states population to compute rates per capita. This information is in the file: data/NYT/states-population.csv

tps = read.csv("data/NYT/states-population.csv")

ts = ts[which(is.element(ts$state, tps$state)==T) , ] # Keeping only states for which we have data on the population (--> removing territories)

Look up the documentation for the function merge and use it to merge the COVID (ts) and population (tps) data frames.
Then, you can compute the COVID death rate with the following formula:

tms$deathRate = tms$deaths / tms$population

Drawing maps

We will now represent this data on maps. To do so, you will need to install the following packages: usmap, ggplot2, animation, magick

Let’s start with a simple map of the US, and then a simple map of the counties in Mississippi and Alabama:

library(usmap)
library(ggplot2)
library(animation)
library(magick)
## Linking to ImageMagick 6.9.11.34
## Enabled features: cairo, freetype, fftw, ghostscript, lcms, pango, rsvg, webp
## Disabled features: fontconfig, x11
plot_usmap(regions = "states") + 
  labs(title = "US States",
       subtitle = "This is a blank map of the states of the United States.") + 
  theme(panel.background = element_rect(color = "black", fill = "lightblue"))

plot_usmap(regions="counties", include = c("MS", "AL") ) + 
  labs(title = "Mississippi & Alabama")

The next step consists in plotting the death rate per state as of 2020-12-25. This is accomplished in two steps:
Step 1: select the data for 2020-12-25
Step 2: Perform the map plot, using this subset of the data.

tmp = ts[which(ts$date == "2020-12-25") , ]
plot_usmap(regions = "states", data = tmp, value="deaths") + 
  labs(title = "Total COVID-19 deaths per state") + 
  theme(panel.background = element_rect(color = "black", fill = "lightblue"))

As you can see, the system automatically chose a blue gradient to represent the data on the map. Let’s see how we can customize this.

plotDeathRateAtDate <- function(t, myDate, minRate, maxRate, plotMap=F, myRegions=c("state", "counties")[1], includeStates = c()){
  tmp = t[which(t$date==myDate) , ]
  myTitle = paste("Death rate as of ", tmp$date[1], sep="")
  
  myInclude = unique(tmp$state)
  if( length(includeStates)>0 ){
    myInclude = includeStates
  }
  
  myMap <- plot_usmap(regions=myRegions, include=myInclude, data=tmp, value="deathRate") + labs(title = myTitle) + 
    scale_fill_gradient2( low = "white", mid="cornflowerblue", high = "red", 
                          #midpoint = median(tmp$deathRate), na.value="white", 
                          midpoint = mean(c(minRate, maxRate)), na.value="white",
                          name = "Death rate", 
                          label = scales::comma, 
                          limits = c(minRate, maxRate) ) + 
    theme(legend.position = c(0.9,0.2))
  if( plotMap==T ){ plot(myMap) }
  
  myMap
}

minRate = 0
maxRate = max(tms$deathRate, na.rm=T)

vStates = unique(tms$state)
myMap = plotDeathRateAtDate(tms, max(tms$date), minRate, maxRate, F, myRegions="states", includeStates = vStates)
plot(myMap)

We now have all the tools in hand to use the function saveHTML from the animation package and generate an animated map show the progression of COVID-19 across the US. The result should look like this:

And we can do the same specifically for Mississippi (with details at the county level):

For the curious students who want to see the code used for the county-level data,here are the details. First, you will need to download the file with population estimates per county here: data/NYT/co-est2019-alldata.csv

tcs = read.csv("./data/NYT/us-counties.csv")
tcp = read.csv("data/NYT/co-est2019-alldata.csv")

tcs$date = as.Date(tcs$date) # Making sure the dates are in a format that R can understand.

tcp$fips = tcp$STATE*1000 + tcp$COUNTY # The FIPS is the unique identifier per county. The FIPS number for all counties of a given state start by the same number (which is the FIPS number for that state)

tcm = merge(tcs, tcp, by="fips") # Merging the COVID data with the population data
tcm$deathRate = tcm$deaths/tcm$POPESTIMATE2019 # Computing the death rate

# The "tcm" data frame is now ready to be used for any downstream analysis/plot.

Exercise (graded):

Write a function that computes the increase/decrease in COVID-19 death rate between two dates. Make it as generic as possible, so that it could work on states or counties, or any data.frame that is given as an argument. After you have this function, generate a plot of the COVID-19 difference in death rate from October 1st and November 1st 2020.

Your function should take the following arguments:

Here is what a call to this function should look like, and the corresponding result:

tdelta = getDeltaRate(tms, as.Date("2020-10-01"), as.Date("2020-11-01"), "Date", "deathRate", "state")
head(tdelta)
##        state    startRate      endRate    deltaRate
## 1    Alabama 5.196622e-04 0.0006063406 8.667835e-05
## 2     Alaska 7.244941e-05 0.0001066237 3.417425e-05
## 3    Arizona 7.795330e-04 0.0008217107 4.217776e-05
## 4   Arkansas 4.586084e-04 0.0006488116 1.902032e-04
## 5 California 4.046596e-04 0.0004472287 4.256911e-05
## 6   Colorado 3.584120e-04 0.0004014770 4.306501e-05

And the corresponding map:

Here is a recipie about how to write the function:

As usual, there are many other ways to write a function doing the same job. You could use a for loop and compute the delta rate state by state (or county by county) for example.

Here is the call to the function for data at the county level: (note that correctly plotting the data at the county level requires going through a few more hoops than the state-level data)

tdelta = getDeltaRate(tcm, as.Date("2020-10-01"), as.Date("2020-11-01"), "date", "deathRate", "fips")
head(tdelta)
##   fips    startRate      endRate    deltaRate
## 1 1001 0.0005011724 0.0005548694 5.369704e-05
## 2 1003 0.0002374190 0.0003180519 8.063288e-05
## 3 1005 0.0002835615 0.0003645791 8.101758e-05
## 4 1007 0.0004465482 0.0006698223 2.232741e-04
## 5 1009 0.0002593989 0.0004323315 1.729326e-04
## 6 1011 0.0015840016 0.0016830017 9.900010e-05

And the corresponding result ploted for the states of Mississippi and Louisiana:

Going the extra mile (optional / not graded)

Use the function saveHTML from the animate package to generate an animated map showing the evolution of COVID-19 rate tendency throughout the year 2020.

Looking for another optional exercise? Write a function that computes the death rate per state from the county-level data.

END OF THIS ROADMAP.