More fun with maps in R…

Note: this lab does not have an attached graded assignment. It is only intended to illustrate some more useful things that can be done in R with maps. More specifically, in this lab, we will learn how to plot data on maps based on geographical coordinates and how to generate heatmaps overlaid over geographical maps.

Let’s start by loading the packages that we will need (you might need to install some of these packages, depending on your current configuration):

library(usmap)
library(ggplot2)

Adding points based on geographical coordinates.

In this first exercise, we will add to a usmap a number of points defined by geographical coordinates (longitude and latitude). To illustrate this principle, I will plot the location of the 3 places that I’ve lived in (in the US) with dots of size proportional to the amount of time that I’ve spent in each city.

The three cities are:

  • Bloomington, IN (86.52 West, 39.15 North) - 7 years
  • Scottsdale, AZ (111.89 West, 33.46 North) - 1 year
  • Starkville, MS (88.73 West, 33.35 North) - 3 years

Let’s start by creating a data.frame containing all this information:

places <- data.frame(
  longitude = c(-86.52, -111.89, -88.73),
  latitude = c(39.15,  33.46, 33.35),
  years = c(7, 1, 3),
  labels = c("Bloomington, IN", "Phoenix, AZ", "Starkville, MS")
)

Note how longitudes West of the Greenwitch meridian are negative values.
Unfortunately, we can’t simply use the latitude/longitude as our coordinates on the plot. We need to use the usmap_transform function to convert these coordinates into values to match on the usmap:

# Note how the following call to the "usmap_transform" function returns an error message.
# Read the documentation for this function and find how to modify the code to make it work...
transformed_data <- usmap_transform(places)

Look at the fields inside transformed_data. What do you notice?

We can now plot the map and add the corresponding text:

myMap <- plot_usmap() + geom_point(
  data = transformed_data,
  aes(x = longitude.1, y = latitude.1, size = years),
  color = "red", alpha = 0.5
)

myMap + geom_text(data = transformed_data, aes(label = labels, x = longitude.1-100000, y = latitude.1+120000), hjust = 0)

Take a minute to generate a map of the places you’ve lived in (or places you like most in the US). Try to adjust the location of labels so that they are nicely positioned relatively to the dots on the map.

Let’s look at some more biologically-relevent data. The file saguaros.csv contains the list of saguaro cactus observations reported in the iNaturalist.org website.

A short note: iNaturalist.org is an app (with corresponding website) allowing users to report observations of fauna and flora. It is sometimes used by researchers.
If you are not familiar with saguaros, they are the most majestic type of cactus, frequently seen in the American South-West. Here is an example:

.

Download the file to your computer, load it in R and generate a map showing the location of every data point in this file. This is what the result should look like:

Notice how data point outside of the US (in Mexico) end up being drawn over Alaska. It’s time to zoom out and plot maps of the entire world…

Beyond the US, let’s take over the world!

While the usmap package is extremely convenient to represent data in the US, the previous case study illustrated how drawing maps of the world can also be useful. Here, we’ll have a look at some options available for this.

Install the package rworldmap. You can use it in a similar way to the usmap package to draw simple world maps with country bondaries:

library(rworldmap)
worldmap <- getMap(resolution = "coarse")

plot(worldmap, asp = 1, bg = "lightblue", col = "grey")

Let’s overlay the location of saguaros observations to this map. This time, you do not need to transform the data and can directly use the latitude/longitude values to plot the data:

head(saguaro)
##   longitude latitude    id nb
## 1 -112.0119 33.57258  3762  1
## 2 -111.9855 33.52526 12298  1
## 3 -112.9059 32.08776 13279  1
## 4 -111.1645 32.24429 26966  1
## 5 -113.6463 33.61488 37607  1
## 6 -111.0478 32.19916 51379  1
plot(worldmap, asp = 1, bg = "lightblue", col = "grey")
points(saguaro$longitude, saguaro$latitude, col="red", cex=0.01)

Looks like we’re a bit too zoomed out. Let’s find a package that can display a map of any region in the world, at any zoom level. Here, we will use ggmap package. This package allows you to download map data directly from the internet and generate map for any region of the world and at any level of zoom (= details) that you want:

library(ggplot2)
library(ggmap)

# Coordinates of the map boundaries
map_bounds = c(left = -129, bottom = 11, right = -60, top = 50)

# A word about zoom values: 3 is typically a good value for continent/world level maps, 10 start showing details inside cities.
# If you use a high zoom level (for example: 10) while generating a map covering a large area (the whole North American continent for example), it will require downloading a lot of data from the internet and require lots of resources for your computer to generate the map.
# On the other hand, generating a map covering a small area but with a small zoom level will generate a blurry-looking map.
myMap <- get_map(location=map_bounds, zoom = 4, source="stamen")
plot(myMap)

It used to be easy to fetch data from Google maps. However, Google now requires an API Key, making it more difficult to access their map data. Here ,we use data from Stamen maps.

Let’s add the location of saguaros observations to the map:

# We need to transform the "myMap" into an object that can be manipulated with ggplot:
myMap <- ggmap(myMap, extent="device", legend="none")

myMapWithDots <- myMap + geom_point(data=saguaro,  aes(x=longitude, y=latitude), fill="red", shape=23, alpha=0.8)
# This plot might take a bit of time to generate, there are a lot of data points to display...
plot(myMapWithDots)

With so many points on the map, it is difficult to get an accurate representation of where the observations of saguaros are really the most frequent. We will now zoom in on the region that contains the most observations and use a spatial heatmap to represent the “density” of observations. We will be using one more package (for the color gradients in the heatmap): RColorBrewer

library(RColorBrewer)

# This is the region that we are zooming in:
map_bounds <- c(left = -115, bottom = 33, right = -112.6, top = 35.1) # NW Arizona

# Let's select only the data that is contained inside the map_bounds coordinates:
data = saguaro[which(saguaro$longitude>map_bounds["left"] & saguaro$longitude<map_bounds["right"] & saguaro$latitude>map_bounds["bottom"] & saguaro$latitude<map_bounds["top"]) , ]

head(data)
##     longitude latitude      id nb
## 5   -113.6463 33.61488   37607  1
## 66  -114.2709 34.37088  672218  1
## 139 -114.0515 33.67353 1431256  1
## 140 -114.0510 33.67293 1431258  1
## 160 -113.0664 33.03680 1839524  1
## 161 -112.6555 33.13089 1857413  1
dim(data)
## [1] 897   4
coords.map <- get_map(location=map_bounds, zoom = 8, source="stamen")
coords.map <- ggmap(coords.map, extent="device", legend="none")
coords.map <- coords.map + stat_density2d(data=data,  aes(x=longitude, y=latitude, fill=..level.., alpha=..level..), geom="polygon")
coords.map <- coords.map +   scale_fill_gradientn(colours=rev(brewer.pal(7, "Spectral")))

plot(coords.map)

We can overlay the actual observations to this map:

coords.map <- coords.map + geom_point(data=data,  aes(x=longitude, y=latitude), fill="red", shape=23, alpha=0.8)
plot(coords.map)

Let’s look at two other regions. First, a zoom in the region around Tucson, Arizona:

map_bounds <- c(left = -111.3, bottom = 32.1, right = -110.8, top = 32.5) # Tucson area
# Let's select only the data that is contained inside the map_bounds coordinates:
data = saguaro[which(saguaro$longitude>map_bounds["left"] & saguaro$longitude<map_bounds["right"] & saguaro$latitude>map_bounds["bottom"] & saguaro$latitude<map_bounds["top"]) , ]

coords.map <- get_map(location=map_bounds, zoom = 10, source="stamen")
coords.map <- ggmap(coords.map, extent="device", legend="none")
coords.map <- coords.map + stat_density2d(data=data,  aes(x=longitude, y=latitude, fill=..level.., alpha=..level..), geom="polygon")
coords.map <- coords.map +   scale_fill_gradientn(colours=rev(brewer.pal(7, "Spectral")))

plot(coords.map)

And now a zoom on the city of Phoenix, Arizona:

map_bounds <- c(left = -112.2, bottom = 33.3, right = -111.8, top = 33.6) # Phoenix area
data = saguaro[which(saguaro$longitude>map_bounds["left"] & saguaro$longitude<map_bounds["right"] & saguaro$latitude>map_bounds["bottom"] & saguaro$latitude<map_bounds["top"]) , ]
coords.map <- get_map(location=map_bounds, zoom = 10, source="stamen")
coords.map <- ggmap(coords.map, extent="device", legend="none")
coords.map <- coords.map + stat_density2d(data=data,  aes(x=longitude, y=latitude, fill=..level.., alpha=..level..), geom="polygon")
coords.map <- coords.map +   scale_fill_gradientn(colours=rev(brewer.pal(7, "Spectral")))
plot(coords.map)

Being curious, go look on google maps (or any other similar tool you would like) what could explain these two hotspots of Saguaros observations? Are these areas with conditions exceptionally favorable for saguaros toe grow? Is there another explanation?

Exercise (not graded - no need to submit a report):

Download the file aligators-MS.csv which contains the location of all iNaturalist reported observations of alligators in the state of Mississippi. Write some R code to generate a map showing the location of these observations.

First: using us_map and simply displaying observations as dots on the map:

Still using us_map, but with a spatial heatmap overlay:

Still using us_map but this time with a spatial heatmap overlay:

Zooming on Starkville:

Animating the maps with gganimate

In this final part of today’s lab, we will use the package gganimate to animate graphics, including maps. But let’s start with something easier: animating a simple plot. For this, we will reuse the weather station dataset from the previous lab.

# Make sure that you have installed both of these packages:
library(gganimate)
library(transformr)

theme_set(theme_bw())

# Loading the weather station datat
tw <- read.csv("data/2020-12-weather.csv", comment.char = "#")
head(tw)
##        Date temperature_max temperature_avg temperature_low humidity_high
## 1 12/1/2020            11.8             0.7            -5.6          0.90
## 2 12/2/2020            13.4             2.1            -6.3          0.90
## 3 12/3/2020            11.8             7.3             0.8          0.91
## 4 12/4/2020            11.2             6.6             0.3          0.91
## 5 12/5/2020            17.1             5.9            -1.3          0.92
## 6 12/6/2020            14.5             6.8            -0.8          0.92
##   humidity_avg humidity_low wind_max wind_avg
## 1         0.64         0.17      9.0      0.4
## 2         0.67         0.19      9.0      0.3
## 3         0.66         0.42      9.0      0.5
## 4         0.84         0.60     11.2      0.6
## 5         0.75         0.30      6.5      0.2
## 6         0.74         0.43      4.0      0.0
tw$Date <- as.Date(tw$Date, format = "%m/%d/%Y") # Converting dates from strings into actual Date objects
tw$day <- as.numeric(format(tw$Date, format="%d")) # Extracting the day information

# Let's start with a regular (not animated) plot:
gp <- ggplot(data = tw, aes(x = Date, y = temperature_max)) +
  geom_line() +
  geom_point()
gp

# And let's reveal the data day by day:
gp <- ggplot(data = tw, aes(x = Date, y = temperature_max)) +
  geom_line() +
  geom_point() +
  transition_reveal(day)
gp

A word of caution: just because you can animate graphs does not mean that you should! Use this with moderation, and always ask yourself: what is the best way to represent this data? Is it really necessary to use an animation?

Now, let’s animate our maps… You will need to download another iNat dataset, this time with the observation date included in it: gators-MS-with-dates.csv

# Using a data set with the observation date (to perform an animation based on time)
tg <- read.csv("data/iNat/gators-MS-with-dates.csv")
tg$Date <- as.Date(tg$observed_on)
tg$year <- as.numeric(format(tg$Date, format="%Y")) # Extracting the year information

map_bounds <- c(left = -88.9, bottom = 33.21, right = -88.67, top = 33.5) # Starkville
tgr = tg[which(tg$longitude>map_bounds["left"] & tg$longitude<map_bounds["right"] & tg$latitude>map_bounds["bottom"] & tg$latitude<map_bounds["top"]) , ] #No need to keep the data outside of the plotting area.
tgr <- tgr[which(tgr$year > 2016) , ] # Keeping only the data more recent than 2016

coords.map <- get_map(location=map_bounds, zoom = 12, source="stamen") # For Starkville-level
## ℹ Map tiles by Stamen Design, under CC BY 3.0. Data by OpenStreetMap, under ODbL.
gp <- ggmap(coords.map, extent="device", legend="none") +
  stat_density2d(data=tgr,  aes(x=longitude, y=latitude, fill=..level.., alpha=..level..), geom="polygon") +
  scale_fill_gradientn(colours=rev(brewer.pal(7, "Spectral"))) +
  transition_time(year)

# Not showing the animated map here, due to size limitation on github...
#gp

A few really good resources.

You will find many excellent resources for free on the web. Most of the questions you might have when using R have already been answered on https://stackoverflow.com/

Here are a few link to some online, open source books that are an absolute gold mine of information about R:

END OF ROADMAP.