For this exercise, you will need to download the file: temp_stations.csv
Load the content of this file into a variable in your R environment with the following command:
t <- read.csv("data/temp_stations.csv") # You may need to adjust the path to the file, depending on how you have organized your folders.
This file contains daily temperature records for weather
stations located in Starkville, Columbs, Jackson, Meridian and
Biloxi:
head(t)
## datetime starkville columbus meridian biloxi
## 1 2022-01-01 18.1 15.3903241 19.1763887 19.8906805
## 2 2022-01-02 -0.9 0.7630577 -0.3803543 0.8910247
## 3 2022-01-03 -1.9 -7.3876835 -0.8971593 1.0825752
## 4 2022-01-04 NA 0.3948917 -0.1458191 5.6266367
## 5 2022-01-05 1.1 -1.7254413 -1.8997366 8.0543028
## 6 2022-01-06 -1.6 -4.0392856 -0.9410461 -0.2371790
Before we start the actual exercise, let’s have a quick look at
the data. Your first task is to plot the temperatures for the Starkville
station as a function of the date. Your plot should like this:
For the following exercise, you will not write a single for loop! The list of functions which you will need for part 1 of this exercise is (do not use any other functions!):
colnames (optional)
apply
min
which.min
names (optional)
table
month
by
t.test
Your dataframe should now look like this:
date | starkville | columbus | meridian | biloxi | mintemp |
---|---|---|---|---|---|
2022-01-01 | 18.1 | 15.3903241 | 19.1763887 | 19.8906805 | 15.3903241 |
2022-01-02 | -0.9 | 0.7630577 | -0.3803543 | 0.8910247 | -0.9000000 |
2022-01-03 | -1.9 | -7.3876835 | -0.8971593 | 1.0825752 | -7.3876835 |
2022-01-04 | NA | 0.3948917 | -0.1458191 | 5.6266367 | -0.1458191 |
2022-01-05 | 1.1 | -1.7254413 | -1.8997366 | 8.0543028 | -1.8997366 |
2022-01-06 | -1.6 | -4.0392856 | -0.9410461 | -0.2371790 | -4.0392856 |
To do so, we will start by adding a column indicating which
station was the coldest. We’ll start by recording the indice (starkville
= 1, columbus = 2, meridian = 3, …) and we can match the indice to the
name later.
date | starkville | columbus | meridian | biloxi | mintemp | coldestStation |
---|---|---|---|---|---|---|
2022-01-01 | 18.1 | 15.3903241 | 19.1763887 | 19.8906805 | 15.3903241 | 2 |
2022-01-02 | -0.9 | 0.7630577 | -0.3803543 | 0.8910247 | -0.9000000 | 1 |
2022-01-03 | -1.9 | -7.3876835 | -0.8971593 | 1.0825752 | -7.3876835 | 2 |
2022-01-04 | NA | 0.3948917 | -0.1458191 | 5.6266367 | -0.1458191 | 3 |
2022-01-05 | 1.1 | -1.7254413 | -1.8997366 | 8.0543028 | -1.8997366 | 3 |
2022-01-06 | -1.6 | -4.0392856 | -0.9410461 | -0.2371790 | -4.0392856 | 2 |
You can now use the function table to find how
many times each indice appears in the colum
coldestStation. Store the result in a variable named
cold
##
## 1 2 3 4
## 82 143 137 3
Use the function names to make your variable
cold look nicer:
## starkville columbus meridian biloxi
## 82 143 137 3
Add a column containing the month of each observation and then
use the function by to obtain the minimum temperature
recorded in Starkville for each month.
## t$month: April
## [1] 0.1
## ------------------------------------------------------------
## t$month: August
## [1] 20.1
## ------------------------------------------------------------
## t$month: December
## [1] -13.9
## ------------------------------------------------------------
## t$month: February
## [1] -6.1
## ------------------------------------------------------------
## t$month: January
## [1] -8.3
## ------------------------------------------------------------
## t$month: July
## [1] 21.1
## ------------------------------------------------------------
## t$month: June
## [1] 16.1
## ------------------------------------------------------------
## t$month: March
## [1] -4.9
## ------------------------------------------------------------
## t$month: May
## [1] 10.1
## ------------------------------------------------------------
## t$month: November
## [1] -4.9
## ------------------------------------------------------------
## t$month: October
## [1] -1.9
## ------------------------------------------------------------
## t$month: September
## [1] 8.1
According to this data, were the 2022 temperatures statistically
significantly different between Starkville and Columbus? (at the 5%
confidence level)
How about between Starkville and Biloxi?
We will use the functionalities of tibbles and
a few R packages to quickly generate a beautiful plot of the
temperatures as a function of the date for each station.
Convert the dataframe t into a tibble
tb:
library("tibble")
library("dplyr")
library("tidyr")
library("ggplot2")
# Let's restart from a fresh version of the t dataframe:
t <- read.csv("data/temp_stations.csv")
tb <- tibble(t)
tb
## # A tibble: 365 × 5
## datetime starkville columbus meridian biloxi
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 2022-01-01 18.1 15.4 19.2 19.9
## 2 2022-01-02 -0.9 0.763 -0.380 0.891
## 3 2022-01-03 -1.9 -7.39 -0.897 1.08
## 4 2022-01-04 NA 0.395 -0.146 5.63
## 5 2022-01-05 1.1 -1.73 -1.90 8.05
## 6 2022-01-06 -1.6 -4.04 -0.941 -0.237
## 7 2022-01-07 -3.9 -4.84 -4.30 1.73
## 8 2022-01-08 -2.9 -2.87 -2.16 -0.184
## 9 2022-01-09 6.2 2.84 3.11 9.73
## 10 2022-01-10 -0.9 2.45 -1.92 3.30
## # … with 355 more rows
Compare what you see (when you type tb) with
what happens when you display a dataframe (by typing t
in the R console)
Before we can plot the data with ggplot, we need to convert our
dataframe tibble so that there is only one column with the
value to be ploted (temperature) and one columns indicating the name of
the station. Use the function pivot_long (from the
package dplyr) to perform this miracle and make sure that your tibble
now looks like this:
## # A tibble: 1,460 × 3
## datetime station temperature
## <chr> <chr> <dbl>
## 1 2022-01-01 starkville 18.1
## 2 2022-01-01 columbus 15.4
## 3 2022-01-01 meridian 19.2
## 4 2022-01-01 biloxi 19.9
## 5 2022-01-02 starkville -0.9
## 6 2022-01-02 columbus 0.763
## 7 2022-01-02 meridian -0.380
## 8 2022-01-02 biloxi 0.891
## 9 2022-01-03 starkville -1.9
## 10 2022-01-03 columbus -7.39
## # … with 1,450 more rows
Use the function mutate to generate a column
date which contains a date object.
You now have everything ready to generate a plot like this: