A quick post showing how to extract data from a website and make a few plots. I chose the Mount Shasta Avalanche Center data because I monitor this everyday throughout the season to see how the avalanche forecast changes and how the snowpack is developing.
R vest
There is a great website scraping package that is part of the tidyverse called Rvest. Check out the Documentation.
library(rvest)
library(tidyverse)
library(lubridate)
html <- read_html("https://www.shastaavalanche.org/page/seasonal-weather-history-mount-shasta")
# right click on the page to see the table
html %>%
html_element(".msac-wx-history-table") %>%
html_table()
## # A tibble: 13 × 2
## `Weather History Summary from Oct 1, 2021 to Feb 9, 2022` `Weather History S…
## <chr> <dbl>
## 1 Temp Max (°F) 58
## 2 Temp Min (°F) 7.5
## 3 Temp Avg (°F) 32
## 4 Wind Max (mi/hr) 57.5
## 5 Wind Min (mi/hr) 0
## 6 Wind Avg (mi/hr) 12
## 7 Wind Gust Max (mi/hr) 92.0
## 8 Total Snowfall (in) 152.
## 9 Total Accumulated Precipitation (Water Equivalent) (in) 12.7
## 10 Max Snowfall in 24 Hrs (in) 20.6
## 11 Snow Depth Max (in) 81.5
## 12 Snow Depth Min (in) 0
## 13 Snow Depth Avg (in) 48
# Right click on the page and get the xpath to a specific table
xpath <- "/html/body/div[2]/main/div/article/div/table[2]"
weather <- html_nodes(html, xpath = xpath)
html_table(weather)
## [[1]]
## # A tibble: 100 × 21
## `Observed and Fore… `Observed and Fore… `Observed and For… `Observed and For…
## <chr> <chr> <chr> <chr>
## 1 "" Ob Temp (°F) Ob Temp (°F) Ob Temp (°F)
## 2 "Date" Min Max Avg
## 3 "2022 02/08" 37 51 42
## 4 "2022 02/07" 37.5 53 43.5
## 5 "2022 02/06" 37 47 42.5
## 6 "2022 02/05" 34.5 48.5 42
## 7 "2022 02/04" 31 46.5 37
## 8 "2022 02/03" 28 39.5 32.5
## 9 "2022 02/02" 29 43.5 37
## 10 "2022 02/01" 19.5 39 31
## # … with 90 more rows, and 17 more variables:
## # Observed and Forecast Weather by Day <chr>,
## # Observed and Forecast Weather by Day <chr>,
## # Observed and Forecast Weather by Day <chr>,
## # Observed and Forecast Weather by Day <chr>,
## # Observed and Forecast Weather by Day <chr>,
## # Observed and Forecast Weather by Day <chr>, …
# make a data.frame with the table
weather2 <- as.data.frame(html_table(weather, fill=TRUE))
# rename columns
names(weather2) <- paste(weather2[1,], weather2[2,])
names(weather2)
## [1] " Date" "Ob Temp (°F) Min"
## [3] "Ob Temp (°F) Max" "Ob Temp (°F) Avg"
## [5] "Ob Wind (mi/hr) Min" "Ob Wind (mi/hr) Max"
## [7] "Ob Wind (mi/hr) Avg" "Ob Wind (mi/hr) Gust"
## [9] "Ob Wind (mi/hr) Dir" "Ob Snow (in) HS"
## [11] "Ob Snow (in) HN24" "Ob Snow (in) SWE"
## [13] "Ob Snow (in) Total Snowfall" "Fx Temp (°F) Min"
## [15] "Fx Temp (°F) Max" "Fx Wind (mi/hr) Min"
## [17] "Fx Wind (mi/hr) Max" "Fx Snow (in) Min"
## [19] "Fx Snow (in) Max" "Fx Snow (in) SWE"
## [21] "Fx Rating "
names(weather2)[1] <- paste("date")
# remove rows that are now column names
weather2 <- weather2[-c(1,2),]
# take a look
glimpse(weather2)
## Rows: 98
## Columns: 21
## $ date <chr> "2022 02/08", "2022 02/07", "2022 02/06"…
## $ `Ob Temp (°F) Min` <chr> "37", "37.5", "37", "34.5", "31", "28", …
## $ `Ob Temp (°F) Max` <chr> "51", "53", "47", "48.5", "46.5", "39.5"…
## $ `Ob Temp (°F) Avg` <chr> "42", "43.5", "42.5", "42", "37", "32.5"…
## $ `Ob Wind (mi/hr) Min` <chr> "2.5", "1.5", "0.5", "1", "0", "2", "5.5…
## $ `Ob Wind (mi/hr) Max` <chr> "23", "7", "24.5", "33.5", "15.5", "16",…
## $ `Ob Wind (mi/hr) Avg` <chr> "9", "3.5", "12.5", "16.5", "5.5", "7.5"…
## $ `Ob Wind (mi/hr) Gust` <chr> "36.8", "24.54", "36.8", "49.07", "30.66…
## $ `Ob Wind (mi/hr) Dir` <chr> "ESE", "SE", "E", "ENE", "WNW", "ESE", "…
## $ `Ob Snow (in) HS` <chr> "60.9", "60.75", "60.66", "61.3", "61.72…
## $ `Ob Snow (in) HN24` <chr> "0", "0", "0", "0", "0", "0", "0", "0", …
## $ `Ob Snow (in) SWE` <chr> "0", "0", "0", "0", "0", "0", "0", "0", …
## $ `Ob Snow (in) Total Snowfall` <chr> "151.7", "151.7", "151.7", "151.7", "151…
## $ `Fx Temp (°F) Min` <chr> "38", "38", "39", "35", "30", "30", "24"…
## $ `Fx Temp (°F) Max` <chr> "50", "50", "47", "45", "43", "34", "35"…
## $ `Fx Wind (mi/hr) Min` <chr> "15", "15", "10", "15", "10", "20", "50"…
## $ `Fx Wind (mi/hr) Max` <chr> "20", "35", "20", "25", "15", "25", "60"…
## $ `Fx Snow (in) Min` <chr> "0", "0", "0", "0", "0", "0", "0", "0", …
## $ `Fx Snow (in) Max` <chr> "0", "0", "0", "0", "0", "0", "0", "0", …
## $ `Fx Snow (in) SWE` <chr> "0", "0", "0", "0", "0", "0", "0", "0", …
## $ `Fx Rating ` <chr> "LOW", "LOW", "LOW", "LOW", "LOW", "LOW"…
# columns that are numeric should be converted back to such. They were coerced into character vectors because of the first two rows were characters.
weather2 <- weather2 %>%
mutate_at(c(2:8), as.numeric)
weather2 <- weather2 %>%
mutate_at(c(10:20), as.numeric)
# coerce date column
weather2 <- weather2 %>%
mutate_at(1, as_date)
# take a quick look
head(weather2)
## date Ob Temp (°F) Min Ob Temp (°F) Max Ob Temp (°F) Avg
## 3 2022-02-08 37.0 51.0 42.0
## 4 2022-02-07 37.5 53.0 43.5
## 5 2022-02-06 37.0 47.0 42.5
## 6 2022-02-05 34.5 48.5 42.0
## 7 2022-02-04 31.0 46.5 37.0
## 8 2022-02-03 28.0 39.5 32.5
## Ob Wind (mi/hr) Min Ob Wind (mi/hr) Max Ob Wind (mi/hr) Avg
## 3 2.5 23.0 9.0
## 4 1.5 7.0 3.5
## 5 0.5 24.5 12.5
## 6 1.0 33.5 16.5
## 7 0.0 15.5 5.5
## 8 2.0 16.0 7.5
## Ob Wind (mi/hr) Gust Ob Wind (mi/hr) Dir Ob Snow (in) HS Ob Snow (in) HN24
## 3 36.80 ESE 60.90 0
## 4 24.54 SE 60.75 0
## 5 36.80 E 60.66 0
## 6 49.07 ENE 61.30 0
## 7 30.66 WNW 61.72 0
## 8 30.66 ESE 61.71 0
## Ob Snow (in) SWE Ob Snow (in) Total Snowfall Fx Temp (°F) Min
## 3 0 151.7 38
## 4 0 151.7 38
## 5 0 151.7 39
## 6 0 151.7 35
## 7 0 151.7 30
## 8 0 151.7 30
## Fx Temp (°F) Max Fx Wind (mi/hr) Min Fx Wind (mi/hr) Max Fx Snow (in) Min
## 3 50 15 20 0
## 4 50 15 35 0
## 5 47 10 20 0
## 6 45 15 25 0
## 7 43 10 15 0
## 8 34 20 25 0
## Fx Snow (in) Max Fx Snow (in) SWE Fx Rating
## 3 0 0 LOW
## 4 0 0 LOW
## 5 0 0 LOW
## 6 0 0 LOW
## 7 0 0 LOW
## 8 0 0 LOW
glimpse(weather2)
## Rows: 98
## Columns: 21
## $ date <date> 2022-02-08, 2022-02-07, 2022-02-06, 202…
## $ `Ob Temp (°F) Min` <dbl> 37.0, 37.5, 37.0, 34.5, 31.0, 28.0, 29.0…
## $ `Ob Temp (°F) Max` <dbl> 51.0, 53.0, 47.0, 48.5, 46.5, 39.5, 43.5…
## $ `Ob Temp (°F) Avg` <dbl> 42.0, 43.5, 42.5, 42.0, 37.0, 32.5, 37.0…
## $ `Ob Wind (mi/hr) Min` <dbl> 2.5, 1.5, 0.5, 1.0, 0.0, 2.0, 5.5, 6.5, …
## $ `Ob Wind (mi/hr) Max` <dbl> 23.0, 7.0, 24.5, 33.5, 15.5, 16.0, 47.5,…
## $ `Ob Wind (mi/hr) Avg` <dbl> 9.0, 3.5, 12.5, 16.5, 5.5, 7.5, 25.0, 20…
## $ `Ob Wind (mi/hr) Gust` <dbl> 36.80, 24.54, 36.80, 49.07, 30.66, 30.66…
## $ `Ob Wind (mi/hr) Dir` <chr> "ESE", "SE", "E", "ENE", "WNW", "ESE", "…
## $ `Ob Snow (in) HS` <dbl> 60.90, 60.75, 60.66, 61.30, 61.72, 61.71…
## $ `Ob Snow (in) HN24` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ `Ob Snow (in) SWE` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ `Ob Snow (in) Total Snowfall` <dbl> 151.7, 151.7, 151.7, 151.7, 151.7, 151.7…
## $ `Fx Temp (°F) Min` <dbl> 38, 38, 39, 35, 30, 30, 24, 19, 19, 31, …
## $ `Fx Temp (°F) Max` <dbl> 50, 50, 47, 45, 43, 34, 35, 32, 32, 43, …
## $ `Fx Wind (mi/hr) Min` <dbl> 15, 15, 10, 15, 10, 20, 50, 25, 20, 0, 5…
## $ `Fx Wind (mi/hr) Max` <dbl> 20, 35, 20, 25, 15, 25, 60, 30, 45, 5, 1…
## $ `Fx Snow (in) Min` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ `Fx Snow (in) Max` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ `Fx Snow (in) SWE` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ `Fx Rating ` <chr> "LOW", "LOW", "LOW", "LOW", "LOW", "LOW"…
# Quick few plots to make sure everything looks reasonable
weather_plot <- ggplot(weather2, aes(x=date, y=`Fx Snow (in) Min`)) +
geom_point()
weather_plot
weather_plot2 <- ggplot(weather2, aes(x=date, y=`Fx Wind (mi/hr) Max`)) +
geom_point()
weather_plot2
Up next: Making sure the data is cleaned up after the scrape and coercion.