A quick post showing how to extract data from a website and make a few plots. I chose the Mount Shasta Avalanche Center data because I monitor this everyday throughout the season to see how the avalanche forecast changes and how the snowpack is developing.

R vest

There is a great website scraping package that is part of the tidyverse called Rvest. Check out the Documentation.

library(rvest)
library(tidyverse)
library(lubridate)
html <- read_html("https://www.shastaavalanche.org/page/seasonal-weather-history-mount-shasta")

# right click on the page to see the table
html %>%
    html_element(".msac-wx-history-table") %>%
    html_table()
## # A tibble: 13 × 2
##    `Weather History Summary from Oct 1, 2021 to Feb 9, 2022` `Weather History S…
##    <chr>                                                                   <dbl>
##  1 Temp Max (°F)                                                            58  
##  2 Temp Min (°F)                                                             7.5
##  3 Temp Avg (°F)                                                            32  
##  4 Wind Max (mi/hr)                                                         57.5
##  5 Wind Min (mi/hr)                                                          0  
##  6 Wind Avg (mi/hr)                                                         12  
##  7 Wind Gust Max (mi/hr)                                                    92.0
##  8 Total Snowfall (in)                                                     152. 
##  9 Total Accumulated Precipitation (Water Equivalent) (in)                  12.7
## 10 Max Snowfall in 24 Hrs (in)                                              20.6
## 11 Snow Depth Max (in)                                                      81.5
## 12 Snow Depth Min (in)                                                       0  
## 13 Snow Depth Avg (in)                                                      48
# Right click on the page and get the xpath to a specific table
xpath <- "/html/body/div[2]/main/div/article/div/table[2]"
weather <- html_nodes(html, xpath = xpath)
html_table(weather)
## [[1]]
## # A tibble: 100 × 21
##    `Observed and Fore… `Observed and Fore… `Observed and For… `Observed and For…
##    <chr>               <chr>               <chr>              <chr>             
##  1 ""                  Ob Temp (°F)        Ob Temp (°F)       Ob Temp (°F)      
##  2 "Date"              Min                 Max                Avg               
##  3 "2022 02/08"        37                  51                 42                
##  4 "2022 02/07"        37.5                53                 43.5              
##  5 "2022 02/06"        37                  47                 42.5              
##  6 "2022 02/05"        34.5                48.5               42                
##  7 "2022 02/04"        31                  46.5               37                
##  8 "2022 02/03"        28                  39.5               32.5              
##  9 "2022 02/02"        29                  43.5               37                
## 10 "2022 02/01"        19.5                39                 31                
## # … with 90 more rows, and 17 more variables:
## #   Observed and Forecast Weather by Day <chr>,
## #   Observed and Forecast Weather by Day <chr>,
## #   Observed and Forecast Weather by Day <chr>,
## #   Observed and Forecast Weather by Day <chr>,
## #   Observed and Forecast Weather by Day <chr>,
## #   Observed and Forecast Weather by Day <chr>, …
# make a data.frame with the table
weather2 <- as.data.frame(html_table(weather, fill=TRUE))

# rename columns
names(weather2) <- paste(weather2[1,], weather2[2,])
names(weather2)
##  [1] " Date"                       "Ob Temp (°F) Min"           
##  [3] "Ob Temp (°F) Max"            "Ob Temp (°F) Avg"           
##  [5] "Ob Wind (mi/hr) Min"         "Ob Wind (mi/hr) Max"        
##  [7] "Ob Wind (mi/hr) Avg"         "Ob Wind (mi/hr) Gust"       
##  [9] "Ob Wind (mi/hr) Dir"         "Ob Snow (in) HS"            
## [11] "Ob Snow (in) HN24"           "Ob Snow (in) SWE"           
## [13] "Ob Snow (in) Total Snowfall" "Fx Temp (°F) Min"           
## [15] "Fx Temp (°F) Max"            "Fx Wind (mi/hr) Min"        
## [17] "Fx Wind (mi/hr) Max"         "Fx Snow (in) Min"           
## [19] "Fx Snow (in) Max"            "Fx Snow (in) SWE"           
## [21] "Fx Rating "
names(weather2)[1] <- paste("date")

# remove rows that are now column names
weather2 <- weather2[-c(1,2),]

# take a look
glimpse(weather2)
## Rows: 98
## Columns: 21
## $ date                          <chr> "2022 02/08", "2022 02/07", "2022 02/06"…
## $ `Ob Temp (°F) Min`            <chr> "37", "37.5", "37", "34.5", "31", "28", …
## $ `Ob Temp (°F) Max`            <chr> "51", "53", "47", "48.5", "46.5", "39.5"…
## $ `Ob Temp (°F) Avg`            <chr> "42", "43.5", "42.5", "42", "37", "32.5"…
## $ `Ob Wind (mi/hr) Min`         <chr> "2.5", "1.5", "0.5", "1", "0", "2", "5.5…
## $ `Ob Wind (mi/hr) Max`         <chr> "23", "7", "24.5", "33.5", "15.5", "16",…
## $ `Ob Wind (mi/hr) Avg`         <chr> "9", "3.5", "12.5", "16.5", "5.5", "7.5"…
## $ `Ob Wind (mi/hr) Gust`        <chr> "36.8", "24.54", "36.8", "49.07", "30.66…
## $ `Ob Wind (mi/hr) Dir`         <chr> "ESE", "SE", "E", "ENE", "WNW", "ESE", "…
## $ `Ob Snow (in) HS`             <chr> "60.9", "60.75", "60.66", "61.3", "61.72…
## $ `Ob Snow (in) HN24`           <chr> "0", "0", "0", "0", "0", "0", "0", "0", …
## $ `Ob Snow (in) SWE`            <chr> "0", "0", "0", "0", "0", "0", "0", "0", …
## $ `Ob Snow (in) Total Snowfall` <chr> "151.7", "151.7", "151.7", "151.7", "151…
## $ `Fx Temp (°F) Min`            <chr> "38", "38", "39", "35", "30", "30", "24"…
## $ `Fx Temp (°F) Max`            <chr> "50", "50", "47", "45", "43", "34", "35"…
## $ `Fx Wind (mi/hr) Min`         <chr> "15", "15", "10", "15", "10", "20", "50"…
## $ `Fx Wind (mi/hr) Max`         <chr> "20", "35", "20", "25", "15", "25", "60"…
## $ `Fx Snow (in) Min`            <chr> "0", "0", "0", "0", "0", "0", "0", "0", …
## $ `Fx Snow (in) Max`            <chr> "0", "0", "0", "0", "0", "0", "0", "0", …
## $ `Fx Snow (in) SWE`            <chr> "0", "0", "0", "0", "0", "0", "0", "0", …
## $ `Fx Rating `                  <chr> "LOW", "LOW", "LOW", "LOW", "LOW", "LOW"…
# columns that are numeric should be converted back to such. They were coerced into character vectors because of the first two rows were characters.
weather2 <- weather2 %>%
mutate_at(c(2:8), as.numeric)

weather2 <- weather2 %>%
mutate_at(c(10:20), as.numeric)

# coerce date column
weather2 <- weather2 %>%
mutate_at(1, as_date)

# take a quick look
head(weather2)
##         date Ob Temp (°F) Min Ob Temp (°F) Max Ob Temp (°F) Avg
## 3 2022-02-08             37.0             51.0             42.0
## 4 2022-02-07             37.5             53.0             43.5
## 5 2022-02-06             37.0             47.0             42.5
## 6 2022-02-05             34.5             48.5             42.0
## 7 2022-02-04             31.0             46.5             37.0
## 8 2022-02-03             28.0             39.5             32.5
##   Ob Wind (mi/hr) Min Ob Wind (mi/hr) Max Ob Wind (mi/hr) Avg
## 3                 2.5                23.0                 9.0
## 4                 1.5                 7.0                 3.5
## 5                 0.5                24.5                12.5
## 6                 1.0                33.5                16.5
## 7                 0.0                15.5                 5.5
## 8                 2.0                16.0                 7.5
##   Ob Wind (mi/hr) Gust Ob Wind (mi/hr) Dir Ob Snow (in) HS Ob Snow (in) HN24
## 3                36.80                 ESE           60.90                 0
## 4                24.54                  SE           60.75                 0
## 5                36.80                   E           60.66                 0
## 6                49.07                 ENE           61.30                 0
## 7                30.66                 WNW           61.72                 0
## 8                30.66                 ESE           61.71                 0
##   Ob Snow (in) SWE Ob Snow (in) Total Snowfall Fx Temp (°F) Min
## 3                0                       151.7               38
## 4                0                       151.7               38
## 5                0                       151.7               39
## 6                0                       151.7               35
## 7                0                       151.7               30
## 8                0                       151.7               30
##   Fx Temp (°F) Max Fx Wind (mi/hr) Min Fx Wind (mi/hr) Max Fx Snow (in) Min
## 3               50                  15                  20                0
## 4               50                  15                  35                0
## 5               47                  10                  20                0
## 6               45                  15                  25                0
## 7               43                  10                  15                0
## 8               34                  20                  25                0
##   Fx Snow (in) Max Fx Snow (in) SWE Fx Rating 
## 3                0                0        LOW
## 4                0                0        LOW
## 5                0                0        LOW
## 6                0                0        LOW
## 7                0                0        LOW
## 8                0                0        LOW
glimpse(weather2)
## Rows: 98
## Columns: 21
## $ date                          <date> 2022-02-08, 2022-02-07, 2022-02-06, 202…
## $ `Ob Temp (°F) Min`            <dbl> 37.0, 37.5, 37.0, 34.5, 31.0, 28.0, 29.0…
## $ `Ob Temp (°F) Max`            <dbl> 51.0, 53.0, 47.0, 48.5, 46.5, 39.5, 43.5…
## $ `Ob Temp (°F) Avg`            <dbl> 42.0, 43.5, 42.5, 42.0, 37.0, 32.5, 37.0…
## $ `Ob Wind (mi/hr) Min`         <dbl> 2.5, 1.5, 0.5, 1.0, 0.0, 2.0, 5.5, 6.5, …
## $ `Ob Wind (mi/hr) Max`         <dbl> 23.0, 7.0, 24.5, 33.5, 15.5, 16.0, 47.5,…
## $ `Ob Wind (mi/hr) Avg`         <dbl> 9.0, 3.5, 12.5, 16.5, 5.5, 7.5, 25.0, 20…
## $ `Ob Wind (mi/hr) Gust`        <dbl> 36.80, 24.54, 36.80, 49.07, 30.66, 30.66…
## $ `Ob Wind (mi/hr) Dir`         <chr> "ESE", "SE", "E", "ENE", "WNW", "ESE", "…
## $ `Ob Snow (in) HS`             <dbl> 60.90, 60.75, 60.66, 61.30, 61.72, 61.71…
## $ `Ob Snow (in) HN24`           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ `Ob Snow (in) SWE`            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ `Ob Snow (in) Total Snowfall` <dbl> 151.7, 151.7, 151.7, 151.7, 151.7, 151.7…
## $ `Fx Temp (°F) Min`            <dbl> 38, 38, 39, 35, 30, 30, 24, 19, 19, 31, …
## $ `Fx Temp (°F) Max`            <dbl> 50, 50, 47, 45, 43, 34, 35, 32, 32, 43, …
## $ `Fx Wind (mi/hr) Min`         <dbl> 15, 15, 10, 15, 10, 20, 50, 25, 20, 0, 5…
## $ `Fx Wind (mi/hr) Max`         <dbl> 20, 35, 20, 25, 15, 25, 60, 30, 45, 5, 1…
## $ `Fx Snow (in) Min`            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ `Fx Snow (in) Max`            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ `Fx Snow (in) SWE`            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ `Fx Rating `                  <chr> "LOW", "LOW", "LOW", "LOW", "LOW", "LOW"…
# Quick few plots to make sure everything looks reasonable
weather_plot <- ggplot(weather2, aes(x=date, y=`Fx Snow (in) Min`)) +
  geom_point()
weather_plot

weather_plot2 <- ggplot(weather2, aes(x=date, y=`Fx Wind (mi/hr) Max`)) +
  geom_point()
weather_plot2

Up next: Making sure the data is cleaned up after the scrape and coercion.