Press "Enter" to skip to content

Scraping Tables from Wikipedia for Visualizing Climate Data

If anyone else is like me, eventually when looking up a future destination you will stumble across the climate data table on Wikipedia. There is a lot of great information, but if you are planning a trip you might just want to see at a glance the temperature ranges for the months you are interested in traveling.

This script should help you scrape tables from Wikipedia

The first step is always including packages, part of what makes the R ecosystem so wonderful.

Required Packages

Next up is specifying what you want to scrape, and grab the data using rvest.

Now that the data is in our dataframe we can set the column names. The tables don’t get read in with proper column names, but we can use the first row of data as our column names. We will have to set the first column to be measurement though.

Now we get into the data manipulation part. The data first gets piped into the melt function, which esentially converts this from a wide format to long format (for lack of better words). Once melted, the melted data is piped into filter from the dplyr package. This lets us easily keep only the melted rows we need.

Once manipulated, we can clean the data. In this case, we are just removing the extra data included in the tables (_°c), trimming the whitespace, and replacing the long hyphen with the short hyphen. The hyphen issue affects casting the character to numeric.

Now for the fun part

Phoenix Climate Data

Sharing is caring!

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *