Temperature Change in Ireland

Has Ireland gotten any warmer? Ask any punter on the street and they will happily inform you of wild swings, trends and dips. “Back when I was a child”, “when I was younger”, or “years ago” are the usual refrains.

What’s the evidence? To answer this, I will use the temperature data from my previous post alongside the R package bcp. The bcp package stands for Bayesian Change Point, and it implements the change point methodology of Barry & Hartigan (1993). A good overview of how to use the bcp R package is offered by Erdman & Emerson (2007).

These series are monthly and run from 1866 to 2011. There are two measures for each month, the average daily high or low. I use both here, and perform the multivariate bcp procedure for each month separately, with average high and low for each month.

The figure above shows the output produced when I look at the January temperatures. Clearly, there are no major changes. The January temperature in Ireland is the same now as it was in 1866. The results for the other months are very similar. However, there are some changes in the October series, with both the posterior means for both max and min temperatures increasing by around 1 degree Celsius (see below).

Has Ireland gotten warmer? Yes, but only in October, for some reason unknown to me.

# with web-scraped data from the previous post

library(bcp)

cold <- NULL
for (i in 1:12){
  mins <- armagh[armagh[,2]==i,4]
  cold <- cbind(cold,mins)
}
cold <- cold[14:159,]

warm <- NULL
for (i in 1:12){
  maxs <- armagh[armagh[,2]==i,3]
  warm <- cbind(warm,maxs)
}
warm <- warm[14:159,]

bcp.0 <- bcp(cbind(cold[,1],warm[,1]), burnin = 1000, mcmc = 5000)
plot(bcp.0)
bcp.0 <- bcp(cbind(cold[,10],warm[,10]), burnin = 1000, mcmc = 5000)
plot(bcp.0)
Advertisements

Web-Scraping in R

Web-scraping, or web-crawling, sounds like a seedy activity worthy of an Interpol investigative department. The reality, however, is far less nefarious. Web-scraping is any procedure by which someone extracts data from the internet. Given that it’s possible to get the internet on computers these days; web-scrapping opens an array of interesting possibilities to social-science researchers as it is possible to harvest massive datasets in short periods of times.

In the following code, I illustrate a very simple web-scraping routine. The object of this exercise is to scrape some Irish weather time-series, which I will look at in a future post. The main function of interest here is:

readLines(...)

which reads these data into the R workspace. The url object in my example is a .txt file, however if the url address is written in html the readLines command will read all of the lines of the html. In effect, the readLines object will be a character vector with a length equal to the number of lines in the html code. In this case, extracting the data you need from the crap you don’t requires a bit of jiggery-pokery formally referred to as parsing. Thankfully, there are a whole bunch of functions in the Hmisc and other packages which can be used to do this in a systematic way. I have found the following particularly useful for parsing:

substring.location(...) ; grep(...) ; substr(...) ; strsplit(...)
# clean data
rm(list=ls())

# web url
site <- "http://www.metoffice.gov.uk/climate/uk/stationdata/armaghdata.txt"

# call in data with try command in while loop
i <- 1
while (i < 2){
  aa <- try(read.table(site,sep="\t"))
  if (class(aa) == "try-error") {
    next
    } else {
      i <- i + 1
      }
  }

# grand! now inspect and trim off crap
aa <- aa[6:dim(aa)[1],]

# data is melted together so some tidying required
bb <- cc <- dd <- c()
for (i in (1:length(aa))){
	bb <- unlist(strsplit(as.character(aa[i]), " "))
    cc <- bb[nchar(bb)>0] ; cc <- cc[1:7]
    dd <- rbind(dd,cc)
}

row.names(dd) <- dd[,1]

colnm <- c(dd[1,1],dd[1,2],paste(dd[1,3],dd[2,1],sep=" "), paste(dd[1,4],
		dd[2,2],sep=" "), paste(dd[1,5],dd[2,3],sep=" "), 
		paste(dd[1,6],dd[2,4],sep=" "), paste(dd[1,7],dd[2,5],sep=" "))

colnames(dd) <- colnm

armagh <- data.frame(dd[-c(1,2),])

for (i in (1:dim(armagh)[2])){
	armagh[,i] <- as.numeric(as.character(armagh[,i]))
}

decmin <- armagh[armagh[,2]==12,4]
year <- armagh[armagh[,2]==12,1]
wh1 <- data.frame(cbind(armagh$tmin.degC[armagh$mm==12],armagh$yyyy[armagh$mm==12]))
wh1 <- na.omit(wh1)

# nice plot
library(ggplot2)
ggplot(wh1, aes(X2,X1)) + 
  geom_line(colour="red") + 
  theme_bw() +
  scale_x_continuous('Year') + 
  scale_y_continuous('Minimum Temperature - Degree Celsius') +
  opts(title = expression("December Average Daily Minimum Temperature - Armagh 1865-2011"))

In the script above, I call in these data, tidy them up and then do a pretty graph with the excellent ggplot2 package. I use a while loop with the try function to call in the data. This can be very important for those interested in scraping data systematically. Sometimes the readLines function will not be able to establish a connection to the url address of interest. The try function in the while loop here ensures that in the event that R is not able to make the connection, it will try again until a connection is established. The equivalent to this is pressing refresh in your internet browser. When scraping data iteratively from a large number of url addresses, connection difficulties are inevitable, and therefore using the try function in while loop can save a lot of hassle.

Happy scraping, ya filthy animal.