Kalkalash! Pinpointing the Moments “The Simpsons” became less Cromulent

Whenever somebody mentions “The Simpsons” it always stirs up feelings of nostalgia in me. The characters, uproarious gags, zingy one-liners, and edgy animation all contributed towards making, arguably, the greatest TV ever. However, it’s easy to forget that as a TV show “The Simpsons” is still ongoing—in its twenty-fourth season no less.

For me, and most others, the latter episodes bear little resemblance to older ones. The current incarnation of the show is stale, and has been for a long time. I haven’t watched a new episode in over ten years, and don’t intend to any time soon. When did this decline begin? Was it part of a slow secular trend, or was there a sudden drop in the quality, from which there was no recovery?

To answer these questions I use the Global Episode Opinion Survey (GEOS) episode ratings data, which are published online. A simple web scrape of the “all episodes” page provides me with 423 episode ratings, spanning from the first episode of season 1, to the third episode of season 20. After S20E03, the ratings become too sparse, which is probably a function of how bad the show, in its current condition, is. To detect changepoints in show ratings, I have used the R package changepoint. An informative introduction of both the package and changepoint analyses can be found in this accompanying vignette.

simpsons2

The figure above provides a summary of my results. Five breakpoints were detected. The first occurring in the first episode of the ninth season: The City of New York Vs. Homer Simpson. Most will remember this; Homer goes to New York to collect his clamped car and ends up going berserk. Good episode, although this essentially marked the beginning of the end.

According to the changepoint results, the decline occurred in three stages. The first lasted from the New York episode up until episode 11 in season 10. The shows in this stage have an average rating of about 7, and the episode where the breakpoint is detected is: Wild Barts Can’t Be Broken. The next stage roughly marks my transition, as it is about this point that I stopped watching. This stage lasts as far as S15E09, whereupon the show suffers the further ignominy of another ratings downgrade. Things possibly couldn’t get any worse, and they don’t, as the show earns a minor reprieve after the twentieth episode of season 18.

So now you know. R code for the analysis can be found in the below.

# packages
library(Hmisc) ; library(changepoint)
# clear ws
rm(list=ls())

# webscrape data
page1 = "http://www.geos.tv/index.php/list?sid=159&collection=all"
home1 = readLines(con<-url(page1)); close(con)

# pick out lines with ratings
means = '<td width="60px" align="right" nowrap>'
epis = home1[grep(means, home1)]
epis = epis[57:531]
epis = epis[49:474]

# prune data
loc = function(x) substring.location(x,"</span>")$first[1]
epis = data.frame(epis)
epis = cbind(epis,apply(epis, 1, loc))
epis$cut = NA
for(i in 1:dim(epis)[1]){
  epis[i,3] = substr(epis[i,1], epis[i,2]-4, epis[i,2]-1) 
}
#create data frame
ts1 = data.frame(rate=epis$cut, episode=50:475)
# remove out of season shows and movie
ts1 = ts1[!(ts1$episode %in% c(178,450,451)),]
# make numeric
ts1$rate = as.numeric(as.character(ts1$rate))

# changepoint function
mean2.pelt = cpt.mean(ts1$rate,method='PELT')

# plot results
plot(mean2.pelt,type='l',cpt.col='red',xlab='Episode',
     ylab='Average Rating',cpt.width=2, main="Changepoints in 'The Simpsons' Ratings")

# what episodes ?
# The City of New York vs. Homer Simpson
# Wild Barts Can't Be Broken
# I, (Annoyed Grunt)-Bot - 
# Stop Or My Dog Will Shoot!
About these ads

20 thoughts on “Kalkalash! Pinpointing the Moments “The Simpsons” became less Cromulent

  1. Interesting! I’m always interested in what data people like to analyze. (I might have to borrow this example for one of my classes.)

    I’m not sure whether the ratings are repeated measures for the same people, or by different people, or both.

    Anyway, I would be more comfortable with continuous changes rather than jumps (people gradually gaining or losing interest), and would try lowess for example here.

    There’s probably also some short-term dependence that might be included by using time-series methods on these data.

  2. I tried lowess and a time series model, though they don’t shed much light. Code, to add to the bottom of yours:

    # lowess
    plot(ts1$rate,type=”l”)
    lines(lowess(ts1$rate))
    # auto.arima
    library(forecast)
    ts1.aa=auto.arima(ts1$rate)
    ts1.aa
    plot(forecast(ts1.aa,h=100))

    The lowess just shows a general downward trend, and the time series obtains an MA(1) model on the first differences.

    I played with the lowess a little. Though the default f=2/3 usually works pretty well for me, I found that f=0.2 seemed to track the trend better. There’s always a tradeoff between insight and over-smoothing, though: is that really a decrease and increase at the end, or just chance?

    Thanks for sharing your code, so that this mini-analysis took me about 2 minutes!

    • Thanks for the reply Ken.

      You are right, there are loads of different ways in which one could model this series. However, given the question at hand, trying to pinpoint the episode “The Simpsons” went sour and the show’s evolution from that point. For me, the change point technique is the most appropriate in this case.

  3. Pingback: TV shows rated by episode as a Shiny App | PremierSoccerStats

  4. Pingback: Reading Digest: Quantifying Zombie Simpsons Edition | Dead Homer Society

  5. Pingback: Momento R do Dia | De Gustibus Non Est Disputandum

  6. Pingback: Weekly links for May 13 | God plays dice

    • Hi Wes, That was just me being lazy. I think I just used the plot function that comes with the changepoint package. I thought this would be fine because the first 50 shows weren’t really “The Simpsons” per se, but more part of another show.

  7. You’ve made some decent points there. I looked on the web to find out more about the issue and found most individuals will go along with your views on this web site.

  8. Pingback: When did “How I Met Your Mother” become less legen.. wait for it… | Data and Analysis with R, for Work and Fun

  9. Pingback: R e a TV: quebra estrutural no blog? | De Gustibus Non Est Disputandum

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s