Identical Champions League Draw: What Were the Odds?

Posted in Football, R, Simulation by diffuseprior

A number of news outlets have reported a peculiar quirk that arose during Friday’s Champions League draw. Apparently, the sport’s European governing body, UEFA, ran a trial run the day before the main event, and the schedule chosen during this event was identical to that of the actual draw on Friday.

Given this strange coincidence, a number of people have been expressing the various odds of this occurrence. For example, the author of this newspaper article claimed that ‘bookies’ calculated the odds at 5,000 to 1. In other words, the probability of this event was 0.0002.

The same article also says that the probability of this event (two random draws being identical) occurring is not as low as one might think. However, this article does not give the probability or odds of this event occurring. The oblivious reason for this is that such a calculation is difficult. Since teams from the same domestic league and teams from the same country cannot play each other, such a calculation involves using conditional probabilities over a variety of scenarios.

Despite my training in Mathematics and interest in quantitative pursuits, I have always struggled to calculate the probability of multiple conditional events. Given that there are many different ways in which two identical draws can be made, such a calculation is, unfortunately, beyond my admittedly limited ability.

Thankfully, there’s a cheats way to getting a rough answer: use Monte Carlo simulation. The code below shows how to write up a function in R that performs synthetic draws for the Champions League given the aforementioned conditions. With this function, I performed two draws 200,000 times, and calculated that the probability of the identical draws is: 0.00011, so the odds are around: 1 in 9,090. This probability is subject to some sampling error, however getting a more accurate measure via simulation would require more computing power like that enabled by Rcpp (which I really need to learn). Nevertheless, the answer is clearly lower than that proposed either by the ‘bookies’ or the newspaper article’s author.

# cl draw
rm(list=ls())
setwd("C:/Users/Alan/Desktop")
dat <- read.csv("cldraw.csv")

#=============================
> dat
team iso pos group
1       Galatasaray TUR  RU     H
2           Schalke GER  WI     B
3            Celtic SCO  RU     G
4          Juventus ITA  WI     E
5           Arsenal ENG  RU     B
6            Bayern GER  WI     F
7  Shakhtar Donetsk UKR  RU     E
8          Dortmund GER  WI     D
9             Milan ITA  RU     C
10        Barcelona ESP  WI     G
11      Real Madrid ESP  RU     D
12      Man. United ENG  WI     H
13         Valencia ESP  RU     F
14              PSG FRA  WI     A
15            Porto POR  RU     A
16           Malaga ESP  WI     C
#=============================

draw <- function(x){
  fixtures <- matrix(NA,nrow=8,ncol=2)
  p <- 0
  while(p==0){
    for(j in 1:8){
      k <- 0
      n <- 0
      while(k==0){
        n <- n + 1
        if(n>50){break}
        aa <- x[x[,"pos"]=="RU",]
        t1 <- aa[sample(1:dim(aa)[1],1),]
        bb <- x[x[,"pos"]=="WI",]
        t2 <- bb[sample(1:dim(bb)[1],1),]
        k <- ifelse(t1[,"iso"]!=t2[,"iso"] & t1[,"group"]!=t2[,"group"],1,0)
      }
      fixtures[j,1] <- as.character(t1[,"team"])
      fixtures[j,2] <- as.character(t2[,"team"])
      x <- x[!(x[,"team"] %in% c(as.character(t1[,"team"]),
                                 as.character(t2[,"team"]))),]
    }
    if(n>50){p <- 0}    
    p <- ifelse(sum(as.numeric(is.na(fixtures)))==0,1,0)
  }
  return(fixtures)
}

drawtwo <- function(x){ 
  f1 <- as.vector(unlist(x))
  joinup <- data.frame(team=f1[1:16], iso=f1[17:32],
                       pos=f1[33:48], group=f1[49:64])
  check1 <- data.frame(draw(joinup))
  check2 <- data.frame(draw(joinup))
  rightdraw <- ifelse(sum(na.omit(check1[order(check1),2])==
    na.omit(check2[order(check2),2]))==8, 1, 0)
  return(rightdraw)
}

drawtwo(dat)

dat2 <- rbind(as.vector(unlist(dat)),
              as.vector(unlist(dat)))
dat3 <- dat2[rep(1,1000),]

vals <- 0
for(i in 1:200){
  yy <- apply(dat3, 1, drawtwo)
  vals <- sum(yy) + vals
}
#=============================
# Probability 
> vals/200000
[1] 0.00011
# Odds
> 1/(vals/200000)-1
[1] 9089.909
#=============================

7 thoughts on “Identical Champions League Draw: What Were the Odds?”

Dinre

December 27, 2012 at 8:10 pm

Well, I can’t help myself looking at your work. I just have to provide the theoretical results to compliment your simulation results and to perhaps illuminate how seemingly complex systems like this can be broken down into theoretical probabilities.

There are really several answers to the odds question, and here are the summarized results:

1) If the ORDER of the draw means anything, then drawing each team in a specific order is just a serial set of independent probabilities. Just calculate the probability of each draw in order and multiply the results. These odds are extremely low (about 1 in 169,344,000 – 254,016,000, depending on whether you draw runners-up or winners first).

2) If the order is completely irrelevant, the problem becomes one of counting the allowable permutations of the matches. The odds will be 1 in [number of permutations]. The odds here are very close to the simulation results in the post (1 in 10,926). I used an inelegant, brute-force method, since the brute-force method is easier to understand for anyone reading this.

Now for the curious, hungry, and/or masochistic, here is an R script to actually explain and calculate all these probabilities:

# Load data frame of team data.
draw_Data <- read.csv("UEFA Draw.csv")

# This is what the resulting data looks like:
# Team Country RU_WI Group
#1 Galatasaray TUR RU H
#2 Schalke GER WI B
#3 Celtic SCO RU G
#4 Juventus ITA WI E
#5 Arsenal ENG RU B
#6 Bayern GER WI F
#7 Shakhtar Donetsk UKR RU E
#8 Dortmund GER WI D
#9 Milan ITA RU C
#10 Barcelona ESP WI G
#11 Real Madrid ESP RU D
#12 Man. United ENG WI H
#13 Valencia ESP RU F
#14 PSG FRA WI A
#15 Porto POR RU A
#16 Malaga ESP WI C

get_Draw_Odds <- function(draw_Data, first_Set, second_Set){
# Initialize array of odds for each draw.
odds_Array <- integer(16)

# Initialize final odds variable.
final_Odds <- 1

# Calculate odds for each first set draw
# which have independent and decreasing odds.
# 1/8, 1/7, 1/6, etc.
odds_Array[first_Set] <- 1 / 8:1

# Calculate odds for each second set draw
# restricted to non-matching group
# and non-matching country compared to
# first set draw.
for (i in 1:8){
RU_Country <- draw_Data$Country[first_Set[i]]
RU_Group <- draw_Data$Group[first_Set[i]]

# Filter list of second set to meet the criteria.
valid_Set <- subset(draw_Data[second_Set[i:8],], Country != RU_Country & Group != RU_Group)

# Calculate odds (1/n).
odds_Array[second_Set[i]] <- 1 / nrow(valid_Set)
}

# Calculate theoretical odds of the draw
# happening in exact order.
# This is assuming independent, serial
# probabilities, where
# Pr(A followed by B) = Pr(A) * Pr(B).
for (i in 1:16){
final_Odds <- final_Odds * odds_Array[i]
}
return(final_Odds)
}

# Initialize arrays of odd and even numbers
# for use in referring to runners-up and winners.
# This is really just for my convenience.
runner_Up <- c(1,3,5,7,9,11,13,15)
winner <- c(2,4,6,8,10,12,14,16)

# Here are the results if we draw a runner-up
# and then draw a suitable group winner:
get_Draw_Odds(draw_Data, runner_Up, winner)
# final odds = 3.93676e-09
# or 1 in 254,016,000

# If we draw winners first and then match,
# we get a slightly different result:
get_Draw_Odds(draw_Data, runner_Up, winner)
# final odds = 5.90514e-09
# or 1 in 169,344,000

# Now, if we assume the draw can be performed in any
# order and we are only concerned with the final set
# of match-ups, then this actually becomes fairly simple.
# We just find the number of final permutations.
# Below is a brute-force method.
library(permute)

# For my own sanity, I am breaking the data
# up into runners-up and winners, since every
# runner-up must be matched with a winner.

# This is how new data is envisioned:
# Runners-up
# Team Country RU_WI Group
#1 Galatasaray TUR RU H
#2 Celtic SCO RU G
#3 Arsenal ENG RU B
#4 Shakhtar Donetsk UKR RU E
#5 Milan ITA RU C
#6 Real Madrid ESP RU D
#7 Valencia ESP RU F
#8 Porto POR RU A

# Winners
# Team Country RU_WI Group
#1 Schalke GER WI B
#2 Juventus ITA WI E
#3 Bayern GER WI F
#4 Dortmund GER WI D
#5 Barcelona ESP WI G
#6 Man. United ENG WI H
#7 PSG FRA WI A
#8 Malaga ESP WI C

# Convert all of this into a matrix of
# teams allowed to be matched.
match_Allowed <- matrix(0,8,8)
match_Allowed[1,] <- c(1,1,1,1,1,0,1,1)
match_Allowed[2,] <- c(1,1,1,1,0,1,1,1)
match_Allowed[3,] <- c(0,1,1,1,1,0,1,1)
match_Allowed[4,] <- c(1,0,1,1,1,1,1,1)
match_Allowed[5,] <- c(1,0,1,1,1,1,1,0)
match_Allowed[6,] <- c(1,1,1,0,0,1,1,0)
match_Allowed[7,] <- c(1,1,0,1,0,1,1,0)
match_Allowed[8,] <- c(1,1,1,1,1,1,0,1)

# With runners-up as the rows and winners
# as the columns, the matrix looks like this:
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#[1,] 1 1 1 1 1 0 1 1
#[2,] 1 1 1 1 0 1 1 1
#[3,] 0 1 1 1 1 0 1 1
#[4,] 1 0 1 1 1 1 1 1
#[5,] 1 0 1 1 1 1 1 0
#[6,] 1 1 1 0 0 1 1 0
#[7,] 1 1 0 1 0 1 1 0
#[8,] 1 1 1 1 1 1 0 1

# Create a matrix of all possible permutations.
all_Matches <- allPerms(1:8, observed=TRUE, max=50000)

# I'm going to use vector math, because it
# makes more sense to me. I'm going to convert
# the all_Matches data into 8 different identity
# matrices to represent each of the 8 runners-up.
# Then, I will use each row of the match_Allowed
# matrix as a "mask" on the matching identity
# matrix. I will use the resulting identity
# matrices as an aggregate mask on the original
# all_Matches data. Then, we will just count
# the allowable rows.

# This is a function that checks to see if
# one number is the same as another. If
# true, it returns a 1, and if false, a 0.
identity_Select <- function(identity_Number, data_Number){
if (identity_Number==data_Number){
return(1)
} else {
return(0)
}
}

# Calculate the total number of items in
# the all_Matches matrix.
# We don't want to keep doing this.
total_Items <- length(all_Matches)

# Transfer the all_Matches data into
# individual matrices.
matches_1 <- all_Matches
matches_2 <- all_Matches
matches_3 <- all_Matches
matches_4 <- all_Matches
matches_5 <- all_Matches
matches_6 <- all_Matches
matches_7 <- all_Matches
matches_8 <- all_Matches

# Loop through every item in each matrix,
# converting it into an identity for each
# number 1 through 8. This takes a little bit
# and really isn't efficient.
for (i in 1:total_Items){
matches_1[i] <- identity_Select(1,matches_1[i])
matches_2[i] <- identity_Select(2,matches_2[i])
matches_3[i] <- identity_Select(3,matches_3[i])
matches_4[i] <- identity_Select(4,matches_4[i])
matches_5[i] <- identity_Select(5,matches_5[i])
matches_6[i] <- identity_Select(6,matches_6[i])
matches_7[i] <- identity_Select(7,matches_7[i])
matches_8[i] <- identity_Select(8,matches_8[i])
}

# Use the rows in the matches allowed matrix
# to mask the appropriate identity matrices.
for (i in 1:nrow(all_Matches)){
matches_1[i,] <- matches_1[i,] * match_Allowed[1,]
matches_2[i,] <- matches_2[i,] * match_Allowed[2,]
matches_3[i,] <- matches_3[i,] * match_Allowed[3,]
matches_4[i,] <- matches_4[i,] * match_Allowed[4,]
matches_5[i,] <- matches_5[i,] * match_Allowed[5,]
matches_6[i,] <- matches_6[i,] * match_Allowed[6,]
matches_7[i,] <- matches_7[i,] * match_Allowed[7,]
matches_8[i,] <- matches_8[i,] * match_Allowed[8,]
}

# Merge all of the identity matrices, so we don't have
# to keep track of them all.
matches_Merged <- matches_1 + matches_2 + matches_3 + matches_4 + matches_5 + matches_6 + matches_7 + matches_8

# Filter disallowed matches from the all_Matches matrix.
filtered_Matches <- all_Matches * matches_Merged

# Initialize counter for the number of allowable
# permutations.
allowed_Permutations 0){
allowed_Permutations <- allowed_Permutations + 1
}
}

# Here is the result:
# allowed_Permutations = 10,926
# so the odds are 9.15248e-05
# or 1 in 10,926

Reply
- diffuseprior
  
  December 27, 2012 at 8:39 pm
  
  Thanks for the comment Dinre.
  
  Reply
Dinre

December 28, 2012 at 7:52 pm

A small correction for posterity. I noticed that the end of my R script didn’t paste in correctly and should read like so:

# Initialize counter for the number of allowable
# permutations.
allowed_Permutations 0){
allowed_Permutations <- allowed_Permutations + 1
}
}

Also, I re-ran the script and received a new result! I don't know what I did the first time, but something was apparently off in the permutations file. The "real" answer is half the originally posted answer: 1 in 5463. I guess the bookies were actually pretty close.

Reply
amit

December 30, 2012 at 1:32 pm

You may wonder why your MC simulation result differs from the analytic result (1:5463. Dinre is correct, except that there is an easier way to compute the result if you know that what your are looking for is the value of the permanent of the constraint matrix Dinre showed).
Your code is correct, except that it doesn’t sample draws uniformly from the possible draw space. when you “build” a random draw you try to assign teams randomly, until you find a legit draw. but some legit draws are more likely to be found than others, or in other words, some initial values are more likely to later fail than others. This results in a distorted distribution of the sample space.

Reply
- diffuseprior
  
  December 30, 2012 at 4:34 pm
  
  Thanks for the comment but I’m confused. Do you mean that certain random seed numbers are more likely to fail than others, or that my function is not replicating the draw correctly?
  
  Reply
  - amit
    
    December 31, 2012 at 9:01 pm
    
    Let’s consider an example. Suppose your objects are not as complex as the uefa draw (the matching of 16 teams into 8 pairs under constraints), for example increasing series of 10 integers each chosen from between 1 and 100. suppose you are interested in estimating the probability that the number 5 appears in a series, using MC method. So you start building an increasing series. you first choose the smallest, and to avoid any future conflict you draw it from 1:91, let’s call it x1. then you use your first choice to help you draw the second number from the range (x1+1):92. and x3 is chosen from (x2+1):93, and so on.
    Surely, you’ll get a random draw of an increasing series. but does this draw has the same distribution of the distribution you are trying to estimate?
    because you build your object of interest sequentially, and because a choice made at step X might affect the choices that remain available at step Y (Y>X), then your random draw of objects might have a distribution that is dependent on the how you build your objects, and not only the kind and number of “legit” objects.
    In the case of increasing series – the way I built it will result with extremely high probabilities for numbers greater than 90, while in reality the probability of any number is equal. So might be the case of your drawing algorithm that builds the match sequentially.
    
    You could argue that the way you built it is exactly how the real draw goes, in which case I’ll admit that your simulation is better than my (and others’) formal mathematical solution/counting.
Mehmet Suzen

January 7, 2013 at 10:00 am

I did simple direct monte carlo simulations on this before the draws.
Here is my post dated 7.12.2012:
http://memosisland.blogspot.de/2012/12/uefa-champions-league-knockout-phase.html

This is a direct simulation that repeatedly creates possible pairs randomly, I run this 20M times.

Reply

DiffusePrioR

Identical Champions League Draw: What Were the Odds?

7 thoughts on “Identical Champions League Draw: What Were the Odds?”

Leave a comment Cancel reply

Share this:

Related

7 thoughts on “Identical Champions League Draw: What Were the Odds?”

Leave a comment Cancel reply