Discrete Probability

Michael Taylor


Combinations and Permutations

Construct a deck of cards using R. For this we will use the function expand.grid() and paste(). The function paste is used to create strings by joining smaller strings. For example, if we have the number and the suit for a card, in 2 different variables we can create the card name using Paste like this.

number <- "Three"
suit <- "Hearts"
paste(number, suit)
## [1] "Three Hearts"

It also works on pairs of vectors. So if we type this line of code, we get the following result.

paste(letters[1:5], as.character(1:5))
## [1] "a 1" "b 2" "c 3" "d 4" "e 5"

The function expand.grid() gives us all the combination of 2 lists. So for example, if you have blue and black pants and white, gray, and plaid shirt, all your combinations can be computed using the expand.grid function like this. You can see all 6 combination.

expand.grid(pants=c("blue", "black"), 
            shirt=c("white", "grid", "plaid"))
##   pants shirt
## 1  blue white
## 2 black white
## 3  blue  grid
## 4 black  grid
## 5  blue plaid
## 6 black plaid

So here’s how we generate a deck of cards. We define the four suits, we define the 13 numbers, and then we create the deck using expand.grid and then pasting together the 2 columns that expand grid creates.

suits <- c("Diamonds", "Clubs", "Hearts", "Spades")
numbers <- c("Ace", "Deuce", "Three", "Four", "Five", "Six", "Seven",
             "Eight", "Nine", "Ten", "Jack", "Queen", "King")
deck <- expand.grid(number=numbers, suit=suits)
( deck <- paste(deck$number, deck$suit) )
##  [1] "Ace Diamonds"   "Deuce Diamonds" "Three Diamonds" "Four Diamonds" 
##  [5] "Five Diamonds"  "Six Diamonds"   "Seven Diamonds" "Eight Diamonds"
##  [9] "Nine Diamonds"  "Ten Diamonds"   "Jack Diamonds"  "Queen Diamonds"
## [13] "King Diamonds"  "Ace Clubs"      "Deuce Clubs"    "Three Clubs"   
## [17] "Four Clubs"     "Five Clubs"     "Six Clubs"      "Seven Clubs"   
## [21] "Eight Clubs"    "Nine Clubs"     "Ten Clubs"      "Jack Clubs"    
## [25] "Queen Clubs"    "King Clubs"     "Ace Hearts"     "Deuce Hearts"  
## [29] "Three Hearts"   "Four Hearts"    "Five Hearts"    "Six Hearts"    
## [33] "Seven Hearts"   "Eight Hearts"   "Nine Hearts"    "Ten Hearts"    
## [37] "Jack Hearts"    "Queen Hearts"   "King Hearts"    "Ace Spades"    
## [41] "Deuce Spades"   "Three Spades"   "Four Spades"    "Five Spades"   
## [45] "Six Spades"     "Seven Spades"   "Eight Spades"   "Nine Spades"   
## [49] "Ten Spades"     "Jack Spades"    "Queen Spades"   "King Spades"

Now that we have a deck constructed, we can now start answering questions about probability. Let’s double-check that the probability of a king in the first card is 1 in 13. We simply compute the proportion of possible outcomes that satisfy our condition. So we create a vector that contains the four ways we can get a king.

note: %in% is a binary operator, which returns a logical vector indicating if there is a match or not for its left operand.

## create a vector that contains the four ways we can get a king.
kings <- paste("King", suits)
mean(deck %in% kings)
## [1] 0.07692308

Now, how about the conditional probably of the second card being a king, given that the first was a king?
\[\Pr(A\ and\ B)=\Pr(A)\times\Pr(B\mid A)\]

Earlier we deduced that if 1 king is already out, then there’s 51 left. So the probability is 3 in 51. Let’s confirm by listing out all possible outcomes.

The function combinations() and permutaions() will be used from the gtools package. The permutation functions computes for any list of size n all the different ways we can select r items.

Below is all the ways we can choose two numbers from the list c(1:5).

permutations(5, 2)
##       [,1] [,2]
##  [1,]    1    2
##  [2,]    1    3
##  [3,]    1    4
##  [4,]    1    5
##  [5,]    2    1
##  [6,]    2    3
##  [7,]    2    4
##  [8,]    2    5
##  [9,]    3    1
## [10,]    3    2
## [11,]    3    4
## [12,]    3    5
## [13,]    4    1
## [14,]    4    2
## [15,]    4    3
## [16,]    4    5
## [17,]    5    1
## [18,]    5    2
## [19,]    5    3
## [20,]    5    4

Optionally for this function we can add a vector. So for example, if you want to see 5 random 7-digit phone numbers out of all possible phone numbers, you could type code like this. Here we’re defining a vector of digits that goes from 0 to 9 rather than 1 through 10. So these four lines of code generate all phone numbers, picks 5 at random, and then shows to you.

all_phone_numbers <- permutations(10, 7, v = 0:9)
n <- nrow(all_phone_numbers)
index <- sample(n ,5)
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,]    8    1    7    3    0    6    5
## [2,]    7    1    2    4    6    8    9
## [3,]    8    6    1    9    0    4    5
## [4,]    3    0    4    8    2    6    9
## [5,]    5    9    8    3    2    0    4

To compute all possible ways that we can choose 2 cards when the order matters, we simply type the following piece of code. Here we use permutations. There’s 52 cards, we’re going to choose 2, and we’re going to select them out of the vector that includes our card names, which we called deck earlier. This is going to be a matrix with 2 dimensions, 2 columns, and in this case, each column is going to have 2,652 rows. Those are all the permutations.

Now, we’re going to define the first card and the second card by grabbing the first and second columns using this simple piece of code.

hands <- permutations(52, 2, v = deck)
first_card <- hands[,1]
second_card <- hands[,2]
sum(first_card %in% kings)
## [1] 204

And now we can, for example, check how many cases have a first card that is a king– that’s 204.

And now to find the conditional probability, we ask what fraction of these 204 have also a king in the second card. So this case we type the following piece of code.

( king_second <- sum(first_card %in% kings & second_card %in% kings) / sum(first_card %in% kings) )
## [1] 0.05882353

We add all the cases that have king in the first, king in the second, and divide by the cases that have a king in the first. And now we get the answer 0.0588235, which is exactly 3 out of 51, which we had already deduced.

Note that the code we just saw is equivalent to this piece of code where we compute the proportions instead of the totals.

mean(first_card %in% kings & 
       second_card %in% kings) / 
  mean(first_card %in% kings)
## [1] 0.05882353

This is in R version of the multiplication rule, which tells us that probably of P, given A, is equal to proportion of A and B, or the probability of A and B, divided by the proportion of \(A\) or the probability of \(A\).

\[\Pr(B \mid A)=\frac{\Pr(A\ and\ B)}{\Pr(A)}\]

Now, what if the order does not matter? For example, in blackjack, if you get an ace and a face card or a 10, it’s called a natural 21, and you win automatically. If we want to compute the probability of this happening, we want to enumerate the combinations, not permutations, since the order doesn’t matter.

So if we get an \(A\) and a king, king and an \(A\), it’s still a 21. We don’t want to count that twice. So notice the difference between the permutations functions, which lists all permutations,

permutations(3, 2)
##      [,1] [,2]
## [1,]    1    2
## [2,]    1    3
## [3,]    2    1
## [4,]    2    3
## [5,]    3    1
## [6,]    3    2

and the combination function, where order does not matter.

##      [,1] [,2]
## [1,]    1    2
## [2,]    1    3
## [3,]    2    3

This means that 2, 1 doesn’t appear because 1, 2 already appeared. Similarly, 3, 1 and 3, 2 don’t appear.

So to compute the probability of a natural 21 in blackjack, we can do this. We can define a vector that includes all the aces, a vector that includes all the face cards, then we generate all the combinations of picking 2 cards out of 52, and then we simply count.

aces <- paste("Ace", suits)

facecard <- c("King", "Queen", "Jack", "Ten")
facecard <- expand.grid(number=facecard, suit=suits)
facecard <- paste( facecard$number, facecard$suit)

hands <- combinations(52, 2, v = deck)

How often do we get aces and a face card?

( mene <- mean(hands[,1] %in% aces & hands[,2] %in% facecard) )
## [1] 0.04826546

The answer is 0.0482655.

The Birthday Problem

Suppose you’re in a classroom with 50 people. If we assume this is a randomly selected group, what is the chance that at least two people have the same birthday? Although it is somewhat advanced, we can actually deduce this mathematically, and we do this later, but now, we’re going to use Monte Carlo simulations. For simplicity, we assumed that nobody was born on February 29th. This actually doesn’t change the answer much. All right, first, note that birthdays can be represented as numbers between 1 and 365.

So a sample of 50 random birthdays can be obtained simply using the sample function, like this.

n <- 50
bdays <- sample(1:365, n, replace = TRUE)

To check if, in this particular set of 50 people we have at least two with the same birthday, we can use the function duplicated, which returns true whenever an element of a vector has already appeared in that vector.

So here’s an example. If I type duplicated 1, 2, 3, 1, 4, 3, 5,


we get true’s for the 1 and the 3 the second time they appear.

So to check if two birthdays were the same, we simply use any and duplicated functions, like this.

## [1] TRUE

And in the case that we just generated, we actually do have this happening. We did have two people, at least two people, with the same birthday. We get true.

Now, to estimate the probability, we’re going to repeat this experiment. We’re going to run a Monte Carlo simulation over and over again. So what we do is we do it 10,000 times.

We use the replicate function like this.

B <- 10000
results <- replicate(B, {
  bdays <- sample(1:365, n, replace = TRUE)

## [1] 0.9689

And when we’re done, we get that the probability of having two people, at least two people, with the same birthday in a group of 50 people is about 96%. Were you expecting this to be so high? People tend to underestimate these probabilities, so it might be an opportunity for you to bet and make some money off of your friends. To get an intuition as to why this is so high, think of what happens as you get close to 365 people. At this stage, we run out of days, and the probability is 1. In the next video, we’re going to actually compute this probability for different group sizes and see how fast it gets close to 1.


Say you want to use what you’ve just learned about the birthday problem to bet with friends about two people having the same birthday in a group of people. When are the chances larger than 50%? Larger than 75%? Let’s create a lookup table. We can quickly create a function to compute this for any group. We write the function like this. We’ll call it compute_prob, and we’ll basically make the calculations for the probability of two people having the same birthday.

compute_prob <- function(n, B=10000){
  same_day <- replicate(B, {
    bdays <- sample(1:365, n, replace=TRUE)

Well, we use a small Monte Carlo simulation to do it. Now that we’ve done this, we want to compute this function, we want to apply this function to several values of n, let’s say from 1 to 60. Let’s define n as a sequence starting at 1 and ending at 60. Now, we can use a for loop to apply this function to each value in n, but it turns out that for loops are rarely the preferred approach in R.

In general, we try to perform operations on entire vectors. Arithmetic operations, for example, operate on vectors in an element wise fashion. We saw this when we learned about R. So if we type x equals 1 through 10, now X is the vector starting at 1 and ending at 10, and we compute the square root of x, it actually computes the square root for each element.

x <- 1:10
##  [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751
##  [8] 2.828427 3.000000 3.162278

Equally, if we define y to be 1 through 10, and then multiply x by y, it multiplies each element 1 by 1– 1 times 1, 2 times 2, et cetera.

y <- 1:10
x * y
##  [1]   1   4   9  16  25  36  49  64  81 100

So there’s really no need for for loops. But not all functions work this way. You can’t just send a vector to any function in R. For example, the function we just wrote does not work element-wise since it’s expecting a scalar, it’s expecting an end.

This piece of code does not do what we want. If we type compute prob and send it the vector n, we will not get what we want. We will get just one number.

n <- seq(1, 60)
## [1] 0

What we can do instead is use the function sapply. sapply permits us to perform element-wise operations on any function. Here’s how it works. We’ll define a simple example for the vector 1 through 10. If we want to apply the square roots of each element of x, we can simply type sapply x comma square root, and it’ll apply square root to each element of x.

x <- 1:10
sapply(x, sqrt)
##  [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751
##  [8] 2.828427 3.000000 3.162278

We don’t need to do this because square root already does that, but we are using it as a simple example. So for our case, we can simply type prob equals sapply n– n is our vector–

prob <- sapply(n, compute_prob)
plot(n, prob)

and then the function we define compute prob. And this will assign to each element of prob the probability of two people having the same birthday for that end. And now we can very quickly make a plot. We plot the probability of two people having the same birthday against the size of the group.

Now, let’s compute the exact probabilities rather than use Monte Carlo simulations. The function we just define uses a Monte Carlo simulation, but we can use what we’ve learned about probability theory to compute the exact value. Not only do we get the exact answer using math, but the computations are much faster since we don’t have to generate experiments.

We simply get a number. To make the math simpler for this particular problem, instead of computing the probability of it happening, we’ll compute the probability of it not happening, and then we can use a multiplication rule. Let’s start with the first person. The probability that person 1 has a unique birthday is 1, of course.

Now let’s move on to the second one. The probability that the second person has a unique birthday given that person 1 already took one other day is 364 divided by 365. Then for a person 3, given that the first two people already have unique birthdays, that leaves 363. So now that probability is 363 divided by 365.

If we continue this way and find the chances of all, say, 50 people having unique birthdays, we would multiply 1 times 364 divided by 365, times 363 divided by 365, dot dot dot, all the way to the 50th element.

Here’s the equation: \[1\times\frac{364}{365}\times\frac{363}{365}\cdots\frac{365-n+1}{365}\]

Now, we can easily write a function that does this. This time we’ll call it exact_prob. It takes n as a argument, and it computes this probability using the simple code. Now we can compute each probably for each n using sapply again, like this.

exact_prob <- function(n){
  prob_unique <- seq(365, 365-n+1) / 365
  1 - prod(prob_unique)
eprob <- sapply(n, exact_prob)

And if we plot it, we can see that the Monte Carlo simulations were almost exactly right.

plot(n, prob)
lines(n, eprob, col="red")

They were almost exact approximations of the actual value. Now, notice had it not been possible to compute the exact probabilities, something that sometimes happens, we would have still been able to accurately estimate the probabilities using Monte Carlo.

How many Monte Carlo experiments are enough?

In practice, we won’t know what the answer is, so we don’t know if our Monte Carlo estimate is accurate. We know that the larger the number of experiments, we’ve been using the letter B to represent that, the better the approximation. But how big do we need it to be? This is actually a challenging question, and answering it often requires advanced theoretical statistics training. One practical approach we will describe here is to check for the stability of the estimate.

Here’s an example with the birthday problem. We’re going to use n equals 22. There’s 22 people. So we’re going to run a simulation where we compute or estimate the probability of two people having a certain birthday using different sizes of the Monte Carlo simulations. So the value of b going to go from 10, to 20, to 40, to 100, et cetera. We compute the simulation, and now we look at the values that we get for each simulation.

B <- 10^seq(1, 5, len=100)
compute_prob <- function(B, n=22){
  same_day <- replicate(B, {
    bdays <- sample(1:365, n, replace = TRUE)

Remember, each simulation has a different B, a different number of experiments. When we see this graph, we can see that it’s wiggling up and down.

prob <- sapply(B, compute_prob)
plot(log10(B), prob, type = "l")

That’s because the estimate is not stable yet. It’s not such a great estimate. But as b gets bigger and bigger, eventually it starts to stabilize. And that’s when we start getting a feeling for the fact that now perhaps we have a large enough number of experiments.

The Addition Rule

The additional rule tells us that the probability of A or B, we’re going to have to make this calculation now, because you can get to 21 in two ways. You can get either a face card and then an ace. Or you can get an ace and then a face card. The additional rule tells us the probability of A or B is the probably of A plus the probably a B minus the probability of A and B.

To understand why this makes sense, think of a Venn diagram. If we’re computing the probability of this whole thing happening, A or B, we can add the blue circle plus the pink circle, and then subtract the middle because we have added it twice by adding the blue plus the pink.

venn.plot <- draw.pairwise.venn(area1      = 15,
                                area2      = 15,
                                cross.area = 5,
                                lty = "blank",
                                cex = 0,
                                fill = c("pink", "lightblue"),
                                category   = c("A", "B"))

\[\Pr(A\ or\ B)=\Pr(A)+\Pr(B)-\Pr(A\ and\ B)\] So now let’s apply it to the natural 21. In the case of natural 21, the intersection is empty. Since both hands can’t happen, you can’t have both an ace and then a face card, and then at the same time have a face card and then an ace. Those two things can’t happen at the same time. So this will be a very easy application of the addition rule. The probably of an ace followed by a face card we know is \(\frac{1}{13}\times\frac{16}{51}\). And the probability of a face card followed by an ace is \(\frac{16}{52}\times\frac{4}{51}\). These are actually the same, which makes sense to the symmetry. These two values are actually the same. In any case, we get the same result that we got before for the natural 21, which is about 0.05.

{r echo=FALSE} blogdown::shortcode('tweet', '852205086956818432')