# Random Variables, Sampling Models, and the Central Limit Theorem

## Random Variables

• Random variables are numeric outcomes resulting from a random process.

We can easily generate random variables using some of the simple examples we have shown, such as the red and blue bead urn. For example, define x to be 1 if a bead is blue, and red otherwise. Here’s a R code you can write to generate that random variable. X is a random variable.

beads <- rep(c("red", "blue"), c(2,3))
X <- ifelse(sample(beads, 1) == "blue", 1, 0)

Every time we select a new bead, the outcome changes randomly. Sometimes it’s 1, sometimes it’s 0. Here’s some examples of how that random variable changes. We are going to do it three times.

replicate(3, {
})
## [1] 1 1 1

In data science, we often deal with data that is affected by chance in some way. The data comes from a random sample, the data is affected by measurement error, or the data measures some outcome that is random in nature. Being able to quantify the uncertainty introduced by randomness is one of the most important jobs of a data scientist. Statistical inference offers a framework for doing this, as well as several practical tools. The first step is to learn how to mathematically describe random variables. We start with games of chance as an illustrative example. Randomized experiments can also be modeled by draws from an urn. Given the way individuals assighned to groups, you can draw your groups randomly.

## Sampling Models

In epidemiological studies, we often assume that the subjects in our study are a random sample from the population of interest. The data related to a specific outcome can be modeled as a random sample from an urn containing the values for those outcomes for the entire population of interest. Similarly, in experimental research, we often assume that the individual organisms we are studying– for example, worms, flies, or mice– are a random sample from a larger population.

We going to start by assuming we have been hired by a small casino to determine if they can make money by installing a roulette wheel. We’re going to define a random variable, capital S, that wil represent the casino’s total winnings. Let’s start by constructing the urn, the urn we use for our sampling model. A roulette wheel has 18 red pockets, 18 black pockets, and 2 green ones. So playing a color in one game of roulette is equivalent to drawing from this urn.

Let’s write some code. There’s 18 black, 18 red, and 2 green.

color <- rep(c('Red', 'Black', 'Green'), c(18, 18, 2))

The 1,000 outcomes from 1,000 people playing are independent draws from this urn. If red comes up, the gambler wins, and the casino loses 1 dollar, so we draw a negative 1. Otherwise, the casino wins 1 dollar, and we draw a 1. We can code 1,000 independent draws using the following code.

n <- 1000
X <- sample(ifelse(color == 'Red', -1, 1), n, replace = TRUE)

Here are the first 10 outcomes of these 1,000 draws.

X[1:10]
##  [1] -1  1  1 -1  1  1  1 -1 -1  1
sum(X)
## [1] 52

Because we know the proportions of 1’s and negative 1’s, inside the urn, we can generate the draws with one line of code, without defining color. Here’s that line of code. We call this approach a sampling model, since we are modeling the random behavior of a roulette with the sampling of draws from an urn.

# sampling model
X <- sample(c(-1, 1), n, replace = TRUE, prob = c(9/19, 10/19))

The total winnings, capital S, is simply the sum of these 1,000 independent draws. So here’s a code that generates an example of S.

(S <- sum(X))
## [1] 58

If you run that code over and over again, you see that S changes every time. This is, of course, because S is a random variable.

n <- 1000
replicate(5, {
X <- sample(c(-1, 1), n, replace = TRUE, prob = c(9/19, 10/19))
sum(X)
})
## [1] 78 78 88 28 68

This is, of course, because S is a random variable. A very important and useful concept is the probability distribution of the random variable. The probability distribution of a random variable tells us the probability of the observed value falling in any given interval. So for example, if we want to know the probability that we lose money, we’re asking, what is the probability that S is in the interval $$S\leq 0$$?

Note that if we can define a commutative distribution function– let’s call it $$F(a)=Pr(S\leq a)$$ – then we’ll be able to answer any question related to the probability of events defined by a random variable S, including the event $$S<0$$. We call $$F$$ the random variable’s distribution function. We can estimate the distribution function for the random variable S by using a Monte Carlo simulation to generate many, many realizations of the random variable. With the code we’re about to write, we run the experiment of having 1,000 people play roulette over and over. Specifically, we’re going to do this experiment 10,000 times. Here’s the code.

n <- 1000
B <- 10000
S <- replicate(B, {
X <- sample(c(-1, 1), n, replace = TRUE, prob = c(9/19, 10/19))
sum(X)
})

So now we’re going to ask, in our simulation, how often did we get sums smaller or equal to $$a$$? We can get the answer by using this simple code, mean(S <= a).

This will be a very good approximation of $$F(a)$$, our distribution function. In fact, we can visualize the distribution by creating a histogram showing the probability $$F(b)$$ minus $$F(a)$$ for several intervals ab. Here it is.

library(dplyr)
library(ggplot2)
Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

filter, lag

The following objects are masked from ‘package:base’:

intersect, setdiff, setequal, union
ggplot(as.data.frame(S), aes(S)) +
geom_histogram(aes(y = ..density..),
color='black',
binwidth = 10) +
ylab("Probability")

Now we can easily answer the casino’s question, how likely is it that we lose money? We simply ask, how often was S, out of the 10,000 simulations, smaller than 0? And the answer is, it was only 4.5% of the time.

mean(S <= 0)
## [1] 0.0529

So it’s quite low. From the histogram, we also see that that distribution appears to be approximately normal. If you make a Q-Q plot, you’ll confirm that the normal approximation is close to perfect.

ggplot(as.data.frame(S), aes(sample=S )) +
stat_qq() +
stat_qq_line()

If, in fact, the distribution is normal, then all we need to define is the distribution’s average and standard deviation.

Because we have the original values from which the distribution is created, we can easily compute these. The average is 52.7186, and the standard deviation is 31.5898. If we add a normal density with this average and standard deviation to the histogram we created earlier, we see that it matches very well.

s <- seq(min(S), max(S), length.out = 100)
normal_density <- data_frame(s=s, f=dnorm(s, mean(S), sd(S)))

ggplot(data = normal_density, aes(s, f)) +
geom_line(color="blue")

data_frame(S=S) %>% ggplot(aes(x=S, y=..density..)) +
geom_histogram(color="black", binwidth = 10) +
geom_line(data = normal_density, aes(s, f), color="blue")+
ylab("probability")

This average and this standard deviation have special names. They are referred to as the expected value and the standard error of the random variable S.

We will say more about this in the next section. It actually turns out that statistical theory provides a way to derive the distribution of a random variable defined as independent draws from an urn. Specifically, in our example, we can show that (S + n) / 2 follows what is known as a binomial distribution. We therefore do not need to run Monte Carlo simulations, nor use the normal approximation, to know the probability distribution of S.

We did this for illustrative purposes. For the details of the binomial distribution, you can consult any probability book, or even Wikipedia. However, we will discuss an incredibly useful approximation provided by mathematical theory that applies generally to sums of averages of draws from any urn, the central limit theorem.

## Central Limit Theorem

The Central Limit Theorem or the CLT for short tells us that when the number of independent draws– also call the sample size– is large, the probability distribution of the sum of these draws is approximately normal. Because sampling models are used for so many data generation processes, the CLT is considered one of the most important mathematical insights in history.

If we know that the distribution of a list of numbers is approximated by the normal distribution, all we need to describe the list are the average and the standard deviation. We also know that the same applies to probability distributions. If a random variable has a probability distribution that is approximated with the normal distribution, then all we need to describe that probability distribution are the average and the standard deviation. Referred to as the expected value and the standard error.

We have described sampling models for draws. We will now go over the mathematical theory that lets us approximate the probability distribution for the sum of draws. Once we do this, we will be able to help the Casino predict how much money they will make. The same approach we use for sum of the draws will be useful for describing the distribution of averages and proportions, which we will need to understand, for example, how polls work. The first important concept to learn is the expected value. In statistics books, it is common to use the letter capital E, like this, $$E[X]=\mu$$, to denote that the expected value of the random variable $$X$$ is $$\mu$$. $$\mu$$ is a Greek letter for M, which is the first letter in the word mean, which is a synonym with average.

A random variable will vary around an expected value in a way that if you take the average of many, many draws, the average of the draws will approximate the expected value. Getting closer and closer the more draws you take. A useful formula is that the expected value of the random variables defined by one draw is the average of the numbers in the urn. For example, in our urn used to model betting on red on roulette, we have twenty 1’s and 18 negative 1’s. So the expected value is $$E[X]=(10+(-9))/19$$. Which is about $0.05. It is a big counterintuitive to say that X varies around 0.05 when the only values it takes is 1 and minus 1. An intuitive way to think about the expected value is that if we play the game over and over, the Casino wins, on average,$0.05 per game. Our Monte Carlo Simulation confirms this.

B <- 10^6
X <- sample(c(-1, 1), B, replace = TRUE, prob = c(9/19, 10/19))
mean(X)
## [1] 0.05249

Here we run a million games and we see that the mean of X, which is a bunch of 0’s and 1’s, is about $0.0525. In general, if the urn has just two possible outcomes– say, a and b, with proportions p and 1 minus p respectively, the average is: $ap + b(1 - p)$. Now, the reason we define the expected value is because this mathematical definition turns out to be useful for approximating the probability distribution of sums, which in turn, is useful to describe the distribution of averages and proportions. The first useful fact is that the expected value of the sum of draws is the $\text{number of draws} \times \text{the average of the numbers in the urn}$. So if 1,000 people play roulette, the Casino expects to win, on average, 1,000 times$0.05, which is $50. But this is an expected value. How different can one observation be from the expected value? The Casino really wants to know this. What is the range of possibilities? If negative numbers are too likely, we may not install the roulette wheels. So the statical theory, once again, answers this question. The standard error, or SE for short, gives us an idea of the size of the variation around the expected value. In statistics books, it is common to use $$SE[X]$$ to know the standard error of the random variable $$X$$ If our draws are independent– That’s an important assumption– then the standard error of the sum is given by the equation, the square root of the $\sqrt{\text{number of draws}}\times \text{the standard deviation of the numbers in the urn}$. Using the definition of standard deviation, we can derive with a bit of math, that if an urn contains two values– a and b, with proportions p and 1 minus p, respectively– the standard deviation is the absolute value of $$b$$ minus $$a$$ times the square root of p times 1 minus p. $\text{standard deviation}=|b-a|\sqrt{p(1-p)}$ So in our roulette example, the standard deviation of the values inside the urn is $|1-(-1)|\sqrt{10/19 \times 9/19}$. abs(1 - -1) * sqrt((10/19) * (9/19)) ## [1] 0.9986 practically 1. The standard error tells us the typical difference between a random variable and its expectation. So because 1 draw is obviously the sum of 1 draw, we can use a formula to calculate that the random variable defined by 1 draw has an expected value of$0.05 and a standard error of about 1. This makes sense since we either get a 1 or a minus 1 with 1 slightly favored over the minus 1. Using the formula, the sum of 1,000 people playing has standard error of about $32. So when 1,000 people bet on red, the Casino is expected to win$50 with a standard error of $32. So it seems like a safe bet. But we still really can’t answer the question– How likely is the Casino to lose money? Here The Central Limit Theorem will help. The central limit theorem tells us that the distribution of the sum of S is approximated by a normal distribution. n * (10 - 9) / 19 ## [1] 52.63 sqrt(n) * 2 * sqrt(90)/19 ## [1] 31.58 Using the formula, we know that the expected value and standard errors are$53 and 32, respectively. Note that the theoretical values match those obtained with the Monte Carlo simulation we ran earlier. Using the Central Limit Theory, we can skip the Monte Carlo simulation and instead, compute the probability of the Casino losing money using the approximation. We write the simple code using the pnorm function and we get the answer. mu <- n*(20-18)/38 se <- sqrt(n) * 2 * sqrt(90)/19 pnorm(0, mu, se) ## [1] 0.04779 It’s about 5%. Which, is in very good agreement– with the Monte Carlo simulation we ran. ## Averages and Proportions ### Mathematical properties 1. The expected value of the sum of random variables is the sum of the expected values of the individual random variables. \begin{align*} E[X_1+X_2+\cdots +X_n]&=E[X_1]+E[X_2]+\cdots+E[X_n] \\ &=n\mu \end{align*} This another way of writing the sum of draws. 1. The expected value of a random variable times a non-random constant is the expected value times that non-random constant. $E[aX]=a\times E[X]$ A consequence of these two facts that we described is that the expected value of the average of draws from the same urn is the expected value of the urn, call it $$\mu$$. \begin{align*}E[(X_1+X_2+\cdots +X_n)/n]&=E[X_1+X_2+\cdots+X_n]/n \\ &=n\mu/n \\ &=\mu\end{align*} 1. The square of the standard error of the sum of independent random variable is the sum of the square of the standard error of each random variable. $SE[X_1+X_2+\cdots +X_n]=\sqrt{SE[X_1]^2+SE[X_2]^2+\cdots+SE[X_n]^2}$ The sum of the standard error squared is known as the variance. 1. The standard error of a random variable times a non-random constant is the standard error times the non-random constant. $SE[aX]=a\times SE[X]$ A consequence of these previous two properties is the standard error of the average of independent draws from the same urn is the standard deviation of the urn, $$\sigma$$, divided by the square root of n. \begin{align*} SE[(X_1+X_2+\cdots +X_n)/n]&=\sqrt{(SE[X_1]^2+SE[X_2]^2+\cdots+SE[X_N]^2)}/n \\ &=\sqrt{(\sigma_1^2+\sigma_2^2+\cdots \sigma_n^2)}/n \\ &=\sqrt{n\sigma^2}/n \\ &=\sigma/n \end{align*} 1. I $$X$$ is a normally distributed random variable, then if $$a$$ and $$b$$ are non-random constants, $$aX + b$$ is also a normally distributed random variable. #### Exercise 1. American Roulette probabilities An American roulette wheel has 18 red, 18 black, and 2 green pockets. Each red and black pocket is associated with a number from 1 to 36. The two remaining green slots feature “0” and “00”. Players place bets on which pocket they think a ball will land in after the wheel is spun. Players can bet on a specific number (0, 00, 1-36) or color (red, black, or green). What are the chances that the ball lands in a green pocket? # The variables green, black, and red contain the number of pockets for each color green <- 2 black <- 18 red <- 18 # Assign a variable p_green as the probability of the ball landing in a green pocket p_green <- green / (green + black + red) # Print the variable p_green to the console p_green ## [1] 0.05263 #### Exercise 2. American Roulette payout In American roulette, the payout for winning on green is17. This means that if you bet $1 and it lands on green, you get$17 as a prize.

Create a model to predict your winnings from betting on green.

# Use the set.seed function to make sure your answer matches the expected result after random sampling.
set.seed(1)

# Assign a variable p_not_green as the probability of the ball not landing in a green pocket
p_not_green <- 1 - p_green

#Create a model to predict the random variable X, your winnings from betting on green.
X <- sample(c(1, -1), 1, prob = c(p_green, p_not_green))

# Print the value of X to the console
X
## [1] -1

In American roulette, the payout for winning on green is $17. This means that if you bet$1 and it lands on green, you get $17 as a prize. In the previous exercise, you created a model to predict your winnings from betting on green. Now, compute the expected value of X, the random variable you generated previously. $ap + b(1 - p)$ $$a=17$$ and $$b=1$$, with proportions p equal p_green and 1 minus p equal to p_not_greenrespectively # Calculate the expected outcome if you win$17 if the ball lands on green and you lose $1 if the ball doesn't land on green # p_not_green <- 1 - p_green p_green * 17 + -1 * p_not_green ## [1] -0.05263 #### Exercise 4. American Roulette standard error The standard error of a random variable X tells us the difference between a random variable and its expected value. You calculated a random variable X in exercise 2 and the expected value of that random variable in exercise 3. Now, compute the standard error of that random variable, which represents a single outcome after one spin of the roulette wheel. # Compute the standard error of the random variable abs(17 - -1) * sqrt(p_green * (1 - p_green)) ## [1] 4.019 #### Exercise 5. American Roulette sum of winnings You modeled the outcome of a single spin of the roulette wheel, X, in exercise 2. Now create a random variable S that sums your winnings after betting on green 1,000 times. # Use the set.seed function to make sure your answer matches the expected result after random sampling set.seed(1) # Define the number of bets using the variable 'n' n <- 1000 # Create a vector called 'X' that contains the outcomes of 1000 samples X <- sample(c(17, -1), size = n, replace=TRUE, prob = c(p_green, p_not_green)) # Assign the sum of all 1000 outcomes to the variable 'S' S <- sum(X) # Print the value of 'S' to the console S ## [1] -10 #### Exercise 6. American Roulette winnings expected value In the previous exercise, you generated a vector of random outcomes, S, after betting on green 1,000 times. What is the expected value of S? # The variables 'green', 'black', and 'red' contain the number of pockets for each color green <- 2 black <- 18 red <- 18 # Assign a variable p_green as the probability of the ball landing in a green pocket p_green <- green / (green+black+red) # Assign a variable p_not_green as the probability of the ball not landing in a green pocket p_not_green <- 1-p_green # Define the number of bets using the variable 'n' n <- 1000 # Calculate the expected outcome of 1,000 spins if you win$17 when the ball lands on green and you lose $1 when the ball doesn't land on green 1000 * (p_green*17 + p_not_green * -1) ## [1] -52.63 #### Exercise 7. American Roulette winnings expected value You generated the expected value of S, the outcomes of 1,000 bets that the ball lands in the green pocket, in the previous exercise. What is the standard error of S? # The variables 'green', 'black', and 'red' contain the number of pockets for each color green <- 2 black <- 18 red <- 18 # Assign a variable p_green as the probability of the ball landing in a green pocket p_green <- green / (green+black+red) # Assign a variable p_not_green as the probability of the ball not landing in a green pocket p_not_green <- 1-p_green # Define the number of bets using the variable 'n' n <- 1000 # Compute the standard error of the sum of 1,000 outcomes sqrt(1000) * abs(17 - -1) * sqrt(p_green * (1 - p_green) ) ## [1] 127.1 ### The Central Limit Theorem #### Exercise 1. American Roulette probability of winning money The exercises in the previous chapter explored winnings in American roulette. In this chapter of exercises, we will continue with the roulette example and add in the Central Limit Theorem. In the previous chapter of exercises, you created a random variable S that is the sum of your winnings after betting on green a number of times in American Roulette. What is the probability that you end up winning money if you bet on green 100 times? # Assign a variable p_green as the probability of the ball landing in a green pocket p_green <- 2 / 38 # Assign a variable p_not_green as the probability of the ball not landing in a green pocket p_not_green <- 1 - p_green # Define the number of bets using the variable 'n' n <- 100 # Calculate 'avg', the expected outcome of 100 spins if you win$17 when the ball lands on green and you lose $1 when the ball doesn't land on green (avg <- n * (17*p_green + -1*p_not_green)) ## [1] -5.263 # Compute 'se', the standard error of the sum of 100 outcomes (se <- sqrt(n) * (17 - -1)*sqrt(p_green * p_not_green)) ## [1] 40.19 # Using the expected value 'avg' and standard error 'se', compute the probability that you win money betting on green 100 times. 1 - pnorm(0, avg, se) ## [1] 0.4479 #### Exercise 2. American Roulette Monte Carlo simulation Create a Monte Carlo simulation that generates 10,000 outcomes of S, the sum of 100 bets. Compute the average and standard deviation of the resulting list and compare them to the expected value -5.2632 and standard error 40.1934 for S that you calculated previously. # The variable B specifies the number of times we want the simulation to run. Let's run the Monte Carlo simulation 10,000 times. B <- 10000 # Use the set.seed function to make sure your answer matches the expected result after random sampling. set.seed(1) # Create an object called S that replicates the sample code for B iterations and sums the outcomes. S <- replicate(B, { sum( sample(c(-1, 17), n, replace=TRUE, prob=c(p_not_green, p_green) ) ) }) # Compute the average value for 'S' mean(S) ## [1] -5.909 # Calculate the standard deviation of 'S' sd(S) ## [1] 40.31 #### Exercise 3. American Roulette Monte Carlo vs CLT In this chapter, you calculated the probability of winning money in American roulette using the CLT. Now, calculate the probability of winning money from the Monte Carlo simulation. The Monte Carlo simulation from the previous exercise has already been pre-run for you, resulting in the variable S that contains a list of 10,000 simulated outcomes. # Calculate the proportion of outcomes in the vector S that exceed$0
mean(S > 0)
## [1] 0.4232

#### Exercise 4. American Roulette Monte Carlo vs CLT comparison

The Monte Carlo result and the CLT approximation for the probability of losing money after 100 bets are close, but not that close. What could account for this?

The CLT does not work as well when the probability of success is small.

#### Exercise 5. American Roulette average winnings per bet

Now create a random variable Y that contains your average winnings per bet after betting on green 10,000 times.

# Use the set.seed function to make sure your answer matches the expected result after random sampling.
set.seed(1)

# Define the number of bets using the variable 'n'
n <- 10000

# Assign a variable p_green as the probability of the ball landing in a green pocket
p_green <- 2 / 38

# Assign a variable p_not_green as the probability of the ball not landing in a green pocket
p_not_green <- 1 - p_green

# Create a vector called X that contains the outcomes of n bets
X <- sample(c(-1, 17), n, replace=TRUE, prob=c(p_not_green, p_green) )
# Define a variable Y that contains the mean outcome per bet. Print this mean to the console.
(Y <- mean(X))
## [1] 0.008

#### Exercise 6. American Roulette per bet expected value

What is the expected value of Y, the average outcome per bet after betting on green 10,000 times?

# Calculate the expected outcome of Y, the mean outcome per bet in 10,000 bets
17 * p_green + -1 * p_not_green
## [1] -0.05263

#### Exercise 7. American Roulette per bet standard error

What is the standard error of Y, the average result of 10,000 spins?

# Define the number of bets using the variable 'n'
n <- 10000

# Assign a variable p_green as the probability of the ball landing in a green pocket
p_green <- 2 / 38

# Assign a variable p_not_green as the probability of the ball not landing in a green pocket
p_not_green <- 1 - p_green

# Compute the standard error of 'Y', the mean outcome per bet from 10,000 bets.
(abs(-1 - 17) * sqrt(p_green*p_not_green))/sqrt(n)
## [1] 0.04019

#### Exercise 8. American Roulette winnings per game are positive

What is the probability that your winnings are positive after betting on green 10,000 times?

# We defined the average using the following code
avg <- 17*p_green + -1*p_not_green

# We defined standard error using this equation
se <- 1/sqrt(n) * (17 - -1)*sqrt(p_green*p_not_green)

# Given this average and standard error, determine the probability of winning more than $0. Print the result to the console. 1 -pnorm(0, avg, se) ## [1] 0.09519 #### Exercise 9. American Roulette Monte Carlo again Create a Monte Carlo simulation that generates 10,000 outcomes of S, the average outcome from 10,000 bets on green. Compute the average and standard deviation of the resulting list to confirm the results from previous exercises using the Central Limit Theorem. # The variable n specifies the number of independent bets on green n <- 10000 # The variable B specifies the number of times we want the simulation to run B <- 10000 # Use the set.seed function to make sure your answer matches the expected result after random number generation set.seed(1) # Generate a vector S that contains the the average outcomes of 10,000 bets modeled 10,000 times S <- replicate(B, { X <- sample(c(17, -1), n, replace=TRUE, prob=c(p_green, p_not_green)) mean(X) }) # Compute the average of S mean(S) ## [1] -0.05223 # Compute the standard deviation of S sd(S) ## [1] 0.03996 #### Exercise 10. American Roulette comparison In a previous exercise, you found the probability of winning more than$0 after betting on green 10,000 times using the Central Limit Theorem. Then, you used a Monte Carlo simulation to model the average result of betting on green 10,000 times over 10,000 simulated series of bets.

What is the probability of winning more than $0 as estimated by your Monte Carlo simulation? The code to generate the vector S that contains the the average outcomes of 10,000 bets modeled 10,000 times has already been run for you. # Compute the proportion of outcomes in the vector 'S' where you won more than$0
mean(S > 0)
## [1] 0.0977
sessionInfo()
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.1 LTS
##
## Matrix products: default
## BLAS: /home/michael/anaconda3/lib/R/lib/libRblas.so
## LAPACK: /home/michael/anaconda3/lib/R/lib/libRlapack.so
##
## locale:
##  [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
##  [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
##  [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8
##  [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C
## [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base
##
## other attached packages:
## [1] ggplot2_3.0.0        dplyr_0.7.6          RevoUtils_11.0.1
## [4] RevoUtilsMath_11.0.0
##
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.18     pillar_1.3.0     compiler_3.5.1   plyr_1.8.4
##  [5] bindr_0.1.1      tools_3.5.1      digest_0.6.15    evaluate_0.11
##  [9] tibble_1.4.2     gtable_0.2.0     pkgconfig_2.0.1  rlang_0.2.1
## [13] rstudioapi_0.7   yaml_2.2.0       blogdown_0.9.8   xfun_0.4.11
## [17] bindrcpp_0.2.2   withr_2.1.2      stringr_1.3.1    knitr_1.20
## [21] rprojroot_1.3-2  grid_3.5.1       tidyselect_0.2.4 glue_1.3.0
## [25] R6_2.2.2         rmarkdown_1.10   bookdown_0.7     purrr_0.2.5
## [29] magrittr_1.5     backports_1.1.2  scales_0.5.0     codetools_0.2-15
## [33] htmltools_0.3.6  assertthat_0.2.0 colorspace_1.3-2 stringi_1.2.4
## [37] lazyeval_0.2.1   munsell_0.5.0    crayon_1.3.4

# References

R Core Team. 2018. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, and Kara Woo. 2018. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://CRAN.R-project.org/package=ggplot2.

Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2018. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.