Cumulative distribution function

Michael Taylor


x <- father.son$fheight
round(sample(x, 10), 1)
##  [1] 63.3 70.2 64.3 71.5 68.0 67.1 64.7 68.9 65.9 69.7

To define a distribution we compute, for all possible values of \(a\), the proportion of numbers in our list that are below \(a\). We use the following notation:

\[F(a)\equiv Pr(x\leq a)\]

This is called the cumulative distribution function (CDF). When the CDF is derived from data, as opposed to theoretically, we also call it the empirical CDF (ECDF). We can plot \(F(a)\) versus a like this:

smallest <- floor( min(x) )
largest <- ceiling( max(x) )
values <- seq(smallest, largest,len=300)
heightecdf <- ecdf(x)
plot(values, heightecdf(values), type="l",
     xlab="a (Height in inches)",ylab="Pr(x <= a)")