4 Continuous Probability
In Section 1.3, we discussed why it is not practical to assign a probability to every possible numeric outcome, such as an exact height, since there are infinitely many possible values. The same idea extends to outcomes that take values on a continuous scale: each individual value has probability zero. Instead, we describe their behavior through probability density functions, which let us compute probabilities for intervals of values rather than single points.
In this chapter, we introduce the mathematical framework for continuous probability distributions and present several useful approximations that frequently appear in data analysis.
4.1 Cumulative distribution functions
We return to our example using the heights of adult male students:
We previously defined the empirical cumulative distribution function (eCDF) as
F <- function(a) mean(x <= a)which, for any value a, gives the proportion of values in the list x that are less than or equal to a.
To connect the eCDF to probability, imagine randomly selecting one of the male students. What is the chance that he is taller than 70.5 inches? Because each student is equally likely to be chosen, this probability is simply the proportion of students taller than 70.5 inches. Using the eCDF, we can compute it as:
1 - F(70.5)
#> [1] 0.363The cumulative distribution function (CDF) is the theoretical counterpart of the eCDF. Rather than relying on observed data, it assigns probabilities to ranges of values for a random outcome \(X\). Specifically, the CDF gives, for any number \(a\), the probability that \(X\) is less than or equal to \(a\):
\[ F(a) = \Pr(X \leq a) \]
Once the CDF is defined, we can compute the probability that \(X\) falls within any interval. For example, the probability that a student’s height is between \(a\) and \(b\) is:
\[ \Pr(a < X \leq b) = F(b) - F(a) \]
Because we can determine the probability of any event from the CDF, it fully defines the probability distribution of a continuous outcome.
4.2 Probability density function
For most continuous distributions, we can describe the cumulative distribution function (CDF) in terms of another function, \(f(x)\), such that
\[ F(b) - F(a) = \int_a^b f(x)\,dx \]
This function \(f(x)\) is called the probability density function (PDF).
The PDF plays a role similar to the relative frequency distribution for discrete data. Instead of assigning probabilities to individual outcomes, which would all be zero for a continuous variable, the PDF describes how probability is distributed across values of \(x\). We can think of it as defining the shape of the distribution: wider regions under the curve correspond to more likely ranges of values.
To build intuition, imagine dividing the range of possible outcomes into many tiny intervals. The width of each interval forms the base of a rectangle, and \(f(x)\) determines its height. The total area of all rectangles approximates the probability of observing a value between \(a\) and \(b\). In the limit, this approximation becomes an integral:

An important example is the normal distribution, whose probability density function is
\[ f(x) = \frac{1}{\sqrt{2\pi}\,\sigma} \exp\left(-\frac{1}{2}\left(\frac{x - m}{s}\right)^2\right) \]
Integrating this function gives the CDF of the normal distribution. In R, the corresponding function is pnorm. A random outcome is said to be normally distributed with mean m and standard deviation s if its CDF is defined by
This is particularly useful in practice. If we are willing to assume that a variable such as height follows a normal distribution, we can answer probability questions without needing the full dataset. For example, to find the probability that a randomly selected student is taller than 70 inches, we only need the sample mean and standard deviation:
4.3 Theoretical distributions as approximations
The normal distribution is defined mathematically, without relying on data. In practice, however, almost all the quantities we analyze come from discrete observations. For instance, our height data can be viewed as categorical, with each unique height representing a category and its probability given by its relative frequency.

While these reported values appear discrete, this discreteness arises from rounding. A few students reported exact metric conversions, such as 177 cm = 69.685 inches, while most rounded to the nearest inch. It is therefore more useful to treat height as a continuous variable, recognizing that no one is exactly 70 inches tall—the higher frequency at 70 simply reflects rounding.
In continuous distributions, individual points have probability zero. Instead, we work with intervals, asking questions such as: what is the probability that a height falls between 69.5 and 70.5 inches? For rounded data, this matches the natural interval corresponding to a single reported inch.
The normal distribution provides a convenient way to approximate these probabilities. For example:
The approximation is close for intervals aligned with the rounding, though the approximation deteriorates for smaller, uneven ranges. This discrepancy reflects discretization rather than a flaw in the normal model itself. As long as we are aware of this limitation, treating rounded data as continuous and using normal approximations remains an effective and practical approach.
4.4 Monte Carlo
R provides functions to generate normally distributed outcomes. Specifically, the rnorm function takes three arguments: size, mean (defaults to 0), and standard deviation (defaults to 1), and produces random numbers. Here is an example of how we could generate data that looks like our reported heights:
Not surprisingly, the distribution looks normal:

This is one of the most useful functions in R because it lets us generate data that mimics natural variation and explore what outcomes might occur by chance through Monte Carlo simulations.
For instance, suppose we repeatedly sample 800 men at random and record the tallest person in each group. What does the distribution of these tallest heights look like? How rare is it to find a seven-footer among 800 men? The following Monte Carlo simulation helps us find out:
Having a seven-footer is quite rare:
mean(tallest >= 7*12)
#> [1] 0.0188Here is the resulting distribution:

Note that although the derivation is not straightforward, the distribution of the maximum can be computed analytically. Once derived, it provides a much faster and more efficient way to evaluate probabilities than relying on simulation. However, in cases where the derivation is too complex or not possible, either due to the form of the distribution or the nature of the problem, Monte Carlo simulation offers a practical alternative. By repeatedly generating random samples, we can approximate the distribution of the maximum, or any other statistic, and obtain reliable estimates even in analytically intractable situations.
The normal distribution is not the only useful theoretical model. Other continuous distributions that often appear in data analysis include the Student’s t, chi-square, exponential, gamma, and beta distributions. Their corresponding shorthand names in R are t, chisq, exp, gamma, and beta.
R follows a simple and consistent naming convention for functions associated with these distributions. Each distribution has four related functions, which begin with the letters d, p, q, and r, indicating density, cumulative probability, quantile, and random generation, respectively. For example, for the Student’s t distribution (discussed later in Section 10.2.3), we use dt for the density, pt for the cumulative distribution function, qt for quantiles, and rt to generate random samples for Monte Carlo simulations.
4.5 Exercises
1. Assume the distribution of female heights is approximated by a normal distribution with a mean of 64 inches and a standard deviation of 3 inches. If we pick a female at random, what is the probability that she is 5 feet or shorter?
2. Assume the distribution of female heights is approximated by a normal distribution with a mean of 64 inches and a standard deviation of 3 inches. If we pick a female at random, what is the probability that she is 6 feet or taller?
3. Assume the distribution of female heights is approximated by a normal distribution with a mean of 64 inches and a standard deviation of 3 inches. If we pick a female at random, what is the probability that she is between 61 and 67 inches?
4. Repeat the exercise above, but convert everything to centimeters. That is, multiply every height, including the standard deviation, by 2.54. What is the answer now?
5. Notice that the answer to the question does not change when you change units. This makes sense since the standard deviations from the average for an entry in a list are not affected by what units we use. In fact, if you look closely, you notice that 61 and 67 are both 1 SD away from the average. Compute the probability that a randomly picked, normally distributed random variable is within 1 SD from the average.
6. To understand the mathematical rationale that explains why the answers to exercises 3, 4, and 5 are the same, suppose we have a random variable with average \(m\) and standard error \(s\). Suppose we ask the probability of \(X\) being smaller or equal to \(a\). Remember that, by definition, \(a\) is \((a - m)/s\) standard deviations \(s\) away from the average \(m\). The probability is:
\[ \mathrm{Pr}(X \leq a) \]
Now we subtract \(\mu\) to both sides and then divide both sides by \(\sigma\):
\[ \mathrm{Pr}\left(\frac{X-\mu}{\sigma} \leq \frac{a-\mu}{\sigma} \right) \]
The quantity on the left is a standard normal random variable. It has an average of 0 and a standard error of 1. We will call it \(Z\):
\[ \mathrm{Pr}\left(Z \leq \frac{a-\mu}{\sigma} \right) \]
So, no matter the units, the probability of \(X\leq a\) is the same as the probability of a standard normal variable being less than \((a - \mu)/\sigma\). If mu is the average and sigma the standard error, which of the following R code would give us the right answer in every situation?
mean(X <= a)pnorm((a - m)/s)pnorm((a - m)/s, m, s)pnorm(a)
7. Imagine the distribution of male adults is approximately normal with an expected value of 69 and a standard deviation of 3. How tall is the male in the 99th percentile? Hint: use qnorm.
8. The distribution of IQ scores is approximately normally distributed. The average is 100 and the standard deviation is 15. Suppose you want to know the distribution of the highest IQ across all graduating classes if 10,000 people are born each in your school district. Run a Monte Carlo simulation with B=1000 generating 10,000 IQ scores and keeping the highest. Make a histogram.