Summary statistics

We start by describing a simple yet powerful data analysis technique: constructing data summaries. Although the approach does not require mathematical models or probability, the motivation for the summaries we describe will later help us understand both these topics.

You have likely noticed that numerical data is often summarized with the average value. For example, the quality of a high school is sometimes summarized with one number: the average score on a standardized test. Occasionally, a second number is reported: the standard deviation. For example, you might read a report stating that scores were 680 plus or minus 50, with 50 the standard deviation. The report has summarized the entirety of scores with just two numbers. Is this appropriate? Is there any important piece of information that we are missing by only looking at this summary rather than the entire list? In this section, we answer these questions and motivate several useful summary statistics and plots, including the average, standard deviation, median, quartiles, histograms, and density plots.