Summary Statistics
We start by describing a simple yet powerful data analysis technique: constructing data summaries. Although the approach does not require mathematical models or probability, the motivation for the summaries we describe will later help us understand both these topics.
It is common to summarize numerical data using an average. For instance, the quality of a high school may be conveyed by a single figure: the average standardized test score attained by its students. Occasionally, a second number is reported: the standard deviation. For example, you might read a report stating that scores were 680 plus or minus 50, with 50 the standard deviation. The report has summarized the entirety of scores with just two numbers. Is this appropriate? Is there any important piece of information that we are missing by only looking at this summary rather than the entire list? In this part of the book, we answer these questions and motivate several useful summary statistics and plots, including the average, standard deviation, median, quartiles, histograms, density plots, boxplots, and quantile-quantile plots.