Summary Statistics

We begin with one of the simplest and most powerful tools in data analysis: summarizing data. This part of the book introduces techniques that help us describe and understand datasets without relying on probability models. These summaries will later provide the intuition needed to understand statistical modeling and inference.

In the first chapter, we focus on distributions, visual representations such as histograms and density plots, that reveal patterns of variation, symmetry, and outliers. In the second chapter, we move from pictures to numbers, introducing numerical summaries such as the average and standard deviation, which quantify the center and spread of a distribution. To motivate these summaries, we introduce the normal distribution.

However, these summarize are not ideal for all datasets. Outliers and skewed distributions can make certain summaries, like the average and standard deviation, less appropriate or cause them to no longer represent what we think they do. For this reason, we also introduce rank-based summaries such as the median and interquartile range, which are more robust to outliers and provide complementary perspectives on data. Together, these ideas form the foundation of data analysis: describing what we see before we attempt to model why it happens.