Chapter 5

Statistics

Descriptive and inferential statistics — mean, median, mode, spread, the normal distribution, and the margin of error behind every poll headline.

This chapter is statistics — the math we use to summarize data we have and to make claims about data we don't. By the end you'll be fluent in the five descriptive measures (mean, median, mode, range, standard deviation), comfortable with the bell-shaped normal distribution, and able to read a poll headline (“45% support, ±3%”) the way a statistician would.

Chapters 5 and 6 are the data half of the course. Chapter 5 is descriptive plus the normal distribution; Chapter 6 picks up probability and expected value. Together they give you the tools to read the news, interpret a study, or evaluate a survey without taking the headline at face value.

By the end of this chapter, you'll be able to…

5.1Differentiate between descriptive and inferential statistics.
5.2Compute the mean, median, mode, range, and standard deviation of a data set.
5.3Identify the four standard data displays (bar chart, histogram, box plot, pie chart) and choose the right one for a given question.
5.4Apply the Empirical Rule (68-95-99.7) to read a normal distribution.
5.5Compute the margin of error for a sample proportion at 95% confidence and read a poll headline as a confidence interval.

Sections

Start with §5.1 →

Chapter glossary

All key terms introduced across this chapter, in the order they appear in the reading.

Data set: The collection of numbers actually in hand. 10 weights, 12 class sizes, 100 test scores — each is a data set.
Population: The full group whose properties we would ideally like to know about. Usually too large to measure directly.
Sample: A subset of the population that we can measure. A poll of 1,000 voters is a sample from the population of all voters.
Descriptive statistic: A numerical summary of a data set itself; exact, no margin of error.
Inferential statistic: A claim about a population based on a sample, always carrying a margin of error.
Mean (x̄): Arithmetic average: (x₁ + ... + xₙ) / n. Sensitive to outliers. The balance point of the data.
Median: Middle value of the sorted data. For even n, the average of the two middle values. Robust to outliers.
Mode: Most frequent value(s). May be bimodal, multimodal, or undefined (no mode) if all values appear once.
Range: Max − min. The simplest measure of spread; very sensitive to outliers.
Standard deviation (s, σ): The typical distance of a value from the mean. Same units as the data. Sample formula divides by n−1; population formula divides by n.
Variance: Square of the standard deviation. Units are the data's units squared.
Bar chart: Comparison display for categorical values. Gaps between bars signal a non-numeric axis.
Histogram: Distribution display for a numeric variable. No gaps between bars; bars are equal-width bins.
Box plot: Compact display of the five-number summary: min, Q1, median, Q3, max.
Pie chart: Parts-of-a-whole display; slices must sum to 100%. Use only when categories partition the whole with no overlap.
Normal distribution: The bell-shaped probability distribution characterized by mean μ and standard deviation σ. Symmetric, with a single peak at μ.
Empirical Rule: For any normal distribution: about 68% of values fall within ±σ of μ, 95% within ±2σ, and 99.7% within ±3σ.
Sample proportion (p̂): Pronounced “p-hat.” The proportion of the sample with a given property. Our best estimate of the population proportion p.
Margin of error (MOE): The ± on every inferential claim. For a sample proportion at 95% confidence, MOE ≈ 1/√n.
Confidence interval: The interval p̂ ± MOE. We are “95% confident” the true population proportion lies inside, meaning the procedure has a 95% long-run success rate.