Skip to main content
Ch 5: Statistics·§5.0 Chapter overview
5
Chapter 5

Statistics

Descriptive and inferential statistics — mean, median, mode, spread, the normal distribution, and the margin of error behind every poll headline.

This chapter is statistics — the math we use to summarize data we have and to make claims about data we don't. By the end you'll be fluent in the five descriptive measures (mean, median, mode, range, standard deviation), comfortable with the bell-shaped normal distribution, and able to read a poll headline (“45% support, ±3%”) the way a statistician would.

Chapters 5 and 6 are the data half of the course. Chapter 5 is descriptive plus the normal distribution; Chapter 6 picks up probability and expected value. Together they give you the tools to read the news, interpret a study, or evaluate a survey without taking the headline at face value.

By the end of this chapter, you'll be able to…

  • 5.1Differentiate between descriptive and inferential statistics.
  • 5.2Compute the mean, median, mode, range, and standard deviation of a data set.
  • 5.3Identify the four standard data displays (bar chart, histogram, box plot, pie chart) and choose the right one for a given question.
  • 5.4Apply the Empirical Rule (68-95-99.7) to read a normal distribution.
  • 5.5Compute the margin of error for a sample proportion at 95% confidence and read a poll headline as a confidence interval.

Sections

  1. 5.1What statistics is, reallyTwo kinds of statistics. Descriptive describes the data we have; inferential makes claims about a population from a sample. Naming the difference is the move that makes the rest of the chapter make sense.Read →
  2. 5.2Mean, median, modeThree measures of center on the same data set. The mean is the arithmetic average; the median is the middle value of the sorted data; the mode is the most frequent value. Each answers a slightly different version of the same question.Read →
  3. 5.3Range and standard deviationWhere the measures of center ask “where,” the measures of spread ask “how wide.” The range is max minus min; the standard deviation is the typical distance from the mean.Read →
  4. 5.4Reading data displaysFour standard ways to make a data set visible at a glance: bar charts for categorical comparisons, histograms for numeric distributions, box plots for five-number summaries, pie charts for parts of a whole.Read →
  5. 5.5Normal distribution and the Empirical RuleThe bell-shaped distribution that powers most inferential statistics. The Empirical Rule fixes three numerical landmarks: 68% within one standard deviation of the mean, 95% within two, and 99.7% within three.Read →
  6. 5.6Surveys and the margin of errorThe chapter closer. A sample of size n estimates a population proportion with a margin of error of roughly 1/√n. Shrinking the margin requires quadrupling the sample.Read →

Chapter glossary

All key terms introduced across this chapter, in the order they appear in the reading.

Data set
The collection of numbers actually in hand. 10 weights, 12 class sizes, 100 test scores — each is a data set.
Population
The full group whose properties we would ideally like to know about. Usually too large to measure directly.
Sample
A subset of the population that we can measure. A poll of 1,000 voters is a sample from the population of all voters.
Descriptive statistic
A numerical summary of a data set itself; exact, no margin of error.
Inferential statistic
A claim about a population based on a sample, always carrying a margin of error.
Mean (x̄)
Arithmetic average: (x₁ + ... + xₙ) / n. Sensitive to outliers. The balance point of the data.
Median
Middle value of the sorted data. For even n, the average of the two middle values. Robust to outliers.
Mode
Most frequent value(s). May be bimodal, multimodal, or undefined (no mode) if all values appear once.
Range
Max − min. The simplest measure of spread; very sensitive to outliers.
Standard deviation (s, σ)
The typical distance of a value from the mean. Same units as the data. Sample formula divides by n−1; population formula divides by n.
Variance
Square of the standard deviation. Units are the data's units squared.
Bar chart
Comparison display for categorical values. Gaps between bars signal a non-numeric axis.
Histogram
Distribution display for a numeric variable. No gaps between bars; bars are equal-width bins.
Box plot
Compact display of the five-number summary: min, Q1, median, Q3, max.
Pie chart
Parts-of-a-whole display; slices must sum to 100%. Use only when categories partition the whole with no overlap.
Normal distribution
The bell-shaped probability distribution characterized by mean μ and standard deviation σ. Symmetric, with a single peak at μ.
Empirical Rule
For any normal distribution: about 68% of values fall within ±σ of μ, 95% within ±2σ, and 99.7% within ±3σ.
Sample proportion (p̂)
Pronounced “p-hat.” The proportion of the sample with a given property. Our best estimate of the population proportion p.
Margin of error (MOE)
The ± on every inferential claim. For a sample proportion at 95% confidence, MOE ≈ 1/√n.
Confidence interval
The interval p̂ ± MOE. We are “95% confident” the true population proportion lies inside, meaning the procedure has a 95% long-run success rate.
0