Main Content

Lesson 1: Decision Making Under Uncertainty

Lesson 1 Summary

  • The choice of the right sample and sample size is the most important factor in statistical analysis. A manager should always ask about the sample size and sample characteristics. The sample must be a true representation of the population under question. Intentionally or unintentionally selecting only a subset of the population leads to less reliable results, and is often the reason people are skeptical about “statistical predictions.”
  • Mean, median, and the mode are measures of centrality of the data. Few extreme data points (outliers) skew the data. The mean is sensitive to outliers, while the median is not.
  • The 5-number summary of a data consists of the minimum, maximum, and median and the 25th and 75th percentiles.
    • The 5-number summary can be graphically displayed with a box (also known as box-and-whisker) plots.
  • Standard deviation measures how spread out the numbers are from the mean.
    • Standard deviation may serve as a measure of uncertainty or risk.
    • Quality control efforts are often aimed at reducing the standard deviation (i.e., increasing the predictability of a process).
  • Which measure of variation to use?
    • There are three main measures of variabilityvariance, standard deviation, and range.
      • standard deviation is more often used over variance because it is directly interpretable. It has the same units as the data.
      • range tells us the difference between the minimum and maximum data points, but does not tell much about how dispersed the data is.
      • coefficient of variation is very useful (standard deviation/mean) when it comes to comparing disparate sets of data.
  • Histograms and frequency distributions are common ways of visualizing the data.
    • A histogram is a bar chart where the data is grouped in ranges.
    • A frequency distribution shows the number of times a particular value occurs in the data.
    • Relative frequency distributions show the percentage of times a particular value occurs in the data.

Top of page