Share

## Data Sets

Data sets accessible through Data Analysis App or Spreadsheet App.

## Univariate > Categorical

1. 2000 State Motorcycle Statistics

Description: Number of motorcycle registrations, helmet requirements, and numbers of fatalities in 2000 by state.

Uses: Compare fatality rate of states with helmet requirements to those in states without requirements; and construct a bar graph comparing the states.

2. Chicago White Sox

Description: 1919 season batting averages and 1919 World Series batting averages for Chicago White Sox who had 10 or more at bats in the World Series game and if the player was accused of throwing the series.

Source: www.baseball-reference.com/postseason/1919_WS.shtml

Uses: Compute change in batting average and compare to whether or not player was accused of throwing Series; and construct a bar graph comparing the change in average and whether or not accused.

## Univariate > Quantitative

1. 2000 State Motorcycle Statistics

Description: Number of motorcycle registrations, helmet requirements, and number of fatalities in 2000.

Uses: Find a fatality rate; construct a box plot or histogram; identify outliers; estimate centers and spread; and identify skewed distributions.

2. Achievement Test Scores

Description: Achievement test scores for all ninth graders in one high school.

Uses: Construct a histogram or box plot; identify a mound shaped distribution; estimate centers and spread; and investigate the percent of data within 1, 2, or 3 standard deviations from the mean.

3. Apartment Temperatures

Description: Variation in an apartment temperature (in degrees F) with its thermostat set to 70 degrees Fahrenheit each day at noon.

Uses: Construct a box plot or histogram; identify outliers; estimate centers and spread; identify skewed distributions; and compare sample mean to the hypothesized mean of 70.

4. Battery Life

Description: Life in hours of two brands of batteries

Source: Navigating through Data Analysis in Grades 6–8 (NCTM, 2003)

Uses: Construct comparative box plots and histograms, and use plots to determine if battery life is different between the two brands.

5. Best Actress

Description: Birthdays and ages of actress whose performances won in the Best Leading Actress category at the annual Academy Awards (Oscars) and the year the award was given.

Sources: www.oscars.com and www.imdb.com

Uses: Construct a box plot or histogram; identify outliers (e.g., examine the effect of removing 80 and 74); estimate centers and spread; identify skewed distributions; and look for trends over time (e.g., Are there predictable patterns in the age of the winners?).

6. Certificate Perimeters

Description: Perimeter measurements (mm) of the border of a certificate measured by 131 students of a Wisconsin high school.

Uses: Construct a box plot or histogram; identify outliers; and estimate centers and spread.

7. Concord & Portland Monthly Precipitation

Description: Normal monthly precipitation (rain and snow) in inches for Concord, New Hampshire and for Portland, Oregon.

Source: National Climate Data Center, 2005

Uses: Construct comparative box plots; investigate measures of variability (IQR and standard deviation); estimate centers and spread; and investigate if there is a statistical difference between the mean temperatures for the two cities.

8. Dissolution Times

Description: Variation in times (in seconds) for a solute to dissolve.

Source: This is student-collected data from a chemistry experiment.

Uses: Construct a histogram or box plot; identify a mound shaped distribution; estimate centers and spread; and use the histogram to investigate the percent of data within 1, 2, or 3 standard deviations from the mean.

9. Fastest Growing Franchises

Description: Rank, franchise name, type of service, minimum and maximum start-up costs (in \$1,000s) for the 100 fastest growing franchises in the U.S. The ranking is based on the number of new franchise units added from 2005 to 2007.

Uses: Construct a box plot or histogram; identify outliers; estimate centers and spread; identify skewed distributions; and investigate which type of franchises have the largest difference between minimum and maximum start-up costs.

10. Gas Mileage

Description: Variation in gas mileage of a car over a 25-week span.

Uses: Construct a histogram or box plot; identify a mound shaped distribution; estimate centers and spread; use the histogram to investigate percent of data within 1, 2, or 3 standard deviations from the mean; and examine the trend in gas mileage over time.

11. Heights of Students and Basketball Players (Univariate Quantitative)

Description: Heights (cm) for a group of middle school students and heights of 25 professional basketball players.

Source: Navigating Through Data Analysis in Grades 6–8 (NCTM, 2003)

Uses: Construct and analyze comparative box plots; construct histograms; identify mound shaped distributions; and estimate centers and spread.

Description: Heights (in inches) of 1,000 males and 1,000 females. The heights have been rounded to the nearest inch.

Uses: Construct and analyze comparative box plots; identify outliers; construct histograms; identify mound shaped distributions; and estimate centers and spread.

13. January Sunshine

Description: Average percent of sunshine for the month of January (up to 2002). The percent of sunshine is the percentage of time that sunshine reaches the surface of the Earth at 174 different major weather-observing stations in all 50 states, Puerto Rico, and the Pacific Islands. The two stations with the highest percentages are Tucson and Yuma, Arizona. The station with the lowest percentage is Quillayute, Washington.

Uses: Construct a histogram or box plot; identify outliers; identify a mound shaped distribution; estimate centers and spread; and use the histogram to investigate the percent of data within 1, 2, or 3 standard deviations from the mean.

14. Land Use

Description: Number of acres (in 1,000) of urban areas in 1960 and 2002 and number of acres of forest in 1959 and 2002 for each of the 48 continental U.S. states and the District of Columbia (excludes Alaska and Hawaii).

Uses: Construct comparative box plots (e.g., compare acres of urban in 1960 and 2002, compare acres in forest 1959 and 2002); create and examine histograms of skewed and mound shaped distributions; examine the effect of outliers on measures of center and spread; and investigate if there is a statistical difference between the mean temperatures for the two cities.

15. Los Angles Rainfall

Description: Rainfall (in inches) in Los Angeles for the 129 years from 1878 through 2006.

Source: National Weather Service

Uses: Construct a box plot or histogram; identify outliers (1884 rainfall amount effect on measures of center and spread); estimate centers and spread; and identify skewed distributions.

16. Manufactured Nails

Description: Nail length (in inches) for 10 nails made by a machine that is set to have a mean length of 2 inches and a standard deviation of 0.03 inches.

Uses: Make a comparison of this sample mean to the hypothesized mean of 2.

17. Mean Hourly Earnings

Description: Mean hourly earnings (in dollars) for 70 different occupations in the United States. Earnings are for all full-time, nonmilitary workers and do not include benefits, overtime, vacation pay, nonproduction bonuses, or tips.

Source: U.S. Department of Labor, National Compensation Survey: Occupational Earnings in the United States, Table 3.

Uses: Construct a box plot or histogram; identify outliers (e.g., examine the effect of CEO and physician earnings on measures of center and spread); estimate centers and spread; and identify skewed distributions.

18. Meaningful Words

Description: Two lists of 20 three-letter “words.” One list contained meaningful words (e.g., CAT, DOG), whereas the other list contained nonsense words (e.g., ATC, ODG). A ninth-grade class of thirty students was randomly divided into two groups of fifteen students. One group was asked to memorize the list of meaningful words; the other group was asked to memorize the list of nonsense words. The number of words correctly recalled by each student was tabulated, and the resulting data are as follows:

Source: Focus in High School Mathematics Reasoning and Sense Making (NCTM 2009)

Uses: Construct comparative box plots; and introduce the randomization test.

19. Migraines

Description: Time passed to get relief from a migraine headache for two different medications.

Source: Navigating Through Data Analysis in Grades 6–8 (NCTM, 2003)

Uses: Construct comparative box plots and histograms; refer to the comparative plots and discuss if there is a difference between the two medications; and introduce randomization test.

20. Min and Max Temperatures

Description: Maximum and minimum temperatures (in F) on record at 289 major U.S. weather-observing stations in all 50 states, Puerto Rico, and Pacific Islands.

Uses: Construct a histogram or box plot; identifying outliers; determine the minimum, maximum and the difference between min and max temperatures.

21. Nickel Weights

Description: Height to nearest hundredth of a gram, of a sample of 100 new nickels.

Uses: Construct a histogram or box plot; identify a mound shaped distribution; and estimate centers and spread.

22. Non-Normal Distribution

Description: Random set of numbers generated from a non-normal distribution.

Uses: Construct a box plot or histogram; identify outliers; estimate centers and spread; and identify skewed distributions.

23. Number of Marriages

Description: Marriage rate per 1,000 people for 50 U.S. states in 2004. The District of Columbia had a rate of 4.5.

Source: Division of Vital Statistics, National Center of Health Statistics www.cdc.gov/nchs/data/nvss/marriage90_04.pdf

Uses: Construct a box plot or histogram; identifying outliers (e.g., examine the effect of Nevada); estimate centers and spread; and identify skewed distributions.

24. Number of Video Games

Description: Number of video games available on 43 selected platforms.

Source: www.mobygames.com/moby_stats

Uses: Construct a box plot or histogram; identify outliers (e.g., examine the effect of Windows and DOS); estimate centers and spread; and identifying skewed distributions.

25. Old Faithful

Description: Each column is a set of consecutive eruptions wait times in minutes for the Old Faithful geyser in Yellowstone National Park collected in 1985.

Source: Focus in High School Mathematics Reasoning and Sense Making in Statistics and Probability (NCTM, 2009)

Uses: Construct histograms, box plots, and time series plot (or observation number on x-axis and duration time on y-axis).

26. PSU Women Heights

Description: Heights (in inches) of 123 women in a statistics class at Penn State University in the 1970s.

Source: Joiner, Brian L. “Living Histograms.” International Statistical Review 3 (1975): 339–340.

Uses: Construct a histogram or box plot; identify a mound shaped distribution; estimate centers and spread; and use the histogram to investigate percent of data within 1, 2, or 3 standard deviations from the mean.

27. Random Rectangles

Description: Area of 100 rectangles placed randomly on a sheet of paper.

Source: Navigating Through Data Analysis in Grades 9–12 (NCTM, 2003)

Uses: Construct a histograms; estimate center and spread; and conduct trials with Distribution of Sample Custom App.

28. Ratings of Movie Showings

Description: How a student at the University of Alabama, Huntsville, rated the projection quality of nearby movie theaters. For each showing, a point was deducted for such things as misalignment, misframing, or an audio problem. He visited one theatre in Huntsville 92 times in the first five-and-a-half years it was open. The number of points deduced per showing is given below.

Uses: Construct a box plot or histogram; identify outliers (e.g., examine the effect of the two 12s); estimate centers and spread; and identify skewed distributions.

29. Roller Coasters

Description: Greatest drop in feet of 55 major roller coasters in the U.S.

Source: Navigating Through Data Analysis in Grades 6–8 (NCTM, 2003)

Uses: Construct a box plot or histogram; identify outliers; estimate centers and spread; determine effect of outliers on the mean; and identify skewed distributions.

30. Study Time

Description: Number of hours 36 members of a high school softball team reported that they studied in a typical week.

Uses: Construct a box plot or histogram; identify outliers; estimate centers and spread; and investigate the effect an outlier has on measures of center and spread.

31. Sunshine for All Months

Description: Average percent of sunshine of the maximum possible through 2002 for 174 selected cities around the United States.

Uses: Construct stacked box plots or histograms to compare the shape, center, and spread of the distributions by month.

32. TV Watching

Description: Number of hours watching TV for one week as reported by a group of seventh graders.

Source: Navigating Through Data Analysis in Grades 6–8 (NCTM, 2003)

Uses: Construct a box plot or histogram; identify outliers; and estimate centers and spread.

33. U.S. Census 2000 & 2010

Description: The spreadsheet contains data about apportionment populations and political representation in the U.S. House of Representatives for the 50 states from the censuses conducted in 2000 and 2010.

Uses: Make comparisons across population and political representation data for 2000 and 2010; and investigate change over time (e.g., Which states gained members in the House of Representatives and which states lost members? Are there any regional patterns of change? Is there consistent representation across the states’ varied populations?).

34. Vertical Jumps

Description: Vertical jump height (in inches) of 27 basketball players in an NBA draft.

Uses: Construct a box plot or histogram; identify outliers; estimate centers and spread; and use the histogram to investigate the percent of data within 1, 2, or 3 standard deviations from the mean.

35. Walking Speeds

Description: These data show the mean times (in seconds) to walk 60 feet in various cities of the world.

Source: www.britishcouncil.org/paceoflife.pdf

Use: Construct a box plot or histogram; identify outliers; and estimate centers and spread.

Having trouble running our Java apps in e-Examples? Get help here.