Share
Pin it!
Google Plus

Data Sets

Univariate > Categorical    Bivariate > Categorical   
Univariate > Quantitative  Bivariate > Quantitative  Multivariate > Quantitative  

Data sets accessible through Data Analysis App or Spreadsheet App.

Univariate > Categorical

1. 2000 State Motorcycle Statistics 

Description: Number of motorcycle registrations, helmet requirements, and numbers of fatalities in 2000 by state.

Source: U.S. Federal Highway Administration

Uses: Compare fatality rate of states with helmet requirements to those in states without requirements; and construct a bar graph comparing the states.

 

2. Chicago White Sox 

Description: 1919 season batting averages and 1919 World Series batting averages for Chicago White Sox who had 10 or more at bats in the World Series game and if the player was accused of throwing the series.

Source: www.baseball-reference.com/postseason/1919_WS.shtml

Uses: Compute change in batting average and compare to whether or not player was accused of throwing Series; and construct a bar graph comparing the change in average and whether or not accused.

 

Univariate > Quantitative

1. 2000 State Motorcycle Statistics 

Description: Number of motorcycle registrations, helmet requirements, and number of fatalities in 2000.

Source: U.S. Federal Highway Administration

Uses: Find a fatality rate; construct a box plot or histogram; identify outliers; estimate centers and spread; and identify skewed distributions.

 

2. Achievement Test Scores 

Description: Achievement test scores for all ninth graders in one high school.

Uses: Construct a histogram or box plot; identify a mound shaped distribution; estimate centers and spread; and investigate the percent of data within 1, 2, or 3 standard deviations from the mean.

 

3. Apartment Temperatures 

Description: Variation in an apartment temperature (in degrees F) with its thermostat set to 70 degrees Fahrenheit each day at noon.

Uses: Construct a box plot or histogram; identify outliers; estimate centers and spread; identify skewed distributions; and compare sample mean to the hypothesized mean of 70.

 

4. Battery Life 

Description: Life in hours of two brands of batteries

Source: Navigating through Data Analysis in Grades 6–8 (NCTM, 2003)

Uses: Construct comparative box plots and histograms, and use plots to determine if battery life is different between the two brands.

 

5. Best Actress 

Description: Birthdays and ages of actress whose performances won in the Best Leading Actress category at the annual Academy Awards (Oscars) and the year the award was given.

Sources: www.oscars.com and www.imdb.com 

Uses: Construct a box plot or histogram; identify outliers (e.g., examine the effect of removing 80 and 74); estimate centers and spread; identify skewed distributions; and look for trends over time (e.g., Are there predictable patterns in the age of the winners?).

 

6. Certificate Perimeters 

Description: Perimeter measurements (mm) of the border of a certificate measured by 131 students of a Wisconsin high school.

Uses: Construct a box plot or histogram; identify outliers; and estimate centers and spread.

 

7. Concord & Portland Monthly Precipitation 

Description: Normal monthly precipitation (rain and snow) in inches for Concord, New Hampshire and for Portland, Oregon.

Source: National Climate Data Center, 2005

Uses: Construct comparative box plots; investigate measures of variability (IQR and standard deviation); estimate centers and spread; and investigate if there is a statistical difference between the mean temperatures for the two cities.

 

8. Dissolution Times 

Description: Variation in times (in seconds) for a solute to dissolve.

Source: This is student-collected data from a chemistry experiment.

Uses: Construct a histogram or box plot; identify a mound shaped distribution; estimate centers and spread; and use the histogram to investigate the percent of data within 1, 2, or 3 standard deviations from the mean.

 

9. Fastest Growing Franchises 

Description: Rank, franchise name, type of service, minimum and maximum start-up costs (in $1,000s) for the 100 fastest growing franchises in the U.S. The ranking is based on the number of new franchise units added from 2005 to 2007.

Source: www.entrepreneur.com/franzone/rank/o,6584,12-12-F5-2006-7-0.html 

Uses: Construct a box plot or histogram; identify outliers; estimate centers and spread; identify skewed distributions; and investigate which type of franchises have the largest difference between minimum and maximum start-up costs.

 

10. Gas Mileage 

Description: Variation in gas mileage of a car over a 25-week span.

Uses: Construct a histogram or box plot; identify a mound shaped distribution; estimate centers and spread; use the histogram to investigate percent of data within 1, 2, or 3 standard deviations from the mean; and examine the trend in gas mileage over time.

 

11. Heights of Students and Basketball Players (Univariate Quantitative)

Description: Heights (cm) for a group of middle school students and heights of 25 professional basketball players.

Source: Navigating Through Data Analysis in Grades 6–8 (NCTM, 2003)

Uses: Construct and analyze comparative box plots; construct histograms; identify mound shaped distributions; and estimate centers and spread.

 

12. Heights of Young Adults 

Description: Heights (in inches) of 1,000 males and 1,000 females. The heights have been rounded to the nearest inch.

Uses: Construct and analyze comparative box plots; identify outliers; construct histograms; identify mound shaped distributions; and estimate centers and spread.

 

13. January Sunshine 

Description: Average percent of sunshine for the month of January (up to 2002). The percent of sunshine is the percentage of time that sunshine reaches the surface of the Earth at 174 different major weather-observing stations in all 50 states, Puerto Rico, and the Pacific Islands. The two stations with the highest percentages are Tucson and Yuma, Arizona. The station with the lowest percentage is Quillayute, Washington.

Source: www.ncdc.gov/oa/climate/online/ccd/avgsun.html 

Uses: Construct a histogram or box plot; identify outliers; identify a mound shaped distribution; estimate centers and spread; and use the histogram to investigate the percent of data within 1, 2, or 3 standard deviations from the mean.

14. Land Use  

Description: Number of acres (in 1,000) of urban areas in 1960 and 2002 and number of acres of forest in 1959 and 2002 for each of the 48 continental U.S. states and the District of Columbia (excludes Alaska and Hawaii).

Source: www.ers.usda.gov/Data/MajorLandUses/ 

Uses: Construct comparative box plots (e.g., compare acres of urban in 1960 and 2002, compare acres in forest 1959 and 2002); create and examine histograms of skewed and mound shaped distributions; examine the effect of outliers on measures of center and spread; and investigate if there is a statistical difference between the mean temperatures for the two cities.

 

15. Los Angles Rainfall 

Description: Rainfall (in inches) in Los Angeles for the 129 years from 1878 through 2006.

Source: National Weather Service

Uses: Construct a box plot or histogram; identify outliers (1884 rainfall amount effect on measures of center and spread); estimate centers and spread; and identify skewed distributions.

 

16. Manufactured Nails 

Description: Nail length (in inches) for 10 nails made by a machine that is set to have a mean length of 2 inches and a standard deviation of 0.03 inches.

Uses: Make a comparison of this sample mean to the hypothesized mean of 2.

 

17. Mean Hourly Earnings 

Description: Mean hourly earnings (in dollars) for 70 different occupations in the United States. Earnings are for all full-time, nonmilitary workers and do not include benefits, overtime, vacation pay, nonproduction bonuses, or tips.

Source: U.S. Department of Labor, National Compensation Survey: Occupational Earnings in the United States, Table 3.

Uses: Construct a box plot or histogram; identify outliers (e.g., examine the effect of CEO and physician earnings on measures of center and spread); estimate centers and spread; and identify skewed distributions.

 

18. Meaningful Words 

Description: Two lists of 20 three-letter “words.” One list contained meaningful words (e.g., CAT, DOG), whereas the other list contained nonsense words (e.g., ATC, ODG). A ninth-grade class of thirty students was randomly divided into two groups of fifteen students. One group was asked to memorize the list of meaningful words; the other group was asked to memorize the list of nonsense words. The number of words correctly recalled by each student was tabulated, and the resulting data are as follows:

Source: Focus in High School Mathematics Reasoning and Sense Making (NCTM 2009)

Uses: Construct comparative box plots; and introduce the randomization test.

 

19. Migraines 

Description: Time passed to get relief from a migraine headache for two different medications.

Source: Navigating Through Data Analysis in Grades 6–8 (NCTM, 2003)

Uses: Construct comparative box plots and histograms; refer to the comparative plots and discuss if there is a difference between the two medications; and introduce randomization test.

 

20. Min and Max Temperatures 

Description: Maximum and minimum temperatures (in F) on record at 289 major U.S. weather-observing stations in all 50 states, Puerto Rico, and Pacific Islands.

Source: www.ncdc.noaa.gov/oa/climate/online/ccd/ 

Uses: Construct a histogram or box plot; identifying outliers; determine the minimum, maximum and the difference between min and max temperatures.

 

21. Nickel Weights 

Description: Height to nearest hundredth of a gram, of a sample of 100 new nickels.

Uses: Construct a histogram or box plot; identify a mound shaped distribution; and estimate centers and spread.

 

22. Non-Normal Distribution 

Description: Random set of numbers generated from a non-normal distribution.

Uses: Construct a box plot or histogram; identify outliers; estimate centers and spread; and identify skewed distributions.

 

23. Number of Marriages 

Description: Marriage rate per 1,000 people for 50 U.S. states in 2004. The District of Columbia had a rate of 4.5.

Source: Division of Vital Statistics, National Center of Health Statistics www.cdc.gov/nchs/data/nvss/marriage90_04.pdf 

Uses: Construct a box plot or histogram; identifying outliers (e.g., examine the effect of Nevada); estimate centers and spread; and identify skewed distributions.

 

24. Number of Video Games 

Description: Number of video games available on 43 selected platforms.

Source: www.mobygames.com/moby_stats 

Uses: Construct a box plot or histogram; identify outliers (e.g., examine the effect of Windows and DOS); estimate centers and spread; and identifying skewed distributions.

 

25. Old Faithful 

Description: Each column is a set of consecutive eruptions wait times in minutes for the Old Faithful geyser in Yellowstone National Park collected in 1985.

Source: Focus in High School Mathematics Reasoning and Sense Making in Statistics and Probability (NCTM, 2009)

Uses: Construct histograms, box plots, and time series plot (or observation number on x-axis and duration time on y-axis).

 

26. PSU Women Heights 

Description: Heights (in inches) of 123 women in a statistics class at Penn State University in the 1970s.

Source: Joiner, Brian L. “Living Histograms.” International Statistical Review 3 (1975): 339–340.

Uses: Construct a histogram or box plot; identify a mound shaped distribution; estimate centers and spread; and use the histogram to investigate percent of data within 1, 2, or 3 standard deviations from the mean.

 

27. Random Rectangles 

Description: Area of 100 rectangles placed randomly on a sheet of paper.

Source: Navigating Through Data Analysis in Grades 9–12 (NCTM, 2003)

Uses: Construct a histograms; estimate center and spread; and conduct trials with Distribution of Sample Custom App.

 

28. Ratings of Movie Showings 

Description: How a student at the University of Alabama, Huntsville, rated the projection quality of nearby movie theaters. For each showing, a point was deducted for such things as misalignment, misframing, or an audio problem. He visited one theatre in Huntsville 92 times in the first five-and-a-half years it was open. The number of points deduced per showing is given below.

Source: home.hiwaay.net/~criswell/theatre/generated_subpages/ratings_table/ratings_table.html 

Uses: Construct a box plot or histogram; identify outliers (e.g., examine the effect of the two 12s); estimate centers and spread; and identify skewed distributions.

 

29. Roller Coasters 

Description: Greatest drop in feet of 55 major roller coasters in the U.S.

Source: Navigating Through Data Analysis in Grades 6–8 (NCTM, 2003)

Uses: Construct a box plot or histogram; identify outliers; estimate centers and spread; determine effect of outliers on the mean; and identify skewed distributions.

 

30. Study Time 

Description: Number of hours 36 members of a high school softball team reported that they studied in a typical week.

Uses: Construct a box plot or histogram; identify outliers; estimate centers and spread; and investigate the effect an outlier has on measures of center and spread.

 

31. Sunshine for All Months 

Description: Average percent of sunshine of the maximum possible through 2002 for 174 selected cities around the United States.

Source: www.ncdc.noaa.gov/oa/climate/online/ccd/avgsum.html 

Uses: Construct stacked box plots or histograms to compare the shape, center, and spread of the distributions by month.

 

32. TV Watching 

Description: Number of hours watching TV for one week as reported by a group of seventh graders.

Source: Navigating Through Data Analysis in Grades 6–8 (NCTM, 2003)

Uses: Construct a box plot or histogram; identify outliers; and estimate centers and spread.

 

33. U.S. Census 2000 & 2010 

Description: The spreadsheet contains data about apportionment populations and political representation in the U.S. House of Representatives for the 50 states from the censuses conducted in 2000 and 2010.

Sources: http://2010.census.gov/2010census/data/ and www.census.gov/population/apportionment/data/2010_apportionment_results.html 

Uses: Make comparisons across population and political representation data for 2000 and 2010; and investigate change over time (e.g., Which states gained members in the House of Representatives and which states lost members? Are there any regional patterns of change? Is there consistent representation across the states’ varied populations?).

 

34. Vertical Jumps 

Description: Vertical jump height (in inches) of 27 basketball players in an NBA draft.

Uses: Construct a box plot or histogram; identify outliers; estimate centers and spread; and use the histogram to investigate the percent of data within 1, 2, or 3 standard deviations from the mean.

 

35. Walking Speeds 

Description: These data show the mean times (in seconds) to walk 60 feet in various cities of the world.

Source: www.britishcouncil.org/paceoflife.pdf

Use: Construct a box plot or histogram; identify outliers; and estimate centers and spread.

 

Back to Core Math Tools Homepage  

 

 

 

Having trouble running our Java apps? Get help here.

Your feedback is important! Comments or concerns regarding the content of this page may be sent to nctm@nctm.org. Thank you.