Data Sets

  • Univariate > Categorical    Bivariate > Categorical  
    Univariate > Quantitative Bivariate > Quantitative Multivariate > Quantitative  

    Data sets accessible through Data Analysis App or Spreadsheet App.

    Univariate > Categorical

    1. 2000 State Motorcycle Statistics

    Description: Number of motorcycle registrations, helmet requirements, and numbers of fatalities in 2000 by state.

    Source: U.S. Federal Highway Administration

    Uses: Compare fatality rate of states with helmet requirements to those in states without requirements; and construct a bar graph comparing the states.

     

    2. Chicago White Sox

    Description: 1919 season batting averages and 1919 World Series batting averages for Chicago White Sox who had 10 or more at bats in the World Series game and if the player was accused of throwing the series.

    Source: www.baseball-reference.com/postseason/1919_WS.shtml

    Uses: Compute change in batting average and compare to whether or not player was accused of throwing Series; and construct a bar graph comparing the change in average and whether or not accused.

     

    Univariate > Quantitative

    1. 2000 State Motorcycle Statistics

    Description: Number of motorcycle registrations, helmet requirements, and number of fatalities in 2000.

    Source: U.S. Federal Highway Administration

    Uses: Find a fatality rate; construct a box plot or histogram; identify outliers; estimate centers and spread; and identify skewed distributions.

     

    2. Achievement Test Scores

    Description: Achievement test scores for all ninth graders in one high school.

    Uses: Construct a histogram or box plot; identify a mound shaped distribution; estimate centers and spread; and investigate the percent of data within 1, 2, or 3 standard deviations from the mean.

     

    3. Apartment Temperatures

    Description: Variation in an apartment temperature (in degrees F) with its thermostat set to 70 degrees Fahrenheit each day at noon.

    Uses: Construct a box plot or histogram; identify outliers; estimate centers and spread; identify skewed distributions; and compare sample mean to the hypothesized mean of 70.

     

    4. Battery Life

    Description: Life in hours of two brands of batteries

    Source: Navigating through Data Analysis in Grades 6–8 (NCTM, 2003)

    Uses: Construct comparative box plots and histograms, and use plots to determine if battery life is different between the two brands.

     

    5. Best Actress

    Description: Birthdays and ages of actress whose performances won in the Best Leading Actress category at the annual Academy Awards (Oscars) and the year the award was given.

    Sources: www.oscars.com and www.imdb.com

    Uses: Construct a box plot or histogram; identify outliers (e.g., examine the effect of removing 80 and 74); estimate centers and spread; identify skewed distributions; and look for trends over time (e.g., Are there predictable patterns in the age of the winners?).

     

    6. Certificate Perimeters

    Description: Perimeter measurements (mm) of the border of a certificate measured by 131 students of a Wisconsin high school.

    Uses: Construct a box plot or histogram; identify outliers; and estimate centers and spread.

     

    7. Concord & Portland Monthly Precipitation

    Description: Normal monthly precipitation (rain and snow) in inches for Concord, New Hampshire and for Portland, Oregon.

    Source: National Climate Data Center, 2005

    Uses: Construct comparative box plots; investigate measures of variability (IQR and standard deviation); estimate centers and spread; and investigate if there is a statistical difference between the mean temperatures for the two cities.

     

    8. Dissolution Times

    Description: Variation in times (in seconds) for a solute to dissolve.

    Source: This is student-collected data from a chemistry experiment.

    Uses: Construct a histogram or box plot; identify a mound shaped distribution; estimate centers and spread; and use the histogram to investigate the percent of data within 1, 2, or 3 standard deviations from the mean.

     

    9. Fastest Growing Franchises

    Description: Rank, franchise name, type of service, minimum and maximum start-up costs (in $1,000s) for the 100 fastest growing franchises in the U.S. The ranking is based on the number of new franchise units added from 2005 to 2007.

    Source: www.entrepreneur.com/franzone/rank/o,6584,12-12-F5-2006-7-0.html

    Uses: Construct a box plot or histogram; identify outliers; estimate centers and spread; identify skewed distributions; and investigate which type of franchises have the largest difference between minimum and maximum start-up costs.

     

    10. Gas Mileage

    Description: Variation in gas mileage of a car over a 25-week span.

    Uses: Construct a histogram or box plot; identify a mound shaped distribution; estimate centers and spread; use the histogram to investigate percent of data within 1, 2, or 3 standard deviations from the mean; and examine the trend in gas mileage over time.

     

    11. Heights of Students and Basketball Players (Univariate Quantitative)

    Description: Heights (cm) for a group of middle school students and heights of 25 professional basketball players.

    Source: Navigating Through Data Analysis in Grades 6–8 (NCTM, 2003)

    Uses: Construct and analyze comparative box plots; construct histograms; identify mound shaped distributions; and estimate centers and spread.

     

    12. Heights of Young Adults

    Description: Heights (in inches) of 1,000 males and 1,000 females. The heights have been rounded to the nearest inch.

    Uses: Construct and analyze comparative box plots; identify outliers; construct histograms; identify mound shaped distributions; and estimate centers and spread.

     

    13. January Sunshine

    Description: Average percent of sunshine for the month of January (up to 2002). The percent of sunshine is the percentage of time that sunshine reaches the surface of the Earth at 174 different major weather-observing stations in all 50 states, Puerto Rico, and the Pacific Islands. The two stations with the highest percentages are Tucson and Yuma, Arizona. The station with the lowest percentage is Quillayute, Washington.

    Source: www.ncdc.gov/oa/climate/online/ccd/avgsun.html

    Uses: Construct a histogram or box plot; identify outliers; identify a mound shaped distribution; estimate centers and spread; and use the histogram to investigate the percent of data within 1, 2, or 3 standard deviations from the mean.

    14. Land Use

    Description: Number of acres (in 1,000) of urban areas in 1960 and 2002 and number of acres of forest in 1959 and 2002 for each of the 48 continental U.S. states and the District of Columbia (excludes Alaska and Hawaii).

    Source: www.ers.usda.gov/Data/MajorLandUses/

    Uses: Construct comparative box plots (e.g., compare acres of urban in 1960 and 2002, compare acres in forest 1959 and 2002); create and examine histograms of skewed and mound shaped distributions; examine the effect of outliers on measures of center and spread; and investigate if there is a statistical difference between the mean temperatures for the two cities.

     

    15. Los Angles Rainfall

    Description: Rainfall (in inches) in Los Angeles for the 129 years from 1878 through 2006.

    Source: National Weather Service

    Uses: Construct a box plot or histogram; identify outliers (1884 rainfall amount effect on measures of center and spread); estimate centers and spread; and identify skewed distributions.

     

    16. Manufactured Nails

    Description: Nail length (in inches) for 10 nails made by a machine that is set to have a mean length of 2 inches and a standard deviation of 0.03 inches.

    Uses: Make a comparison of this sample mean to the hypothesized mean of 2.

     

    17. Mean Hourly Earnings

    Description: Mean hourly earnings (in dollars) for 70 different occupations in the United States. Earnings are for all full-time, nonmilitary workers and do not include benefits, overtime, vacation pay, nonproduction bonuses, or tips.

    Source: U.S. Department of Labor, National Compensation Survey: Occupational Earnings in the United States, Table 3.

    Uses: Construct a box plot or histogram; identify outliers (e.g., examine the effect of CEO and physician earnings on measures of center and spread); estimate centers and spread; and identify skewed distributions.

     

    18. Meaningful Words

    Description: Two lists of 20 three-letter “words.” One list contained meaningful words (e.g., CAT, DOG), whereas the other list contained nonsense words (e.g., ATC, ODG). A ninth-grade class of thirty students was randomly divided into two groups of fifteen students. One group was asked to memorize the list of meaningful words; the other group was asked to memorize the list of nonsense words. The number of words correctly recalled by each student was tabulated, and the resulting data are as follows:

    Source: Focus in High School Mathematics Reasoning and Sense Making (NCTM 2009)

    Uses: Construct comparative box plots; and introduce the randomization test.

     

    19. Migraines

    Description: Time passed to get relief from a migraine headache for two different medications.

    Source: Navigating Through Data Analysis in Grades 6–8 (NCTM, 2003)

    Uses: Construct comparative box plots and histograms; refer to the comparative plots and discuss if there is a difference between the two medications; and introduce randomization test.

     

    20. Min and Max Temperatures

    Description: Maximum and minimum temperatures (in F) on record at 289 major U.S. weather-observing stations in all 50 states, Puerto Rico, and Pacific Islands.

    Source: www.ncdc.noaa.gov/oa/climate/online/ccd/

    Uses: Construct a histogram or box plot; identifying outliers; determine the minimum, maximum and the difference between min and max temperatures.

     

    21. Nickel Weights

    Description: Height to nearest hundredth of a gram, of a sample of 100 new nickels.

    Uses: Construct a histogram or box plot; identify a mound shaped distribution; and estimate centers and spread.

     

    22. Non-Normal Distribution

    Description: Random set of numbers generated from a non-normal distribution.

    Uses: Construct a box plot or histogram; identify outliers; estimate centers and spread; and identify skewed distributions.

     

    23. Number of Marriages

    Description: Marriage rate per 1,000 people for 50 U.S. states in 2004. The District of Columbia had a rate of 4.5.

    Source: Division of Vital Statistics, National Center of Health Statistics www.cdc.gov/nchs/data/nvss/marriage90_04.pdf

    Uses: Construct a box plot or histogram; identifying outliers (e.g., examine the effect of Nevada); estimate centers and spread; and identify skewed distributions.

     

    24. Number of Video Games

    Description: Number of video games available on 43 selected platforms.

    Source: www.mobygames.com/moby_stats

    Uses: Construct a box plot or histogram; identify outliers (e.g., examine the effect of Windows and DOS); estimate centers and spread; and identifying skewed distributions.

     

    25. Old Faithful

    Description: Each column is a set of consecutive eruptions wait times in minutes for the Old Faithful geyser in Yellowstone National Park collected in 1985.

    Source: Focus in High School Mathematics Reasoning and Sense Making in Statistics and Probability (NCTM, 2009)

    Uses: Construct histograms, box plots, and time series plot (or observation number on x-axis and duration time on y-axis).

     

    26. PSU Women Heights

    Description: Heights (in inches) of 123 women in a statistics class at Penn State University in the 1970s.

    Source: Joiner, Brian L. “Living Histograms.” International Statistical Review 3 (1975): 339–340.

    Uses: Construct a histogram or box plot; identify a mound shaped distribution; estimate centers and spread; and use the histogram to investigate percent of data within 1, 2, or 3 standard deviations from the mean.

     

    27. Random Rectangles

    Description: Area of 100 rectangles placed randomly on a sheet of paper.

    Source: Navigating Through Data Analysis in Grades 9–12 (NCTM, 2003)

    Uses: Construct a histograms; estimate center and spread; and conduct trials with Distribution of Sample Custom App.

     

    28. Ratings of Movie Showings

    Description: How a student at the University of Alabama, Huntsville, rated the projection quality of nearby movie theaters. For each showing, a point was deducted for such things as misalignment, misframing, or an audio problem. He visited one theatre in Huntsville 92 times in the first five-and-a-half years it was open. The number of points deduced per showing is given below.

    Source: home.hiwaay.net/~criswell/theatre/generated_subpages/ratings_table/ratings_table.html

    Uses: Construct a box plot or histogram; identify outliers (e.g., examine the effect of the two 12s); estimate centers and spread; and identify skewed distributions.

     

    29. Roller Coasters

    Description: Greatest drop in feet of 55 major roller coasters in the U.S.

    Source: Navigating Through Data Analysis in Grades 6–8 (NCTM, 2003)

    Uses: Construct a box plot or histogram; identify outliers; estimate centers and spread; determine effect of outliers on the mean; and identify skewed distributions.

     

    30. Study Time

    Description: Number of hours 36 members of a high school softball team reported that they studied in a typical week.

    Uses: Construct a box plot or histogram; identify outliers; estimate centers and spread; and investigate the effect an outlier has on measures of center and spread.

     

    31. Sunshine for All Months

    Description: Average percent of sunshine of the maximum possible through 2002 for 174 selected cities around the United States.

    Source: www.ncdc.noaa.gov/oa/climate/online/ccd/avgsum.html

    Uses: Construct stacked box plots or histograms to compare the shape, center, and spread of the distributions by month.

     

    32. TV Watching

    Description: Number of hours watching TV for one week as reported by a group of seventh graders.

    Source: Navigating Through Data Analysis in Grades 6–8 (NCTM, 2003)

    Uses: Construct a box plot or histogram; identify outliers; and estimate centers and spread.

     

    33. U.S. Census 2000 & 2010

    Description: The spreadsheet contains data about apportionment populations and political representation in the U.S. House of Representatives for the 50 states from the censuses conducted in 2000 and 2010.

    Sources: http://2010.census.gov/2010census/data/ and www.census.gov/population/apportionment/data/2010_apportionment_results.html

    Uses: Make comparisons across population and political representation data for 2000 and 2010; and investigate change over time (e.g., Which states gained members in the House of Representatives and which states lost members? Are there any regional patterns of change? Is there consistent representation across the states’ varied populations?).

     

    34. Vertical Jumps

    Description: Vertical jump height (in inches) of 27 basketball players in an NBA draft.

    Uses: Construct a box plot or histogram; identify outliers; estimate centers and spread; and use the histogram to investigate the percent of data within 1, 2, or 3 standard deviations from the mean.

     

    35. Walking Speeds

    Description: These data show the mean times (in seconds) to walk 60 feet in various cities of the world.

    Source: www.britishcouncil.org/paceoflife.pdf

    Use: Construct a box plot or histogram; identify outliers; and estimate centers and spread.

     

    Back to Core Math Tools Homepage