Share
Pin it!
Google Plus

Bivariate Quantitative Data Sets

Univariate > Categorical    Bivariate > Categorical   
Univariate > Quantitative  Bivariate > Quantitative  Multivariate > Quantitative  

Bivariate > Quantitative

1. 100-meter Freestyle 

Description: Winning times (in seconds) for women and men in the Olympic 100-meter freestyle swim for games since 1912.

Source: The World Almanac and Book of Facts 2003. Mahwah, N.J.: World Almanac Education Group, Inc., 2003; www.olympics.com

Uses: Find a regression line to summarize the linear relationship between two variables; interpret the slope and y-intercept of the regression line; identify a potential outlier (12, 82.2); draw a movable line; and introduce the concepts of residuals and residual plots.

 

2. AIDS Fatalities in the U.S. 

Description: These data give the number of deaths from AIDS in the United States for the years 1981 through 1994.

Source: United States Centers for Disease Control and Prevention, HIV/AIDS Surveillance Report, Year-End Edition 1995

Uses: Compare linear model with other models using residual plots and correlation coefficient.

 

3. All Manatee Mortalities 

Description: Number of manatee mortalities from various causes starting from 1974 through 2004.

Source: www.savethemanatee.org/mortalitychart.htm 

Uses: Finding a regression line to summarize the linear relationship between two variables; interpreting slope and y-intercept of the regression line; drawing a movable line; and introduce the concepts of residuals and residual plots.

 

4. Altitude/Atmospheric Pressure 

Description: These data relate atmospheric pressure (in pounds per square inch, psi) to airplane altitude (in miles).

Uses: Introduce nonlinear models (e.g., exponential); explore the concept of residuals and residual plots; and compute the correlation coefficient.

  

5. Animal Brain and Body Weights 

Description: These data give brain weights (in grams) and body weights (in kilograms) from samples of a variety of species of mammals.

Uses: Introduce nonlinear models (e.g., exponential).

 

6. Animal Gestation Time and Longevity 

Description: These data show the average gestation time in days and average longevity (life span) in years of animals.

Source: World Almanac and Book of Facts 2012. Mahwah, N.J.: World Almanac, 2012.

Uses: Construct a scatterplot; make predictions about and compute the correlation and strength of association; find the least-squares regression equation; and interpret the meaning of the slope and y-intercept of the regression equation in terms of the context.

 

7. Area Codes 

Description: Population and number of area codes for each state in 2000.

Source: Navigating Through Data Analysis in Grades 6–8 (NCTM, 2003)

Uses: Plot a scatterplot; determine a least squares regression line for predicting number of area codes from population; interpret the meaning of the slope and y-intercept of the regression equation; compute and plot residuals and determine points with the largest/smallest residuals; and estimate and compute the correlation.

 

8. Baby Boys Walking 

Description: To see whether a program of special stepping and foot-placing exercises for 12 minutes each day could speed up the process of babies learning to walk, 12 baby boys were randomly assigned to the special exercise group or to the exercise control group. For the control group, parents were told to make sure their infant sons exercised at least 12 minutes per day, but they were given no special exercises to use and no other instructions about exercise. The data contains the age, in months, when each baby first walked without help.

Source: Zelazo, Phillip R., Nancy Ann Zelazo, and Sarah Kolb. “Walking in the Newborn,” Science 176 (1972): 314–5

Uses: Construct comparative box plots; and introduce the randomization test.

 

9. Bacteria Growth I 

Description: Bacteria count over a period of 20 hours.

Uses: Find the exponential growth model.

 

10. Bacteria Growth II 

Description: Bacteria count over a period of 20 hours.

Uses: Find the exponential growth model.

 

11. Baseball Averages 

Description: Season batting averages and the batting average in the World Series for the same year of a selection of players.

Uses: Draw the y=x line and look at the number of ordered pairs above and below this line; and find the change in batting averages and analyze the change using histograms and boxplots.

 

12. Blood Lead Levels 

Description: Blood lead levels for children exposed in a lead-related industry and for comparison students.

Source: Morton, David E., et al. “Lead Absorption in Children of Employees in a Lead-related Industry.” American Journal of Epidemiology 155 (1982): 549–555.

Uses: Construct comparative box plots; and conduct a randomization test to decide if there is a difference in blood lead levels between the two groups.

 

13. Braking Road Test 

Description: Road tests under various conditions are likely to produce data like these that show speed (in mph), distance until braking (in feet), and distance until stopping (in feet).

Uses: Find a regression line to summarize the linear relationship between two variables (speed and distance until brake); interpret slope and y-intercept of the regression line; investigate nonlinear models for (speed, distance until stop); and investigate the composite of the two functions.

14. Canines 

Description: These data show the typical adult weight (in kg) and the maximum longevity (in years) for all 28 species of canines (dogs) listed in a large database of animals.

Source: AnAge, Animal Ageing and Longevity Database, genomics.senescence.info/species

Uses: Construct a scatterplot; and determine if there is an association between adult weight and longevity in canines. If so, find the best fitting model for the relationship.

 

15. Car Skid Marks and Speeds 

Description: When police investigate the scene of an automobile accident, they look for skid marks and use the length of those marks to estimate the speed at which the car was traveling. The results of experiments with a test car, giving skid mark length (in feet) and speed (in miles per hour), are shown here.

Uses: Introduce the power regression model.

 

16. Chicago White Sox 

Description: 1919 season batting averages and 1919 World Series batting averages for Chicago White Sox who had 10 or more at bats in the World Series game and if the player was accused of throwing the series.

Source: www.baseball-reference.com/postseason/1919_WS.shtml 

Uses: Compute change in batting average and compare to whether or not player was accused of throwing Series; determine and interpret the regression line equation to predict the World Series batting average from the season batting average; look for relationships among the accused and non-accused players in relation to the predicted values; and compare the influence of excluding a player’s data from the data set.

 

17. Cholesterol Levels 

Description: Cholesterol levels both before and after a dietary change from a “standard” American diet to a vegetarian diet.

Source: Navigating Through Data Analysis in Grades 9–12 (NCTM, 2003)

Uses: Compute change in cholesterol levels; construct histogram and box plot of change in cholesterol levels; and determine if overall levels are greater than zero.

 

18. Classroom Temperatures 

Description: Temperature of 4 different classrooms at different time intervals collected for 5 school days.

Source: Navigating Through Data Analysis in Grades 6–8 (NCTM, 2003)

Uses: Construct time series plot for each room and compare plots to determine if rooms vary in temperature.

 

19. Cold Surgery 

Description: Cold surgery data, where body temperature is in degrees F, brain activity is a percent of normal, and safe operating time is in minutes.

Source: Vergano, Dan. “Surgery’s Chilling Future Will Put Fragile Lives on Ice.” USA Today, August 1, 2001, p. 8D. His data source was McCullough, Jock, et. al. Annals of Thoracic Surgery. 

Uses: Plot a scatterplot and exponential regression equation to model the relationship between (Body Temperature, Brain Activity) and (Body Temperature, Safe Operating Time); and use regression equations or a table of predicted values table to make predictions.

 

20. Compact Cars 

Description: Curb weight (in 100 lbs) and Highway mpg for a list of compact cars.

Source: www.edmunds.com 

Uses: Compare linear model to nonlinear models; estimate and compute the correlation; and introduce concepts of residuals and residual plots

 

21. Crawling Age 

Description: These data give the results of a study of 414 babies that attempted to determine whether babies bundled in warm clothing learn to crawl later than babies dressed more lightly. The average daily outside temperature when the babies were six months old and the average age in weeks at which those babies began to crawl are reported.

Source: Benson, Janette. “Season of Birth and Onset of Locomotion.” Infant Behavior and Development 16 (1993): 69–81.

Uses: Plot a scatterplot; interpret possible trends in crawling age for warmer/colder months; determine a least squares regression line for predicting age from temperature; interpret the meaning of the slope and y-intercept of the regression equation; compute and plot residuals and determine points with the largest/smallest residuals; and estimate and compute the correlation.

 

22. Cricket Chirps 

Description: Temperatures at various chirp rates of the snowy tree cricket. The snowy tree cricket is known as the “thermometer cricket” because it is possible to count its chirping rate and to estimate the temperature.

Uses: Find a regression line to summarize the linear relationship between two variables; and interpret the slope and y-intercept of the regression line.

 

23. Cumulative AIDS Deaths 

Description: Total number of deaths from AIDS in United States from 1981–1994.

Source: United States Centers for Disease Control and Prevention, HIV/AIDS Surveillance Report, Year-End Edition 1995.

Uses: Find a nonlinear model using logs; estimate and compute the correlation; examine residual plots; and use log transformation to determine whether the number of deaths from AIDS appear to be increasing exponentially from 1981 through 1989 and from 1981 through 1994.

 

24. Darwin and Genetics 

Description: Height in inches of plants grown in 15 pairs for a fixed period of time. One member of each pair was cross-fertilized and the other was self-fertilized.

Source: Navigating Through Data Analysis in Grades 9–12 (NCTM, 2003)

Uses: Compute the differences in plant heights; and construct a histogram and box plot of differences to determine how different from zero.

 

25. Dow Jones Averages 

Description: Dow Jones Industrial Average low every 5 years since 1965.

Source: www.analyzeindices.com/dow-jones-history.shtml 

Uses: Compare linear and exponential (or other nonlinear) models and their fit with these data; use an exponential model to make predictions; and compute correlation and residual plots.

 

26. DSM Monthly Temps 

Description: Mean monthly temperatures in degrees Fahrenheit in Des Moines, Iowa 2010

Source: www.crh.noaa.gov/images/dmx/2010%20DSM%20MonthlyTables.pdf 

Uses: Construct a time series plot (change in temperature over one year); and construct a box plot or histogram of temperatures.

 

27. Federal Minimum Wage 

Description: Federal minimum wage from 1955 to 2007.

Source: U.S. Department of Labor

Uses: Construct a time series plot (change in minimum wage over time).

 

28. Fertilizing Cost 

Description: These data give the cost (in dollars) for fertilizing circular fields with a variety of different radii (in meters).

Uses: Compare linear to nonlinear models; and compute the correlation and residual plots.

 

29. Flights to/from Chicago 

Description: Distance (in miles) and time (in minutes) for a sample of United Airlines nonstop flights to and from Chicago, Illinois.

Source: www.uatimetable.com 

Uses: Find a regression line to summarize the linear relationship between two variables; interpret the slope and y-intercept of the regression line; and compare east and westbound times.

 

30. Free Fall Speed 

Description: The New River Gorge Bridge in Fayetteville, West Virginia, is 876 feet above the water, and the most daring jumpers at the annual Bridge Day BASE jumping event fall for about 7 seconds and 650 feet until opening their parachutes. A news story about the event included these data relating time in free fall (in seconds) and approximate speed (in miles per hour) of typical divers.

Source: Sarlow, Eli. “A Heightened Chance of Death.” The Washington Post, Sunday, November 4, 2007, p. A15

Uses: Use power model to show relationship between speed and time.

31. Hamburger Nutrition I 

Description: Total calories, fat (in grams), and carbohydrates (in grams) for selected hamburgers with cheese.

Sources: www.mcdonalds.com/us/en/food/food_quality/nutrition_choices.html nutrition.mcdonalds.com/getnutrition/nutritionfacts.pdf, 11-14-2011,

www.bk.com/en/us/menu-nutrition/full-menu.html,

 www.wendys.com/food/NutritionLanding.jsp,

 www.wendys.com/food/pdf/us/nutrition.pdf,

http://sonicwww.s3.amazonaws.com/Content/pdfs/SonicNutritionGuide.pdf

www.jackinthebox.com/pdf/NutritionalBrochure2011.pdf 

Uses: Find a regression line to summarize the linear relationship between two variables; interpret the meaning of the slope and y-intercept of the regression line; interpret the correlation coefficient; and construct a matrix plot.

 

32. Hamburger Nutrition II 

Description: Calories, fat (in grams) and protein (in grams) for hamburgers.

Sources: www.wendys.com, www.mcdonalds.com, www.burgerking.com, www.hardees.com, www.carlsjr.com (December 2006).

Uses: Find a regression line to summarize the linear relationship between two variables; interpret the meaning of the slope and y-intercept of the regression line; interpret the correlation coefficient; construct residual plots; compare correlation and residual plots of linear versus nonlinear models; and construct a matrix plot.

 

33. Hamburger Nutrition III 

Description: These data relate fat (in grams) to calories and sodium (in mg) in hamburgers sold by a variety of national fast-food chains.

Sources: www.wendys.com, www.mcdonalds.com, www.burgerking.com, www.hardees.com, www.carlsjr.com (December 2006).

Uses: Find a regression line to summarize the linear relationship between two variables; interpret the meaning of the slope and y-intercept of the regression line; interpret the correlation coefficient; and construct a matrix plot.

 

34. Health and Nutrition 

Description: Average daily food supply (in calories), life expectancy, and infant mortality rates (in deaths per 1,000 births) from a sample of Western Hemisphere countries.

Source: World Health Organization Global Health Observatory Data Repository; www.populstat.info/Americas 

Uses: Find a regression line to summarize the linear relationship between two variables; interpret the meaning of the slope and y-intercept of the regression line; interpret the correlation coefficient; construct residual plots; and compare correlation and residual plots of linear model versus nonlinear model.

 

35. Hippopotamus Population Sizes 

Description: Hippopotamus population size for selected years from 1970–1983

Source: Hamilton, Lawrence C. Regression with Graphics, p. 179. Pacific Grove, Calif.: Duxbury, 1991.

Uses: Find a regression line to summarize the linear relationship between two variables; interpret the meaning of the slope and y-intercept of the regression line; interpret the correlation coefficient; and identify a possible outlier (1975, 2342) and test whether it strongly influences the equation of the regression line or the correlation.

 

36. Horse Stride 

Description: Height (in hands) and hip angle (in degrees).

Source: AP Statistics discussion list, posted on February 3, 2006, www.mathforum.org/kb/forum.jspa?forumID=67 

Uses: Find a regression line to summarize the linear relationship between two variables; interpret the meaning of the slope and y-intercept of the regression line; and identify an outlier (e.g., consider Gaspe) and test whether it is an influential point (i.e., strongly influences the equation of the regression line or the correlation).

 

37. Hybrid Electric Vehicles 

Description: Since 1999, there has been a growing trend in the sales of hybrid electric vehicles. These data show the number of hybrid electric vehicles sold in each of the first eight years after 1999.

Source: www.afdc.energy.gov/afdc/data/vehicles.html#afv_hev 

Uses: Construct a scatterplot or time series plot; and use log transformations and residual plots to reason about an appropriate regression model.

 

38. Instructor Attributes  

Description: A group of 49 volunteer college students were randomly assigned to two treatments. Twenty-five students were told that they would view a videotape of a teacher who other students thought was “charismatic”: lively, stimulating, and encouraging. The remaining 24 students were told that the instructor they would view was thought to be “punitive”: not helpful or interested in students, and a hard grader. Then all students watched the same 20-minute lecture given by the same instructor. Following the lecture, subjects rated the lecturer. The students’ summary ratings are given here. Higher ratings are better.

Source: www.ruf.rice.edu/%7Elane/case_studies/instructor_reputation/index.html; their source: Annette Towler and R.L. Dipboye. “The effect of instructor reputation and need for cognition on student behavior”—poster presented at American Psychological Society conference, May 1998.

Uses: Construct comparative box plots; investigate measures of variability (IQR and standard deviation); estimate centers and spread; conduct a randomization test.

 

39. Leg and Stride Length 

Description: These data relate stride length (in cm) to leg length (in cm) for a sample of 20 people.

Uses: Find a regression line to summarize the linear relationship between two variables; interpret the meaning of the slope and y-intercept of the regression line; interpret the correlation coefficient; and analyze the residual plot.

 

40. Light Intensity 

Description: These data compare the distance (in meters) from a light source and illuminance (light per unit area, in luxe).

Uses: Find a nonlinear model (exponential and/or power) to summarize the relationship between two variables; interpret the meaning of the coefficients and exponents of the model; interpret the correlation coefficient; and analyze the residual plot.

 

41. Los Angeles Flight Altitude Data 

Description: Airplane altitude (in 1,000s ft) and temperature (in degrees F) data above Los Angeles.

Uses: Find a regression line to summarize the linear relationship between two variables; interpret the meaning of the slope and y-intercept of the regression line; interpret the correlation coefficient; analyze the residual plot; and compare a linear model with a nonlinear model.

 

42. Major U.S. Cities Populations 

Description: Major U.S. Cities 2000 and 2010 population (in 1,000s).

Uses: Find a regression line to summarize the linear relationship between two variables; interpret the meaning of the slope and y-intercept of the regression line; and use the regression model to make predictions.

 

43. Gestation and Life Span of Some Mammals 

Description: Gestation (in days) and Average longevity (in years).

Source: World Almanac and Book of Facts 2001. Mahwah, N.J.: World Almanac, 2001.

Uses: Find a regression line to summarize the linear relationship between two variables; interpret the meaning of the slope and y-intercept of the regression line; and identify an influential point (660, 35) and test how it influences the equation of the regression line or the correlation.

 

44. Manatee Watercraft Mortalities 

Description: Number of manatees killed in watercraft collisions near the Gulf Coast of Florida every year from 1985 through 2004.

Source: www.savethemanatee.org/mortalitychart.htm 

Uses: Find a regression line to summarize the linear relationship between two variables; interpret the meaning of the slope and y-intercept of the regression line; interpret the correlation coefficient; and identify possible outliers.

 

45. Marriage/Divorce Rates 

Description: These data give the marriage rates and divorce rates for the countries listed in the Statistical Abstract of the United States. Marriage and divorce rates are the number per 1,000 people aged 15–64.

Source: Statistical Abstract of the United States, 2006, Table 1320.

Uses: Find the correlation coefficient; and identify any influential points and test how they influence the correlation coefficient.

 

46. Median Income  

Description: Median incomes (in dollars) for men and women employed full-time outside the home from 1970 to 2009.

Source: www.census.gov/hhes/www/income/data/historical/people/index.html 

Uses: Find a regression line to summarize the linear relationship between two variables; interpret the meaning of the slope and y-intercept of the regression line; making predictions; and find points of intersection of two linear equations.

 

47. Men’s 100-meter Run 

Description: Olympic Winning times (in seconds) for Men’s 100-meter run for the years 1890 to 2008.

Source: The World Almanac and Book of Facts 2001. Mahwah, N.J.: World Almanac Education Group, Inc., 2001; www.olympics.com 

Uses: Find a regression line to summarize the linear relationship between two variables; interpret the meaning of the slope and y-intercept of the regression line; identify an outlier or influential point (6,12) and test how it influences the equation of the regression line and the correlation.

 

48. Movie Running Times 

Description: Running times and gross receipts for thirty of the top movies of 1997.

Source: Navigating Through Data Analysis in Grades 6–8 (NCTM, 2003)

Uses: Plot a scatterplot and compute correlation and determine if there is a relationship between gross receipts and running times of movies.

 

49. Mozart/Silence 

Description: In a science project, a student wanted to determine whether sixth graders did better when they took a math test in silence or when Mozart was being played. Twenty-six students were randomly divided into the two treatment groups. Part of the students’ results are in the table provided.

Uses: Construct comparative box plots; investigate measures of variability (IQR and standard deviation); estimate centers and spread; determine possible outliers; and conduct a randomization test.

50. Nonlinear Values I 

Description: Example of a nonlinear data set.

Use: Use a log-log transformation of the (x,y) to achieve a linear pattern.

 

51. Nonlinear Values II 

Description: Example of a nonlinear data set.

Use: Perform log-log transformations of data and use residual plots to decide whether the (x,y) patterns were generated by power functions, exponential functions, or a different model.

 

52. Nonlinear Values III 

Description: Example of a nonlinear data set.

Use: Perform log-log transformations of data and use residual plots to decide whether the (x,y) patterns were generated by power functions, exponential functions, or a different model.

 

53. Olympic 200 Meter Dash 

Description: Men’s and Women’s Olympic 200 meter dash times (secs) by year.

Source: Focus in High School Mathematics Reasoning and Sense Making in Statistics and Probability (NCTM, 2009)

Uses: Find a regression line to summarize the linear relationship between two variables; and interpret the meaning of the slope and y-intercept of the regression line.

 

54. Peak Cherry Tree Blooming 

Description: These data show the days after March 1 when the cherry trees hit peak bloom for the years beginning in 1980.

Source: National Park Service. www.nps.gov/cherry/

Uses: Construct a time series plot (or scatterplot with Year-1980 on the x-axis); recognize a cycle pattern in the data over time; construct a box plot or histogram of the Peak Day; identify outliers; estimate centers and spread; and identify skewed distributions.

 

55. Penny Stacking 

Description: Results from one class of students counting the number of pennies each can stack with their dominant hand and non-dominant hand.

Uses: Construct comparative box plots; investigate measures of variability (IQR and standard deviation); estimate centers and spread; identify possible outliers; and conduct a randomization test.

 

56. People, Congress, and Pizza 

Description: Population in 2000, number of representatives to congress, and number of pizza restaurants in 40 selected U.S. states.

Source: Navigating Through Data Analysis in Grades 6–8 (NCTM, 2003)

Uses: Find a regression line to summarize the linear relationship between two variables; interpret the meaning of the slope and y-intercept of the regression line; interpret the correlation coefficient; and identify a possible outlier and test whether it strongly influences the equation of the regression line or the correlation.

 

57. Planet Orbits (km) 

Description: Distance from the Sun (in millions of kilometers) and orbit time (in Earth years) for the major planets of our solar system.

Source: http://en.wikipedia.org/wiki/Distance_of_planets_to_the_Sun

Uses: Find a nonlinear model to summarize the relationship between two variables; draw residuals and residual plots; and compute the correlation.

 

58. Planet Orbits (mi) 

Description: Distance from the Sun (in millions of miles) and orbit time (in Earth years) for the major planets of our solar system.

Source: http://en.wikipedia.org/wiki/Distance_of_planets_to_the_Sun 

Uses: Find a nonlinear model to summarize the relationship between two variables; draw residuals and residual plots; and compute the correlation.

 

59. Plant Growth 

Description: Chrysanthemums with long stems are likely to have smaller flowers than chrysanthemums with shorter stems. An experiment was conducted at the University of Florida to compare growth inhibitors designed to reduce the length of the stems, and so increase the size of the flowers. Growth inhibitor A was given to 10 randomly selected plants and growth inhibitor B was given to the remaining 10 plants. The plants were grown under nearly identical conditions, except for the growth inhibitor used. The table gives the amount of growth during the subsequent 10 weeks.

Source: Watkins, Ann E., Richard L. Scheaffer, and George W. Cobb. Statistics in Action. Emeryville, Calif.: Key Curriculum Press, 2004, p. 681.

Uses: Construct comparative box plots; investigating measures of variability (IQR and standard deviation); estimate center and spread; identify possible outliers; and conduct a randomization test.

 

60. Population by Decade 

Description: Population of U.S. in each census year (1790 to 2000).

Source: Navigating Through Data Analysis in Grades 6–8 (NCTM, 2003)

Uses: Plot a scatterplot and determine a non-linear model to show relationship between population and time.

 

61. Radioactive Isotope 

Description: These data show experimental measurements of decay over time for a radioactive chemical, given by amount left (in grams) at various times (in days).

Uses: Find a nonlinear model to summarize the relationship between two variables; draw residuals and residual plots; and compute the correlation.

 

62. Radioactive Waste Exposure 

Description: Data from a study of nine Oregon communities in the 1960s when nuclear power was relatively new. The study compared exposure to radioactive waste from a nuclear reactor in Hanford, Washington and the rate of deaths due to cancer in these communities. The cancer deaths are per 100,000 residents.

Source: Journal of Environmental Health, May-June 1965.

Uses: Find a nonlinear model to summarize the relationship between two variables; draw residuals and residual plots; compute the correlation; and investigate the impact of outliers (3.4, 130) and (2.6, 130) on the model and correlation

 

63. Ramp Height and Time 

Description: These data give the run times (in seconds) for a ball rolling down an adjustable ramp at various heights (in feet).

Uses: Find a nonlinear model to summarize the relationship between two variables; draw residuals and residual plots; and compute the correlation.

 

64. Ramp Time and Distance 

Description: These data give the time (in seconds) and distance (in meters) travelled for a ball rolling down a stationary ramp.

Uses: Find a nonlinear model to summarize the relationship between two variables; draw residuals and residual plots; and compute the correlation.

 

65. Riverdale Adventure Club Survey 

Description: Poll results from the Riverdale Adventure Club members when asked whether or not they would purchase a video of their jump for various prices.

Uses: Compare linear model and nonlinear model; draw residual plots; and compute the correlation.

66. Satellite Radio 

Description: These data show the number of Sirius satellite radio customers (in millions) at the end of each quarter year from the beginning of 2004 to the end of 2006.

Source: en.wikipedia.org/wiki/Sirius_Satellite_Radio

Uses: Find a nonlinear model to summarize the relationship between two variables; draw residuals and residual plots; and compute the correlation.

 

67. Scan Quality and File Size 

Description: These data show file sizes (in MB) for a document scanned at different resolutions (in dots per inch, DPI).

Uses: Introduce the concepts of residuals and residual plots. Perform log transformations, create a scatterplot, semi-log plots, and log-log plots of the data and then use residual plots to determine the best fitting model.

 

68. Seal Sizes 

Description: These data give the average length (in feet) and weight (in pounds) of various types of seals.

Source: Grzimek’s Animal Life Encyclopedia, Mammals, vol. 4. New York: McGraw-Hill, 1990.

Uses: Find a nonlinear model to summarize the relationship between two variables; draw residuals and residual plots; and compute the correlation.

 

69. Selected Fast Food 

Description: Grams of fat and amount of calories in selected fast food items

Uses: Find a regression line to summarize the linear relationship between two variables; interpret the meaning of the slope and y-intercept of the regression line; and identify a potential outlier (7,360) and test how it influences the equation of the regression line and the correlation.

 

70. Smell Test 

Description: Researchers at the Smell & Taste Foundation randomly assigned volunteers to wear an unscented mask or to wear a floral-scented mask. The subjects then completed two pencil-and-paper mazes. The time (in seconds) to complete the two mazes was recorded. Data were recorded separately for smokers and nonsmokers because smoking affects the sense of smell. The result for 13 nonsmokers on their first attempt are given.

Uses: Construct comparative box plots; investigate measures of variability (IQR and standard deviation); estimate center and spread; identify possible outliers; and conduct a randomization test.

 

71. Song Length and File Size 

Description: These data show file sizes (in MB) for songs of different lengths (in seconds).

Uses: Construct a scatterplot; find a linear regression equation to model the data; and introduce the idea of residuals and residual plots.

 

72. Sphere Radius and Surface Area 

Description: These data give the radius (in centimeters) and surface area (in square centimeters) for a variety of spheres.

Uses: Find a nonlinear model to summarize the relationship between two variables; draw residuals and residual plots; and compute the correlation.

 

73. Sphere Surface Area and Volume 

Description: These data give the surface area (in square centimeters) and volume (in cubic centimeters) for a variety of spheres.

Uses: Find a nonlinear model to summarize the relationship between two variables; draw residuals and residual plots; and compute the correlation.

 

74. Stopping Distances 

Description: Braking distances at 30 mph and 60 mph for 10 cars of two different models.

Source: Navigating Through Data Analysis in Grades 6–8 (NCTM, 2003)

Uses: Construct comparative box plots; and introduce randomization test.

 

75. Super Bowl Ad 

Description: Cost of a 30-second Super Bowl ad (in $1K) and winning team players’ share.

Source: Associated Press 2/2/2012

Uses: Construct a scatterplot or time series plot of cost; find a regression equation to summarize the relationship between the two variables; and make predictions.

 

76. Surgery Time and Cost 

Description: Surgery time (in minutes) and cost (in $).

Uses: Find a regression line to summarize the linear relationship between two variables; interpret the meaning of the slope and y-intercept of the regression line; and make predictions.

 

77. Taking Chances 

Description: In one popular game, a fair die is rolled to find out whether you win a prize. Rules of the game are:

• You win a $4 prize if the top face of the die is a 4.

• You donate $1 to the school special project fund if the top face of the die is 1, 2, 3, 5, or 6.

The cumulative profit (in dollars) for a number of trials is given.

Uses: Compare a linear model and a nonlinear model; draw residual plots; and compute the correlation.

 

78. Tree Age 

Description: Diameter (cm) and age (years) of trees.

Source: Samuels, Myra L., and Jeffrey A. Witmer. Statistics for the Life Sciences, 3rd ed., 2003, pp. 575–576. Their source: Chambers, Jeffrey Q., Niro Higuchi, and Joshua P. Schimel. “Ancient Trees in Amazonia,” Nature 391 (1998): 135–136.

Uses: Find a regression line to summarize the linear relationship between two variables; interpret the meaning of the slope and y-intercept of the regression line; compute the correlation; draw residual plots; determine the effect of outliers and influential points on regression line and correlation.

 

79. U. S. Census 

Description: These data show United States population (in millions) at various times since 1900 (in years).

Uses: Compare a linear model and a nonlinear model; draw residual plots; compute correlation; and make predictions using different models.

 

80. U.S. Public Debt 

Description: These data give the cumulative public debt (in billions of dollars) of the United States government at five-year intervals since 1970.

Source: U.S. Department of the Treasury, The Public Debt Online

Uses: Find a nonlinear model to summarize the relationship between two variables; draw residuals and residual plots; compute the correlation; and make predictions.

 

81. Voters in U. S. Elections 

Description: Number of votes cast in a sample of U.S. Presidential elections between 1840 and 2008.

Source: en.wikipedia.org

Uses: Finding a nonlinear model to summarize the relationship between two variables; residuals and residual plots; correlation; making predictions.

 

82. Women’s 100-meter Run 

Description: Women began running 100-meter Olympic races in 1928. These data contain the winning times (in seconds) for each of the years through 2008.

Source: The World Almanac and Book of Facts 2001. Mahwah, N.J.: World Almanac Education Group, Inc., 2001; www.olympics.com

Uses: Find a regression line to summarize the linear relationship between two variables; interpret the slope and y-intercept of the regression line; draw a movable line; introduce the concepts of residuals and residual plots; and examine trends over time.

 

83. World Population 

Description: These data give the world population (in millions) for various years since 1650.

Uses: Find a nonlinear model to summarize the relationship between two variables; plot residuals and residual plots; compute the correlation; and make predictions.

 

Back to Core Math Tools Homepage  

 

Having trouble running our Java apps? Get help here.

Your feedback is important! Comments or concerns regarding the content of this page may be sent to nctm@nctm.org. Thank you.