Until recently, much of the research on students’ understanding of statistical ideas has focused on their conceptions of center (e.g., Mokros and Russell 1995) or on what constitutes a good sample (e.g., Jacobs 1999). However, in the past decade research on students’ statistical thinking has broadened and increasing interest has been paid to how students think about variability in data or variability in distributions of data. Within the field of statistics, variability arises everywhere. Data vary, samples vary, and distributions vary. Furthermore, variation occurs both within samples and distributions as well as across samples and distributions. A large part of statistical analysis often involves parsing out the relative contributions and locations of sources of variation in data, in samples, and in distributions.
Among the questions that researchers have investigated in considering student thinking about variability are:
- Do students acknowledge variability? If so, how do they describe it or talk about it?
- Do they recognize potential sources of variability?
- Are students’ conceptions of variability influenced by context? What are the different ways that students conceptualize variability?
- Will they attempt to control aspects of an experiment to minimize variability?
- Are developmental trajectories found in students’ understanding of variability?
A number of studies have begun to provide evidence for developmental trajectories in students’ thinking about variability. Reading and Shaughnessy (2004) used the results from students’ work on several statistical tasks to hypothesize both a description hierarchy and a causation hierarchy for students’ thinking about variability. In these hierarchies lower level student responses might be concerned only with outliers or only with middles, whereas higher level responses might begin to mention both middles and extremes in data. At an even higher level a student response might discuss the connections between middles in data and the variability of data dispersed around a middle, or might even point out deviations of data from some fixed value, such as the mean or median. A closer look at some students’ responses on several research tasks will help further clarify the development of students’ thinking about variability.
Variability in Data Collected over Time
Data sets tell stories, and the heart of any statistical story is usually contained in the variability in the data. When analyzing data, the role of a student or a statistician is to be a “data detective,” to uncover the stories that are hidden in the data. From a data-detective point of view, important signals in the variability are noted. Such signals are particularly evident in data that are collected over time, such as the data on annual consumption of fruit juice collected by the U.S. Department of Agriculture, depicted in figure 1. When asked what the overall pattern in the data was, and why the data might be varying, middle and secondary students’ responses went from no idea: “This is just crazy, I have no idea why it would do that, it looks totally random!” to reasons that had some substance but were not applicable to the type of data: “It’s going up because the population is growing, and so we’re drinking more,” to intricate analyses and detail: “The general trend is a line upward. People are drinking more fruit juice per person. Maybe they are drinking less milk, so more fruit juice. Also, there are years when things go way up or down—maybe there were new products introduced in the early 1980s to make it go up, or a bad frost around 1990 to make it go down.” These excerpts show a spectrum of student reasoning on this task that ranges from no recognition of potential sources of variation in the data—just thinking it’s random variation, to a naïve conception of a cause for an increasing trend (population growth), to a rather deep analysis of why the per capita increase may be occurring with some potential viable reasons for the ups and downs in the consumption over time (Shaughnessy 2007). We need to provide more opportunities for students to be “data detectives” to give them a chance to develop and share their reasoning about sources of variation in data over time (Shaughnessy and Pfannkuch 2002).
Variability in Data Obtained from Repeated Sampling
Researchers have begun to build a conceptual model to describe the progression of students’ reasoning about variation on repeated sampling tasks, similar to the candy-sampling task below. In such sampling tasks, researchers have been interested in what students will predict for the outcomes of an empirical sampling distribution when repeated samples are drawn from either a known or unknown “population.”
Candy Sampling Task. Custom Candies assembles Ten Sweets Bags for the holiday season. Their machines fill bags with 10 candies by scooping up exactly 10 each time from a giant mixture that is constantly adjusted to have 7000 red (70% red) and 3000 green (30% green). What would you predict for the number of reds that would occur in 1 bag of 10 candies? What would you predict for the numbers of reds that would occur in a collection of 50 bags of 10 candies each?
Students’ reasoning on such repeated sampling tasks indicate a progression from ikonic, to additive, to proportional, and finally to distributional types of reasoning. According to Kelly and Watson (2002), some students, particularly younger students, reason ikonically, because they refer to physical circumstances or personal stories when they predict sample outcomes. Ikonic reasoners say such things as, “They might get more reds because their hand could find them,” or “Maybe they are lucky and will get all reds,” without any reference to the actual contents or the proportion of colors in the mixture. Both Watson and her colleagues, and Shaughnessy and his colleagues have found numerous examples of the other types of student reasoning about variability in repeated sampling contexts (Watson and Kelly 2002; Shaughnessy, Ciancetta, and Canada 2004). In research with grade 6–12 students, Shaughnessy and his colleagues (2004) found that additive reasoners predicted that a lot of red candies would be in the bags of 10 candies “because there are more reds in the mixture.” Additive reasoners never referred to the actual percent of reds, or proportion of reds, just that “there are more,” thus, theirs is a frequency-based argument. In contrast, proportional reasoners tend to predict “around 7 reds in 10 candies” for this sampling problem, defending their predictions with such statements as, “There are 70% red,” or “I’d expect 7 red out of the 10 candies, because that is the ratio in the mixture.” Proportional reasoners explicitly discuss the connections between sample proportions and population proportions. Distributional reasoners go one step further and combine both centers and spreads in their reasoning about such sampling problems. They make comparisons between the sample proportions and the population proportion, and they also explicitly mention variation about the expected value. In the candy sampling problem above, distributional reasoners say such things as, “I’d expect a range of numbers of reds in the bags, clustering around 7 reds. Some bags will have a little less, say 5 or 6 reds; others will have 8 or 9 or even 10.”
Variability—variation in data, in samples, and in distributions of data —is a fundamental idea in statistics and in decision making from data. When students start to attend to variability, they may also begin to notice shapes of various distributions (i.e., flat, humped, double humped, and so forth). However, the research shows that attention to just one of the aspects of a distribution (e.g., center, shape, or spread) does not necessarily guarantee that students will pay attention to the other aspects as they investigate data. Students need some time to be able to integrate the various aspects of a distribution —center, shape, and spread —into a coherent whole as they grow as “data detectives.” The research suggests that students’ understanding of variability can develop over time if they are given consistent opportunities to explore data and if they are explicitly asked to attend to variability. We hear a very clear message from the research on students’ understanding of variability: In the world of statistics and data analysis, there is more to life than centers!
This research brief is based on J. Michael Shaughnessy’s chapter “Research on Statistics Learning and Reasoning” in Second Handbook of Research on Mathematics Teaching and Learning, edited by Frank K. Lester Jr.
By Michael Shaughnessy
Judith Reed, Series Editor
Jacobs, Victoria R. “How Do Students Think about Statistical Sampling before Instruction?” Mathematics Teaching in the Middle School 5 (December 1999): 240?46, 263.
Kelly, Ben A., and Jane M. Watson. “Variation in a Chance Sampling Setting: The Lollies Task.” In Mathematics Education in the South Pacific: Proceedings of the 25th Annual Conference of the Mathematics Education Research Group of Australasia, Auckland, Vol. 2, edited by Bill Barton, Kathryn C. Irvin, Maxine Pfannkuch, and Michael O. J. Thomas, pp. 366?73. Sydney, Australia: MERGA, 2002.
Mokros, Jan, and Susan Jo Russell. “Children’s Concepts of Average and Representativeness.” Journal for Research in Mathematics Education 26 (January 1995): 20–39.
Reading, Chris, and J. Michael Shaughnessy. “Reasoning about Variation.” In The Challenge of Developing Statistical Literacy, Reasoning and Thinking, edited by Dani Ben-Zvi and Joan Garfield, pp. 201?26. Dordrecht, The Netherlands: Kluwer Academic Publishers, 2004.
Shaughnessy, J. Michael. “Research on Statistics Learning and Reasoning.” In Second Handbook of Research on Mathematics Teaching and Learning, edited by Frank K. Lester Jr., pp. 957–1009. Reston, Va.: National Council of Teachers of Mathematics, 2007.
Shaughnessy, J. Michael, Matt Ciancetta, and Dan Canada. “Types of Student Reasoning on Sampling Tasks.” In Proceedings of the 28th Meeting of the International Group for Psychology and Mathematics Education, Vol. 4, edited by Marit Johnsen Høines and Anne Berit Fuglestad, pp. 177?84. Bergen, Norway: Bergen University College Press, 2004.
Shaughnessy, J. Michael, and Maxine Pfannkuch. “How Faithful Is Old Faithful? Statistical Thinking: A Story of Variation and Prediction.” Mathematics Teacher 95 (April 2002): 252–59.
Watson, Jane M., and Ben A. Kelly. “Can Grade 3 Students Learn about Variation?” In Proceedings of the Sixth International Conference on Teaching Statistics: Developing a Statistically Literate Society, Cape Town, South Africa, edited by Brian Phillips. CD-ROM. Voorburg, The Netherlands: International Statistics Institute, 2002. [Citation on p. 5 (2 places)]
Watson, Jane M., and Jonathan B. Moritz. “Developing Concepts of Sampling.” Journal for Research in Mathematics Education 31 (January 2000): 44–70. [Citation on p. ?]