2016
One Hundred Years of Research:
Prudent Aspirations
Gene V Glass
The Oxford English Dictionary (Note 1), whose editors assiduously research first usage of its entries, lists the origin of the statistical term meta-analysis as Educational Researcher (Vol. 5, No. 10), the house organ of the American Educational Research Association (Figure 1). (Note 2) The method was introduced in the presidential address titled “Primary, Secondary and Meta-Analysis of Research” (Glass, 1976). In 1978, the first meta-analysis of an education issue—the effects of class size on achievement—appeared in the first issue of the new AERA journal, Educational Evaluation and Policy Analysis (Glass & Smith, 1979). Details of the method were fleshed out in a chapter in Volume 5, 1978, Review of Research in Education (Glass, 1978). Meta-analysis was further developed and illustrated in the 1981 publication of Meta-Analysis in Social Research (Glass, McGaw, & Smith, 1981), and reached its present-day, fully articulated form in 1985 with the publication of Hedges and Olkin’s (1985) Statistical Methods for Meta-Analysis.
The development of the method was greatly influenced by Meehl’s (1978) powerful critiques of statistical significance testing and weak theorizing in the “soft sciences” (Glass, 2015). For those who do not know the late Professor Meehl, he is said to be one of the very small number of psychologists whom philosophers of science actually read (Scriven, 1980). Meehl went on to write thus about meta-analysis many years later: “I think that meta-analysis is one of the most important methodological contributions of this generation of psychologists, arguably the most important, and have so stated to Professor Glass in our correspondence” (Meehl, 1990, p. 242).
Meta-analysis was not first received with grand hosannas in all quarters. Some called it “mega-silliness” (Eysenck, 1978). It was criticized for “mixing apples and oranges” (Gallo, 1978) or that it was “too lumpy” (Presby, 1978). These criticisms have been addressed almost countless times as they have appeared in the past 40 years (Glass, 2015). Suffice it to say here that the importance of controlling exogenous variables by randomization in comparative experiments remains an a posteriori question. Andrews, Guitar, and Howie (1980) produced credible and useful findings from a meta-analysis of stuttering treatments with pretest-posttest data only because an extensive history of research on stuttering ruled out the effects of exogenous influences. The a priori exclusion of data from a research synthesis risks discarding important findings. However, when findings from well-controlled studies differ appreciably from those of poorly controlled studies, one is always advised to heed the former—as when class size randomized experiments showed important differences when compared to naturally occurring differences from class sizes (Glass, Cahen, Smith, & Filby, 1982).
Meta-analysis has come to be the only widely used empirical research method developed almost exclusively by education researchers. A search with Google Scholar on the term meta-analysis produces more than 2.75 million hits. By contrast, a technique that might arguably be claimed to be an invention of education researchers, criterion referenced test, produces about 63,000 hits. The search for citations to the presidential address in which meta-analysis was introduced produces 4,400 references to the original Educational Researcher article. Shadish and Lecy (2015) applied bibliometric methods combined with interviews of the principal actors in the creation of meta-analysis to establish its roots in education and the social sciences. In spite of the prevalence of meta-analysis in education research, it is in the area of biomedical research that the method has found its greatest application. The proliferation of empirical research in biomedicine and pharmacology makes the corpus of education research look tiny by comparison. Scientists estimate that more than 1.5 million studies are published annually in more than 23,000 journals. Approximately 7% of these studies, or about 100,000, are human clinical trials addressed to questions of evidence-based treatments and outcomes (personal communication, Z. Tran, October 18, 2015). The prevalence of meta-analysis in medicine today has led some to believe that the method was developed originally by epidemiologists or bio-statisticians. But the investigations of science writer Morton Hunt (1997) place the genesis of meta-analysis squarely in the education research community.
In the 40 years since the introduction of meta-analysis, not only have research studies in education continued to appear apace, but meta-analyses themselves have continued to proliferate. In 2009, John A. C. Hattie published a widely cited book titled Visible Learning: A Synthesis of 800+ Meta-Analyses on Achievement. True to its title, the book includes the results of the integration of more than 800 meta-analyses, which themselves involve more than 50,000 studies, that address techniques for improving achievement by school students. By 2015, the 2009 publication of Visible Learning had been cited nearly 4,000 times. The contribution of meta-analysis to the furthering of education research can reasonably be assessed by addressing the import of Hattie’s (2009) findings in Visible Learning. Perhaps more importantly, Hattie’s summary carries with it an important lesson more generally about the progress of empirical research in education.
Figure 2 is adapted (Note 3) from Hattie (2009, Figure 2.2, p. 16).
The figure illustrates the average “effect size”(Note 4) (ES) across all meta-analyses for a named intervention. (Commas replace periods to indicate decimals, as is established practice outside the United States.) An example of the interpretation of an effect size will clarify the information reported in Figure 2. Hattie reports that the average effect size for “preschool programs” across all the available meta-analyses is 0.45 standard deviations. From this we conclude that the average child who participated in a preschool program scored on an achievement test about one-half standard deviation above the average child not participating. This 0.45 standard deviation average benefit translates into a ranking of 67th percentile for the preschool student among the students in the non-participating group, assuming normal distribution of each group and a hypothetical test of achievement.
Criticisms are frequently leveled against meta-analysis that, continuing with the present example, (a) we don’t know what the non-participating students in the control group were actually exposed to, (b) you can’t average across different types of test, (c) one cannot assume normal distributions in any group let alone both groups, and so forth. These criticisms have been addressed many times and found to be of little merit (Cooper, Hedges, & Valentine, 2009; Glass et al., 1981; Hedges & Olkin, 1985). Suffice it to say that the criticisms of meta-analysis are inherent in all empirical education research. “Control” or “traditional” comparison groups are usually vaguely described. The interventions being investigated are frequently specified only verbally and quite inadequately for anyone who would choose to follow suit. Averaging over different measures of achievement is only one order of magnitude more general than averaging over separate items on a test, where those items can be measuring different things for different examinees. So, I am arguing that Hattie’s (2009) summary of meta-analyses should be taken at face value for what general leads it can provide. It would be hard to argue based on Hattie’s summary that summer school programs on the average could be expected to outperform Reciprocal Teaching programs on measures of student achievement. However, Hattie’s findings and those of many other meta-analyses provide a deeper lesson about how education research might guide practice.
Average effect sizes, in one meta-analysis or across many, are part of the story. But the variability of effect sizes from study to study tells a different story. It is common—indeed, almost the rule—that a collection of studies will show wide variability in effect sizes and a modest average effect. In other words, 25 comparative experiments of computer-assisted instruction (CAI) versus teacher-led instruction (TLI) might show an average effect size of 0.37, but those effect sizes are likely to range from modestly negative to moderately positive. The only interpretation allowed is that CAI might confer a modest benefit versus TLI in general; however, in fact, that comparison can vary from TLI being somewhat superior to CAI to CAI being rather impressively superior to TLI. What one can expect from the comparison of CAI and TLI depends heavily on circumstances not specified, even in a meta-analysis of dozens of experiments Rolstad, Mahoney, and Glass (2008) calculated 30 effect sizes comparing developmental bilingual education (DBE) versus monolingual English education (MLE). The 30 values of ES showed an average of 0.18 and a standard deviation of 0.86. Note that the standard deviation is not the standard deviation among students taught in a particular way; it is the standard deviation among effects sizes resulting from 30 different experiments. Assuming normality in the distribution of effect sizes from experiments, one would estimate that 58% of the comparisons of DBE with MLE would show DBE superior and 48% would show MLE superior. In this author’s experience, such findings are not infrequent.
One of the most important lessons that meta-analysis has taught us is that the impact of interventions is significantly smaller than their variability. Chemists and engineers use a measure of repeatability of assays or quality assurance defined as the coefficient of variation: CV = Mean/Standard Deviation. A low value of CV indicates that the phenomenon being assessed cannot be relied on to yield consistent results. Such is the case with most education interventions. This fact stands in contrast to the findings of most meta-analyses in medicine, for example, where multiple clinical trials tend to produce consistent findings (Higgins & Thompson, 2002; for a dissenting view, see Ioannidis, 2010). Medical and pharmacological research enjoys many advantages over education research, not the least of which is greater control through random assignment to treatment groups.
Michael Scriven argued in his AERA presidential address in
1980 that education research by its nature will not establish reliable
generalizations:
“Now take the problem of controlling undisciplined behavior in a
classroom, or vandalism in a school. Here the solutions are timebound;
they’ll work this month, maybe, but not necessarily next
year when the street-smarts catch up, the population changes, the
present media role-models are replaced by others. And even this
month, they quite possibly won’t work in a neighboring district,
even in another school. . . . We can’t specify—even in black box or
macro-language—what makes the difference: no statistical
generalizations at all hold up across situations. This is a strongly
stochastic system. In a weakly stochastic system, some do. To modify
an example of Gene Glass’, from his perspective of a dozen metaanalyses
of complete research literatures—we may find that 2/3 of
the variance is unexplained, which means that categorical
predictions in a new situation are mostly guesswork. In a strongly,
stochastic system, they are entirely guesswork.” (p. 17)
Meta-analysis has not lived up to its promises to produce incontrovertible facts that would lead education policy. What it has done is demonstrate that average impacts of interventions are relatively small and the variability of impacts is great. The attempts to explain the large variability in study findings are generally post hoc and idiographic, and as such, they do not contribute to anything approaching a science of education. The features of context that mediate success are yet to be understood.
What 40 years of meta-analysis have taught us may have been
foreshadowed by one-time AERA president Lee J. Cronbach
(1975) in his revisiting of his classic American Psychological
Association presidential address—excerpted here from the heavily
gendered original:
“Too narrow an identification with science, however, has fixed our
eyes upon an inappropriate goal. The goal of our work, I have
argued here, is not to amass generalizations atop which a theoretical
tower can someday be erected. . . . The special task of the social
scientist in each generation is to pin down the contemporary facts.
Beyond that, he shares with the humanistic scholar and the artist
in the effort to gain insight into contemporary relationships, and
to realign the culture’s view of man with present realities. To know
man as he is is no mean aspiration.” (p. 126)
Here stands education research, then, on the 100th anniversary of its creation: disabused of its early aspirations to become a science, chastened by decades of unfulfilled promises, willing, if not eager, to entertain a broad spectrum of thought and experience toward the end of improving the education of children.
Notes
With respect, to Lee J. Cronbach (1982), and with thanks to David C. Berliner for help with an early draft.
- www.oed.com. The statistical method “meta-analysis” is perhaps unique as a contribution to empirical inquiry of many types because it arose entirely within the practice of education research. In spite of its origins, meta-analysis has found its widest application and most important contributions in the field of medicine. Contrasting the success of meta-analysis in medicine and education reveals interesting lessons for the future of education research. The findings of research studies in education are highly variable. Context matters in very significant ways when studying education phenomena, and decades of research have produced little understanding of these contextual influences. The findings of education studies remain varied in unaccountable ways.
- The Oxford English Dictionary lists a first meaning of meta-analysis as “Analysis of the grounds and assumptions on which a theory, explanation, or account is based.” The term is infrequently used in philosophy.
- Hattie’s (2009) Figure 2.2 displays the average effect sizes for meta-analyses of 138 influences and interventions. A group of only 19 interventions is shown here for illustrative purposes. Note. Adapted from visible learning diagram at www.visiblelearning. org, based on Hattie (2009).
- An “effect size” (ES) for a comparative experiment that contrasts
Intervention A against Control Group B takes the form ES = (Mean-A –
Mean-B)/Standard Deviation–B, where Mean-A is the average score of
students treated by Intervention A, Mean-B is the corresponding average
for a Control group, and Standard Deviation–B is the standard deviation
among the students in the control group. Precisely what the “control
group” is and alternatives for measuring variability among students is
the subject of many methodological treatments. A typical interpretation
of an effect size goes like so: ES = 1.0 indicates that the average student
under intervention scores above 84% of the students in the control condition
on the outcomes test.
References
Andrews, G., Guitar, B., & Howie, P. (1980). Meta-analysis of the effects of stuttering treatment. Journal of Speech and Hearing Disorders, 45(2), 287–307.
Cooper, H., Hedges, L. V., & Valentine, J. C. (Eds.). (2009). The handbook of research synthesis and meta-analysis. New York, NY: Russell Sage Foundation.
Cronbach, L. J. (1975). Beyond the two disciplines of scientific psychology. American Psychologist 30(2), 116–127.
Cronbach, L. J. (1982). Prudent aspirations for social inquiry. In W. Kruskal (Ed.), The state of the social sciences: Fifty years at Chicago (pp. 61–81). Chicago, IL: University of Chicago Press.
Eysenck, H. J. (1978). An exercise in mega-silliness. American Psychologist, 33(5), 517.
Gallo, P.S. (1978). Meta-analysis—A mixed metaphor. American Psychologist, 33(5), 515–517.
Glass, G. V (1976). Primary, secondary and meta-analysis of research. Educational Researcher, 5, 3–8.
Glass, G. V (1978). Integrating findings: The meta-analysis of research. Review of Research in Education, 5, 351–79.
Glass, G. V (2015). Meta-analysis at middle age: A personal history. Research Synthesis Methods, 6(3), 221–231.
Glass, G. V, Cahen, L. S., Smith, M. L., & Filby, N. N. (1982). School class size: Research and policy. Beverly Hills, CA: SAGE Publications.
Glass, G. V, McGaw, B., & Smith, M. L. (1981). Meta-analysis in social research. Beverly Hills, CA: SAGE.
Glass, G. V, & Smith, M. L. (1979). Meta-analysis of research on the relationship of class-size and achievement. Educational Evaluation and Policy Analysis, 1, 2–16.
Hattie, J. A. C. (2009). Visible learning: A synthesis of 800+ meta-analyses on achievement. London: Routledge.
Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. Orlando, FL: Academic Press.
Higgins, J. P. T., & Thompson, S. G. (2002). Quantifying heterogeneity in a meta-analysis. Statistics in medicine, 21, 1539–1558.
Hunt, M. (1997). How science takes stock: The story of meta-analysis. New York, NY: Russell Sage Foundation.
Ioannidis, J. P. A. (2010). Meta-research: The art of getting it wrong. Research Synthesis Methods, 1, 169–184.
Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46(4), 806–834.
Meehl, P. E. (1990). Why summaries of research on psychological theories are often uninterpretable. Psychological Reports, 66(1),195– 244.
Presby, S. (1978). Overly broad categories obscure important differences between therapies. American Psychologist, 33(5), 514–515.
Rolstad, K., Mahoney, K., & Glass, G. V (2008). The big picture in bilingual education: A meta-analysis corrected for Gersten’s coding error. Journal of Educational Research & Policy Studies, 8(2), 1–15.
Scriven, M. (1980a). Self-referent research. Educational Researcher, 9(6), 7–30.
Shadish, W. R., & Lecy, J. D. (2015). The meta-analytic big-bang. Research Synthesis Methods, 6(3), 246–264.
No comments:
Post a Comment