Gene V Glass Archives: Meta-Analysis of Bilingual Education Research

Research on Effectiveness of Bilingual Education (in the U

2007

The Big Picture in Bilingual Education:

A Meta-analysis Corrected for Gersten’s Coding Error

Kellie Rolstad, Arizona State University

Kate Mahoney, State University of New York at Fredonia

Gene V Glass, Arizona State University

Language minority education is at a peculiar point in its history. Within the last few years, clarity and consensus regarding the effectiveness of bilingual instruction has emerged in the scientific literature, while the political environment has become more hostile than at any time since the passage of the Bilingual Education Act forty years ago.

In Rolstad, Mahoney and Glass (2005) (hereafter, RMG), we reviewed narrative summaries and meta-analyses examining the effectiveness of bilingual education. The meta-analyses and more recent narrative summaries favored the conclusion that bilingual education is an effective approach to raising academic achievement for English Language Learners (ELLs), a conclusion also consistent with the work by the National Literacy Panel (August & Shanahan, 2006) completed after RMG. A puzzling source of data for us was Gersten (1985), which was an outlier in our analysis. In the present paper, we note recent revelations in Rossell and Kuder (2005) that Gersten miscoded program descriptions in his study, and we produce a new meta-analysis corrected for the coding error.¹

Rolstad, Mahoney and Glass’s (2005) Meta-analysis

RMG used a corpus of 17 studies which were conducted in the years following Willig’s (1985) meta-analysis. Unlike previous studies, RMG provided comparisons not only for Transitional Bilingual Education (TBE) and English-only approaches, programs in which English acquisition is the primary goal, but also for Developmental Bilingual Education (DBE), programs that promote the development and maintenance of the first language as well as English. Furthermore, RMG included as many studies as possible in the meta-analysis, without applying selection criteria bearing on study quality, as intended by the original developers of the method (Glass, 1976; Glass, McGaw, & Smith, 1981).

As an additional methodological contribution, RMG coded program models according to the descriptions provided in the studies rather than the labels themselves, as many studies were found to use program labels adopted by schools but which did not fit conventional definitions. RMG coded programs whose descriptions were more aligned with the conventional definition of TBE as TBE, those more aligned with the conventional definition of DBE as DBE, and those more aligned with the conventional definition of an English-only program as EO. See Crawford (2004) for conventional definitions and discussion of program models.

RMG showed that TBE was consistently superior to all-English approaches, and that DBE programs were superior to TBE programs. In an analysis controlling for ELL status, RMG found a positive effect for bilingual education of .23 standard deviations, with outcome measures in the native language showing a positive effect of .86 standard deviations. Note that in Table 1 (originally published in RMG) Gersten’s three average effect sizes contributed negatively to the meta-analysis. More specifically and by individual effect size, Gersten (1985) contributed 3 negative effect sizes, Gersten, Woodward, and Schneider (1992) contributed 10 negative and two positive effect sizes, and Gersten and Woodward (1995) contributed eleven negative effect sizes. For further details, please see RMG.

[Insert Table 1 about here]

Gersten’s Coding Error

Gersten (1985) had reported that a larger percentage of children enrolled in a structured immersion program (75%) scored at or above grade level on standardized tests than children in a bilingual program (19%) at the end of second grade. Gersten (1985) does not present a description of his comparison group apart from labeling it “the district’s bilingual program.” However, because Gersten has written extensively on bilingual education, consistently expressing a preference for direct instruction in Structured Immersion (SI) over bilingual methods, we included the 1985 study in our analysis along with two other Gersten contributions, even though it lacked an actual definition or description of the bilingual education program. In RMG, we coded Gersten’s SI program as an EO program and what Gersten called TBE was coded as TBE.

However, Rossell and Kuder (2005) recently reported a personal communication with Gersten revealing that Gersten “now agrees that the district undoubtedly mislabeled their ESL program as a bilingual program” (footnote 7, page 18), and that the comparison was not between EO and TBE, as Gersten originally stated, but rather between SI and ESL Pullout.

Gersten’s three articles contributed 26 individual effect sizes out of 67 (39% of the sample), which had a substantial influence on the mean effect size. We now have a better understanding of why one of the studies, Gersten (1985), differed so dramatically from the others in the meta-analysis -- rather than comparing TBE with SI, it compared two varieties of English-only programs, namely, ESL Pullout and SI. Gersten’s (1985) description of the SI program in his study depends on reference to general characteristics of SI outlined in Baker and de Kanter (1983).

The key to a structured immersion is that all academic instruction takes place in English, but at a level understood by the students (Baker & de Kanter, 1983). At the same time, there are always bilingual instructors in the class who understand the children's native language and translate problematic words into the native language, answer questions phrased in the native language, help the children understand classroom routines, show them the bathrooms, lunchroom, and playground, and so forth (p. 189).

Gersten (1985) appears to define SI as used in the study, then, as involving bilingual teachers who provide minimal help in the native language. While no details are provided regarding the ESL Pullout program, such programs generally do not provide native language support of any kind (Crawford, 2004). Therefore, following the coding convention established in RMG, we regard Gersten’s SI as more aligned with TBE, since it appears to have provided a modicum of native language support, and we take what Gersten has now revealed to have been ESL Pullout as a variety of EO.

A Recalculated Meta-analysis Corrected for Gersten’s Coding Error

Recalculating the meta-analysis in Table 2 with the corrected coding for Gersten (1985), following these conventions, we see that the mean effect size for all outcome measures increases from .08 to .19, Reading (in English) increases from -.06 to .14, Math (in English) increases from .08 to .17, and all TBE studies increased from -.01 to .10. The revelation of a coding error for Gersten (1985), and the inconsistency of all three of Gersten’s studies with the rest of the work we reviewed, increases our confidence that the “investigator effect” noted in RMG may justify removing all three of the Gersten studies. As shown in Table 2, removing Gersten’s studies renders an effect size for TBE of 0.17, nearly as high as for DBE. Because numerous factors other than language proficiency are known to contribute to lower academic achievement among ELLs (August & Hakuta, 1996; August & Shanahan, 2006), we argued in RMG that the most informative result is the effect size reported for studies involving ELLs in both treatment and control groups; as shown in Table 2, the average effect size for TBE in these studies is 0.23, favoring bilingual approaches.

[Insert Table 2 about here]

Conclusions

Meta-analysis is a useful tool for clarifying variation among studies reporting divergent findings. The original RMG analysis discovered curious effects associated with the Gersten studies, which behaved as outliers in the analysis. The coding error recently reported by Rossell and Kuder (2005) confirmed our suspicion, at least for Gersten (1985), that the results were incorrect. The new analysis reported in Table 2 strengthens the conclusions previously reached in RMG supporting TBE over English-only approaches, and DBE over TBE.

References

August, D. & Hakuta, K. (1998). Improving Schooling for Language-Minority Children: A Research Agenda. Washington, DC: National Academy Press.

August, D., & Shanahan, T. (Eds.) (2006). Developing literacy in second-language learners: Report of the National Literacy Panel on Language-Minority Children and Youth. Mahwah, NJ: Lawrence Erlbaum.

Baker, K., & de Kanter, A., A. (1981). Effectiveness of Bilingual Education: A Review of the Literature. Final Draft Report. Washington, DC: Department of Education Office of Planning, Budget, and Evaluation.

Burnham-Massey, M. (1990). Effects of bilingual instruction on English academic achievement of LEP students. Reading Improvement, 27, 129-32.

Carlisle, R. S. (1989). The writing of Anglo and Hispanic elementary school students in bilingual, submersion, and regular programs. SSLA, 11, 257-280.

Carter, T. P., Chatfield, M.L. (1986). Effective bilingual schools: Implications for policy and practice. American Journal of Education 95(1), 200-32.

Crawford, J. (2004). Educating English Learners: Language Diversity in the Classroom. Los Angeles, CA: Bilingual Educational Services, Inc.

de la Garza, J. & Medina, M. (1985). Academic achievement as influenced by bilingual instruction for Spanish-dominant Mexican American children. Hispanic Journal of Behavioral Sciences, 7(3), 247-59.

Gersten, R. (1985). Structured immersion for language minority students: Results of a longitudinal evaluation. Educational Evaluation and Policy Analysis, 7, 187-196.

Gersten, R., Woodward, J. (1995). A longitudinal study of transitional and immersion bilingual education programs in one district. Elementary School Journal, 95(3), 223-239.

Gersten, R., Woodward, J., & Schneider, S. (1992). Bilingual immersion: A longitudinal evaluation of the El Paso program. Washington, DC: READ Institute. (ERIC Document Reproduction Service No. ED389162)

Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. Educational Researcher, 5, 3-8.

Glass, G. V., McGaw, B., & Smith, M. L. (1981). Meta-analysis in social research. Beverly Hills, Calif.: SAGE Publications.

Krashen, S. (2005). A correction -- 20 years later. Stephen Krashen’s Mailing List. Retrieved December 3, 2008 from http://sdkrashen.com/pipermail/krashen_sdkrashen.com/2005-November/000332.html.

Lindholm, K.J. (1991). Theoretical assumptions and empirical evidence for academic achievement in two languages. Hispanic Journal of Behavioral Sciences, 13(1), 3-17.

Medina, M. Jr.& Escamilla, K. (1992). Evaluation of transitional and maintenance bilingual programs. Urban Education, 27(3), 263-90.

Medina, M. Jr., Saldate, M. IV, & Mishra, S. (1985). The sustaining effects of bilingual instruction: A follow-up study. Journal of Instructional Psychology, 12(3), 132-39.

Medrano, M.F. (1986). Evaluating the long-term effects of a bilingual education program: A study of Mexican students. Journal of Educational Equity and Leadership, 6, 129-138.

Medrano, M.F. (1988). The effects of bilingual education on reading and mathematics achievement: A longitudinal case study. Equity and Excellence, 23(4), 17-19.

Ramirez, J.D., Yuen, S.D., Ramey, D.R., Pasta, D.J. & Billings, D. (1990). Final report: Longitudinal study of immersion strategy, early-exit and late-exit transitional bilingual education programs for language-minority children. San Mateo, CA: Aguirre International. (ERIC Document Reproduction Service No. ED330216)

Rolstad, K., Mahoney, K. & Glass, G.V. (2005). The Big Picture: A Meta-Analysis of Program Effectiveness Research on English Language Learners. Educational Policy, 19(4), 572-594.

Rossell, C. (1990). The effectiveness of educational alternatives for limited-English proficient children. In G. Imoff (Ed.), Learning in Two Languages (pp. 71-121). New Brunswik, NJ: Transatlantic Publishers.

Rotharb, S. and others (1987). Evaluation of the bilingual curriculum content (BCC) pilot project: A three year study. Final report. Miami, FL: Dade County Public Schools. Office of Educational Accountability. (ERIC Document Reproduction Service No. ED300382)

Saldate, M., IV, Mishra, S., & Medina, M., Jr. (1985). Bilingual instruction and academic achievement: A longitudinal study. Journal of Instructional Psychology, 12(1), 24-30.

Slavin, R. E., & Cheung, A. (2003). Effective Reading Programs for English Language Learners: A Best-Evidence Synthesis. Baltimore, MD: Johns Hopkins University Center for Research on the Education of Students Placed At Risk (CRESPAR).

Texas Education Agency. (1988). Bilingual/ESL Education: Program evaluation report. Austin, TX: Texas Education Agency. (ERIC Document Reproduction Service No. ED305821)

Thompson, M. S., DiCerbo, K., Mahoney, K. S., & MacSwan, J. (2002). ¿Éxito en California? A validity critique of language program evaluations and analysis of English learner test scores. Education Policy Analysis Archives, 10(7), entire issue. Available at http://epaa.asu.edu /epaa/v10n7/.

Willig, A. C. (1985). A Meta-analysis of selected studies on the effectiveness of bilingual education. Review of Educational Research, 55(3), 269-318.

Endnote

1. We are indebted to Stephen Krashen for bringing this important fact to our attention (Krashen, 2005).

Tables

Table 1

Comparisons of Effect Size by Study as They Appeared in Rolstad, Mahoney & Glass (2005)

Study	N of ES	Mean ES	SD of ES¹

Burnham-Massey, 1990
Grades 7-8
Range of n's for TBE: 36-115
Range of n's for EO²: 36-115

TBE vs EO²
Reading	3	-0.04	0.07
Mathematics	3	0.24	0.14
Language	3	0.16	0.25


Carlisle, 1989
Grade 4, 6
Range of n's for TBE:23
Range of n's for EO¹:19
Range of n's for EO^2:22

TBE vs EO¹
Writing-Rhetorical Effectiveness	1	0.82
Writing- Overall Quality	1	1.38
Writing-Productivity	1	0.60
Writing-Syntactic Maturity	1	1.06
Writing-Error Frequency	1	0.50

TBE vs EO²
Writing-Rhetorical Effectiveness	1	-2.45
Writing- Overall Quality	1	-8.25
Writing-Productivity	1	0.18
Writing-Syntactic Maturity	1	0.24
Writing-Error Frequency	1	1.01


Carter and Chatfield, 1986
Grades 4-6
Range of n's for DBE: 26-33
Range of n's for EO²:14-47

DBE vs EO²
Reading	3	0.32	0.24
Mathematics	3	-0.27	1.06
Language	3	-0.60	1.54

¹“SD of ES” is the standard deviation of the effect sizes.

Table 1 (continued)