This debate over class size raises some important questions. For instance what is the impact of class size on the educational outcomes of Australian children? How do class size reductions compare with other educational reforms in terms of their cost effectiveness? It turns out we know very little about how class size influences student learning in Australia so in this post I’ll take a look at what some of the better research from the US has to say about the effectiveness of reductions in class size.
The main challenge faced by researchers in trying to figure out whether small class sizes have a positive impact on educational outcomes is that the children we observe in small classes may not be representative of all children. Recent work by Louise Watson and Chris Ryan suggest that the student-teacher ratio (which as it turns out is a different thing to class size) in Independent schools have fallen relative to Government schools over the last 20 years in Australia. They also show how the socio-economic status of Government school students has declined over this period relative to children in Independent schools. If it is the case that those children enrolled at well funded Independent schools have parents with greater means and/or a greater willingness to invest in the education of their children inside the home, we may find these children to have better educational outcomes for reasons that have nothing to do with class size. The same might be true if the higher socio-economic status of a child’s classmates conferred upon them some positive peer-effect in their learning or if Independent schools had other characteristics (higher quality teachers, better facilities) that improved child outcomes independent of class size.
This isn’t necessarily a problem if educational researchers had data on all of these influences and were able to use statistical methods to control for them. The trouble is this usually isn’t the case. The only way we can untangle the causal effect of class size is to rely on policy experiments such as those conducted in the US. To date no such experimental evidence exists for Australia.
How can policy makers learn about the impact of class size on student learning?
One of the most cited studies on the effectiveness off class size is Krueger (1999). Krueger used data from a policy experiment that was branded Project STAR (the Student/Teacher Achievement Ratio experiment) was conducted between 1985 and 1989 in the US state of Tennessee. Project STAR involved the random assignment of 11,600 students in 80 public schools to three different class types.
|Small classes||13-17 students per teacher|
|Regular classes||22-25 students per teacher|
|Regular aid classes||22-25 students per teacher and a full-time teacher’s aid|
The experiment did not cover all public schools. A school had to be large enough to accommodate one of each of these class types but eligible schools were spread across urban and regional areas. Children were randomly assigned in Kindergarten (the year before year one in Australia) with the experiment continuing until third grade. Most of the students entered the study in 1985 when they started Kindergarten. However, an additional 2,200 students joined in the 1st grade, since Kindergarten was not mandatory in Tennessee. Students who repeated a grade left the study. Teachers in participating schools were randomly assigned a class type.
STAR wasn’t a perfect experiment. Obviously running an experiment in the real world is going to be a touch more difficult than running one in a lab. Parents who complained could get their children re-randomised between the two types of regular classes (p. 520) and some children with behavioural problems were re-assigned to small classes. There is also the possibility that students who left the study were not representative of the student population. We might also be concerned that some parents moved their children to private schools upon learning of their child’s initial assignment to a regular class.
That being said the numbers of students who transition between class types is not overly large (p. 507) and it doesn’t appear that many parents moved their children out of the experiment in response to their child’s initial class assignment. Krueger was able to obtain enrollment forms for 18 of the 80 schools from which we learn that 10.4% of children assigned to small classes left the experiment compared to 14.3% percent of those assigned to regular classes and 12.2% of those assigned to regular classes with an aid (p. 516). The percentage of children in small classes who were held back at some point between Kindergarten and 3rd Grade was somewhat lower at 19.8% compared to 27.4% of children in regular classes (p. 505). Overall different types of students were evenly spread across the three class sizes (p. 504).
Student learning was measured using the Stanford Achievement Test (SAT). SAT is a test of reading, word recognition and math. In order to aid comparisons between children in small and regular classes children in regular classes were assigned their SAT score percentile while children in small classes were assigned the regular class percentile that matched their raw SAT score. A summary measure was constructed by taking the average of the reading, word recognition and math percentiles.
Is there a causal effect of class size (in Tennessee)?
Focusing on Krueger’s results that control for student, teacher and school characteristics the average impact of small class size, and regular class with an aide on average SAT percentile scores relative to regular non-aid classes in each grade is…
|Regular with Aid||0.31||1.78||1.58||-0.75|
Source: Krueger 1999 – Column 5 of Table V, pp. 512-13.
The above results use observed class type rather than assigned class type but this appears to make little difference to the results (compare columns 1-4 in with columns 5-8). As we would expect if class type were approximately random then including student, teacher and class characteristics in the modeling should make little difference to the estimate of class size (compare columns 2 and 4, 5 and 8) and this is what Krueger finds.
As Krueger shows in Table III there was a little bit of overlap in actual class size among the three class types. The average small class size was 15.7 students compared to 22.7 for the regular non-aide classes and 23.4 for the aide classes. For this reason he presents results that model the exact class size that each students found themselves in. After adjusting appropriately the results are very similar to the above (Table VIII, p. 518).
Krueger also examines whether the benefits that accrue to smaller class size accumulate over the course of the experiment or are confined to the initial impact of finding oneself in a small class. This is interesting as many students who were allocated to small classes entered school in 1st grade rather than Kindergarten so not all students experienced the same number of years in the class type that they were assigned.
His preferred estimates (Table IX, column 3), put the initial impact of small class assignment at just under a 3 percentile increase compared to a regular class with an additional 0.65 percentile increase for each year in a small class. These results are not however statistically significance. For non-stats-geeks
“When the same students are tracked over time…students in small classes [gain] about one percentile rank per year relative to students in regular classes…students appear to benefit particularly from attending a small class the first year they attend one, whether that is Kindergarten, first, second, or third grade…” pp. 523-4
Are some students impacted differently?
The numbers above are Krueger’s best attempts to estimate the impact of small class size on the average student. Obviously not all students are average and it may be that the consequences of larger class sizes are more adverse for certain groups of students. Krueger finds boys, students from low-income families (as measured by eligibility for free school lunches), students of African American background and students from the Inner-City all receive greater test score increases. There is also some evidence that the small class sizes are more important for reading than for math (Table XI, p. 528).
|Free lunch||No free lunch|
Source: Krueger 1999 – Table X, p. 525.
Is the effect of class size large or small?
This is a difficult question to answer. While there is a vast literature that suggests that performance on cognitive tests are able to predict later life outcomes, it is difficult to know how a 3-6 percentile increase in SAT percentiles, on average, translates into outcomes that matter such as school completion, university participation and labour productivity. Cognitive tests are not an end in themselves; they are merely one indicator of the private and social returns of investments in education. In Krueger’s words…
“Is the impact of attending a small class big or small? Unfortunately, it is unclear how percentile scores on these tests map into tangible outcomes. Nevertheless, a couple of comparisons are informative…one could compare the estimated class-size effects with the effects of other student characteristics. For example, in kindergarten the impact of being assigned to a small class is about 64 percent as large as the white-black test score gap, and in third grade it is 82 percent as large. By both metrics, the magnitudes are sizable.” p. 514
Recent work by Chetty and a cast of co-authors has sought to shed some light on the private returns to small class size. Chetty et. al. (2010) (NBER working paper version) match STAR students to their tax records between 1999 and 2007 when they were aged between 19 and 27. From these tax records they calculate average earnings between 2005 and 2007 for STAR students for whom they are able to match in 95% of cases. Where parents listed STAR children as dependents in their tax returns they are able to obtain average adjusted gross household income between 1996-98. This provides a measure of the financial resources of the household in which the child lived when they were aged 16-18, an important variable missing from the original STAR data (this has to be better than free-lunch eligibility). The authors are able to match 86% of children to their parents. Somewhat surprisingly they can also infer College attendance from the tax data. In order to receive tax credits for fee relief “Title IV” tertiary education institutions in the US have to report on tuition payments and scholarships received by all students.
As was done in the Krueger paper, the authors are careful to make sure that the pre-labour market student characteristics and those of their parents are not associated with assignment to a small class (p. 11, see also column 2 of Table II). This is important because the original STAR data didn’t contain any information on parental characteristics thereby adding to the credibility of STAR as a pretty good approximation to true experiment.
They also make sure that students who were matched are no different from those who were not, at least based on what we observe of the students. After controlling for which school students went to, it appears there is very little difference in the match rates between small and large classes. While it might be that schools in disadvantaged areas might produce children who are less likely to work and/or file tax returns it doesn’t appear that these children were more less likely to be allocated to a small class within schools (p. 12, columns 1 and 2 of Table III).
What were the “real-world” outcomes of small class size for students in the STAR project?
Chetty et. al. use the same statistical method that Krueger used to form the estimates in my previous table. They find a similar impact of small class size on test scores to that found by Krueger (for the initial year of assignment to a small class) even after controlling for parental characteristics that Krueger could not. While Chetty et. al.’s results suggest that students assigned to small classes have a 2% greater probability of entering university by age 20, this result is not statistically significant at a 5% level of significance. This is stats-geek talk for small class sizes being unlikely to have any impact on university participation. The results for wages are even less certain. In the authors’ own words
“the average student assigned to a small class spent 2.27 years in a small class, while those assigned to a large class spent 0.13 years in a small class. On average, large classes had 22.6 students while small classes had 15.1 students. Hence, the impacts on adult outcomes below should be interpreted as effects of attending a class that is 33% smaller for 2.14 years [ p. 16 of Working Paper version]…With controls for demographic characteristics, the point estimate of the earnings impact becomes -$124 [per year] (with a standard error of $336). Though the point estimate is negative, the upper bound of the 95% confidence interval is an earnings gain of $535 (3.4%) gain per year.” p. 18 of Working Paper version.
Put simply they are unable to find any convincing evidence for an impact of small class sizes on wage income later in life.
What are the policy implications of the findings from the STAR project for Australia?
As I suggested earlier we don’t know for certain that these findings would be replicated if we had run a STAR-like randomised policy trial in Australia.
It might be that class-size has an impact upon outcomes unrelated to cognitive test scores, such as self-confidence, inter-personal skills and other personality traits (non-cognitive skills) that the Nobel Laureate James Heckman and his co-authors find are important. While these may have their own benefits it doesn’t appear that this is reflected in early labour market earnings. This is not to say there are no social returns to small class sizes but it does cast some doubt on whether smaller class sizes would be able to pay for themselves in taxes collected later in life. The case for smaller class sizes would be strengthened if research could show that there was a causal link with adverse student outcomes that are quite costly such as crime or risky behaviours such as drug use.
The main implication of the US research for Australia is that there is a need for a randomized policy experiment similar to STAR. In addition to randomly assigning children to different class sizes this trial could also randomly assign students to teachers shown to have different abilities to increase NAPLAN scores, one potential measure of teacher quality. Such a policy trial would provide State and Commonwealth Governments with a better idea of the extent to which of class size and teacher quality (albeit narrowly defined) yield greater returns for a given level of the other. This is the sort information required to ensure that the education dollar is spent in the most efficient way for the greatest effect. It would also provide an evidence-base that would strengthen the political position of a Government attempting to implement such reforms in the future.
This policy trial should be configured in such a way as to provide precise estimates of the impact on children from disadvantaged backgrounds (Indigenous students, students with disabilities and students from low socio-economic backgrounds). As Krueger’s results indicate it may be that children who enter school with lower levels of parental investment will require a different combination of teacher quality and class size to students from more advantaged backgrounds.
Given the long lead time between implementing the trial and being able to assess later life outcomes such as university participation and earnings some may find my view that Governments should fund such policy experiments to be politically naive. I would argue that this highly politicised debate has gone on in this country for decades and has occurred in an evidence vacuum. Most State and Federal Governments get at least 2 terms in office. Six years is the time it takes for a child to go from Preparatory/Kindergarten/Reception/Pre-primary/Transition to Year 5 or from Year 6 to school completion. Once the data is in the public domain it’s there forever. Were an incoming government to terminate the experiment there would still be the potential to link the data to tax records and learn about the long-term outcomes of the programme in the same what that Chetty et. al. did with STAR. Building consensus for controversial reforms might take more time than most Governments have but the benefits of good policy once implemented are reaped in perpetuity.
To conclude, it is widely accepted that class size is not the be all and end all of increasing student performance. The political issue is that class size is a quantifiable and tangible policy lever that avoids the complex questions of the sort a policy maker confronts in defining and measuring something like teacher quality. In my opinion Australian students and the Australian economy would be best served by teachers, parents, politicians and, dare I say, students, coming to a consensus about what teacher quality means and how best to measure it. Such a consensus combined with a randomised policy trial that sheds some light on the optimal combination of resources for different schools could unlock the private and social benefits of expenditure on education in Australia.
Raj Chetty, Friedman, J. N., Hilger, N., Saez, E., Schanzenbach, D. W. and Yagan, D. (2011). How Does Your Kindergarten Classroom Affect Your Earnings? Evidence from Project Star. The Quarterly Journal of Economics. Vol. 126, No. 4, pp. 1593-1660.
Alan B. Krueger (1999) Experimental estimates of education production function. Quarterly Journal of Economics. Vol. 144, No. 2, pp. 497-532.
Louise Watson and C. Ryan (2010) Choosers and losers: The Impact of government subsidies on Australia secondary schools, Australian Journal of Education, Vol. 54, No. 1, pp. 86-107.
Joshua D. Angrist and V. Lavy (1999) Using Maimondides’ rule to estimate the effect of class size on scholastic achievement. Quarterly Journal of Economics. Vol. 144, No. 2, pp. 533-575.