A very influential paper on improving math outcomes was published in 2008. The authors refused to divulge their data claiming that agreements with the schools and Family Educational Rights and Privacy Act (FERPA) rules prevented it.
When we found the identities of the schools by other means, serious problems with the conclusions of the article were quickly revealed.
In this article we describe how we were able to find the missing data for the 2008 paper. We discuss the huge difficulties they revealed, and point out the legal constraints that should make it very difficult for authors of such papers to legally withhold their data in the future.
Stanford Professor of Education, Jo Boaler, and her student, Megan Staples, published a very influential paper, [BS], on improving math outcomes for high school students in 2008. The paper had so many policy implications that it was critically important for researchers to be able to check the results. But the authors refused to divulge their data, claiming agreements with the participating schools and FERPA rules prevented it.
So it is vitally important to analyze the legal foundations on which these authors base their refusal to share crucial data with other researchers.
It seems to be settled law that public schools, the lead researchers, and even individual teachers directly involved in conducting the research have no privacy protections and are subject to the Freedom of Information Act (FOIA) requirements.^{†}
It turns out that FERPA only applies to student names and educational records, but the correct citation, [HHS], while providing strong protections for human research subjects, has critical exemptions for exactly the types of research done in [BS], [BR], and [RN]. As a result, the claimed right to privacy does not hold for the data in these three papers, nor for papers like them, and their data is subject to FOIA.
However, 7 years ago, the authors of [BCM] were unaware of these exemptions, so it was necessary for us to try to find the names of the schools studied in [BS] by other means. We were lucky enough that a close examination of the data recorded in [BS] allowed us to do just this.
Serious - perhaps even fatal - problems with the conclusions in [BS] were quickly revealed once we had these school names.
Jo Boaler recently wrote a very pointed criticism of Prof. Wayne Bishop and me, [B2]. It referred to a paper jointly authored by Wayne Bishop, Paul Clopton, and me, [BCM], that she seemed to be unaware had been accepted for publication in the peer reviewed education journal, Education Next, on 3/22/2006, and so she claimed it "has never been peer reviewed." What actually happened was that, for various reasons, it was held back and simply made available via the Internet for archival reasons.
Our paper, [BCM], studies a published article of J. Boaler and M. Staples, [BS], a study focused on three California high schools, of which the most important for the study was called "Railside." If [BS] is correct, the paper is extremely important. Since [BS] is potentially so important and has so many implications about the best ways to teach our students, it needs to be independently verified. Indeed, a high ranking official from the U.S. Department of Education asked me to evaluate the claims of [BS] in early 2005 because she was concerned that if those claims were correct U.S. ED should begin to reconsider much if not all of what they were doing in mathematics education. This was the original reason we initiated the study, not some need to persecute Jo Boaler as she claims ([B2], paragraph 3).
In any case, the conclusion of [BCM] was that no change in funding policies was required, as we had identified three critical areas where much more information was absolutely required before the results of [BS] could be justified. The three areas will be described and discussed later in this note, and we have serious doubts about the possibility of fixing the issues that we've identified.
One of the reasons I held [BCM] back was that some of our math educators felt that when Boaler left Stanford, there was no real need for this paper to appear. This was not, of course, my focus, but it gave me concerns that if we published [BCM], it would be impossible for me to work with the community of math educators in this country, and there were still things that I felt a research mathematician could do to help improve the current mess in our K-12 mathematics outcomes. Indeed, at that time
Though the article was placed on my ftp site, very few people knew about it, so I was able to continue to work on the problems with our K-12 math outcomes.
Boaler claimed [BCM] "contravenes federal law that protects human subjects" (see [B2], Bullet 10), so it is worth noting first that [BCM] had been submitted to the Stanford IRB and was approved for publication after minor changes, again something that Boaler appears not to have known.
Since these exclusions^{[1]} are the topics in
[BS], there is no expectation of privacy, except for the identities
and school records of the individual students. [Subpart A -- Basic HHS
Policy for Protection of Human Research Subjects 45 C.F.R §46.101 (1991),
Paragraph (b) www.hhs.gov].
Also, Stanford's stated expectations for openness in research by Stanford faculty members is unambiguous, [StanHandbook]. The summary of the [StanHandbook] requirements starts as follows:
[RPH 2-6] ... Expresses Stanford's commitment to openness in research; defines and prohibits secrecy, including limitations on publishability of results; specifies certain circumstances which are acceptable under this policy....
and the exact resolution is
That the principle of openness in research - the principle of freedom of access by all interested persons to the underlying data, to the processes, and to the final results of research - is one of overriding importance. Accordingly, it is the decision of the Senate that that principle be implemented to the fullest extent practicable, and that no program of research that requires secrecy (as hereafter defined) be conducted at Stanford University, subject to the exceptions set forth in Paragraph 4 of this Resolution.
The exceptions are not applicable to either [BS] or our paper, [BCM]. Accordingly, our data -- including the names of the three schools -- is entirely available to researchers on request.
It is possible that Boaler was trying to circumvent the requirements of this Stanford policy by quoting the FERPA constraints. But as we have seen, they do not apply to [BS], and in any case, they are the wrong rules. She should have referenced [HHS], but that would not have helped either. Moreover, if her research grant contained restrictions on access to the data, then Stanford would not have allowed it, [Checklist], [NOTE]. It seems evident that [BS] is subject to Stanford's openness of research requirements, and she was obliged to release their data, just redacting student names or identifiers.
There was an article written about Boaler's critique [B2], followed by a large number of comments.^{[2]} Perhaps motivated by this article, a number of people read [BCM] and noted that most of it was indisputable (where we dissect the actual mathematics involved in [BS]) but it was necessary to have the real names of the schools in [BS] to check the remaining details. However, the actual names were not included in either [BS] or [BCM]. In any case Prof. Boaler promised a direct rebuttal to our paper prompted by these comments. She wrote
As I write this, nearly two months have passed since Boaler's rebuttal was promised, but it has not appeared. Nor is it likely to. The basic reason is that there is every reason to believe [BCM] is not only accurate but, in fact, understates the situation at "Railside" from 2000 - 2005.
Indeed, a high official in the district where Railside is located called and updated me on the situation there in May, 2010. One of that person's remarks is especially relevant. It was stated that as bad as [BCM] indicated the situation was at Railside, the school district's internal data actually showed it was even worse. Consequently, they had to step in and change the math curriculum at Railside to a more traditional approach.
Changing the curriculum seems to have had some effect. This year (2012) there was a very large (27 point) increase in Railside's API score and an even larger (28 point) increase for socioeconomically disadvantaged students, where the target had been 7 points in each case.
This reminds me very much of a promise some years ago when I wrote an article on a similar widely heralded experiment at Andover High School in Michigan, [MA]. I showed that the end result was a disastrous experience for a large majority of the students in their first year college mathematics courses the year after graduating from the high school.
Then, too, a rebuttal was promised but it never appeared, and as time went on, further material was published that made it clear that our analysis was both correct and justified, [HP].
In the remainder of this note we focus on the content of [BS], and try to show what our difficulties with it were. We then explain what we did in [BCM], and why we needed to do it.
Our commentary in [BCM] refers to [BS], published in the journal Teachers College Record in 2008, but the version we worked from was the preprint that appeared on the Boaler web-site for about 1 year as a PDF file dated 3/2/2005. The title in both cases was
Transforming Students’ Lives through an Equitable Mathematics Approach: The Case of Railside School, and the authors were listed as Jo Boaler, then at Stanford University, and Megan Staples, then at Purdue University
The article [BS] studies the outcomes for the ninth grade students who entered three California high schools in 2000.
The cohorts that [BS] studied at the first two were selected from the ninth graders who took the standard Algebra I course in 2000-2001, while the cohort at the third was selected from the ninth graders (almost the entire class) who started with the ninth grade math course that year as the program at the third school was non-standard. The students were followed till they left high school, and detailed records were kept for them as they progressed through the mathematical programs at their respective schools for the three years from 2000-2001 through 2002-2003.
The schools are identified using the pseudonyms Greendale, Hilltop, and Railside in [BS], and were described as follows on pages 5 and 6 of the preprint:
"Both Greendale and Hilltop schools offered students (and parents) a choice between a traditional sequence of courses, taught using conventional methods of demonstration and practice, and an integrated sequence of courses in which students worked on a more open, applied curriculum called the Interactive Mathematics Program (Fendel, Fraser, Alper, & Resek, 2003), or `IMP.' Students in IMP classes worked in groups and spent much more time discussing mathematics problems than those in the traditional classes. Railside school did not offer a choice and the approach they used was `reform’ oriented. The teachers worked collaboratively and they had designed the curriculum themselves, drawing from different `reform’ curriculum such as the College Preparatory Mathematics Curriculum (Sallee, Kysh, Kasimatis, & Hoey, 2000), or `CPM' and `IMP.' "
With a number of changes, this preprint is the paper that later appeared in print with the same title (See [BS]). Since there were changes between the two versions, we will send the original preprint on request if possible.
It is also worth noting that Prof. Wayne Bishop had requested the identities of the three schools from Prof. Boaler shortly after the preprint had appeared, but she refused, saying that it was against the law, the requirements of her NSF grant, and her agreements with the three schools.^{[3]}
It is worth noting again that her refusal is contrary to federal FOIA requirements (see Appendix for the specific section of the federal code that makes studies like [BS] subject to FOIA), and to Stanford's openness of research requirements.
The point of the Boaler-Staples paper seemed to be that the standard measures of student achievement -- STAR exams, SAT, AP exams etc. -- were not valid measures of what the students understood and could do. So the authors, together with the involved teachers at the three high schools, created 4 tests, a ninth grade pre-test and ninth, tenth and eleventh grade post-tests, with the ninth grade post-test given as a pre-test at the start of tenth grade. They were adminstered to the treatment groups as the students advanced from 9th through 11th grade.
The paper posits that the pre- and post-tests were a more valid indication of what the students actually knew and understood. Assuming this, the article goes on to say that the students at Railside started out at a much lower level than those at the other two schools, but as they advanced, this difference quickly evened out on the tests, and by the end of the study the Railside students significantly outperformed the others.
The authors claimed, but did not conclusively demonstrate, that the three cohorts were roughly equivalent. They included a table, (Table 5 on page 12 of the preprint) [this table, with the numbers rounded appears as Table 6 in the published version] that showed students at Railside outperforming the students at the other two schools in Algebra in 2003, the final year of the study. Also, they asserted that on many, if not most, standard measures, the Railside students did not do well when compared to the students at the other two schools.
So, to validate the claims in the Boaler/Staples article, one has to do three things.
It does not appear that any one of these three items is addressed in any detail in the Boaler/Staples article. There are some general assertions that each of items 1 and 2 was done but no details are included in [BS]. Moreover, there are no indications of a detailed evaluation of the tests in [BS] at all.
We believe that in a paper having the potential importance of this one -- implying the need for major changes in instruction and even curricula at the high school level -- the authors must give details for all three.
In [BCM], we address each of these three items.
We do not claim the first two items were not addressed in [BS], only that they had not been addressed nearly adequately. As regards the tests -- undoubtedly the most important part of the Boaler/Staples study -- we are very sure that the study fails. The four tests cannot measure what they must, unless mathematical imprecisions, errors, and low level mathematical content knowledge are what is required for success in college and the workforce. The effect of low level content knowledge is especially severe. Students who come to college in this situation must start with a remedial math course, and their chances of being able to major in any high tech area become extremely poor.
The details of our analysis of the three items above is the focus of [BCM].
Now we explain how the Boaler/Staples preprint revealed the real identities of the three schools. Here is the key table 5 that appears on page 12 of the preprint, and with the numbers rounded in the published version as Table 6:
Greendale | Hilltop | Railside | |
---|---|---|---|
n | 125 | 224 | 188 |
Advanced | 0 | 0 | 1 |
Proficient | 6 | 13 | 15 |
Basic | 27 | 28 | 33 |
Below Basic | 55 | 43 | 36 |
Far Below Basic | 12 | 15 | 15 |
This table turns out to uniquely identify the schools. There are two things to note.
We now give two examples from the 2003 STAR data-set. They show the form of the report that appears on the net for individual schools. Surprisingly, in the cases below the exact columns that appear in the Boaler/Staples Table 5 are seen as representing the performance of the 2003 ninth graders at the two respective schools,^{[4]} and the same is true of the STAR results for the third school.
In detail, here is the method we used. We took the data above from Table 5, and one of us (P. Clopton, Director of the Veterans Medical Research Foundation VetStats Core,) checked the entire publicly available 2003 California STAR data-base, looking for schools for which any column was identical to one of the columns in Table 5. In each case we found that there was one and only one school that had that data. But the students in the cohorts Boaler was studying should have been in 11th grade, not ninth in 2003!^{[4]} So Table 5 is not data for the population studied in [BS].
Looking more closely at the data for these schools from 2000-2004 we saw that what is remarkable about the supplied data in their Table 5 is that this 2003 ninth grade algebra data is the only time where the Railside students clearly outperformed the students at the other two schools during this period. For example here is the data for ninth graders in 2004:
Greendale | Hilltop | Railside | |
---|---|---|---|
n | 108 | 250 | 229 |
Advanced | 1 | 0 | 0 |
Proficient | 22 | 14 | 10 |
Basic | 34 | 38 | 33 |
Below Basic | 39 | 42 | 48 |
Far Below Basic | 5 | 6 | 8 |
So we can say that in reality, Boaler and Staples had already publicly identified the schools and then misidentified the data in their Table 5. Moreover, there is a possibility that they picked the unique data that might strengthen their assertions, rather than make use of the data relevant to their treatment groups.
We double checked the identifications in various ways, for example checking whether Boaler had ever worked with faculty members from any of the three schools. We found that this was the case for Greendale and Railside.
Additionally, we were told by parents at the school we believed to be Greendale that Boaler had been studying the students there, and also at the school we believed was Hilltop.
Finally, we found an article of hers entitled Stanford University Mathematics Teaching and Learning Study: Initial report -- A comparison of IMP 1 and Algebra 1 at Greendale School prominently posted on the web-site of the school we had identified as Greendale, [IMP].
(A number of the parents at Greendale were not happy about that article. About 1/5 of the families with students at Greendale had made a huge effort to get a traditional mathematics track reinstated there, but students had to actively select it. The parent's perspective seemed to be that Boaler had interfered with this selection process. Not only had an article extolling the virtues of the IMP track been posted on the schools' web-site, but I was told that Boaler had attended private meetings between the individual parents and the lead math teacher at Greendale.)
Once we had the identities of the three schools we could fill in the missing data on student outcomes. This is fully reported in [BCM] so we don't repeat the details here. It suffices to say that every standard measure that had a strong correlation with the likelihood of student success after high school was much stronger for the Greendale and Hilltop students.
In particular, as indicated, Railside's 61% mathematics remediation rate in the California State Universities was much higher than the 37% state average, while the rates for Greendale and Hilltop (35% and 29% respectively) were at or significantly below the state average.
Boaler and Staples were obligated to discuss this and explain why it does not contradict their results. But there is no indication of this in [BS].
As to the possibility that the treatment groups at the three schools were equivalent, we only had the state data to work with, so our analysis had to be somewhat indirect and, as a result, a bit subtle. What we noticed was that
The difference in approach between Railside and Greendale, Hilltop meant that in 2000-2001 we would only expect to see ninth graders taking the advanced tests at the end of academic year 2000-2001 if they were among the strongest math students in their classes, and they were at Greendale or Hilltop.
In fact, this was exactly the case. Only 4 students took one of the advanced tests at Railside at the end of 2000-2001, while the number was 31% at Greendale, and 18% at Hilltop. As a result, we can be reasonably sure that the top 30% of the ninth graders at Greendale and the top 20% of the them at Hilltop were not taking Algebra I in 2000-2001. But, according to [BS] and the information on the Railside web-site, all ninth graders at Railside were required to take the first year math course.
[BS] tells us that about half of each cohort continued with the standard sequence at each of the high schools, and this half is the group that [BS] evaluates to draw their conclusions.
So what do we have? We can assume that the surviving half of each treatment group was roughly made up of the strongest students in each original group. This leads to the following conclusions.
These are hardly comparable populations if we make the standard assumption that abilities are uniformly distributed among populations and ethnic groups.
Boaler/Staples needs to come to grips with this issue. If the final treatment group at Railside contained a significant number of stronger students than the groups at the other two schools, that alone could be enough to explain why the Railside students did better on the Boaler/Staples exams than the groups at the other two schools.
Finally, we come to the third issue -- how closely the [BS] tests measure the mathematics students need to know. For this we had to do a close analysis of the four tests (actually, we only looked closely at two of the three post-tests), and fully 2/3 of [BCM] is devoted to this analysis.
There were many serious mathematical errors on the tests, and even more uses of language so imprecise that students who were never privy to the shorthand being used had little chance of finding the "correct" answers.
Here are just two of the problems we found with the tests. One of the errors was especially amusing, [BCM], p. 17. The correct response to the question the first time it appeared on one of the exams was wrong, so it was changed when the identical question appeared on a later test. However, the new answer while not technically wrong, was so peculiar as to be unbelievable. Another question, this time on the final post-test, is
4. A triangle has an area of 62 sq units. If one side is 10 units, and one angle measures 40 degrees find possible measurements for the other sides and angles. Draw the triangle and label sides and angles.
The question implies there is only one answer up to congruence. This is not true. There would have only been one if the area had been larger than 68.687 square units, but for 62, there are two. One is very easy to analyze since it depends on a general argument. The other is quite tricky since it depends on a very special argument as the second triangle only exists for certain areas.
All of this is discussed in full detail in [BCM]. where we also show that the first and second post-tests were at least three years below California's expectations. (See p. 16 of [BCM].)
Taking all this into consideration, we had no hesitation in asserting that the Boaler tests could not have been an accurate measure of the mathematics the students knew. Similarly, there was no evidence that they measured the mathematics students needed to know for the workplace or for success in college.
It remains to describe our professional qualifactions. Two of us are practicing Ph.D. mathematicians and the third, P. Clopton, is a well respected statistician. All three of us were involved in the creation of the 1998 California Mathematics Standards and Framework, and I have held a number of national positions overseeing research in mathematics education, as well as overseeing the creation of the recent Common Core Mathematics Standards.
§46.101 To what does this policy apply?
(a) Except as provided in paragraph (b) [www.hhs.gov] of this section, this policy applies to all research involving human subjects conducted, supported or otherwise subject to regulation by any federal department or agency which takes appropriate administrative action to make the policy applicable to such research. This includes research conducted by federal civilian employees or military personnel, except that each department or agency head may adopt such procedural modifications as may be appropriate from an administrative standpoint. It also includes research conducted, supported, or otherwise subject to regulation by the federal government outside the United States.
(b) Unless otherwise required by department or agency heads, research activities in which the only involvement of human subjects will be in one or more of the following categories are exempt from this policy:
[BCM] Wayne Bishop, Paul Clopton, R. J. Milgram, A close
examination of Jo Boaler's Railside Report,
ftp://math.stanford.edu/pub/papers/milgram/combined-evaluations-version3.pdf
[B] Wayne Bishop, "A response to some points of: when academic disagreement becomes harassment and persecution," http://math.stanford.edu/~milgram/Jo-Boaler-reveals-attacks-AccusationsResponse-trans.html
[B2] Jo Boaler, "When Academic Disagreement Becomes Harassment and Persecution," www.stanford.edu.~joboaler
[BS] Jo Boaler, Megan Staples, "Transforming Students’ Lives through an Equitable Mathematics Approach: The Case of Railside School," preprint 3/2/2005, and Teachers College Record, 110(3) (2008), 608-645.
[BR] D. Briars, L. Resnick, Standards, assessments -- and what else? The essential elements of Standards-based school improvement. CRESST Technical Report 528, (2000)
[RN] J. Riordan, P. Noyce, "The impact of two standards-based mathematics curricula on student achievement in Massachusetts," J. Research in Mathematics Education 32 (2001), 368-398
[HHS] Subpart A -- Basic HHS Policy for Protection of Human Research Subjects 45 C.F.R. § 46.101 (1991), paragraph (b). www.hhs.gov.ohrp/humansubjects/guidance/45cfr46.html
[MA]http://math.stanford.edu/~milgram/andover-report.html
[HP] R. Hill, T. Parker,A study of Core-Plus students attending Michigan State University, www.math.msu.edu/~parker/monthly905-921.pdf
[CalData] http://star.cde.ca.gov/star2003/
[IMP] J. Boaler Mathematics Teaching and Learning Study:
Initial Report -- A Comparison of IMP 1 and Algebra 1 at Greendale School,
www.gphillymath.org/StudentAchievement/Reports/Initial_report_Greendale.pdf
[StanHandbook] rph.stanford.edu/2-6.html
[Checklist] dot.stanford.edu/C-Res/ITARlist.html
[NOTE] When we first heard Boaler's claim about restrictions in her NSF grant making it impossible for her to release data for [BS], Stanford's IRB checked with NSF and was assured that this was not the case.
^{†} Special thanks to Veronica Norris, J.D., RN, for her legal analysis
^{[1]} The original purpose of these exemptions seems to have been to enable researchers to work on these topics with students as subjects without notifying and informing their parents. But an exemption is an exemption.
^{[2]}insidehighered.com/news/2012/10/15/stanford-professor-goes-public-attacks-over-her-math-education-research
^{[3]} You may see the dramatic language she used in her refusal, which is quoted in a recent article by Prof. Bishop.[B]. It starts as follows: "I have documented the different lies and insults you have written about me and I have decided not to engage you in discussions of my study..."
^{[4]} In an e-mail from Boaler dated 10/23/2006, she asserts that "There is no way of knowing the grade levels of the students [taking the STAR Algebra I exam] (that I know of) and your depiction of the data as that of `9th grade students' is incorrect."
Download Problem with hidden data [PDF]