Not all school data are equal

Students in their senior years of high school are subjected to swathes of testing and assessment. Hot on the heels of the final NAPLANs tests in Year 9 comes a steady succession of assessment exercises. Once students reach the ‘pointy end’ of their school careers they face a barrage of PATs, GATs, SATs and SACs, as well as whatever other internal checks their schools choose to enact. On the face of it, then, schools get great benefit from the varying insights that these tests offer:

NAPLAN is maybe three hours of your life in Year Nine. But if you think about all the other assessments you do in Year Nine, there’s probably 100 hours… So that 100 hours tells us a great deal more about a student than what three hours of NAPLAN does. The advantage on NAPLAN is that it matches to other schools and other kids.So it’s a big picture … it’s a fuzzy picture but it’s big. And what weknow about students [from internal tests] is more specific, but it tells us only about the student – it doesn’t link into other schools [Stephen, senior maths teacher at Northland]

Despite this wealth of data, it is becoming notable during our fieldwork how schools tend to privilege specific bits of data in their bouts of number-crunching and data analysis. In this sense, it is clear that different data sources and data points are understood (and valued) in very different ways.

Take, for instance, the data that Northland College chooses to use when gauging students’ academic ‘performance’. In assembling an accurate prediction of how students are likely to perform in their end-of-school university admissions ranking, Northland has a wealth of assessment data from the previous two years of each student’s schooling at its disposal. Despite this, Northland’s modelling is based on data from one assessment source. These are students’ ‘SAC’ results – data derived from various ‘School Assessed Coursework’ components of students’ Year 11 and Year 12 ‘VCE’ courses (advanced school-leaving examinations equivalent to ‘A-Levels’ in the UK).

SACs are internal assessment tasks that the school writes and administers itself in the form of essays, reports and tests. As such, Northland already knows the SAC scores that their students have attained. It also has collated considerable ‘historical’ data in the form of SAC scores of previous students stretching back for five years or so. These data are therefore considered to be the best basis from which to make predictions of students’ likely future performance.

It is interesting to reflect on why SAC scores were perceived by Northland to be the most useful data in making their predictions. So, what were the reasons for privileging this data over all the other assessment data that might have been used. Clearly, one advantage is accessibility – this is data that the school has ready access to. Another advantage is perceived ‘authenticity’ – as Northland authored and administered these tests itself, the data is trusted in a way that might apply to externally-run assessments. A less tangible criterion is the perceived validity of this data – as Stephen put is, “when it comes to the SAC, every [student] knows this number means something”.

These might appear arbitrary distinctions to make given that all school assessments are ostensibly measuring academic competency. However, in the eyes of Northland’s senior staff SACs were the only assessment data at their disposal where it could be assumed that students were genuinely trying to perform to the best of their ability. As Stephen reasoned: “We figure[e] that a kid puts in a lot of effort for a SAC … but who knows what they do for the other assessments!”.

As the school’s ‘Data Lead’, Stephen justified this presumption in statistical terms from having ‘played around’ with all of the school’s assessment data over the past few years – “we do tend to see that there’s more variation in results [from other assessments] than what you get in the SAC results”. He personally attributed this to the varying ways that students choose to approach different assessment tasks. With so much assessment coming their way, Stephen reasoned that students tend to adopt varying strategies when taking different forms of assessment. In particular, he reckoned that some students sometimes chose to use lower stakes ‘diagnostic’ tests to gauge their minimum (rather than maximal) capabilities. This was certainly reflected in the advice that Stephen was giving his own classes …

I say to my kids, ‘Look … this is a diagnostic test only, so you could treat it two ways. You can either give it everything or you can give it almost nothing and see if it tells you what you know. And then you know what to focus on. Or you can really study like hell and then it tells you what you’re capable of if you – so it’s up to you want you want to do. As long as you’re prepared to stand behind the number’.

Yet, even within Northland’s preference for SACs, it transpired that not allSAC scores were valued equally. In particular, for its performance ‘prediction’ analysis the school prioritised SAC scores from the first semester of the school year when it was assumed that students were fully focused. Moreover, each of Stephen’s spreadsheets contained a notable number of cells that were colour-coded in yellow and orange. Stephen explained that this denoted subject areas where “the correlation turns into garbage”. This might be because of a small sample size, or simply Stephen’s previous experience that this was a subject where there was rarely a good correlation between SACs and eventual end-of-school performance measures:

The colour coding’s telling me where historically the data is no good. Mandarin this year had three kids. So, doing this is statistically invalid … Something like Literature or Music performance, the SACs are *always* so different to what the end result is.

In these ‘garbage’ cases, class teachers were asked to provide rough ‘ballpark’ estimates of how well the student performed in the actual classwork. One instance where the test data was not considered trustworthy was Music. This was SAC data that in Stephen’s experience was “historically no good”. Here, Stephen explained that these classes involve complex composition and creative tasks – all aspects of learning that are not captured in the SAC exercises. As such, he would spend time walking around the school to ask Northfield’s music teachers to retrospectively concoct hypothetical exam grades which could then be fed into the final ‘predictive’ spreadsheet:

Something like Music performance, the teacher knows so much more. So, I’ll say to the teacher, ‘What’s your best feel for what that kid’s going to get?’. And if they say, ‘’I think they’re a 29. I put in a 29 in a different column and it throws out a 29 as a predicted score. And then it scales it based on last year’s scaling.

Tellingly, despite the very different origins of this data, each spreadsheet cell gives the appearance of uniform test data and a statistically valid calculation. Only Stephen knows where the ‘ballpark’ figures sit:

Researcher: ... and no student or parent is going to know that that came from a guesstimate?

Stephen: Nope! We only tell them the final score. We don’t tell them any of the numbers that were filtered in.

These ad hoc judgements over the provenance and quality of school data recur throughout all our interviews with the school staff tasked with ‘number crunching’ and conducting data analyses. Another common point of contention when using teacher-led assessment data are suspicions that some colleagues are ‘marking harder’ and being less ‘fair’ in their grading than others. In this sense, data that are generated ‘in-house’ are not necessarily always seen as good!

All of these judgements constitute minor adjustments, omissions and substitutions by the staff tasked with running the data analyses and doing the schools’ ‘spreadsheet work’. Yet, while these decisions and distinctions might appear like minor technical choices, they reflect the contingent and highly contextualised nature of school data. Crucially, then, these are highly localised understandings and rules-of-thumb. Stephen’s preferences for certain SAC scores over others was based on his own personal teaching approaches as well as his perceived ‘feel for the numbers’ resulting from working with the school’s datasets on Excel for the past five years or so. Faced with the same mass of data, it is unlikely that two teachers (let alone two schools) would wholly agree over ‘what counts’ and what does not. It seems that school data is not as objective and neutral as some people would have us believe.

Not all school data are equal

Like this:

Related