There are often ‘good’ organisational reasons for ‘bad’ school data

Most schools have a few ‘big ticket’ data points – a particular indicator, score or ranking which looms large in the life of the school. These are data points that most staff and students are aware of, have in the back of their minds, and might occasionally influence their decisions and actions. In a previous blog post we gave an example of such a data point in the form of ‘course evaluation’ surveys generated on Google Forms. As that blog post explored, the resulting colour code can come to underpin significant forms of mean-making and shared understandings within schools in a number of subtle ways.

We are interested in these ‘big ticket’ forms of school data for many reasons – especially what they can tell us about the realities of how data is ‘done’ within the milieu of contemporary schooling. In this sense, one fascinating characteristic of these data practices is how messy, loose and improvised they often turn out to be. As with all forms of digital data, it is important not to presume these to be precisely calibrated, objective processes. Instead, the ‘data science’ behind any production of such data within a school is usually a convoluted and compromised process. If we are interested in the realities of the datafication of schooling, then it is worthwhile paying close attention to these messy processes.

Let’s return to the common practice of collating student course evaluation survey data into implied measures of course ‘quality’. As we have detailed in the previous post, this might involve student responses from a handful of Likert-scale survey items (rated from 1 [not very good] to 5 [very good]) being aggregated and then averaged into a composite ‘score’. Rather than use the actual number, many schools will then assign each course a colour-code according to a set of pre-defined thresholds. For example, in one research school any score of 4.1 or above will merit the top colour grade (in this instance a ‘purple’ as opposed to ‘green’, ‘yellow’ or ‘red’).

In terms of good data practice, there are a number of problems here. ‘Ranked’ Likert-style categories do not strictly meet the assumptions for measuring central tendency using an arithmetic mean. Instead, this type of non-parametric data usually merits a median score being calculated. Despite the ways that this colour-scale is understood and talked about throughout the school, the survey items have little (if any) validity as composite elements of an overall notion of teaching ‘quality’. Moreover, the thresholds chosen for meriting each colour category are highly arbitrary – usually reverse-engineered to ensure that there is not a preponderance of courses in the top or bottom categories (what teachers responsible for this process will sometimes justify as ensuring a ‘normal distribution’ or ‘bell curve’).

Many of the ‘big ticket’ data points that we come across in schools are ‘calculated’ in a similar ad hoc manner. This is the reality of the datafied school. What might appear at first glance to be sophisticated ‘indicators’ are actually the result of a few well-intended teachers with a statistical interest, constructing protocols and methods of calculation in an idiosyncratic and slightly compromised manner.

Let’s take another example from another research school. Here, a school ‘data lead’ describes an annual ‘data-crunching’ exercise that he undertakes each school year to produce subject-specific ‘predictions’ for the new incoming cohort of final year students. This involves collating each individual student’s test data from previous years and running a series of analyses to produce ‘z scores’ (standard scores), which are then fed into a simple regression technique. At the final stage of each ‘prediction’, the teacher visually estimates a rough line of best fit and final predicted score is calculated. This is converted in the form of an ATAR score (the Australian Tertiary Admission Rank – a common university entrance metric used in Australia that maps a student’s best five aggregate subject scores to a set of published national averages).

This is a laborious and often subjective process, with the teacher taking great care to make sure that the numerical data reflects what he knows personally about each individual student and the actual classes that the test scores relates to. This therefore involves a considerable amount of modification to the datasets. For example, the teacher goes through each student’s individual records substituting missing test scores with averages from the year group. Similarly, for specific subjects where he reckons that the test scores do not give a full indication of the work involved in those particular classes, then he goes to the relevant class teachers and asked them to provide a rough ‘ballpark’ estimate of how well the student performed in the actual classwork in lieu of the actual test score. One example of not trusting the actual test data is English literature classes in later years of schooling. Here, the teacher explained that these classes engage students in quite complex textual critique and creative writing tasks – all ‘learning’ that is not captured in the standard end-of-year tests. As such, it makes more sense for students’ actual class teachers to retrospectively judge their performances in the form of a hypothetical exam grade which can then be fed into the final ‘predictive’ score.

Despite the precarious nature of these ‘calculations’, the eventual predictive data points form a significant rite of passage for each student entering their final examination year. These predictions are intended to form the basis of a planning meeting between the student, their parents and new class teacher at the beginning of the final school year. The data-point therefore acts as a necessary catalyst for a conversation of what needs to happen next. The students and parents only get to see the final predicted score (actually calculated as to one decimal place) in the form of a 10-point band. The predicted score is couched in terms of ‘Students who have performed like you up until now, have gone on to get an ATAR between these scores’.

The lead teacher justified the considerable effort and time that it took him to produce such figures with the reckoning that his predictions turned out to be ‘correct 90% of the time’. However, he was also at pains to qualify each stage of the analysis as ‘fuzzy’, keen to stress that ‘the numbers are rubbery’ and the substantial contextualisation work involved in ‘massaging the data’.

Of course, such an approach is a common feature of most aspects of how things actually get done in schools (and most other aspects of everyday life). Indeed, we have a rich language to describe such practices. For example, Britain has a long history of ‘bodging’, ‘fudging’ and ‘made do and mend’. The Australian notion of ‘bush mechanics’ refers to the need to make best use of available resources and unorthodox approaches to engineer working solutions. In the US the television character MacGyver has become a dictionary verb, defined as “in an improvised or inventive way, making use of whatever items are at hand”. Indeed, when we asked one school’s team of senior teachers with an involvement in data use and IT how easily the new ‘learning management system’ would be integrated with the school’s existing data systems, their answer was simple:

“It will be the six of us with hammer and nails … MacGyvering it”

***

On the face of it, the examples described above might be criticised as unsound and perhaps even sloppy data practice. Yet, they are entirely justifiable and totally commonplace across most (if not all) institutional settings. In this sense, we need to explore the realities of how data is being ‘done’ in these schools in practice, and pursue the basic sociological question of why things are the way that they are. Thus, in the examples just outlined it is interesting to consider (to paraphrase Harold Garfinkel’s study of clinical records), why there might well be ‘good’ organisational reasons for ‘bad’ school data.

As these descriptions illustrate, school staff are often mindful of the ‘shortcomings’ of the data that is being used in their school (even if this is not something that can be explicitly acknowledged when the data is being used). Nevertheless, there are often understandable reasons for carrying on using this data regardless of its flaws. Here there are clear resonances between school use of data at the beginning of the 2020s and Garfinkel’s work in 1960s’ clinical settings. Indeed, over 50 years ago, Garfinkel highlighted the shared understandings amongst clinical staff over the necessary limitations of their record-keeping. For example, it might well be counter-productive to record comprehensive background information about patients for fear of causing suspicion. Similarly, it might also be considered counter-productive to present patients with datapoints that are too specific for fear of causing excessive unease or disquiet.

Similar rules-of-thumb are clearly shaping how the school staff just described engage in their ‘big ticket’ data processes. Garfinkel points to record-keeping needing to fit with the ‘prevailing rules of practice’ of the institution. Any data collection and data analysis therefore is conducted in compliance with a school’s tacit rules of “operating procedures that from [the teacher’s] point of view are more or less taken for granted as right ways of doing things” (Garfinkel 1967). Thus school staff can make good working use of apparently ‘bad’ data (for example, incomplete, inaccurate or ill-calculated variables), and can develop workarounds to deal with any particularly flawed records. What might seem to an outsider as ‘bad’ data can be incorporated unobtrusively into the day-to-day running of the school and what Garfinkel identified as its ‘normal, natural troubles’ (n.b. this sits in contrast with ‘big ticket’ data points that are externally created about the school, such as NAPLAN results or MySchool rankings).

So while the examples that we are finding in our research schools might not be technically ‘good’ analyses of data, they are clearly ‘good enough’ for the schools’ purposes. Predictions are made, targets are set, and potentially awkward conversations can be started between staff, students and parents over a student needing to work harder, or reaching the limits of their potential.

Indeed, the example of the ‘final year prediction’ exercise highlights the school’s (or at least the data teacher’s) underpinning concerns not lying with statistical accuracy or precision per se. Instead, the main purpose of the exercise is to produce analyses that will exhort and encourage (rather than discourage) students at a vulnerable point in their school careers. This is a fine balancing act – reflected in the ‘massaging’ of analyses in order to have this desired impact on each individual student’s performance over the final few months of their schooling.

***

Interestingly in terms of the ‘digital’ focus of our research, this functional use of what might appear to be technically dysfunctional data is aided and abetted by software tools such as the Excel spreadsheet and Google Forms. One might like to imagine that the schools’ data systems and the software tools used to analyse this data are all configured in ways that force the analyst to conform to statistical rules and procedures. Yet, rather than acting as a standardized corrective to any attempted ‘sloppy’ use of data, statistical software packages and spreadsheets are notorious for allowing all sorts of spurious and statistically inappropriate ‘calculations’ to be run regardless of their robustness – adhering to the old computer science maxim of ‘garbage in, garbage out’.

Indeed, all of the software-based analytical procedures described above are well able to incorporate teachers’ ‘guess-timates’ and creative quantifications into scores that are ostensibly accurate to one decimal point. If anything, the implied precision of the computer-generated number in a CSV file full of 250 student records quickly obscures the contingent and improvised nature of the source data and its subsequent ‘massaging’. If anything, it is fully understandable that these calculations quickly take on an exalted status within day-to-day school life – what some computer scientists have begun to refer to ‘garbage in, gospel out’. In this sense, the datafied school is perhaps characterised best as a mess – rather than mass – of data.

REFERENCE

Bittner, E. & Garfinkel, H. (1967). ‘Good’ organizational reasons for ‘bad’ clinical records. in Studies in Ethnomethodology . New York, Prentice Hall

There are often ‘good’ organisational reasons for ‘bad’ school data

Like this:

Related