‘Garbage In, Garbage Out’ is a long-standing adage amongst computer programmers. In a world now increasingly shaped by the generation and processing of digital data, the underlying intent of this phrase remains as relevant as ever. For example, Safiya Noble’s recent book on ‘Algorithmic Oppression’ develops a similar argument of ‘Bias In, Bias Out’. Thus, while remaining mindful of the need not to ‘black box’ the technologies that are doing the data processing, our research into the datafication of schools needs to think carefully about issues such as ‘data provenance’ and ‘data quality’. In particular, how do claims about data-based education stand up in light of the origins of its source material?
The data consequences of ‘dirty policing’
As always, it is helpful to consider how these issues are being discussed in other areas of society. Of particular interest is a recent paper by Richardson, Schultz and Crawford examining the use of data in policing. In short, these authors raise concerns that much of the data used by law enforcement agencies is “produced within the context of flawed, racially fraught and sometimes unlawful practices” (p.1) – i.e. what they term ‘dirty policing’. As the title of their paper suggests, the data that arises from such acts of dirty policing leads inevitably to ‘bad’ analyses and predictions.
This argument is worth unpacking in a little more detail. First, the authors outline a spectrum of dirty policing practices ranging from everyday sloppy oversights and omissions through to occasional outright illegal action. On one hand, then, are everyday instances of under-reporting particular incidents while over-reporting others. Similarly, police personnel might also be involved in ‘fudging’ records, and other minor actors of ‘massaging the figures’. Such practices might seem harmless enough to those individuals doing the fudging and tweaking, but Richardson shows how when these practices become an acceptable part of work-culture across a police precinct then they constitute systemic data manipulation. Thus, even the most minor acts of distortion and exaggeration soon mount up to widespread contaminations of data. In addition, are more systemic illicit practices – such as fabricating reports, planting evidence, false arrests and convictions. As well as leading to clear mis-carriages of justice, such actions further skew the data that results.
Richardson and colleagues certainly cast the provenance of data in law enforcement in a very poor light. As the authors put it, the spectrum of dirty policing practices culminates to “shape the environment and the methodology by which data is created, which leads to inaccuracies, skews, and forms of systemic bias embedded in the data” (p.1). Further down the line, then, these compromised data can distort calculations and lead to confirmatory feedback loops that mis-direct police attention and efforts. Richardson and colleagues refer to this as ‘dirty data’. This draws upon the common data science use of the term to refer to data that is missing, incomplete, entered incorrectly or in a non-standard format. Here then, Richardson and colleagues seek to add the following variation of ‘dirty data’, i.e.:
“data that is derived from or influenced by corrupt, biased and unlawful practices, including data that has been intentionally manipulated or ‘juked’, as well as data that is distorted by individual and societal biases” (p.4).
In this sense, any talk of ‘data-driven policing’ or ‘predictive policing’ is substantially compromised by “the legacy of unlawful or biased policing practices that they are built on” (p.1). Law enforcement systems relying on data generated from these flawed policing practices cannot make any convincing claims for greater objectivity, transparency, or accountability. Instead, this data is much more likely to perpetuate the biases, omissions and injustices that underpinned the initial flawed practices.
Equivalent examples of ‘dirty teaching’
So how might these criticisms relate to schools and schooling? Equivalent descriptions of ‘dirty teaching’ might appear harsh, but schools are certainly party to a number of similar strategic, tactical and/or performative practices. Indeed, the highly audited and accountable nature of contemporary schooling means that school staff (and the organisations they work for) are under increasing pressure not to be seen to fail. As such, schooling might well be seen as another institutional culture where data-related practices might not be as ‘pure’ as one might like to assume.
For example, news reports over the past few years have regularly highlighted cases of schools feeling the need to ‘cook the books’ in order to appear to be performing well. Some reported practices involve the strategic removal of low-performing students out of examination classes (so-called ‘off-rolling‘), or a purge of school exclusions just prior to periods of ‘high stakes’ assessment. Elsewhere, schools have also been caught engaging in various forms of malpractice relating to the conduct of tests and examinations.
In addition, are more mundane acts of data transgression by individual classroom teachers who are perhaps just trying to be helpful. For example, a teacher looking to see the best in a student might reasonably justify her decision to round-up a grade, tip a ‘Fail’ mark over into a ‘Pass’, turn a blind eye to an absence. Teaching is a profession where people are generally minded to encourage and exhort – even if this requires being a little lenient in one’s gradings, decisions and reports.
Students are also not immune to such practices. On one hand, most students will be familiar with the conditions under which some teachers coerce ‘student satisfaction’ feedback. On the other hand, students themselves might also engage in episodes of what might be termed ‘dirty learning’ in order to look better on paper. At one extreme are instances of students breaking into school online databases to alter test scores and grades. Far more acceptable practices might include the cynical engagement with online learning systems – going into a ‘click frenzy’ to fool the system analytics into recording high levels of engagement. In addition, students have always cheated at tests, copied work, and engaged in other well-worn forms of academic malpractice. All told, data is being produced about teaching and learning for varying reasons and with varying levels of validity and rigor.
Data and teaching – the need for suspicion?
This is not to lay the blame on individual teachers, students or schools, all of whom are placed under considerable pressure by the data that they produce. In many ways, all these practices are wholly defensible from the point of view of the individual student, teacher or school. It is important to remember that these sorts of action have existed for many decades before this current era of ‘Learning Analytics’ and ‘Educational Data-Mining’. Moreover, the idea of manipulating data is now part of our wider digital culture – for example, people have become well-versed in favourably manipulating their FitBit data or social media ratings. In one sense, then, school-related ‘dirty data’ practices are an inevitable part of the context within which educational data science exists.
However, these practices certainly have understandable cumulative consequences. Mirroring Richardson’s concerns over dirty policing, any use of large-scale datasets for modelling and predicting what goes on within schools will be significantly compromised by systematic gaps and biases in what has been reported. This is not just a technical irritation. Instead, this is likely to led to the mis-allocation of resources, mis-direction of teacher attention, and general skewing of how teachers are nudged by their data-driven systems. For example, if the educational experiences and outcomes of particular (marginalised) students are being omitted from data collections, then it is likely that dedicated support and extra attention will be denied to those who need it most. If data is to be a useful element of any school community then there are clear benefits from recording failures, instances where teachers are struggling, or where students are at risk.
Unfortunately, this is a problem with no easy solutions. As Richardson et al. remind us, despite what vendors of data systems might claim, there is no technical fix or statistical adjustment for such distortions. The only way to overcome the contamination of data in this way is a root-and-branch reform of organisational cultures that value and reward accurate and honest data practices. Otherwise, as they argue, “it is extremely difficult, if not impossible, for systems trained on this data to detect and separate ‘good’ data from ‘bad’ data” (p.5).
Of course, changing the ‘data culture’ of policing or teaching is not going to be reformed overnight. As such, Richardson’s paper highlights the need for heightened scrutiny and suspicion over claims of educational data to be accurate, valid and precise. When presented with any argument for ‘data-driven school reform’ or ‘data-based insights’ into teaching and learning, it is important to remember that contexts such as classrooms and staffrooms are not hermeneutically-sealed, clean laboratory conditions within which clean datasets can be extracted. Instead, these are messy social settings that are shaped by messy social dynamics. In this sense, any data that are produced within schools need to come with a considerable ‘health warning’ and be used only in a circumspect (if not suspicious) manner.
Given all of this, one of the initial key questions that we now need to explore in our research is straightforward enough:
Q: Just how much of the data being generated in schools is being generated in good faith … and can be used accordingly?
Notes on: Rashida Richardson, Jason Schultz & Kate Crawford (2019). Dirty data, bad predictions: how civil rights violations impact police data, predictive policing systems, and justice. New York University Law Review Online [forthcoming]