Hit 4.1 and you’re flagging ‘Purple’

[note: this is a ‘composite account’ of similar practices encountered across three different educational institutions. By combining the stories, this mode of writing allows us to highlight and explore salient data practices without identifying a specific school. See Markham (2012) for a discussion of this writing approach]

PART ONE

At the end of every school term, teachers at Bayside High are required to gather student feedback on how their courses have gone. This involves getting each class to complete a brief survey on Google Forms, comprising six standard questions that need to be rated from 1 (not very good) to 5 (very good). Sometimes teachers choose the option to also include an open-ended textbox at the end of these six closed questions, which allows students to type in their own comments. That said, the fact that every response is traceable back to the relevant student’s personal ID means that very few comments are usually proffered.

These ‘Triple S’ scores (i.e. ‘Student Satisfaction Survey’) are an ordinary instance of how data is produced using digital technology within Bayside. This is also precisely the kind of everyday data-driven process that we are interested in exploring and addressing in our research.

Indeed, the brief description just provided touches on a number of interesting issues. Firstly, people’s recollections about the reasons for introducing this feedback process are usually vague. Nevertheless, long-standing members of the school’s management team recall being motivated by messages coming from the state government Department of Education (aka ‘The Department’) at the beginning of the 2010s exhorting schools to make visible use of ‘evidence’. The idea of gathering student feedback as a basis for gauging the quality of courses was raised at a few regional leadership meetings, and quickly picked up by a number of schools in the area.

Second, it is interesting to reflect on the school’s choice of online platform for this exercise. The use of Google Forms stems from the ‘locked-in’ nature of IT provision at Bayside, rather than what might be necessarily considered as the best tool for the job. Bayside is a self-styled ‘Google school’ – making use of Chromebooks and the ‘G Suite for Education’ package of applications (part of which is Google Forms). Staff are well versed in using Google Forms, and students are often tasked with setting up short surveys as part of their assignments. As such, Google Forms is familiar and relatively straightforward technology for everyone in the school to engage with.

The design of the ‘Triple S’ template that all teachers are expected to use is also interesting. The small group of lead teachers initially tasked with planning the survey were mindful that Bayside students quickly get bored filling in forms. Nevertheless, the ‘clear steer’ from school management was for a process capable of producing ‘meaningful’ data. In an attempt to balance these concerns, six different aspects of student ‘satisfaction’ were chosen to allow the survey to cover a decent amount of ground. The six areas of questioning were taken from a set of standards originally developed by AITSL (the Australian teaching standards agency). This resulted in items that probe whether lessons had ‘included a range of teaching strategies’ or featured ‘challenging learning goals’. These were introduced to the ‘Triple S’ planning meetings as covering important aspects of teaching as well as perhaps lending the survey a degree of credibility. Six years later, it was still reckoned that this appropriation of AITSL criteria was a key aspect of ensuring the ‘validity’ of the survey results in the minds of many teachers.

While the wording of the six items remained was fairly unconventional, the choice of categories attached to the 1-to-5 rating scale was continuously tinkered with over the first couple of years that Triple S was being put in place. Initially, the Assistant Principal charged with getting the system up and running opted for a four-point scale spanning ‘Very satisfactory’, ‘Satisfactory’, ‘OK’ and ‘Not satisfactory’. However, some of the initial test-runs of the process found younger students unsure of what ‘satisfactory’ meant. Similarly, a few vocal teachers questioned the ambiguity of the term – surely their students could be learning well while not being especially satisfied. After this, the Assistant Principal settled on the more nebulous label of ‘Good’, while also switching to a five-point scale to force more variation in students’ responses. For the first couple of years it was proving very difficult to persuade students to click anything other than ‘OK’.

These preliminary efforts saw the ‘Triple S’ process became quickly embedded as a regular (and largely unremarked upon) part of Bayside’s academic calendar. Around Week 8 of every ten-week term, teachers would endeavour to chivvy students into completing the online forms. With students taking up to ten different classes, this saw many students making around 60 separate judgements each term about the quality of their classes. The Google Form for each course was configured to not permit any question to be skipped. Moreover, any student not completing a form was quickly identifiable and chased up (first with an automated email reminder, and then a face-to-face ‘please explain’). All told, the vast majority of Bayside students were dutifully clicking through 240 judgements a year … a tangible example of ‘student voice’ and ‘evidence-led practice’.

PART TWO

In one sense, this example of data gathering is a completely normal aspect of contemporary schooling. Indeed, the online production of student satisfaction data is a commonplace procedure that takes place in one form or another in many schools around the world. Of particular interest to us, however, is how this routine data-generation practice was playing out within the everyday life of Bayside High. In this sense, the Triple S process is a fascinating example of how data-driven microworlds can become established within schools such as Bayside – i.e. specific aspects of the school culture that are generated by data while also involving the generation of data. It is illuminating, therefore, to reflect on how Triple S was influencing the conditions and character of Bayside.

For example, it was notable howthe Triple S ‘results’ infused the language and shared understandings within the school. It was common to hear particular classes or subject areas being talked about by managers and administrators as ‘Purple’ (very good) or as ‘Red’ (dangerously bad). These colour-related connotations were dropped into day-to-day conversations across the school, although few people were able to specify exactly what they referred to. After spending some time talking with school staff, it transpired that these colours related to the commonplace in-house method that Bayside’s administration deployed when working with spreadsheet data on Excel. This institutional habit had developed to delineate key school data that had to be understood by management and teachers using Excel’s ‘conditional formatting’ option. This feature allows cells in a spreadsheet to be colour-coded according their value. While the cut-off points varied according to the specific data-set, Bayside’s colour-coding always runs along lines of ‘Red’ (indicating requiring immediate action), ‘Yellow’ (indicating requiring improvement), ‘Green’ (indicating good) and ‘Purple’ (indicating excellent). These codes are now applied routinely to a variety of key indicators across the school, such as student attendance, student behaviour … and the ‘Triple S’ rankings.

As such, it is understandable that the Triple S scores were subject to this standard ‘Bayside way’ of processing data. As far as most teaching staff got to see it, then, Triple S data analysis took the form of aggregated means of the six Likert items being categorised into the familiar vernacular of how data was presented and consumed within the school. Specifically, it was deemed that an average ‘Triple S’ score of 4.1 or above (on a scale of 1 to 5) would merit the top ‘purple’ ranking. This number had initially been calculated by the Assistant Principal as resulting in roughly 10 percent of classes being designated ‘Purple’. However, the intervening years had seen notable rating-creep. This meant that around one-third of courses were now getting average scores that merited purple, with the majority otherwise getting green. The AP reckoned that increasing the threshold to 4.5 might get things back to a more balanced level – although this would certainly spark consternation amongst teachers who found their scores were suddenly down-graded. For the time being, then, adjusting the thresholds was not considered to be worth the hassle.

While many aspects of Bayside were being talked about in these colourful terms, very few non-administrative staff were aware of the colours’ origins in the world of Microsoft spreadsheets. As just mentioned, the origins of the colour-coding were attributed back the three-color ‘conditional formatting’ option on Microsoft Excel which automatically allows for the colouring of all cells either Green, Yellow or Red. This default option had been used for years as part of Bayside’s administration procedures, and had been carried over when the school officially switched over to Google Sheets (the outputs of which were usually converted into .CSV files and imported back into Excel by school administrators). As just mentioned, four years ago, the fourth ‘Purple’ colour had been introduced after concerns that too many courses were getting ‘Green’ flags. As Excel does not have a default four colour scheme, the exemplar ‘Purple’ colour was suggested by another Assistant Principal because it seemed distinct from the other colours, and also might be seen as having what she jokingly described as ‘high status’ associations with royalty and a general ‘sign of class’. She also mentioned (again half-jokingly) that ‘Gold’ was felt to be a little too divisive.

These colour connotations were certainly prevalent across various areas of school life. For example, a few teachers admitted that their initial requests to be considered for promotion largely hinged on the prevalence of ‘Purple Scores’ on their profile (or conversely the tell-tale scattering of red and yellow). Teachers would self-regulate their expectations based on their own readings of these colours – for example, reasoning that they were ‘not hitting enough purples’ to be deemed ready for promotion. Conversely, talk of students ‘flagging Red’ would often be heard in staffroom conversations as short-hand for a struggling and/or troublesome student. In many of these conversations there were distinct slippages in what was being indicated. Regardless of the very specific nature of the six Likert-scale items that students were being asked to click on each Google Form, the resulting ‘Triple S’ colours were generally conflated as overall indicators of the quality of the provision (‘how good the course is’), as well as indicators of teachers’ own teaching quality (‘how good my courses are’).

Thus, this seemingly low-key process of students completing Google surveys belied the relatively ‘high stakes’ nature of how the data was subsequently used. That said, there were a few signs that teachers and students acknowledged the significance of Triple S around the time of data collection. Indeed, there was a distinct performativity in terms of how the data was generated. Toward the end of each term teachers and their classes would enter into back-and-forth negotiations of getting the ‘Triple S’ done. Most teachers would temporarily halt their classes throughout week eight, and devote ten minutes where students had to log-on and ‘complete the sheet’. After this, teachers would chase up individual students in subsequent classes, or send begging emails. Most teachers were fairly low-key in directing their students’ responses – although some would often remind students of the ‘importance’ of these scores to the school, and stress the need “to not just pick the middle’’. During these ten-minute truces, teachers might make semi-ironic reminders about how much fun the classes had been, or how great the students had been. A few students might make jokey comments about giving low scores. Generally, this process was usually entered into in good humour, or at least a slighted stilted indifference.

It is also interesting to consider the different ways that staff and students related to the ‘Triple S’ scores, and how these numbers were brought to life by different people in different contexts. For example, beyond their mild ribbing of teachers during the ‘complete the sheet’ sessions, a few diffident students would talk of ‘trashing’ courses (or teachers) that they did not like by giving excessively low scores. Some teachers would get quite anxious about how their scores this term might pan out – this was especially notable amongst younger teachers on probation, and a few older female teachers who would pay particular attention to any negative comments. Conversely, other teachers (often – but not exclusively – male) would defiantly talk of how the Triple S scores were ‘meaningless’, with a few priding themselves in getting lower scores as a badge of teaching particularly ‘difficult’ or ‘challenging’ topics.

Elsewhere, however, school management teams and administrators in non-teaching positions would generally talk about the Triple S data in more aggregated terms. One assistant principal referred to getting a ‘heat map’ of all the school’s courses, and then being able to ‘drill down’ to any ‘hot spots’ (i.e. clusters of course there were getting red or purple scores). This exercise was described as giving school managers and administrators a ‘birds-eye view’ of teaching throughout the school. Yet, the spreadsheet presentation of these average scores was always on rows aggregated by ‘Year Group’ and columns arranged by ‘Subject Area’. This default arrangement therefore narrowed the scope of any such meta-analysis. As such, these heat maps only tended to draw the AP’s attention toward ‘problems’ relating to specific subject teaching teams and/or across particular year groups. Any other possible factors that might be impacting students’ ‘happiness’ with each course (e.g. when lessons were timetabled, whether classes were located in newly refurbished classrooms or temporary portable classrooms) were not foregrounded in the visualisations.

It is also interesting to consider the computational work behind the production of these indicators. Of course, the mathematical basis for the calculation of mean scores from ordinal ratings is not especially robust. Similarly, questions can be asked regarding the ‘construct validity’ of the six Triple S items as indicators of satisfaction, quality or effectiveness. Yet, while a few teachers and students would occasionally challenge the provenance of the whole exercise, more often than not people would be happy to go along with what Verran and Lippert (2019) characterise as the alluring ‘smell of numbers’. In the absence of any other immediate evidence, most staff were content to go along with the collective understanding of what 4.1 and ‘flagging purple’ meant.

Nevertheless, there were notable instances where this commensurability was being tested and worked around. For example, school managers described making allowances for ‘tricky’ subject areas that were understood as inevitably less popular – such as the received wisdom that subjects like Maths would always get a grade or two lower. As such, a ‘Yellow’ or a ‘Green’ for these subjects was considered a success, whereas for most other subjects only ‘Green’ or ‘Purple’ was seen as acceptable. A few teachers suggested cynically that this extra leeway would ‘keep Maths onside’ – i.e. placate the most numerate teachers from challenging the underlying validity of the whole exercise. Either way, the consequencesof the production, calculation and subsequent analysis of the Triple S data was certainly not set in stone.

Notwithstanding this maintenance work, it was notable how the ‘Triple S’ numbers (or, more frequently, the attributed colour) had become an accepted shorthand within Bayside for teaching ‘quality’ and even teachers’ own classroom competence. In terms of how Bayside externally presented itself, Triple S was often used as a tangible totem of the school operating along consultative lines, paying attention to ‘student voice’ and making ‘evidence-led’ changes to its teaching provision

In these ways, then, the ‘Triple S’ scores provide a good example of what Ingmar Lippert (2018) describes as the ‘world-making’ of individual data-points and indicators. This data was certainly part and parcel of constructing an institutional identity of Bayside as an evidence-driven school – lending a veneer of precision and objectivity to otherwise woolly concepts (such as student happiness, or teaching quality), and what could otherwise appear to be subjective decisions (such as whether a teacher is promoted, or whether a class continues to be taught). The Triple S data smoothed the way for tricky instances of institutional decision-making, and acted as a lodestar for individual teachers to judge their own professional performance (as well as infer the performance of others). All told, these seemingly simple sets of scores from 1 to 5 were established within the everyday life of Bayside a significant source of comfort for some, and significant source of anxiety for others.

REFERENCES

Lippert, I. (2018) On Not Muddling Lunches and Flights: Narrating a Number, Qualculation, and Ontologising Troubles. Science & Technology Studies 31(4):52-74

Lippert, I. and Verran, H. (2019). After Numbers? Innovations in Science and Technology Studies’ Analytics of Numbers and Numbering. Science & Technology Studies, 31(4):2-12.

Markham, A. (2012). Fabrication as ethical practice: qualitative inquiry as ethical practice. Information Communication & Society, 15(3):334-353

Hit 4.1 and you’re flagging ‘Purple’

Like this:

Related