M/F/X … why converting gender into data is more complex than often presumed

We have written previously about the exclusionary nature of school data – using the example of non-binary students and the continued restrictive categorisation of either being ‘M’ or ‘F’ in the majority of data sets and system interfaces. 

In critical terms such mis-management can be seen as an egregious example of what Dean Spade terms ‘administrative violence’ on the part of educational institutions, software companies, and other authorities. On a personal level, this is also a telling example of what Sasha Costanza-Chock labels a ‘dysaffordance’ – a term derived from the idea of gender dysphoria, describing where a system forces individuals to mis-identify themselves in order to continue the interaction

Yet, in simple data science terms the continued use of M/F binaries is increasingly out of step with current best practice. In basic terms of statistical validity, the decision to restrict gender to a finite number of closed categories (usually ‘Male’ or ‘Female’) is now understood by growing numbers of data scientists to fly in the face of widely accepted norms that gender is fluid rather than compliant with a small set of fixed categories.

Instead, data science conventions are changing to reflect the social fact that growing numbers of people now identify with a gender that is not exclusively male or female, alongside growing numbers of people now identifying with a gender that is not the sex they were assigned at birth or during infancy. 

As such, many statisticians and official agencies now follow the requirement to include a ‘three-option’ indication of ‘gender’, which includes the option for individuals to self-identify and self-describe their gender. At the very least, this results in a ‘ternary’ of categories along the lines of M/F/X – where X refers to ‘Non-binary sex’, ‘Genderqueer’, ‘Gender-fluid’ or simply ‘Another term’

Statisticians are also careful to distinguish between ‘sex-assigned-at-birth’ (taken to mean ‘biological sex’) and current gender status. This conceptualisation of biological sex now rightly recognises the fact that some people are born with variations of genetic, hormonal or physical sex characteristics – prompting a category of ‘Intersex’ or ‘Differences of Sex Development’ (DSD).

These distinctions are not simply fringe identity politics, but are now enshrined into how governments and official agencies around the world are making sense of gender and sex. The Australian Bureau of Statistics – for example – stresses the need to … 

“… ensure that appropriate options are provided to individuals who may identify and be recognised within the community as a gender other than the sex they were assigned at birth or during infancy, or as a gender which is not exclusively male or female”.

It is telling, however, how infrequently these distinctions and subtleties are reflected in the everyday data that is generated through our uses of digital technologies – in the profiles that we are asked to create, in the interface designs we click through, in  feedback forms we provide. 

The main issue here from a ‘critical data literacies’ perspective is why data about gender or sex are even being generated during the course of our everyday technology use in the first place – is this data that reallyneeds to be generated? Why exactly do system developers and software designers need to know this information … and what is likely to be done with this data? 

Secondly, however, is the politics of how any querying of gender and sex is approached by technology developers and designers. Above all,  the decision to reduce gender to a binary of Male/Female is a subjective programming choice on the part of tech developers – a political decision rather than an objective measure.

As Luciano Floridi observes, power in the digital age is increasingly derived from being able to shape the nature of what is asked because those who control the questions shape the answers”.