Despite being an inherently inter-disciplinary topic, discussions of digital data remain beset by disciplinary mis-apprehensions and suspicions. On one hand, it is reasonable to contend that many social scientists writing about data-driven issues lack an adequate understanding (let alone direct experience) of the technical work and underpinning computational theory involved. There is certainly room for academics in the humanities, arts and social sciences to take time to learn how to run a simple linear regression, and appreciate the differences between unsupervised and supervised learning.
At the same time, is continued frustration over the socially-unware and politically uninterested disposition that seems to pervade the computational and data sciences. It is argued, for example, that data scientists show little regard for the complex social contexts within which their work is implemented. Social commentators are infuriated by attitudes within the data sciences that technology is neutral, data is objective, as well as the all-absolving claim of ‘I am just an engineer’.
Thus, while the humanities, arts and social sciences certainly need to up their game, there is mounting pressure for data science to become much more politically-oriented. This contention is worked through in admirable detail by the Harvard data scientist Ben Green in his call to approach ‘data science as political action’. In particular, Green develops two aspects of this thesis – firstly, considering why data scientists should recognize themselves as political actors, and secondly reflecting how data scientists might ground their practice in politics. In so doing, the following points are raised.
#1. Data scientists are not apolitical/ data is not neutral
Green starts by reminding us that attempting to present oneself (or one’s actions) as apolitical is itself a political stance. More specifically, attempting to claim neutrality is a ‘fundamentally conservative’ position that signals implicit support for maintaining the status quo and, therefore, the interests of dominant social groups and hegemonic political values.
As such, Green has little time for data scientists who claim neutrality or that they are somehow operating ‘outside of politics’. No knowledge or action is purely objective, and no data science can be carried out in the expectation of simply discovering ‘knowledge for knowledge’s sake’. Here Green draws on Donna Haraway to remind us that no branch of science is able to lay claim to providing a completely detached, objective “conquering gaze from nowhere”. Knowledge cannotbe value-free – instead, any knowledge is aligned with the social contexts that generate it. As such, data science is not a matter of developing neutral technologies that are capable of being used for good or bad ends.
To reiterate this point, Green draws on Langdon Winner (1980) to argue for embracing the understanding that every artifact arising from data science ‘has politics’. All data systems, processes and procedures are based on design decisions that have impacts that are determinative for society. This is not to solely blame data scientists for the consequences of their design and development work. Yet it is important for the field to acknowledge some degree of responsibility – especially for how data scientists choose to interact with the aspects of society that their products interact with.
#2. The trap of turning to data ‘ethics’
Of course, we have recently seen well-publicized efforts to imbue data science with an awareness of ethics, and to professional understandings of fairness, accountability and transparency. Yet Green remains skeptical of these recent shifts in emphasis. In this sense, Green follows on from the burgeoning criticism of high-profile episodes of ‘ethics-washing’ amongst Big Tech actors where ethics frameworks and ethics boards are established to no great effort (other than as an attempt to avoid regulation). While attuning data science toward issues of ethics is a welcome ‘first step’, it remains an insufficient response by itself.
For example, Green argues that recent calls for computational and technical professions to adopt a form of Hippocratic oath overlook the fact that professional codes rarely (if ever) result in social justice. These codes seldom contain clear normative directions of what data scientists should be doing (i.e. beyond vague illusions to ‘being aware’ of the social, cultural and political impact of their work). Moreover, these codes rarely are reinforced by mechanisms to ensure that programmers and engineers follow the stated principles or else are held accountable for any violation. It can also be argued that the idea of easily achievable ‘ethical’ action simply propagates the false dichotomy that technology and society are somehow distinct from each other rather than being inherently entwined.
As such, there continues to be good grounds to be doubtful of calls for ethics training to be established within data science education. Indeed, there is growing pushback from within the data science community against the tokenism of ethics training. As Ellen Broad argues, the idea of training a new generation of data scientists who are highly technically-skilled and thoroughly-drilled in issues of fairness/ accountability/ transparency replicates the well-worn trope of the ‘unicorn data scientist’. These are characteristics that no one individual can be expected to possess as a matter of course. Instead, these qualities will usually only result from a mixture of people working together in collaborative teams.
#3. The trap of turning to ‘data science for social good’
Alongside data ethics, Green also problematizes the seemingly progressive position of pursuing data science for social good. This is the logic that while never capable of providing perfect solutions, data science can be used to improve current circumstances. On one hand, Green commends data scientists for being willing to engage in more nuanced thinking and engagement with social issues. After all, an ambition to ‘do good’ brings a human focus to what might otherwise be largely technical concerns. Nevertheless, Green laments how these efforts are hampered ultimately by their reliance on vague and unarticulated political assumptions about what ‘social good’ might constitute (let alone the question of whether an unproblematic ‘good’ might be achievable at all).
At best, Green argues, this approach to data science falls into a non-politicized “know it when you see it” approach to deciding what constitutes social good. This leads quickly to crude equivalencies such as ‘poverty=bad’ or ‘staying enrolled on a university course=good”. Couching one’s actions in such broad-brush presumptions is a convenient way of glossing over the fact that deciding what constitutes ‘good’ involves normative judgement which ideally should be driven by an underpinning guiding political philosophy.
This lack of grounding principles means that the ‘social goods’ being pursued by data scientists can cover a wide (and sometimes conflicting) range of political characteristics. This can result in dangerous over-simplifications of issues that are actually politically complex and might lack clear consensus over what is desirable. As such, data scientists run the risk of blithely “wading into hotly contested political territory” and acting in a contestable (perhaps regressive) manner. As Green concludes:
“By framing their notions of ‘good’ in such vague and undefined terms, data scientists get to have their cake and eat it too: they can receive praise and publications based on broad claims about solving social challenges while avoiding any actual engagement with social and political impacts”
#4. Meaningful social change can only result from direct engagement with the politics of data
At best, then, Green reasons that talk of ‘data ethics’ or data science for ‘social good’ can only be opening points of complex conversations about what might be the most desirable applications of data science in any social context. Crucially, these conversations need to be framed by explicit sets of values, and ready to embrace the politics of negotiating between competing perspectives, goals, and agendas. As such, there can be no clear-cut ‘right’ or ‘wrong’ applications of data science that do not merit scrutiny. Instead, Green pushes for a cultural shift throughout the field to encourage a collective understanding amongst data scientists of being co-engaged in political action that has varying impacts on different groups of people over time.
Pursuing data science along these lines clearly requires additional time and effort. Indeed, if Green’s arguments are taken to their logical conclusion, any decision regarding how to apply data science to a social setting can only be taken after a considerable amount of deliberation, debate, dialogue and consensus building. These deliberations need to be especially mindful of the complexities of the social contexts in which any data system, tool or application is to be implemented. This is not to say that data scientists need to compromise their technical interests, expert knowledge, or passion for problem-solving and innovation. Yet, as Green reasons, any computational skills and passions need to be bolstered by a new concurrent acknowledgement that:
“… data science is a form of political action. Data scientists must recognize themselves as political actors engaged in normative constructions of society and, as befits political work, evaluate their efforts according to the material downstream impacts on people’s lives”
#5. So what might a political data science look like?
So, what might as politically-engaged data science look like, and how might its practitioners think and act differently? Of course, answering this question first requires clarity on precisely what political agendas are to be pursued. Taking his own political agenda as a starting point, then, Green extends the idea of ‘social good’ by considering how the field might evolve toward a more deliberative and rigorous grounding in a politics of ‘social justice’. Along these lines, then, Green outlines four phases that can guide individual change and institutional reform across the data sciences. In brief, this involves:
- becoming interested in directly addressing social issues;
- recognizing the politics underlying these issues;
- redirecting existing methods toward new applications;
- developing new practices and methods that orient data science around a mission of social justice.
These suggestions raise a number of interesting new directions that data science might wish to pursue. First, is a need for data scientists to pursue clearly articulated visions of social benefit. This might require developing better understandings of the social contexts that data science work will be implemented. For example, educational data scientists might well benefit from secondments to teach in secondary schools, administer humanities classes in a university, or work in the Global South.
Green’s suggestions also raise the prospect of consulting and/or conducting social research on the issues that data science is attempting to address. Here, Green recommends engaging with academic literature rooted in the STS tradition. He also suggests conducting studies that adopt ‘critical design’, ‘anti-oppressive design’ and other participatory approaches to perceiving data science problems and then developing data-led ways to address these problems. To these ends, Green evokes the South African disability rights mantra of “Nothing About Us Without Us”. In other words, participatory approaches to data science cannot merely rely on tokenistic ‘end user’ consultation, but instead must genuinely commit to ensuring the central involvement of voices of those who are directly impacted by the data science.
As a data scientist himself, Green is certainly not presenting an anti-data science rant. If anything, Green is simply attempting to imbue his chosen field with a heightened sense of perspective. It is important to recognize that data science does not sit outside of society – neither does it have any power (and/or responsibility) to take on the task of completely changing and/or ‘saving’ the world. Yet, by the same token, neither is data science wholly culpable for the detrimental impacts of its work. Data scientists are simply part of the same social milieu as non-data scientists.
In this sense, Green’s arguments culminate in a call-to-arms along similar lines to a few other recent commentaries in this area – in short, are data scientists seeking to work with the system or against the system? Here Green looks back to André Gorz’s distinction between “reformist reforms” (actions that limit their objectives to what is rationally achievable within a given system), and “non-reformist reforms” (actions which are driven by an interest in what should be made possible in terms of human needs and demands). In short, reformist reformers start with existing systems and strive to improve them, while non-reformist reformers start from a set of desired social conditions and seek ways to attain them (regardless of system constraints).
To date, Green reasons that data science has almost always been focused on ‘reformist reforms’. After all, this is a field founded upon a standard logic of accuracy, efficiency and improving the performance of systems rather than substantively altering them. As such, conventional data science is an inevitably conservative pursuit. Nevertheless, Green raises the prospect that we perhaps need to set about developing more revisionist forms of data science (see also previous posts on Os Keyes and Roderic Crooks). For me, this is perhaps the most interesting conclusion for education data scientists to begin to seriously consider. How might a substantially different form of data-based education be established that undermines, usurps and utterly blows away the current conditions of the datafied classroom? This is not simply a case of appropriating data science ‘for good’, but a far more radical proposition of harnessing data science for revolt!