“We can’t predict how a child’s life will turn out – even with a ton of data”

Despite the promises of the ‘data gaze‘ and ‘dataism‘ that surround the growth of digital data over the past decade or so, data science is always at a distinct disadvantage when trying to model any ‘real world’ issues involving social contexts. This is illustrated by a recent study by Princeton University which challenged hundreds of statisticians, data scientists, AI and machine learning researchers to predict six life outcomes for children, parents, and households. These outcomes included common educational data-points such as the children’s grade point average at school and their self-reported perseverance (‘grit’) in school. Even though the competing computational systems were given nearly 13,000 data points on over 4,000 families stretching back over 15 years,  it was concluded that “despite using a rich dataset and applying machine-learning methods optimized for prediction, the best predictions were not very accurate and were only slightly better than those from a simple benchmark model” (Salganik et al. 2020, p.1). In short, as news media reporting of the study concluded: “AI can’t predict how a child’s life will turn out even with a ton of data” (Hao 2020)

This is perhaps unsurprising to any social scientist. Attempting to statistically account for the contextual layers implicit in any life event or personal biography will always be compromised by the sheer breadth of the social world which these calculations attempt to capture. This highlights the inherent problems of representativeness, reductiveness, and explainability of data-driven interventions in education. For example, in terms of the ‘representativeness’ of educational data, even the most sophisticated computational processes are only as good as the ‘input’ data they are given. There are plenty of instances where educational data might be inaccurate, incomplete, poorly chosen or simply a poor indicator of what it supposedly represents. These gaps and omissions are especially important in terms of modelling what human teachers and students do. Even the most complex models of ‘teaching’ and ‘learning’ contain significant grey areas. In contrast, as Louisa Amoore (2019, p.151) reasons, algorithms do not accommodate doubt, uncertainty or the inability to calculate: “algorithm[s] must reduce the vast multiplicity of possible pathways to a single output. At the instant of the actualization of an output signal, the multiplicity of potentials is rendered as one, that moment of decision is placed beyond doubt”



Amoore, L. (2019). Doubt and the algorithm: On the partial accounts of machine learning. Theory, Culture & Society36(6), 147-169.

Hao, K.  (2020) AI can’t predict how a child’s life will turn out even with a ton of data. MIT Technology Review, April 2nd

Salganik, M.,  Lundberg, I., Kindel, A.,. Ahearn, C., Al-Ghoneim, K., Almaatouq, A.,  Altschul, D. et al. (2020).   Measuring the predictability of life outcomes with a scientific mass collaboration. Proceedings of the National Academy of Scienceswww.pnas.org/content/117/15/8398