Elizabeth Spelke | Harvard

Artificial Intelligence and Human Minds: Perspectives From Young Children

Our species has a talent for technology: for imagining, crafting and using objects that extend our abilities and improve our lives. Although there is now much discussion, and some disquiet, over the prospect that future generations will live with autonomous machines that are more capable than we are, our talent for building such tools is not new. Even the earliest tools from human prehistory outperform us at their dedicated functions: arrowheads pierce animal skins better than fingernails, and bowls hold water better than cupped hands. From the beginning, moreover, our tools have functioned not only with us, like the stone flakes that early humans used to cut food, but autonomously and for our benefit, like the roofs that shelter us. Our ability to develop such objects testifies to our singular ability, as adults, to foresee how currently nonexistent objects, functions, and activities can transform our lives and experiences. It also testifies to the power of children to develop adaptively within the highly variable environments that human ingenuity creates, learning culture-specific skills that have come to include agriculture, reading, mathematics, and modern engineering (Dehaene, 2009, 2011).

Despite the ancient origins of our talent for technology, the emergence of machines that reason and learn prompts many questions, two of which pertain directly to the focus of my research, on the cognitive capacities of human infants and children. First, can the development of such machines shed light on the workings of young human minds and on the sources of our species’ cognitive talents: insights that could deepen our understanding of human nature and improve children’s education and welfare (Battro et al., 2011)? Second, will the presence of intelligent machines that interact with humans alter the ways in which children think and learn? If so, how can those machines best be structured to enhance children’s development? To approach these questions, I begin by reviewing some pertinent findings from research on early human cognitive development.

Cognition in infancy

From birth, human infants perceive, act on, and make sense of their surroundings, anticipating its future states. Research provides evidence that infants both perceive inanimate objects when they are visible and track such objects when they are hidden, extrapolating object motions and mechanical interactions (Baillargeon, 1998; Stahl & Feigenson, 2015). Infants also perceive and reason about people and animals, predicting their future actions from their past behavior together with their powers to perceive accessible aspects of the environment (Gergely & Csibra, 2003; Luo & Johnson, 2009). And from the beginning, infants focus on people’s social communications, using their speech, gaze, and coordinated actions to infer their engagement with the infant (Meltzoff & Moore, 1977; Kinzler, et al., 2007) and with one another (Hamlin et al., 2007; Powell & Spelke, 2013).

Inanimate objects, agents, and social beings behave in fundamentally different ways: objects are governed only by the laws of physics, whereas agents plan their actions to achieve valued goal states while minimizing costs, and social beings engage with one another so as to share information and experiences. Research provides evidence that infants are sensitive to these differences (Spelke & Kinzler, 2007). They perceive and interpret the behavior of inanimate objects primarily by analyzing objects’ positions and motions, in accord with basic constraints that objects move as connected wholes on continuous paths and interact with one another on contact (Spelke, et al., 1995). Infants perceive and reason about the object-directed actions of people and animals by analyzing aspects of their shapes and motions (Bertenthal & Pinto, 1994), in accord with assumptions that agents perceive the world at a distance and act efficiently to transform it, in accord with their goals (Gergely et al, 1995; Woodward, 1998; Liu & Spelke, 2017). Finally, infants perceive and interpret people’s social motives and relationships by analyzing their interactions with the infant and with one another. Recent research suggests that infants are especially sensitive to the asymmetrical relations that connect caregiving adults to their children (Johnson et al., 2007; Spokes et al., 2017), dominant individuals to their subordinates (Thomsen et al., 2011), and socially responsive imitators to the targets of their imitation (Powell & Spelke, 2013, in review).

These findings and others suggest that infants are endowed with core cognitive systems that form the foundation for the development of our common sense reasoning about the physical, living, and social worlds. These systems likely are connected, because agents’ actions are constrained by physics and people’s social bonds are conveyed by their actions. Nevertheless, each core cognitive system functions in accord with a distinct set of principles and operates with a high degree of independence from the other systems, especially in infancy. For example, infants likely can view their pet cat as a social being (a member of their family, with distinctive relations to other family members), an agent (that chases after butterflies and chews on house plants), and an object (that is heavy to lift), but they do not readily construe the cat in these three different ways at once. Young infants also do not appear to recognize a central property of tools and other artifacts: that they are objects, designed to foster the instrumental goals of agents, for use within a community of social beings.

Toward the end of the first year, infants’ understanding of objects, agents, and social beings comes together: infants begin to conceive of objects as members of one or another kind – a body whose form affords dedicated functions for itself (if it is a person or animal) or for members of the infant’s social world (if it is inanimate: Xu & Carey, 1996). This conception emerges as infants engage with others and thereby learn one of the earliest emerging and universal features of human language: noun phrases whose head nouns refer to kinds of animals (“dog”), natural objects (“stone”, “tree”), or artifacts (“cup”). By nine months of age, infants expect each distinct noun to refer to a distinct kind of object with a characteristic form and function (Xu, 2007). Soon thereafter, infants begin to seek information about object kinds, asking of each thing that they encounter, “what is this?” and (if it is an artifact) “what is it for?” (Keil, 1989). There is wide consensus among psychologists that the capacity to view novel objects as individual members of novel artifact kinds is central to the child’s developing mastery of culture in general and technology in particular. Moreover, this capacity is widely thought to depend on infants’ predisposition to attend to their social partners, learning from their speech and actions (Tomasello, 2008; Csibra & Gergely, 2009). Because adults are apt to talk about things that matter to them, their language directs children to concepts that are socially useful. Because adults’ actions on objects, such as drinking from a cup or turning the pages of a book, both exhibit the objects’ functions and reveal aspects of their structure, those actions inform infants about the key properties of the things used in their culture. The artifact concepts that infants master at the end of the first year therefore serve as a basis for the prodigious cultural learning that distinguishes our species from others, and that sets humans on a path that leads toward the world we now are considering, in which humans interact with autonomous machines whose intelligence and action capacities, in some domains, equal or exceed our own.

Reverse engineering infant minds

Although research on human cognitive development has shed light both on what young infants know and on the fundamental changes in their knowledge that occur when one-year-old children begin to master artifacts, the psychological and brain sciences have not yet achieved a deep understanding of the mechanisms and processes that give rise to this knowledge. The content of infants’ knowledge can be revealed by simple behavioral experiments, yet the most advanced investigations in experimental psychology and neuroscience have not yet revealed the basic computations of the human mind.

With the emergence of machine learning and artificial intelligence comes the promise of this deeper understanding. From its beginnings, computer scientists have aimed to build machines that learn as children do, the most capable learners on earth (Turing, 1950). Moreover, the most conspicuous recent successes in the field of artificial intelligence have centered on machines that are structured similarly to the brain’s perceptual systems and that are built to learn (LeCun et al., 2015). Symmetrically, cognitive and developmental psychologists have looked to research in computer science and mathematics for guidance in studying the basic computations of mature and developing human minds (Tenenbaum, et al., 2011). Coordinated research across these fields, developing and testing computational models of human cognition and learning, could deepen understanding of human minds in general, and the minds of infants and young children in particular, while guiding the development of ever more intelligent machines.

For example, recent thinking about infants’ “intuitive physics” – their grasp of the mechanical principles governing object motions and interactions – has benefited from the development, in computer science, of physics engines that simulate these motions and interactions (Battaglia et al., 2013). Physics engines are used in animated films and interactive video games to depict events in which objects collide, topple, or collapse on contact with other objects, surfaces, substances, and agents. The computational challenges solved by the designers of physics engines suggest insights into both the capacities of young infants and key limits to those capacities (Ullman et al., in review). For example, infants track moving objects over occlusion by taking account of their positions, motions, and approximate sizes but not their detailed shapes or surface texture (Baillargeon, 1998; Spelke, et al., 1995). For example, when young infants see a cup appear alternately on the opposite sides of one screen, they represent one persisting object in motion, but when they see a cup and a shoe appear in alternation on the screen’s two sides, they fail to represent two distinct objects (Xu & Carey, 1996). Physics engines might behave similarly, for they use coarse representations of an object’s position, mass, and motion in order to extrapolate its motion forward, and then call on stored, detailed representations of the object’s appearance so that it can be rendered, by graphics programs, at places where it is visible. The use of a coarse representation in the computation of the object’s changing position and motion is accurate enough to appear natural to adults, while sparing the computations that would be required if every detailed feature of the object were extrapolated forward. Infants’ failure to track the detailed shapes of occluded objects may reflect a similarly efficient process for representing hidden object motion, and a division of labor between basic processes for representing objects’ dynamic properties and their visual appearance.

Recent thinking about young children’s psychological and social reasoning has benefited in similar ways from computational models of action understanding (e.g., Baker, et al., 2009, 2017) based on the assumption that agents plan actions that maximize their rewards while minimizing their costs (Gergely et al., 1995), and that social beings act as well to maximize the rewards of their valued social partners (Jara-Ettinger et al., 2016). Recent experiments provide evidence that representations of action plans guide young children’s interpretation and evaluation of other agents’ actions, motives, and mental states. Three-year-old children who see a social character refuse to help another character judge the first character more harshly if the requested helping action was easy to perform (Jara-Ettinger et al., 2015), and 10-month-old infants infer that an agent values one goal object more than another if he is willing to take a higher-cost action to obtain one of the objects, even if his behavior toward the two objects is otherwise the same (Liu et al., 2017).

Computational modeling of early cognitive development is still in its infancy, but these and other studies suggest that a deeper understanding of young human minds, and of our species’ prodigious learning capacities, can emerge from coordinated research in machine learning, artificial intelligence, and human cognitive and brain development. Such an understanding may be critical to addressing key challenges posed by our rapidly changing technological landscape.

Protecting and enhancing children’s development

As research on the nature of intelligence progresses, how will the development of increasingly intelligent machines affect the minds of those who use them, especially the children who learn with and from them? If artificial intelligence is to bring us new technologies that enhance our reasoning and benefit our lives, then this question looms large. Intelligent systems might extend our capacities by making useful information more accessible: for example, GPS-based navigation systems that display our current position in relation to our surroundings at multiple scales, and that bring us information about otherwise inaccessible events such as traffic accidents or roadblocks, have the potential to extend and enrich our representations of the environment. These same systems, however, could diminish our spatial cognitive capacities, if we use them to navigate for us, rather than to enrich and strengthen our spatial knowledge. Research in cognitive neuroscience reveals that the basic cognitive systems by which humans navigate are fundamental to human spatial reasoning and memory, and they are strengthened by exercise (Burgess, et al., 2002). Like visual and motor systems, these systems likely are weakened by disuse: thus, a person who moves solely at the direction of a GPS navigator may both fail to develop a spatial representation of her surroundings, and diminish her memory capacities more generally. These two contrasting uses of contemporary technology suggest a question and a challenge: How should navigation aids be crafted so as to enrich, rather than diminish, our ancient, autonomous capacities for spatial reasoning? Similar questions, calling for research, are raised by intelligent systems that help us plan our days, remember friends’ birthdays, or select our music.

The advent of intelligent machines raises especially pointed questions concerning children’s learning, including the learning that propels our talent for developing novel technology. Throughout history, children have learned both by acting and by observing the actions of their elders, who manipulated artifacts with perceptible structures and functions. In the second year, children become highly attentive to the manner in which adults act on objects, and highly predisposed to reproduce those actions exhibit (Tomasello, et al., 2005; Lyons et al., 2007). Young children also begin to attend to adults who copy their own detailed actions on objects (Agnetta & Rochat, 2004), and to the structural properties of the objects that adults manipulate (Booth & Waxman, 2002). These developments recruit infants’ earlier developing sensitivity to object shapes and motions, to agents’ detailed, multi-step actions, and to social beings’ shared experience to propel a key feature of human cognition: the rapid development, in childhood, of encyclopedic knowledge of object kinds.

How will this development proceed for the current generation of infants, born into families using the tablets and smart phones that are now ubiquitous in many societies and constant companions to many parents? In contrast to the artifacts that smart phones replace, such as telephones, cameras, and books, smart phones have multiple functions. Neither the structures that permit their functions, nor the actions of their users, are perceptually accessible to the child (or, in most cases, to other adults): when a parent looks at and taps on a cell phone, he could be engaged in any of a multitude of diverse actions, undertaken to realize an even larger potential set of goals. His observable behavior does not reveal his action plans.

If multipurpose machines take on more and more of the functions that previously were performed by perceptually distinct objects, whose structure afforded specific actions that were diagnostic of their function, how will children develop the encyclopedic knowledge of object kinds that has long served as a foundation for cognitive development? Will future generations of children learn directly from smart machines, whose functioning has made the actions of other people less informative? Because the structures that support the behavior of these machines cannot be seen, and the behavior of adults who use them is only minimally informative about their goals, plans, and social relations, will children be less inclined to explore objects, or to use the object-directed actions of other people, so as to learn about the structure and functioning of the physical, living, and social world? If so, what will children learn in a world of smart, interactive machines, and how will their learning impact their social and cognitive growth? Because humans invent technologies for human benefit, we can combine and invert these questions: What kinds of intelligent machines should computer scientists aim to create, in order promote and support young children’s cognitive development and well being?

Past research on cognitive development in infancy and early childhood does not answer this question. Although that research has taught us a great deal about what infants and young children know at different ages, it does not support strong predictions concerning children’s learning in radically new or hypothetical environments. To make such predictions, the brain and cognitive sciences must achieve a deeper understanding of how infants and children reason and learn.

Fortunately, collaborative research in cognitive science, neuroscience and computer science promises to deepen our understanding, providing insights that can inform the development of new technologies to enhance children’s lives. Side by side with our talents and propensities for transforming the world in ways that create both new opportunities and new problems, our species has a striking capacity for foreseeing the potential problems and addressing them. Thus, the development of physics and the atmospheric sciences has allowed its practitioners to anticipate, and devise ways to counter, the catastrophic consequences of massive climate change or global nuclear warfare – two challenges posed by human technological progress that now can be foreseen and countered, even though nothing in our history provides a precedent for them. Similarly, the development of computational cognitive science promises to bring knowledge that can support the design of thinking machines that act for the benefit of all people, and perhaps especially for the benefit of children, the most vulnerable and gifted human learners. I believe it will best do so if computer scientists and cognitive psychologists work together to achieve a better understanding of developing human minds.


Agnetta, B., & Rochat, P. (2004). Imitative games by 9-, 14-, and 18-month-old infants. Infancy, 6(1), 1-36.

Baillargeon, R. (1998). Infants’ understanding of the physical world. In M. Sabourin, F. Craik, & M. Robert (Eds.), Advances in psychological science, Vol. 2 (pp. 503- 529). London: Psychology Press.

Baker, C.L., Saxe, R., & Tenenbaum, J.B. (2009) Action understanding as inverse planning. Cognition, 113(3), 329-349.

Baker, C.L., Jara-Ettinger, J., Saxe, R. & Tenenbaum, J. (2017). Rational quantitative attribution of beliefs, desires, and percepts in human mentalizing. Nature Human Behavior, 1 (0064).

Battaglia, P.W., Hamrick, J.B., & Tenenbaum, J.B. (2013). Simulation as an engine of physical scene understanding. Proceedings of the National Academy of Sciences of the United States of America, 110, 18327-32. 

Battro, A., Dehaene, S. & Singer, W. (2011). Human neuroplasticity and education: Scripta Varia, 117. Pontifical Academy of Sciences.

Bertenthal, B.I., & Pinto, J. (1994). Global processing of biological motions. Psychological Science, 5(4), 221-225.

Booth, A.E. & Waxman, S. (2002). Object names and object functions serve as cues to categories for infants. Developmental Psychology, 38(6), 948-957.

Burgess, N., Maguire, E.A., & O’Keefe, J. (2002). The human hippocampus and spatial and episodic memory. Neuron, 35(4), 625-641.

Csibra, G., & Gergely, G. (2009). Natural pedagogy. Trends in Cognitive Sciences, 13(4), 148-153.

Dehaene, S. (2009). Reading in the brain. Penguin.

Dehaene, S. (2011). The number sense (second ed.). Oxford.

Gergely, G. & Csibra, G. (2003). Teleological reasoning in infancy: The naive theory of rational action. Trends in Cognitive Sciences, 7(7), 287-292.

Gergely, G., Nádasdy, Z., Csibra, G., & Bíró, S. (1995). Taking the intentional stance at 12 months of age. Cognition, 56(2), 165-193.

Keil, F.C. (1989). Concepts, kinds, and cognitive development. Cambridge, MA: MIT Press.

Hamlin, J.K., Wynn, K., & Bloom, P. (2007). Social evaluation in preverbal infants. Nature, 450(7169), 557-559.

Jara-Ettinger, J., Gweon, H., Schulz, L.E., & Tenenbaum, J.B. (2016). The naïve utility calculus: Computational principles underlying commonsense psychology. Trends in Cognitive Sciences.

Jara-Ettinger, J., Tenenbaum, J.B., & Schulz, L.E. (2015). Not so innocent: Toddlers’ reasoning about costs, competence, and culpability. Psychological Science.

Johnson, S.C., Dweck, C.S. & Chen, F.S. (2007). Evidence for infants’ internal working models of attachment. Psychological Science, 18(6), 501-502.

Keil, F.C. (1989). Concepts, kinds, and cognitive development. Cambridge, MA: MIT Press.

Kinzler, K.D., Dupoux, E., & Spelke, E.S. (2007). The native language of social cognition. Proceedings of the National Academy of Sciences USA, 104(30), 12577-12580.

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444.

Liu, S. & Spelke, E.S. (2017). Six-month-old infants expect agents to minimize the cost of their actions. Cognition, 160, 35-42.

Liu, S., Ullman, T., Tenenbaum, J., & Spelke, E.S. (2017). Origins of a naïve utility calculus: Infants infer the value of goals from the costs of actions. Unpublished manuscript, Harvard University.

Luo, Y., & Johnson, S.C. (2009). Recognizing the role of perception in action at 6 months. Developmental Science, 12(1), 142-149.

Lyons, D.E., Young, A.G., & Keil, F.C. (2007). The hidden structure of overimitation. Proceedings of the National Academy of Sciences USA, 104(50), 19751-19756.

Meltzoff, A.N., & Moore, M.K. (1977). Imitation of facial and manual gestures by human neonates. Science, 198(4312), 75-78.

Powell, L.J. & Spelke, E.S. (2013). Preverbal infants expect members of social groups to act alike. Proceedings of the National Academy of Sciences, 110, 3965-3952.

Powell, L.J. & Spelke, E.S. (in review). Infants’ understanding of social imitation: Inferences of affiliation from third-party observation. Manuscript submitted for publication.

Spokes, A.C. & Spelke, E.S. (2017). The cradle of social knowledge: Infants’ reasoning about caregiving and affiliation. Cognition, 159, 102-116.

Spelke, E.S., Vishton, P., & von Hofsten, C. (1995). Object perception, object-directed action, and physical knowledge in infancy. In M. Gazzaniga (Ed.), The Cognitive Neurosciences. Cambridge, MA: MIT Press.

Spelke, E. & Kinzler, K. (2007). Core knowledge. Developmental Science, 10, 89-96.

Stahl, A.E., & Feigenson, L. (2015). Observing the unexpected enhances infants’ learning and exploration. Science, 348(6230), 91-94.

Tenenbaum, J.B., Kemp, C., Griffiths, T.L., & Goodman, N.D. (2011). How to grow a mind: Statistics, structure, and abstraction. Science, 331(6022), 1279-1285.

Thomsen, L., Frankenhuis, W., Ingold-Smith, M., & Carey, S. (2011). The big and the mighty: Preverbal infants represent social dominance. Science, 331(6016), 477-480.

Tomasello, M. (2008). Origins of human communication. Cambridge, MA: MIT Press.

Tomasello, M., Carpenter, M., Call, J., Behne, T., & Moll, H. (2005). Understanding and sharing intentions: The origins of cultural cognition. Behavioral and Brain Sciences, 28, 675-735.

Turing, A. (1950). Computing machinery and intelligence. Mind, 49, 433-460.

Ullman, T., Spelke, E., Battaglia, P. & Tenenbaum, J. (in review). Mind games: Game engines as an architecture for intuitive physics. Manuscript submitted for publication.

Woodward, A.L. (1998). Infants selectively encode the goal object of an actor’s reach. Cognition, 69(1), 1-34. 

Xu, F., & Carey, S. (1996). Infants’ metaphysics: The case of numerical identity. Cognitive Psychology, 30(2), 111-153.

Xu, F. (2007). Sortal concepts, object individuation, and language. Trends in Cognitive Sciences, 11(9), 400-406.