Optimal Strategies for Decision-Making and their Neural Basis
Understanding how animals and humans make decisions is one of the key questions in neuroscience, economics and artificial intelligence. Decisions come in all sort of flavors but, in neuroscience, most of the work so far has focused on two types known as perceptual decision-making and value-based decision-making. In the case of perceptual decision-making, subjects must decide on the state of a stimulus based on sensory evidence. For instance, subjects might have to determine whether a set of dots is moving rightward or leftward based on a short movie . In value-based decision-making, subjects have to choose between items with subjective values, such as choosing between two types of desserts . In this case, and contrary to perceptual decision-making, there is no objectively correct answer since the value of an item is necessarily specific to the taste and preference of each subject.
The theory as well as the neural basis of binary perceptual decision-making are reasonably well understood . A class of models known as drift diffusion model, or DDM for short, have been shown to predict remarkably well the percentage of correct responses as well as the reaction times as a function of the task difficulty. DDM are based on the assumption that subjects receive scalar samples at every time steps from their perceptual system, which serve as evidence for or against the two possible choices [4, 5]. For instance, in the case of leftward versus rightward motion, positive samples can be assigned to leftward motion, in which case negative samples would count as evidence for rightward motion. To be more specific, the samples are assumed to be drawn from a Gaussian distribution whose mean is proportional to the strength of the visual motion and its sign is related to the direction (positive for leftward motion in our example). The DDM simply takes the sum over time of the samples and stops whenever the accumulated evidence reaches one of two symmetric bounds. If the positive bound is hit first, the model ‘chooses’ left, while it chooses right if the negative bound is hit first. Critically, this simple strategy, and variations thereof, has been shown to optimize the number of correct answers per unit of time. Moreover, the response of neurons in several cortical areas suggests that they sum their momentary evidence, and stop integrating when their activity reaches a specific level, just as in a DDM. Therefore, it appears that, to a first approximation, neural circuits implement DDMs for binary perceptual decision-making.
Behavioral studies also suggest that humans and animals use a similar strategy for binary value-based decision-making . In this case, it is assumed that the brain generates two samples at each time step drawn from two Gaussian distributions with means equal to the subjective values of the two items being considered. Specialized circuits compute the difference between the two samples at each time step and then sum this difference over time until an upper or lower bound is hit, with each bound associated with one particular choice. This strategy is appealing from a neural point of view since it requires the same circuits as for perceptual decision-making. However, unlike in the case of perceptual decision-making, it is unclear whether this strategy is optimal, i.e., whether it maximizes the number of rewards (or value) per unit of time across multiple trials. In fact, there are reasons to believe that this is not an optimal strategy.
Consider a choice between two items with nearly equal high values. This would be like choosing between your favorite ice cream and your favorite cake. In this case, the difference between the value samples at each time step will be very small on average, in which case the accumulation process will take a long time to hit either of the bounds. Therefore, this model predicts that, when confronted with two equally good choices, subjects should take a particular long time to decide, even though at the end of the decision, they are guaranteed to end up with a good choice. This is strange: it would make a lot more sense in this case to decide quickly rather than to procrastinate. A different class of models known as race models seems better suited to this type of situations. A race model uses two accumulators, one per choice, each summing the samples for one choice exclusively. The process stops whenever one of the accumulators reaches a preset bound. If both choices are highly valued, both accumulators will grow quickly and hit their bound in a short time.
Curiously, however, it is well known that subjects do take a very long time to decide between two items they like, a result consistent with the DDM, not the race model. In fact, we have all experienced this problem. If a restaurant menu contains two items you really like, you know you will agonize over the options for a long time. Other behaviors in value-based decision-making are just as puzzling. For instance, subjects have a particular hard time deciding between two items they really like if a third low-value choice is offered, even if the subjects never select this third choice. These strange interactions between options, and others results, have often been used to argue that humans rely on a suboptimal strategy for value-based decision-making. The problem with this conclusion is that, up until recently, the optimal strategy for value-based decision-making was unknown, making it difficult to determine whether a particular strategy is optimal or not. We have recently revisited this issue and used the theory of dynamic programming to derive the optimal strategy. In the case of binary decision-making, the answer was counterintuitive: DDM models do provide the optimal strategy in the sense that they optimize the reward rate . Although it seems strange that the optimal decision policy involves waiting a long time when deciding between two good choices, this strategy has the advantage of leading to very fast responses when the difference in value between the two options is large, i.e., when the choice is easy, which increases the reward rate. As a result, DDMs work better than race models when the difficulty of the choices varies across trials.
For choices involving N options where N is greater than 3, the optimal solution requires N coupled accumulators, where the coupling comes from the fact that the mean across all the accumulators must be subtracted from each accumulator. As a result, the choice between two high-value items can be influenced by a third low-value item, because this item will contribute to the common mean term. As a result, the optimal strategy exhibits the same behavior as humans: it becomes hard to choose between two high-value items in the presence of third items even if it is never chosen.
Our work also shows that the neural implementation of the optimal strategy requires a very specific operation known as normalization with corresponds to the subtraction of the mean of the momentary evidence. Normalization has been reported in neural circuits involved in value-based decision-making but its role had remained obscure. Our analysis suggests that it is in fact a key operation that allows neural circuits to make near optimal decisions.
While it represents a significant step forward, this work only considers the simplest form of value-based decisions, such as choosing between two desserts. For much more complex cases, such as deciding a major in college, the decision involves very complex form of reasoning which cannot be captured by the simple DDM model we have explored here. Complex reasoning is believed to rely on probabilistic inference over rich data structures such as trees or graphs driven by a temporal stream of evidence. It remains to be seen how neural circuits could represent data structures of this type and implement efficient inference over such neural representations.
It is quite likely that this research could greatly benefit from the recent work in artificial intelligence on neural networks with long term memory ([8, 9]). Up until recently, learning algorithms were exclusively designed for networks with short-term memory but these algorithms have now been generalized to train networks composed of two sub-networks, one dedicated to long term-storage and one more specialized for online computation. A similar dichotomy appears to exist, to a first approximation, in the mammalian brain where the hippocampus is specialized in long-term storage while the cortex is more specifically focused on online processing. One can imagine storing knowledge about a particular domain in the long-term memory of the system and using the other network to integrate over time the information extracted from long-term memory. The current architectures used in AI lack full biologically plausibility but they provide an extremely promising starting point. Such a project would illustrate once again the extraordinary potential of artificial intelligence as a source of inspiration for research in Neuroscience.
1. Newsome, W.T., K.H. Britten, and J.A. Movshon, Neuronal correlates of a perceptual decision. Nature, 1989. 341(6237): p. 52-4.
2. Rangel, A., C. Camerer, and P.R. Montague, A framework for studying the neurobiology of value-based decision making. Nat Rev Neurosci, 2008. 9(7): p. 545-56.
3. Shadlen, M.N. and R. Kiani, Decision making as a window on cognition. Neuron, 2013. 80(3): p. 791-806.
4. Kiani, R. and M.N. Shadlen, Representation of confidence associated with a decision by neurons in the parietal cortex. Science, 2009. 324(5928): p. 759-64.
5. Gold, J.I. and M.N. Shadlen, The neural basis of decision making. Annu Rev Neurosci, 2007. 30: p. 535-74.
6. Krajbich, I., C. Armel, and A. Rangel, Visual fixations and the computation and comparison of value in simple choice. Nature Neuroscience, 2010. 13(10): p. 1292-8.
7. Tajima, S., J. Drugowitsch, and A. Pouget, Optimal policy for value-based decision-making. Nat Commun, 2016. 7: p. 12400.
8. Graves, A., et al., Hybrid computing using a neural network with dynamic external memory. Nature, 2016. 538(7626): p. 471-476.
9. Henaff, M., et al., Tracking the World State with Recurrent Entity Networks. 2016.