Learning & Updating
Andreas Voss
Steven Miletić
Behavioral adaptation in probabilistic environments requires learning through trial and error. While reinforcement learning (RL) models can describe the temporal development of preferences through error-driven learning, they neglect mechanistic descriptions of single-trial decision-making. On the other hand, sequential sampling models such as the diffusion decision model (DDM) allow for the mapping of state preferences on single response times. We present a Bayesian hierarchical RL-DDM that integrates temporal-difference (TD) learning to bridge these perspectives. Our implementation incorporates variants of TD learning, including SARSA, Q-Learning, and Actor-Critic models. We tested the model with data from N = 58 participants in a two-stage decision-making task. Participants exhibited learning over time, becoming both more accurate and faster in their choices. They also reflected a difficulty effect, with faster and more accurate responses for easier choices, as reflected by greater subjective value differences between available options. Model comparison using predictive information criteria and posterior predictive checks demonstrated that, overall, participants seemed to employ on-policy learning. Furthermore, the RL-DDM captured both the temporal dynamics of learning and the difficulty effect in decision-making. Our work represents an important extension of the RL-DDM into temporal-difference learning.
Dr. Michael Frank
Alexander Fengler
Dr. Michael Frank
In cognitive neuroscience, there has been growing interest in adopting sequential sampling models (SSM) as the choice function for reinforcement learning (RLSSM), opening up new avenues for exploring generative processes that can jointly account for decision dynamics within and across trials. To date, such approaches have been limited by computational tractability, due to lack of closed-form likelihoods for the decision process and expensive trial-by-trial evaluation of complex reinforcement learning (RL) processes. By combining differentiable RL likelihoods with Likelihood Approximation Networks (LANs), and leveraging gradient-based inference methods including Hamiltonian Monte Carlo or Variational Inference (VI), we enable fast and efficient hierarchical Bayesian estimation for a broad class of RLSSM models. By exploiting the differentiability of RL likelihoods, this method improves scalability and enables faster convergence with gradient-based optimizers or MCMC samplers for complex RL processes. To showcase the combination of these approaches, we consider the Reinforcement Learning - Working Memory (RLWM) task and model with multiple interacting generative learning processes. This RLWM model is then combined with decision-process modules via LANs. We show that this approach can be combined with hierarchical variational inference to accurately recover the posterior parameter distributions in arbitrarily complex RLSSM paradigms. In comparison, fitting a choice-only model yields a biased estimator of the true generative process. Our method allows us to uncover a hitherto undescribed cognitive process within the RLWM task, whereby participants proactively adjust the boundary threshold of the choice process as a function of working memory load.
Anne Collins
Prof. Hamidreza Pouretemad
Dr. Jamal Amani Rad
Human learning is driven by multiple interacting cognitive processes, which operate in parallel to shape decision-making in dynamic environments. Previous studies investigating the interaction between reinforcement learning (RL) and working memory (WM) suggest a dynamic interplay between the two. WM facilitates rapid early learning but is capacity-limited, leading to interference. RL, in contrast, gradually updates action values and enhances decision-making stability in high WM load, ultimately improving policy optimization over time. Here, we extended prior work by employing the RLWM task with a continuous response space featuring three discrete action targets to manipulate WM load, an aspect that has remained unexplored. We collected behavioral data from 85 participants who completed the RLWM task, including a surprise testing phase. Our findings reveal that under high WM load, RL plays a greater role, gradually strengthening stimulus-response associations that are later recalled more reliably during the test phase. While WM facilitates rapid early learning, this speed comes at a cost: associations learned quickly are more prone to forgetting, whereas those acquired more slowly under greater reliance on RL exhibit enhanced retention. These results highlight the interaction between WM and RL in shaping learning and memory retention. Future studies should further investigate this interaction in a continuous response space, which may better capture decision-making dynamics.
Michael Lee
Two major open questions in the study of reasoning are (1) what functions people compute to draw conclusions from given pieces of information, or premises; and (2) how people interpret the meanings of the premises they draw conclusions from. For example, how justified is it to conclude "they travelled by train" on the basis that "If they went to Ohio, then they travelled by train" and "they went to Ohio", and why? Although these questions have been debated for thousands of years, it is typically difficult to distinguish competing theories empirically because they tend to be defined only verbally, not computationally; and because they usually overlap in the predictions they make. This talk presents the current state of an ongoing project in which we translate verbal theories of how people reason with and interpret conditional premises like "If they went to Ohio, then they travelled by train" into computational form. Building on the hypothesis that people try not to contradict themselves when reasoning, we derive sets of internally consistent conclusions for a range of inferences and premise interpretations, and formalize them as components of a Bayesian latent-mixture model. Applying the model to simulated and existing reasoning datasets, we illustrate how different combinations of inferences provide more or less information for distinguishing between competing theories based on the specificity and degree of overlap in their predictions.
Alexander Fengler
Dr. Michael Frank
In cognitive neuroscience, there has been growing interest in adopting sequential sampling models (SSM) as the choice function for reinforcement learning (RLSSM), opening up new avenues for exploring generative processes that can jointly account for decision dynamics within and across trials. To date, such approaches have been limited by computational tractability, due to lack of closed-form likelihoods for the decision process and expensive trial-by-trial evaluation of complex reinforcement learning (RL) processes. By combining differentiable RL likelihoods with Likelihood Approximation Networks (LANs), and leveraging gradient-based inference methods including Hamiltonian Monte Carlo or Variational Inference (VI), we enable fast and efficient hierarchical Bayesian estimation for a broad class of RLSSM models. By exploiting the differentiability of RL likelihoods, this method improves scalability and enables faster convergence with gradient-based optimizers or MCMC samplers for complex RL processes. To showcase the combination of these approaches, we consider the Reinforcement Learning - Working Memory (RLWM) task and model with multiple interacting generative learning processes. This RLWM model is then combined with decision-process modules via LANs. We show that this approach can be combined with hierarchical variational inference to accurately recover the posterior parameter distributions in arbitrarily complex RLSSM paradigms. In comparison, fitting a choice-only model yields a biased estimator of the true generative process. Our method allows us to uncover a hitherto undescribed cognitive process within the RLWM task, whereby participants proactively adjust the boundary threshold of the choice process as a function of working memory load.
Submitting author
Author