Society for Mathematical Psychology

SMP 2026 LaMartine

Human-AI Interaction

ZhaoBin Li
Prof. Mark Steyvers

Productive human-AI collaboration requires appropriate reliance, yet contemporary AI systems are often miscalibrated, exhibiting systematic overconfidence or underconfidence. We investigate whether humans can learn to mentally recalibrate AI confidence signals through repeated experience. In a behavioral experiment (N = 200), participants predicted the AI's correctness across four AI calibration conditions: standard, overconfidence, underconfidence, and a counterintuitive "reverse confidence" mapping. We develop a computational model utilizing a linear-in-log-odds (LLO) transformation and a Rescorla-Wagner learning rule to explain participants' trial-by-trial adaptation, and estimate the model using Bayesian multilevel inference to capture group-level trends and individual variability. Results demonstrate robust learning across all conditions, with participants significantly improving their accuracy, discrimination, and calibration alignment over 50 trials. The model reveals that humans adapt by updating their baseline trust and confidence sensitivity, using asymmetric learning rates to prioritize the most informative errors. While humans can compensate for monotonic miscalibration, we identify a significant boundary in the reverse confidence scenario, where a substantial proportion of participants struggled to override initial inductive biases. These findings provide a mechanistic account of how humans adapt their trust in AI confidence signals through experience.

This is an in-person presentation on July 18, 2026 (10:40 ~ 11:00 EDT).

No recording available Join the discussion

Mr. Lukas Mayer

Integrating AI assistance into decision-support systems has the promise of improving outcomes via human-AI complementarity effects. However, in practice, AI advice, particularly when unsolicited, is routinely turned off by people. To begin addressing this bottleneck in human-AI collaboration, we conducted an experiment in which people make decisions of variable difficulty with an AI assistant that automatically intervenes under one of three policies: offering advice on all trials, on a random subset of trials, or a complementary subset of trials. Since unprompted AI advice can be disruptive, participants were given the ability to turn AI advice off and back on at any time, with these on/off decisions serving as our primary behavioral measure. Our results show that advice timing is critical for maintaining human-AI collaboration, as advice perceived as irrelevant can prompt people to turn AI assistance off. This presents a critical risk, as participants were slow to re-enable AI advice even under conditions where advice was clearly beneficial. We construct a cognitive model to describe people's decisions to dis- and re-enable assistance and present a POMDP framework that integrates the cognitive model to estimate optimal, human-aware advice policies. This hybrid model is notable in that a policy estimation algorithm can consider how to avoid causing people to turn AI assistance off in the pursuit of human-AI complementarity by evaluating the counterfactual outcomes of offering AI advice on any given trial through the cognitive model.

This is an in-person presentation on July 18, 2026 (11:00 ~ 11:20 EDT).

No recording available Join the discussion

Konstantina Sokratous
Joachim Vandekerckhove
Prof. Clintin Davis-Stober

As artificial intelligence systems increasingly function as decision-making agents, their evaluation remains largely performance-based, emphasizing benchmark accuracy rather than measurement of underlying cognitive properties. In this talk, I argue for the development of an AI psychometrics: a formal measurement framework for artificial agents grounded in principles from representational measurement theory and computational cognitive modeling. Unlike human respondents, modern generative AI models offer multiple affordances: 1) they directly output probability distributions over response alternatives, 2) can be retested without contamination from learning or fatigue, and 3) permit structural intervention. I show how these affordances enable the use of computational cognitive models to quantify latent properties. Theoretical foundations will be outlined and presented along with empirical applications demonstrating how this approach reveals properties invisible to traditional benchmark metrics, serving thus as a stepping stone for the establishment of a principled measurement science for artificial cognition.

This is an in-person presentation on July 18, 2026 (11:20 ~ 11:40 EDT).

No recording available Join the discussion

Hinn Zhang
Prof. Mark Steyvers

Prior work on cognitive offloading to AI has largely focused on performance, often comparing accuracy when delegating to AI versus completing a task manually. Less is known about how people weigh completion time and effort when deciding whether to perform tasks themselves or delegate them to an AI, and the potential biases that shape these offloading choices. To formalize and measure these decisions, we consider human–AI delegation as a choice process under uncertainty and ask whether offloading tracks objective trade-offs in time and effort, or reflects a systematic preference for automation. We conducted behavioral experiments in which participants chose between delegating to an AI robot or completing the task themselves, with the two options differing in time and effort costs. Across two task blocks, the relative advantage alternated between completing the task themselves and delegating to an AI to test for adaptation to changing trade-offs. Model-based analyses showed that participants were sensitive to time and effort differences. However, their choices also demonstrate a systematic bias toward delegation that objective cost differences cannot entirely explain. These findings provide a basis for measuring delegation bias and individual differences, enabling generalizable predictions about offloading behavior and informing how AI-assisted workflows could optimize delegation rather than default to it.

This is an in-person presentation on July 18, 2026 (11:40 ~ 12:00 EDT).

No recording available Join the discussion

Presenting author
Submitting author
Author