Statistics & Identifiability
Joachim Vandekerckhove
Model discovery techniques can help discover plausible candidate models from data and can aid theory-building in areas of behavioral and social sciences where empirical work has not yet led to formal specification. In this project we introduce a data-driven framework for discovery of plausible nomothetic mean structures for a population, where a mean structure is some set of functions that indicates the way variables and parameters are associated with one another. Discovery of such mean structures can then aid in the construction of theoretical accounts of regularities in the data. We expand these methods to accommodate the wide (rather than long) data formats common in the behavioral and social sciences. We are able to capitalize on this hierarchical nature and assume that there exists a nomothetic mean structure that describes a population. Differences among individuals are exhibited through the observed (measured) variables and the unobserved (latent) parameters. The mean structure is nomothetic in the sense that the variables and parameters are assumed to be mathematically arranged in the same way across individuals. This is in contrast to the idiographic perspective under which each individual may have a unique arrangement of variables and parameters that need not conform to a common structure shared across persons. We discuss the computational set up of the approach and establish a proof of concept by demonstrating the general approach through numerical examples.
This is an in-person presentation on July 20, 2026 (10:40 ~ 11:00 EDT).
Dr. Jeffrey Rouder
Oblique latent-variable models are ubiquitous in studies of individual differences, with latent correlations often serving as the primary quantities of interest. These correlations are typically interpreted as reflecting the shared variance between latent constructs. Yet, it is common to observe that manifest correlations are low or modest, while the corresponding latent correlations are much higher. This pattern can be explained by a geometric property of oblique measurement models, under which latent correlations can be highly sensitive to even small deviations from the intended measurement structure. When the measurement structure departs from what is intended, even modestly, the estimated latent correlations can become inflated. This sensitivity raises concerns about the validity of the latent correlations that are often reported. To address this issue, we introduce a diagnostic that evaluates whether an oblique structural representation is justified by the observed covariance structure, rather than relying solely on measures of global fit. We also discuss alternative approaches for decomposing individual differences that do not depend as heavily on restrictive measurement assumptions. These alternatives offer greater flexibility while still providing interpretable decompositions.
This is an in-person presentation on July 20, 2026 (11:00 ~ 11:20 EDT).
A fundamental goal of cognitive science is to determine which theory best accounts for data. Researchers often implement competing theories as computational models and use model comparison techniques to evaluate the merits of the models as proxies for the theories. In the Bayesian framework, two dominant approaches for model comparison are Bayes Factors (BF) and Cross-Validation (CV). Both methods show important limitations when all candidate models are misspecified, a situation that is virtually guaranteed in real-world applications but is often downplayed or ignored. I present a simulation-based investigation using reaction-time and binary-choice data. A Linear Ballistic Accumulator (LBA) with Gamma-distributed drift rates serves as the known true generative process, producing realistic patterns including positively skewed response times and speed-accuracy tradeoffs. I compare three classes of misspecified models: (1) Log-Normal Race models (structurally similar to the LBA), (2) Ollman's Fast Guess model (a qualitatively different mechanism), and (3) theory-agnostic statistical models. Results reveal three key findings. First, both BF and CV may select models demonstrably far from the true generative process: flexible, theory-agnostic models can outperform cognitive models even when the latter better approximate the true mechanism. Second, when comparing models with qualitatively different theoretical commitments, BF is driven more by priors than structural adequacy. Third, CV shows a complementary limitation: conservatism that provides little guidance when theoretically incompatible models predict comparably well. These findings underscore that model comparison optimizes predictive accuracy, not proximity to truth, with important implications for cognitive modeling practice.
This is an in-person presentation on July 20, 2026 (11:20 ~ 11:40 EDT).
Dr. Constantin Meyer-Grant
Evidence accumulation models play a pivotal role in the field of decision-making research, offering a comprehensive framework for understanding choice behavior and response times, including speed–accuracy tradeoffs (SATs). Within this framework, the dominant explanation of SATs has long been a reduction in boundary separation under speed pressure, implying that less evidence is required before committing to a decision. Recently, however, Wang et al. (2025) advocated for an alternative account. They proposed that speed emphasis induces a confirmation bias toward the currently favored option, which can be formally represented by a self-exciting Ornstein–Uhlenbeck model (OUM). In this talk, we critically evaluate whether this model can actually support inferences about which process drives SATs. We focus on parameter recoverability, which is a prerequisite for attributing experimental manipulations to specific model parameters. Using analytical arguments and extensive simulation studies, we show that in certain situations, key parameters of the model are not jointly identifiable. In particular, the boundary separation cannot always be reliably recovered because of an inherent tradeoff between boundary separation and non-decision time. We demonstrate why this identifiability problem arises, discuss its practical impact on inference about SATs, and outline implications for interpreting parameter changes in the self-exciting OUM.
This is an in-person presentation on July 20, 2026 (11:40 ~ 12:00 EDT).
Submitting author
Author