Is there an R implementation of the linear ballistic accumulator model or Ratcliff's diffusion model for measuring response time and accuracy?

Is there an R implementation of the linear ballistic accumulator model or Ratcliff's diffusion model for measuring response time and accuracy?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I am looking for an implementation of the linear ballistic accumulator model or Ratcliff's diffusion model (e.g. in R, MATLAB, or Python).

Here are a few options. I have not tried them yet personally.

As mentioned below, the rtdists package in R is able to fit both LBA and diffusion models.

Other Options


Scott Brown has a copy of Donkin et al (2009) on his web page with some code in R, Excel, and WinBUGS for fitting the LBA model:

There's also theglbapackage on CRAN by Ingmar Visser.

Diffusion model

The Diffussion model is available as a matlab toolbox called (DMAT).


  • Donkin, C., Averell, L., Brown, S.D., & Heathcote, A. (2009) Getting more from accuracy and response time data: Methods for fitting the Linear Ballistic Accumulator model. Behavior Research Methods, 41, 1095-1110. PDF
  • Vandekerckhove, J., & Tuerlinckx, F. (2008). Diffusion model analysis with MATLAB: A DMAT primer. Behavior Research Methods, 40, 61-72. doi:10.3758/BRM.40.1.61 PDF

For the diffusion model, there is also Eric-Jan Wagenmakers' "EZ-diffusion model", which you can find here.

This paper compares three different pieces of software for estimation of diffusion model parameters:

von Ravenzwaaij D., & Oberauer, K. (2009). How to use the diffusion model: Parameter recovery of three methods: EZ, fast-dm, and DMAT. Journal of Mathematical Psychology, 53 (6), 463-473. [PDF]

The R package diffIRT ( estimates both the Q and the D diffusion models (see his website for the van der Maas et al. paper discussing the differences between these models). R code for the EZ2 approach, which is much faster if that is important for your applications, is

In the subheading you also mention that you're interested in matlab / python implementations:

I've personally used DMAT in matlab at that's a nice package. However, the python based HDDM package may be one of the best around at the moment (in my opinion) and it has a good user guide.

and the paper associated with the package:

Wieki et al (2013):


  • Wiecki, T. V., Sofer, I., & Frank, M. J. (2013). HDDM: hierarchical bayesian estimation of the drift-diffusion model in python. Frontiers in neuroinformatics, 7, 14.

The R package rtdists is another great option:

Provides response time distributions (density/PDF, distribution function/CDF, quantile function, and random generation): (a) Ratcliff diffusion model based on C code by Andreas and Jochen Voss and (b) linear ballistic accumulator (LBA) with different distributions underlying the drift rate.


Sequential sampling models (SSMs) ([24]) have established themselves as the de-facto standard for modeling reaction-time data from simple two-alternative forced choice decision making tasks ([22]). Each decision is modeled as an accumulation of noisy information indicative of one choice or the other, with sequential evaluation of the accumulated evidence at each time step. Once this evidence crosses a threshold, the corresponding response is executed. This simple assumption about the underlying psychological process has the appealing property of reproducing not only choice probabilities, but the full distribution of response times for each of the two choices. Models of this class have been used successfully in mathematical psychology since the 60s and more recently adopted in cognitive neuroscience investigations. These studies are typically interested in neural mechanisms associated with the accumulation process or for regulating the decision threshold (see e.g. [7], [3], [18]). One issue in such model-based cognitive neuroscience approaches is that the trial numbers in each condition are often low, making it difficult it difficult to estimate model parameters. For example, studies with patient populations, especially if combined with intraoperative recordings, typically have substantial constraints on the duration of the task. Similarly, model-based fMRI or EEG studies are often interested not in static model parameters, but how these dynamically vary with trial-by-trial variations in recorded brain activity. Efficient and reliable estimation methods that take advantage of the full statistical structure available in the data across subjects and conditions are critical to the success of these endeavors.

Bayesian data analytic methods are quickly gaining popularity in the cognitive sciences because of their many desirable properties ([12], [10]). First, Bayesian methods allow inference of the full posterior distribution of each parameter, thus quantifying uncertainty in their estimation, rather than simply provide their most likely value. Second, hierarchical modeling is naturally formulated in a Bayesian framework. Traditionally, psychological models either assume subjects are completely independent of each other, fitting models separately to each individual, or that all subjects are the same, fitting models to the group as if they are all copies of some “average subject”. Both approaches are sub-optimal in that the former fails to capitalize on statistic strength offered by the degree to which subjects are similar in one or more model parameters, whereas the latter approach fails to account for the differences among subjects, and hence could lead to a situation where the estimated model cannot fit any individual subject. The same limitations apply to current DDM software packages such as DMAT [25] and fast-dm [28]. Hierarchical Bayesian methods provide a remedy for this problem by allowing group and subject parameters to be estimated simultaneously at different hierarchical levels ([12], [10], [26]). Subject parameters are assumed to be drawn from a group distribution, and to the degree that subject are similar to each other, the variance in the group distribution will be estimated to be small, which reciprocally has a greater influence on constraining parameter estimates of any individual. Even in this scenario, the method still allows the posterior for any given individual subject to differ substantially from that of the rest of the group given sufficient data to overwhelm the group prior. Thus the method capitalizes on statistical strength shared across the individuals, and can do so to different degrees even within the same sample and model, depending on the extent to which subjects are similar to each other in one parameter vs. another. In the DDM for example, it may be the case that there is relatively little variability across subjects in the perceptual time for stimulus encoding, quantified by the “non-decision time” but more variability in their degree of response caution, quantified by the “decision threshold”. The estimation should be able to capitalize on this structure so that the non-decision time in any given subject is anchored by that of the group, potentially allowing for more efficient estimation of that subjects decision threshold. This approach may be particularly helpful when relatively few trials per condition are available for each subject, and when incorporating noisy trial-by-trial neural data into the estimation of DDM parameters.

HDDM is an open-source software package written in Python which allows (i) the flexible construction of hierarchical Bayesian drift diffusion models and (ii) the estimation of its posterior parameter distributions via PyMC ([16]). User-defined models can be created via a simple python script or be used interactively via, for example, IPython interpreter shell (:cite:PER-GRA2007). All run-time critical functions are coded in Cython ([1]) and compiled natively for speed which allows estimation of complex models in minutes. HDDM includes many commonly used statistics and plotting functionality generally used to assess model fit. The code is released under the permissive BSD 3-clause license, test-covered to assure correct behavior and well documented. Finally, HDDM allows flexible estimation of trial-by-trial regressions where an external measurement (e.g. brain activity as measured by fMRI) is correlated with one or more decision making parameters.

With HDDM we aim to provide a user-friendly but powerful tool that can be used by experimentalists to construct and fit complex, user-specified models using state-of-the-art estimation methods to test their hypotheses. The purpose of this report is to introduce the toolbox and provide a tutorial for how to employ it subsequent reports will quantitatively characterize its success in recovering model parameters and advantages relative to non-hierarchical or non-Bayesian methods as a function of the number of subjects and trials (:cite: SoferWieckiFrank ).


We have presented a design principle for how decision-making should be implemented in the brain, and briefly summarised supporting evidence specifically we propose that decision-making in threshold-based systems should compromise between speed and accuracy of decision-making by manipulation of baseline activation in decision-making neural populations, rather than a manipulation of thresholds, in order to implement stereotypical decisions under varying speed-accuracy tradeoffs. This could be formalised as an optimality argument, that decision-making systems should minimise variability in decision implementation, across decision scenarios such optimality arguments are commonplace in behavioural disciplines such as behavioural ecology, where their predictions are tested against empirical data, and any disagreement used to refine the theory [28]. In applying this normative approach from evolutionary biology to models of neuroscience, we hope to make a modest contribution to the programme of reconciling functional and mechanistic explanations of behaviour [29]. One potential limitation of our analysis is that equivalence of changes in threshold and changes in baseline activation has only formally been demonstrated for linear models. Real neural systems are typically non-linear, but we argue that even though the aforementioned equivalence does not hold for certain important non-linear models, the principle of maintaining a consistent threshold and varying baseline activation, even if the decision-dynamics are changed as a result and this needs to be compensated for by the neural mechanisms, remains an important one that we should expect to see realised the neurophysiological evidence supporting this hypothesis, reviewed above, supports this view.

We suggest that our principle is not specific but should be applicable to any response system. Decision-making takes place at many different levels of brain processing, and while more complex decision-related motor sequences undoubtedly can be affected by decision-task difficulty, we believe our principle should also hold at the most fundamental levels of action selection in the brain. Even the conceptually simplest decision-making mechanisms, such as the race model [14] can be expressed as accumulator models. Accumulators are also likely to be involved in more complex decision-making processes the basal-ganglia have been demonstrated to be involved in action selection, mediating access to motor control by different competing brain regions. A biologically-plausible mathematical model of the basal-ganglia has been proposed that is able to implement statistically optimal decision-making over multiple alternatives [30]. As with the accumulator models outlined above, this model is based on decision-making populations that must reach a threshold in order to precipitate the corresponding action, and this threshold may be adjusted to compromise between the speed and accuracy of decision-making. There is an interesting difference however that, in this model, the output nuclei of the basal ganglia must fall below an activation threshold in order for the corresponding action to be taken. However the principle is the same, that in order for consistency of decision-implementation we would expect this threshold to remain constant therefore we would predict that accurate decisions should be implemented by raised baseline activation levels of decision-making neural populations in the basal ganglia, while fast decisions should be implemented by lower baseline activation. Bogacz et al. [10] review four main theories of how speed-accuracy trade-offs can be managed in the cortico-basal ganglia circuit, and note that three involve a change in activation of some part of the circuit, whether striatum [12], cortical integrators [31]–[32], or subthalamic nucleus [33], while none modifies threshold of the output nuclei. We suggest that it could be of interest to interpret not only models but also other data already extant, or yet to be generated, in terms of the proposal we have made here for how consistent decision-implementation should be achieved.


Understanding the neuro-cognitive mechanisms underlying decision-making and reinforcement learning[1–3] has potential implications for many neurological and psychiatric disorders associated with maladaptive choice behavior[4–6]. Modeling work in value-based decision-making and reinforcement learning often relies on simple logistic (softmax) functions[7,8] to link model-based decision values to observed choices. In contrast, in perceptual decision-making, sequential sampling models such as the drift diffusion model (DDM) that not only account for the observed choices but also for the full response time (RT) distributions have a long tradition[9–11]. Recent work in reinforcement learning[12–15], inter-temporal choice[16,17] and value-based choice[18–21] has shown that sequential sampling models can be successfully applied in these domains.

In the DDM, decisions arise from a noisy evidence accumulation process that terminates as the accumulated evidence reaches one of two response boundaries[9]. In its simplest form, the DDM has four free parameters: the boundary separation parameter α governs how much evidence is required before committing to a decision. The upper boundary corresponds to the case when the accumulated evidence exceeds α, whereas the lower boundary corresponds to the case when the accumulated evidence exceeds zero. The drift rate parameter v determines the mean rate of evidence accumulation. A greater drift rate reflects a greater rate of evidence accumulation and thus faster and more accurate responding. In contrast, a drift rate of zero would indicate chance level performance, as the evidence accumulation process would have an equal likelihood of terminating at the upper or lower boundaries (for a neutral bias). The starting point or bias parameter z determines the starting point of the evidence accumulation process in units of the boundary separation, and the non-decision time τ reflects components of the RT related to stimulus encoding and/or response preparation that are unrelated to the evidence accumulation process. The DDM can account for a wide range of experimental effects on RT distributions during two-alternative forced choice tasks[9].

The application of sequential sampling models such as the DDM has several potential advantages over traditional softmax[7] choice rules. First, including RT data during model estimation may improve both the reliability of the estimated parameters[12] and parameter recovery[13], thereby leading to more robust estimates. Second, taking into account the full RT distributions can reveal additional information regarding the dynamics of decision processes[14,15]. This is of potential interest, in particular in the context of maladaptive behaviors in clinical populations[14,22–25] but also when the goal is to more fully account for how decisions arise on a neural level[10].

In the present case study, we focus on a brain region that has long been implicated in decision-making, reward-based learning and impulse regulation[26,27], the ventromedial prefrontal / medial orbitofrontal cortex (vmPFC/mOFC). Performance impairments on the Iowa Gambling Task are well replicated in vmPFC/mOFC patients[26,28,29]. Damage to vmPFC/mOFC also increases temporal discounting[30,31] (but see[32]) and risk-taking[33–35], impairs reward-based learning[36–38] and has been linked to inconsistent choice behavior[39–41]. Meta-analyses of functional neuroimaging studies strongly implicate this region in reward valuation[42,43]. Based on these observations, we reasoned that vmPFC/mOFC damage might also render RTs during decision-making less dependent on value. In the context of the DDM, this could be reflected in changes in the value-dependency of the drift rate v. In contrast, more general impairments in the processing of decision options, response execution and/or preparation would be reflected in changes in the non-decision time. Interestingly, however, one previous model-free analysis in vmPFC/mOFC patients revealed a similar modulation of RTs by value in patients and controls[40].

The present study therefore had the following aims. The first aim was a validation of the applicability of the DDM as a choice rule in the context of inter-temporal and risky choice. To this end, we first performed a model comparison of variants of the DDM in a data set of nine vmPFC/mOFC lesion patients and nineteen controls. Since recent work on reinforcement learning suggested that the mapping from value differences to trial-wise drift rates might be non-linear[15] rather than linear[14], we compared these different variants of the DDM in our data and ran posterior predictive checks on the winning DDM models to explore the degree to which the different models could account for RT distributions and the relationship between RTs and subjective value. Second, we re-analyzed previously published temporal discounting data in controls and vmPFC/mOFC lesion patients to examine the degree to which our previously reported model-free analyses[30] could be reproduced using a hierarchical Bayesian model-based analysis with the DDM as the choice rule. Third, we used the same modeling framework to analyze previously unpublished data from a risky decision-making task in the same lesion patients and controls to examine whether risk taking in the absence of a learning requirement is increased following vmPFC/mOFC damage. Finally, we explored changes in choice dynamics as revealed by DDM parameters as a result of vmPFC/mOFC lesions, and investigated whether lesions to vmPFC/mPFC impacted the degree to which RTs were sensitive to subjective value differences, both by examining DDM parameters and via DDM mixture models.


RT Distributions from One and Many Accumulators.

We began by identifying the conditions under which an individual accumulator model (n = 1) and a large-ensemble accumulator model (n = 1,000) predict RT distributions with similar shapes, defined as overlapping 95% confidence intervals over all five RT quintiles (0.1, 0.3, 0.5, 0.7, 0.9). We observed that an individual accumulator model and a large ensemble accumulator model predict RT distributions with virtually indistinguishable shapes if accumulation rates are at least moderately correlated (rv ≥ 0.6) with intermediate termination rules. Much higher rate correlations (rv ≥ 0.9) are necessary under extreme termination rules (Fig. 2). Similar results were obtained under a pooling mechanism (Fig. 2, rightmost column). Thus, RT distributions can be explained both by an individual model accumulator and by accumulating activity of large neuronal ensembles only if their activation dynamics are moderately correlated and RT is not governed by extremely fast or slow accumulators.

Predicted RT distributions as a function of ensemble size (N), termination rule (pN), and accumulation rate correlation (rv). Each panel shows the 0.1, 0.3, 0.5, 0.7, and 0.9 RT quantiles on a log-log scale (the x axis ranges from 10 0 to 10 3 the y axis ranges from 10 2 to 10 3 ) as a function of N, pN, and rv vary across columns and rows, respectively. We identified conditions (pN and rv) under which RT distributions were (i) invariant over the entire interval of N (i.e., 1,1,000 white panels], (ii) invariant with N over the interval (10,1,000 light gray panels), (iii) invariant with N over the interval (100,1,000 medium gray panels), and (iv) not invariant with N (dark gray panels).

RT Distributions Over a Range of Accumulator Ensemble Sizes.

We also investigated the invariance of RT distributions over a range of ensemble sizes to determine whether RTs may be invariant once some critical ensemble size is reached. Knowing that the same RT distributions are predicted whether an ensemble has 10 accumulators or 1,000 accumulators or more provides important insights into the properties of ensemble dynamics. It may be that ensembles need to be “large enough” but that the precise size of the ensemble has little effect on the RT that the ensemble generates.

Extending the analysis above, we investigated how RT distributions scale with accumulator ensemble size. We identified conditions under which a small-ensemble model (n = 10) and an intermediate-ensemble model (n = 100) predict RT distributions with similar shapes as a large-ensemble model (n = 1,000). RT distributions were invariant across ensembles with at least 10 accumulators if accumulation rates were at least modestly correlated (rv ≥ 0.3) and termination rules avoided the extremes (10% ≤ pN ≤ 90%). RT distributions were invariant across larger ensembles (n ≥ 100) with even lower rate correlations (rv ≥ 0.1). Only if accumulation rates were uncorrelated (rv = 0.0) or termination rules were extreme (pN = first and pN = last) did RT distributions vary dramatically in scale and shape with ensemble size (Fig. 2). Similar findings were observed when RT was determined by the pooling termination mechanism (Fig. 2, rightmost column) and with other accumulator model variants we investigated (SI Text, Robustness of Findings).

Variability in RT remains remarkably constant across different ensemble sizes over a large proportion of parameter space. Only for uncorrelated accumulators and extreme termination rules (pN = first or pN = last) does ensemble size affect RT variability, a lack of invariance anticipated by extreme value statistics. By analogy to the central limit theorem, we can perhaps anticipate why median RT remains invariant with ensemble size. However, there is no single mathematical property that might allow us to anticipate why variability in RT is invariant with ensemble size across correlated samples and intermediate termination rules, so we need to rely on simulation. To begin with, we know that for pN = first, variability decreases with ensemble size, and for pN = last, variability increases with ensemble size. So at some point in the range of termination rules we might expect an invariance of variability with ensemble size. What is striking is that this invariance is observed across all of the intermediate termination rules we investigated, not just a single value of termination rule. Also, for small ensemble sizes, variability is largely dominated by sampling variability across those few accumulators, and low correlations between accumulator rates may have only a small influence on the predicted variability from trial to trial. By contrast, for large ensemble sizes, variability is largely dominated by the between-trial variability introduced by the correlation between accumulator rates. These counteracting effects of ensemble size and correlation largely cancel each other out, producing invariance in RT distributions over a range of model parameters and model architectures (SI Text, Robustness of Findings) that we did not anticipate.

Invariance of ART with RT.

We then investigated how the trial-averaged ART from an individual accumulator can be invariant with RT even though RT is produced by a large ensemble of accumulators. Most accumulator models are based on thresholds that are invariant across RT (40 ⇓ –42), and multiple laboratories have observed invariant thresholds of neural discharge rate (6 ⇓ ⇓ ⇓ ⇓ ⇓ ⇓ ⇓ ⇓ ⇓ ⇓ –17). However, the ART of an individual accumulator participating in the ensemble is not guaranteed to reach the same value on each trial because of the stochastic nature of its accumulation process—on some trials it has reached θ and contributes to the measured RT, but on other trials it has not yet reached θ and so does not contribute (Fig. 1C). Though it is trivially true for a single accumulator that ART will be invariant with RT, it is unknown whether large ensembles of accumulators with intermediate termination rules and accumulation rate correlations reproduce the invariance of ART with RT that is regularly measured in neurophysiology.

Just like a neurophysiology experiment would randomly sample one neuron in some brain region, we randomly selected one accumulator in the ensemble and measured ART for that accumulator on each simulated trial. We then quantified how the slope of the linear regression of ART over RT varied for ensembles of 10, 100, and 1,000 accumulators (Fig. 3), mimicking the approach used in neurophysiological analyses. For small ensembles (n = 10), ART was invariant over RT under intermediate termination rules (10% ≤ pN ≤ 90%) and moderate rate correlations (rv ≥ 0.4). With many accumulators (n = 1,000), the invariance of ART with RT was only violated for the earliest termination rule (pN = first) and low accumulation rate correlations (rv ≤ 0.3). Under a pooling mechanism, the invariance of ART with RT was never violated. Thus, the invariance of ART with RT emerges from the dynamics of individual accumulators operating in large ensembles, even though the dynamics of no single accumulator uniquely determine RT.

Relationship between ART and RT as a function of ensemble size (N), termination rule (pN), and accumulation rate correlation (rv). Each panel shows the linear regression slope of ART on RT, expressed as colored pixels, for three ensemble sizes (Left, n = 10 Center, n = 100 Right, n = 1,000) and all combinations of termination rules and accumulation rate correlations. Hatched pixels indicate parameter combinations for which ART varied systematically with RT. Thus, beige, nonhatched pixels represent parameter combinations for which the slope of the linear relationship between ART and RT was zero and nonsignificant.

Relationship Between ART and θ.

Finally, we explored how the ART measured from an individual accumulator relates to the actual threshold of that accumulator (θ). In the neurophysiology literature, it is commonly assumed that the ART of an individual neuron represents a threshold like that in stochastic accumulator models. However, because ART is a trial-averaged measure and the true threshold of a neuron (θ) is unknown, we do not know how closely the value of ART approximates the value of θ.

As expected, with n = 1, ART was constant with RT and identical across trials, and the measured ART equaled the model parameter θ. However, in ensembles operating under intermediate termination rules (10% < pN < 90%) ART varied significantly between trials (Fig. 4). Thus, individual accumulators acting in ensembles do not reach the same activation level at RT on each trial, meaning that measured ART is not necessarily equivalent to the threshold specified by the model (θ) for any given accumulator. Analogous nonequivalence was observed for pooling mechanisms. We further observed that the termination rule determined how closely ART approximated θ. Under early termination rules (pN < 50%), average ART was less than θ. Under late termination rules (pN > 50%), average ART was greater than θ. Under the median termination rule (pN = 50%), average ART equaled θ this entails that the relationship between ART and θ cannot be determined without knowledge of the termination rule. The accumulation rate correlation determined the magnitude of variability in ART. The more homogeneous the accumulators, the smaller the variability in ART and the closer the agreement with θ, which implies that the degree of stochastic variation in ART is indicative of the homogeneity of the accumulation process in the ensemble. Together, though these findings demonstrate unanticipated complexity in the relationship between ART measured in an individual accumulator and the true θ that defines its dynamics, in conditions under which one accumulator resembles many, the average ART measured from neurons is a fair proxy of the relation of θ to RT.

Distribution of measured activation level around RT (ART) between trials in a randomly selected accumulator as a function ensemble size (N), termination rule (pN), and accumulation rate correlation (rv). The x axis ranges from 10 0 to 10 3 , and the y axis ranges from 10 2 to 10 3 . Other conventions as in Fig. 2. Individual threshold (θ, red line) was identical across accumulators. Thus, correspondence between ART and θ is indicated by overlap of distributions (black lines) and threshold (red line).


Inhibitory control adapts to contextual statistics

Subjects performed an anticipatory version of the stop signal task (Fig. 2A see Adaptive stop signal task) similar to that reported previously (Dunovan et al., 2015), with the exception of how contextual information was conveyed to the subject. Rather than explicitly cuing the subject as to the probability of seeing a stop signal on each trial, as was used in our previous study, subjects in the current experiment had to rely on performance feedback to learn the temporal distribution of stop signals in one of three contexts.

Adaptive stop signal task and contextual SSD statistics. A, Anticipatory stop signal task. Top, On Go trials, subjects were instructed to press a key when the ascending bar crossed a target line, always occurring on 520 ms after trial onset. Feedback was given informing the subject if their response was earlier or later than the Go target (maximum 100 points). Bottom, On Stop trials, the bar stopped and turned red before reaching the target line. If no response was made (correct), the subject received a bonus of 200 points. Failure to inhibit the key press resulted in a −100 point penalty. B, Stop signal statistics across contexts. Distributions show the sampling distributions for SSDs on context trials in the Early (blue), Uniform (gray), and Late (purple) groups. Early and Late SSDs were normally distributed (parameters stated as in-figure text, N(μ, σ)). Below the distributions, each row of tick marks indicates the context SSDs for a single example subject in each group. Bottom row of red tick marks indicates the five probe SSDs included for all subjects regardless of context.

To assess behavioral differences across contexts, we compared accuracy on stop signal trials at each Probe SSD across groups as well as the mean RTs on correct (response on Go trial) and error (i.e., response on Stop trial) responses. Separate one-way ANOVAs revealed a significant main effect of context across groups on both correct RTs, F(2,72) = 10.07, p < 0.001, and error RT (responses on stop signal trials), F(2,72) = 21.72, p < 0.00001. Consistent with our hypothesis, we found a significant interaction between context condition and Probe SSD, F(2.23,80.15) = 3.60, p = 0.027 (Fig. 3A). Shifting the mean of the context SSD distribution later into the trial led to delayed responding on Go trials (Fig. 3A, middle, right) as well as greater stopping accuracy on probe trials (Fig. 3A, left) in the Uniform and Late groups relative to the Early group. Thus, as predicted, participants could reliably learn to modulate their inhibitory control efficiency based on the probabilistic structure of prior stop signal timing (Shenoy and Yu, 2011).

Effects of context on stop accuracy and RTs. A, Subject-averaged stop accuracy (left) and cumulative RT distributions for correct (Go trials middle) and error (Stop trials right) responses in the Early (blue), Uniform (gray), and Late (purple) contexts. B, Posterror slowing following failed Stop trials in each context and subsequent decay over five trials. C, The posterror slowing observed immediately after a failed stop (terr + 1) in each context (e.g., first data point in B). Error bars and shaded area represent the 95% CI calculated across subjects.

We next examined whether failed Stop trials elicited any systematic changes in RT on subsequent trials. Figure 3B shows the immediate slowing and subsequent decay in RTs following a stop error (probe trials only), calculated with respect to the average RT on the five trials that preceded the error. A one-way ANOVA revealed a significant effect of context on the degree to which subjects slowed responses immediately following stop errors, F(2,72) = 4.27, p = 0.018. Unlike the observed effects on RT and accuracy, which scaled with differences in the mean SSD in each context, group differences in posterror slowing appeared to be driven by the variance of SSDs, with stop errors eliciting greater slowing in the Uniform context than in the Early and Late contexts (Fig. 3C). Collectively, these findings suggest that adaptive control is sensitive to multiple task dimensions and that these dimensions manifest in dissociable behavioral profiles.

Static DPM parameter identifiability

The DPM (Dunovan et al., 2015) assumes that an execution decision is made when an accumulating execution process, with onset time tr and drift rate ve, crosses an upper decision boundary a (see Computational models). On Stop trials, a nested braking process, with negative drift rate vb, is initiated at the current state of the execution process at the time of the SSD and accumulates back toward the lower boundary (always set equal to 0 Fig. 1A). The model successfully cancels an action when the braking process reaches the lower bound before execution process terminates at the upper execution threshold.

For a cognitive model to be informative, it is important to verify that its parameters are identifiable, or able to be reliably estimated from observable measures of the target behavior. The issue of model identifiability is particularly relevant to novel variants of sequential sampling models, as several recently proposed models within this class have been found to exhibit poor identifiability despite providing convincing fits to experimental data (Miletić et al., 2017 White et al., 2018). More common variants, however, such as the drift-diffusion model and linear ballistic accumulator, are reasonably identifiable with sufficient trial counts and the application of appropriate optimization procedures (Ratcliff and Tuerlinckx, 2002 van Ravenzwaaij and Oberauer, 2009 Visser and Poessé, 2017).

In practice, the identifiability of a model can be assessed by performing fits to simulated data, for which the true parameters are known, and comparing the recovered estimates. To evaluate the identifiability of parameters in the DPM, we adopted the following procedure. First, we identified three generative parameter sets that approximated the average stopping accuracy curve and RT distributions observed in each context condition, ensuring that the generative parameter sets yielded plausible behavioral patterns. Each of these three generative parameter sets served as hyperparameters describing the mean of a normally distributed population from which 25 “subject-level” parameter sets were sampled and used to simulate 1000 trials (for sampling details, see Computational models). This produced a simulated dataset similar in size and dimension to that of the empirical data while capturing the assumption that subject parameter values in each context vary around a shared mean. Each of the three group-level parameter sets was used to generate 20 simulated datasets (each comprised of 25 randomly sampled subjects with 1000 trials per subject). The DPM was then fit to the subject-averaged stop accuracy and RT quantiles for each of the simulated datasets following the optimization routine outlined in Materials and Methods (i.e., here, “subject-averaged” data were calculated by first estimating the stop accuracy curve over probe SSDs, correct RT quantiles, and error RT quantiles for each subject then calculating the mean for each of these values across subjects).

Parameter estimates recovered from the fits are summarized in Figure 4A–D, with the recovered values for each parameter plotted against the respective generative value for each of the three sets. All parameters were recovered with a high degree of accuracy. In addition to accurately recovering generative parameter values, the DPM provided high-quality fits to the datasets generated from all three parameter sets, as shown by the positively skewed distribution of χ 2 values in Figure 4E. The results of this simulation and recovery analysis suggest that parameters of the DPM are identifiable when fitting group-level data and are robust to variability in the parameter values of individual subjects.

Simulation and parameter recovery analysis of DPM. A, True and estimated boundary height (a blue), (B) braking drift rate (vb red), (C) onset time (tr purple), and (D) execution drift rate (ve green) for three generative parameter sets. Lines indicate true generative parameter means. Lighter colors represent the range of sampled subject-level estimates. Squares represent estimated parameter means. Error bars represent ±1 SD. E, Distribution of χ 2 values for fits to all 60 simulated datasets (gray).

Individual subject DPM fits

To better understand the cognitive mechanisms underlying the observed effects of feedback on timing and control behavior across contexts, we fit RT and stop accuracy data to the DPM. To isolate the parameters that were influenced by the experimental manipulations, the model fits were performed in multiple consecutive stages. To reduce the combinatorial space of possible model configurations, we adopted a forward stepwise model selection approach where we began by comparing models in which a single parameter was free to vary across conditions (Table 3), execution drift (ve), braking drift (vb), onset delay (tr), or boundary height (a).

Static fit statistics for early and late contexts

Because the behavioral effects of interest were driven by trial-by-trial feedback (i.e., at the subject level) as well as differences in the sampling distributions of SSDs across contexts (i.e., at the group level), single parameter models were fit to behavioral data for individual subjects, allowing select parameters to vary between the first and second half of trials in the experiment, and to data at the group level, allowing parameters to vary across contexts.

Consistent with our previous study in which subjects proactively modulated the drift rate of the execution process (ve) in a probabilistic cueing paradigm (Dunovan et al., 2015), we found that allowing ve to vary between the first and second half of trials provided the best average fit across subjects in the current experiment (AICv = −206.5, BICv = −202.2, SD = 23.47 Fig. 5A). Comparable IC scores were provided by alternative models (e.g., AICtr = −201.70, BICtr = −199.34, SD = 21.86). Thus, we also inspected the number of subjects for which each model outperformed the rest and found that the ve model was the best fitting model for more subjects (N = 33) than any alternative models (see Fig. 5B). Parameter estimates (Fig. 5C) showed that the ve values tended to increase over the course of the experiment, higher in the second compared with the first half of trials. Notably, this effect was most pronounced in the Early context, followed by the Uniform and Late contexts, respectively, suggesting that subjects in the Early context were more sensitive to timing errors than those in the Uniform and Late contexts.

Single-subject DPM fits and model comparison. Variants of the DPM were fit to all individual subject datasets with different parameters left free to vary between the first and second half of trials. A, Mean subject BIC and AIC scores for boundary height (a blue), execution drift rate (ve green), braking drift rate (vb red), and onset delay (tr yellow) (dark circles). Lighter circles represent an individual subject. Error bars represent 95% CI. B, Same mean values as in A, but with size of the dots scaled to reflect the number of subjects for which each model had the lowest AIC/BIC score. White text indicates the number of subjects best described by each model (e.g., factor used to scale the size of the dots). C, Observed increase in ve values estimated for the first (v1) and second half (v2) of trials in each context. Error bars represent 95% CI. Schematic represents the relative increase in ve in Early (cyan), Uniform (gray), and Late (purple) contexts, compared with a shared initial drift rate (i.e., before learning black dotted line).

Contextual modulation of DPM parameters

Next, we investigated whether the same mechanism was able to account for the observed differences in RT and stop accuracy at the group level, by first optimizing parameters to the average data in the Uniform context, where the timing of the stop signal is unpredictable, and then to the average data in the Early and Late contexts, holding all parameters constant at the best fitting Uniform values, except for one or two parameters of interest. The model that best accounted for differences in the stop accuracy and RT quantiles across the three context conditions was selected for further investigation of feedback-dependent learning mechanisms. The fitting routine (for details, see Materials and Methods) was repeated a total of 20 times using different initialization values for all parameters at the start of each run to avoid biases in the optimization process. The summary of fits to the Uniform context data is provided in Table 1. In line with our previous findings (Dunovan et al., 2015), as well as the outcome of single-subject fits in the current study, leaving the execution drift rate free provided a better account of context-dependent changes in behavior compared with alternative single-parameter models (Best-Fit AICve = −363.02 Fig. 6A).

Group-level model comparison and best fit predictions across context. A, AIC (dark) and BIC (light) scores for all single-parameter models, allowing execution boundary height (a blue), execution drift rate (ve green), braking drift rate (vb red), or onset delay (tr yellow) to vary across contexts. Three dual-parameter models were also included to test for possible benefits of allowing ve (best fitting single parameter model) to vary along with a (teal), vb (purple), or tr (dark green). Error bars indicate the 95% CI. B, Qualitative effects of context on a (top) and ve parameter estimates (bottom) in the Early and Late contexts. C, Model predicted data (lines and larger transparent circles) simulated with best fit parameters from the ve, a model, corresponding to dotted circle in A overlaid on the average empirical data for Early (cyan), Uniform (gray), and Late (purple) contexts. Error bars represent 95% CI.

To further test the relationship between execution drift rate and context, we performed another round of fits to test for possible interactions between the execution drift rate and a second free parameter, boundary height (a), braking drift rate (vb), or onset delay (tr). The AIC and BIC scores from these fits showed that a combination of boundary height and execution drift rate (ve and a) provided the best overall fit to the data (Best Fit AICa,ve = −372.26), reasonably exceeding that of the drift-only model (|AICve −AICa,ve| = 9.24) to justify the added complexity of the dual-parameter model. Figure 6C shows a qualitative assessment of the a and vE model's goodness of fit, revealing a high degree of overlap between the simulated and observed stop accuracy and RT data in both Early and Late conditions. These results suggest that there may be two targets of learning in the decision process: a strong modulation of the execution drift rate and a subtler modulation of the boundary height.

Adaptive DPM with dual-learning mechanisms

It is not clear from the preceding analysis whether error-driven changes in the drift rate and boundary height are able to capture trial-to-trial adjustments of response speed and stop accuracy as statistics of the environment are learned experientially. Here we explore how drift rate and boundary height mechanisms adapt on a trialwise basis to different sources of feedback to drive context-dependent control and decision-making.

We implemented two forms of corrective learning (Fig. 7A): one targeting the execution drift rate v and another targeting the height of the execution boundary a. We hereafter denote execution drift rate as v rather than ve to avoid multiple subscripts in the adaptive model equations. On correct Go trials (Fig. 7A, left, middle), the current drift rate (vt) was updated (vt+1 Eq. 6) to reflect the signed difference between the model's RT on the current trial and the target time (T G = 520 ms), increasing the drift rate following “slow” responses (i.e., RTt > T G ) and decreasing the drift rate following “fast” responses (i.e., RTt < T G ). On failed Stop trials, vt was updated according to the same equation but with the error term reflecting the difference between RTt and the trial response deadline (T S = 680 ms), thus slowing the drift rate to reduce the probability of failed stops in the future. This form of RT-dependent modulation in the drift rate is motivated by recent findings demonstrating adaptation of action velocity by dopaminergic prediction error signaling in the striatum (Yttri and Dudman, 2016). In the context of the “believer-skeptic” framework (Dunovan and Verstynen, 2016), fast RT errors could reinforce the “skeptic” (i.e., indirect pathway) and suppress the “believer” (i.e., direct pathway) by decreasing dopaminergic tone in the striatum as follows: In addition to receiving feedback about errors in action timing, subjects also received penalties for failing to suppress responses on Stop trials. In the adaptive DPM, failed stops (Fig. 7A, right) caused an increase in the boundary height (a0) according to a δ function with height βt and decayed exponentially on each subsequent trial (at<err>) until reaching its baseline value a0 or until another stop error occurred (Eq. 7) as follows: This form of adaptation in boundary height is motivated by physiological evidence that the STN plays a critical role in setting threshold for action execution and that this relationship is modulated by error commissions (Cavanagh et al., 2014). On all correct Go trials and the first failed Stop trial, the timing errors were scaled by the same learning rate (α0). An additional parameter was included to modulate the sensitivity (π) to stop errors over time (Eq. 8), allowing the model to capture an observed decrease in the stop accuracy over time in each of context groups (Fig. 8C). According to Equation 8, π dropped exponentially over time at a rate p, acting as a scalar on αt (Eq. 9) and βt (Eq. 10) before updating values of drift rate (Eq. 6) and boundary height (Eq. 8) after a failed stop. Higher values of p led to more rapid decay of π toward zero and, thus, a more rapid desensitization to Stop trial errors as follows:

Adaptive DPM parameter recovery and learning predictions in Uniform context. A, Schematic showing how the execution drift rate is modulated following timing errors on Go trials (left) and how the boundary height is modulated following failed inhibitions on Stop trials. B, Parameter recovery results for α (left, teal), β (middle, purple), and p (right, orange) parameters in the primary version of the adaptive DPM. Horizontal lines indicate true generative parameter means. Light colors represent the range of sampled subject-level estimates. Squares represent estimated parameter means. Error bars represent ±1 SD. Subject-averaged timeseries (dark line) and 95% CI (gray area) showing the (C) RT on Go trials and (D) accuracy on Stop trials. Each point in the timeseries (n = 30) represents the windowed average RT/accuracy over ∼30 trials. The corresponding adaptive model predictions are overlaid (dotted line), averaged over simulations to each individual subject's data.

Adaptive DPM modulates behavior according to context-specific control demands. A, Average stop accuracy curves (left) and correct (middle) and error (right) RT distributions predicted by adaptive model simulations in the Early (blue) and Late (purple) contexts (initialized with the optimal parameters of the Uniform context). B, Empirical timeseries of go RTs with model predictions overlaid for Early (left), Uniform (middle), and Late (right) contexts. C, Empirical and model predicted timeseries of stop accuracy for the same conditions as in B.

Adaptive DPM parameter identifiability

Before fitting the adaptive DPM to observed subject data, we first performed a parameter recovery analysis, similar to that conducted for the static DPM, to ensure that the learning rate and decay parameters introduced in the adaptive model could be reliably identified (for procedural details, see Materials and Methods). The parameter recovery results are displayed in Figure 7B, showing the recovered estimates for α, β, and p overlaid on the true values. For all three generative parameter sets, the optimization procedure for fitting the adaptive DPM accurately recovered the true values of α, β, and p. In one case (recovery estimates of α for parameter set 2), the 95% CI of recovered parameter estimates failed to overlap with the range of generative values however, the trend of recovered α estimates followed the trend of the true values across parameter sets 1 (highest α), 2 (medium α), and 3 (lowest α).

Adaptive fits in the uniform context

After confirming the identifiability of learning parameters in the adaptive DPM, we next sought to confirm that the trial-averaged behavior of the adaptive model was preserved after fitting the learning rates (e.g., stop accuracy curve on probe trials and RT quantiles on correct and error trials). The adaptive DPM's predictions are indeed closely aligned with the empirical statistics used to fit the static model (adaptive DPM χ 2 static = 0.005, static DPM χ 2 static = 0.011 Table 4). Although this is not necessarily surprising, it is promising to confirm that introducing feedback-dependent adaptation in the drift rate and boundary height parameters does not compromise the model's fit to trial-averaged statistics. Next, we inspected the degree to which this model captured changes in Go trial RT and Stop trial accuracy in the Uniform context. Indeed, the predicted time course of both behavioral measures showed a high degree of correspondence with the observed behavioral patterns (Fig. 7C,D). These qualitative fits show that it is indeed possible to capture feedback-dependent changes in RT and stop accuracy with the specific types of error learning in ve (Eq. 6) and a (Eq. 7) parameters. Without an alternative model with which to compare, however, it is impossible to conclude anything about the specificity of these particular learning rules (e.g., the hypothesized dependencies of ve and a on timing and control errors, respectively). Therefore, we compared the fits afforded by the primary version of the adaptive DPM with an alternative version in which a was modulated by timing errors and ve was modulated by failed stops. In the alternative version of the model, Equation 6 becomes at+1 = at · , increasing and decreasing a following fast (RTt < 520 ms) and slow (RTt > 520 ms) responses on Go trials, and increasing a on failed Stop trials in proportion to the speed of response speed. Additionally, Equation 7 becomes υterr = υ0 − βteterr , slowing ve by a magnitude of βt. Indeed, this alternate version of the adaptive model afforded an improvement over the static model fits to trial-averaged statistics (adaptive DPMalt χ 2 static = 0.007, static model χ 2 static = 0.011 Table 4) however, compared with the adaptive DPM in which ve was modulated by timing errors and a was increased following failed stops (χ 2 adapt = 0.235, AIC = −326.4, BIC = −320.1), fits of the alternative adaptive model (χ 2 adapt = 0.861, AIC = −248.6, BIC = −242.3) provided a worse fit to feedback-dependent changes in RT and stop accuracy over time in the Uniform context (Table 4).

Adaptive dependent process model fit statistics

Adaptive predictions in early and late contexts

Consistent with our original hypotheses, fits of the primary version of the adaptive DPM to behavior in the Uniform context highlight two possible mechanisms for acquiring the prior on the SSD: adaptive modulation of response speed by the drift rate and cautionary increases in boundary height following control errors. To confirm that these mechanisms work together to adaptively learn based only on the statistics of previous input signals, we took the average parameter scheme from the Uniform context fits and simulated each subject in the Early and Late contexts. If the context-dependent changes in the RT distributions and stop accuracy are indeed a reflection of the proposed learning mechanisms, then the model simulations should reveal similar RT and accuracy time courses as in the observed behavior.

Figure 8A shows the simulated stop-curve and RT distributions generated by the adaptive model based on feedback in the Early and Late conditions. As in the observed data (Fig. 3A), adaptation to Early SSDs led to impaired stopping accuracy, but faster RTs relative to simulated predictions in the Late condition. In Figure 8B, C, the middle panels show the same trial-binned RT and stop accuracy means as in Figure 7C, D (Uniform condition), flanked by corresponding time courses from simulations to Early (left) and Late (right) conditions. The adaptive model predictions show a high degree of flexibility, conforming to idiosyncratic changes in the trialwise behavioral dynamics within each context SSD condition. For instance, the RTs in the Early condition exhibit a relatively minor and gradual decay over the course of the experiment (Fig. 8B, left), contrasting markedly from the early increase and general volatility of RTs in the Late condition (Fig. 8B, right). The adaptive DPM largely captures both patterns, underscoring feedback-driven adaptation in the drift rate as a powerful and flexible tool for commanding inhibitory control across a variety of settings. In addition to predicting group differences in the time course of RTs, the simulations in Figure 8C show a striking degree of precision in the model-estimated changes in stop accuracy, both over time and between groups.

Because the static model fits revealed marginal evidence for the drift-only model (Fig. 6A), we next asked whether this simpler model was able to account for the learning-related behavioral changes with the same precision as the dual-learning (i.e., drift and boundary) model. To test this hypothesis, we ran simulations in which the boundary learning rate was set to zero, thereby leaving only the drift rate free to vary in response to feedback. Figure 9A shows the error between observed and model-predicted estimates for each of the behavioral measures in Figure 3 (e.g., RT, stop accuracy, and posterror slowing) based on 20 simulations of the drift-only and dual-learning models. Compared with the drift-only model, the dual-learning model showed no significant benefits in terms of fit to the trialwise RT (t(24) = 1.09, p = 0.28) or accuracy (t(24) = 0.23, p = 0.82) but showed a marked improvement in the fit to posterror slowing (t(24) = −6.91, p < 0.00001) (Fig. 9A). Importantly, the interaction of drift rate and boundary adaptation in the dual-learning model not only reduced the error in the model fit, but recovered the same qualitative pattern of posterror slowing across contexts observed in the data (Fig. 9B). In contrast, the drift-only model predicted the largest posterror slowing effect in the Early condition (Fig. 9B, left). This is particularly revealing because no information about the observed posterror slowing was included in the adaptive cost function when fitting the learning-rate parameters. Collectively, these results suggest that goal-directed tuning of movement timing (i.e., RT) and control (i.e., stop accuracy) is best described by feedback-driven changes in the drift rate and boundary-height parameters of accumulation-to-bound decisions.

Utility of including boundary adaptation compared with drift-only model. A, Relative error of simulated compared with observed RT, accuracy, and posterror slowing on probe trial measures based on 20 simulated datasets for the drift-only and drift and bound adaptive models. Posterror slowing in each context condition as predicted by the (B) drift-only and (C) drift and bound models. For comparison with patterns observed in empirical data, see Figure 3C. Error bars indicate the 95% CI around the mean.

Is there an R implementation of the linear ballistic accumulator model or Ratcliff's diffusion model for measuring response time and accuracy? - Psychology

Psychon Bull Rev (2017) 24:547–556 DOI 10.3758/s13423-016-1081-y

The EZ diffusion model provides a powerful test of simple empirical effects Don van Ravenzwaaij1 · Chris Donkin2 · Joachim Vandekerckhove3

Published online: 28 June 2016 © The Author(s) 2016. This article is published with open access at

Abstract Over the last four decades, sequential accumulation models for choice response times have spread through cognitive psychology like wildfire. The most popular style of accumulator model is the diffusion model (Ratcliff Psychological Review, 85, 59–108, 1978), which has been shown to account for data from a wide range of paradigms, including perceptual discrimination, letter identification, lexical decision, recognition memory, and signal detection. Since its original inception, the model has become increasingly complex in order to account for subtle, but reliable, data patterns. The additional complexity of the diffusion model renders it a tool that is only for experts. In response, Wagenmakers et al. (Psychonomic Bulletin & Review, 14, 3–22, 2007) proposed that researchers could use a more basic version of the diffusion model, the EZ diffusion. Here, we simulate experimental effects on data generated from the full diffusion model and compare the power of the full diffusion model and EZ diffusion to detect those effects. We show that the EZ diffusion model, by virtue of its relative simplicity, will be sometimes better able to detect

Electronic supplementary material The online version of this article (doi:10.3758/s13423-016-1081-y) contains supplementary material, which is available to authorized users.  Don van Ravenzwaaij

University of Groningen, Department of Psychology, Grote Kruisstraat 2/1, Heymans Building, room 169, 9712 TS Groningen, The Netherlands

University of New South Wales, Sydney, Australia

University of California, Irvine CA, USA

experimental effects than the data–generating full diffusion model. Keywords Sequential accumulator models · Response time analysis · Power · Model complexity

In everyday life, we are constantly confronted with situations that require a quick and accurate action or decision. Examples include mundane tasks such as doing the dishes (we do not want to break china, but we also do not want to spend the next hour scrubbing), or vacuum cleaning (we like to get as many nooks and corners as possible, but also want to get back to finishing that paper), but also more serious activities, such as typing a letter or performing a placement test. For all these actions, there exists a trade–off, such that greater speed comes at the expense of more errors. This phenomenon is called the speed–accuracy trade–off (Schouten & Bekker, 1967 Wickelgren, 1977). In experimental psychology it is common practice to study this speed–accuracy trade–off with relatively simple tasks. More often than not, the task requires participants to decide between one of two alternatives as quickly and accurately as possible. Notable examples include the lexical decision paradigm (Rubenstein et al., 1970) in which the participant is asked to classify letter strings as English words (e.g., LEMON) or non–words (e.g., LOMNE), and the moving dots task (Ball & Sekuler, 1982) in which participants have to determine whether a cloud of partly coherently moving dots appears to move to the left or to the right. Typically, the observed variables from these and other two– alternative forced choice tasks are distributions of response times (RTs) for correct and incorrect answers. One way to analyze the data from these kinds of tasks is to draw inferences based on one of, or both, the mean of the correct RTs,

Psychon Bull Rev (2017) 24:547–556

or the percentage of correct responses. These measures, however, do not speak directly to underlying psychological processes, such as the rate of information processing, response caution, and the time needed for stimulus encoding and non–decision processes (i.e., response execution). They also do not address the speed–accuracy trade–off. The motivation among cognitive psychologists to be able to draw conclusions about these unobserved psychological processes has led to the advent of sequential accumulator models. A prominent example of such a model is the diffusion model (Ratcliff, 1978). The model assumes that an observer accumulates evidence for responses until a threshold level of evidence for one of the responses is reached. The time taken to accumulate this evidence, plus a non–decision time, gives the observed response time, and the choice is governed by which particular threshold is reached. Over the last four decades, as increasingly complex data patterns were observed, the diffusion model grew in order to account for these data. Ratcliff (1978) added the assumption that accumulation rate varied from trial to trial in order to account for the observation that incorrect responses were slower than correct responses. Ratcliff and Rouder (1998) assumed that the starting point of evidence could vary from trial to trial (following Laming, 1968), allowing them to account for incorrect responses that were faster than correct responses. Finally, Ratcliff and Tuerlinckx (2002) also proposed that non-decision time would vary across trials, an assumption that allowed the model to account for patterns in the speed with which the fastest responses were made. The version of the diffusion model that includes all components of between–trial variability is known henceforth as the ‘full’ diffusion model. As a theoretical model of decision–making, the full diffusion model is impressive – it accounts for a wide range of reliable empirical phenomena. Among others, the diffusion model has been successfully applied to experiments on perceptual discrimination, letter identification, lexical decision, categorization, recognition memory, and signal detection (e.g., Ratclif,f 1978 Ratcliff et al., 2004, 2006 Klauer et al., 2007 Wagenmakers et al., 2008 van Ravenwaaij et al., 2011 Ratcliff et al., 2010). Using the diffusion model, researchers have examined the effects on decision making of alcohol intake (van Ravenzwaaij et al. 2012), playing video games (van Ravenzwaaij et al. 2014), sleep deprivation (Ratcliff & van Dongen, 2009), anxiety (White et al., 2010), and hypoglycemia (Geddes et al. 2010). The model has also been applied extensively in the neurosciences (Ratcliff et al., 2007 Philiastides et al., 2006 Mulder et al., 2012). In recent years, researchers have begun to use the full diffusion model as a measurement model. A diffusion model analysis takes as input the entire RT distribution for both correct and incorrect responses. The model maps the

observed RTs and error rates into a space of psychological parameters, such as processing speed and response caution. Such an analysis has clear benefits over traditional analyses, which make no attempt to explain observed data in terms of psychologically meaningful processes. A full diffusion model analysis is complicated, for two reasons. First, the model is complicated to use. Parameters for the model are estimated using optimization on functions that involve numerical integration and infinite sums.While there have been valiant efforts to make such models easier to use (Donkin et al., 2009, 2011 Vandekerckhove & Tuerlinckx, 2007, 2008 Voss & Voss, 2007), the application of a full diffusion model remains an approach most suited for experts. Second, the model itself may be more complex than is required by the data it is being used to fit, at least when the model is being used as a measurement model. When the data do not provide enough constraint on the estimation of model parameters, the more complex model will overfit the data, which leads to increased variability in parameter estimates.1 In response to the complexity of the full diffusion model, Wagenmakers et al. (2007) advocated for the use of the “EZ diffusion model”. The EZ diffusion model forgoes between– trial variability in accumulation rate, starting point, and non-decision time, as well as a–priori response bias (but see Grasman et al., 2009). By removing all of these additional model components, no fitting routine is required to estimate the parameters of the EZ diffusion model. Instead, the EZ diffusion takes RT mean, RT variance, and percentage correct, and transforms them into a mean rate of information accumulation, response caution, and a non–decision time. The EZ model has been heralded for the ease with which it can be applied to data. However, critics have claimed that it is “too EZ” (Ratcliff, 2008, but see Wagenmakers et al., 2008). It is true that the EZ diffusion model can not account for the very broad range of data patterns for which the full diffusion was developed. However, the patterns of fast and slow errors, and shifting leading edges, that warrant the full complexity of the diffusion model are often observed in experiments that are specifically designed to observe such patterns, usually involving many thousands of trials. It is unclear whether such complex patterns can be detected in data coming from simpler experiments, at least to the point that they constrain the estimation of additional model parameters.

see this, imagine an experimental design in which the betweentrial variability in accumulation rate parameter in the diffusion model, ν, is unidentifiable (i.e., every value of νˆ can yield the same likelihood value). If we were to fit a model to data that includes ν, the maximum likelihood value of the other parameters in the model, θ , will be estimated conditional on νˆ . Because the parameters in the diffusion model are correlated, the value of θˆ depends on ν. ˆ As such, estimating ν artificially increases the variability in estimates of θ .

Psychon Bull Rev (2017) 24:547–556

Fig. 1 The diffusion model and its parameters as applied to a moving dots task (Ball & Sekuler, 1982). Evidence accumulation begins at starting point z, proceeds over time guided by mean drift rate ν, and stops whenever the upper or the lower boundary is reached. Boundary separation a quantifies response caution. Observed RT is an additive combination of the time during which evidence is accumulated and non–decision time Ter

Van Ravenzwaaij and Oberauer (2009) examined the ability of both the full and EZ model to recover the mean structure and individual differences of parameter values used to generate fake data. The authors concluded that EZ was well capable of recovering individual differences in the parameter structure, but exhibited a bias in the recovery of the mean structure. Interestingly, the full diffusion model was unable to recover individual differences in the across– trial variability parameters, casting doubt on the added value of these extra parameters in more “typical” data sets. Recovery of the mean structure depended very much on the specific implementation. Here, we show that the additional complexity of the full diffusion model has adverse consequences when one aims to use the model to detect the existence of an empirical effect. Simplifying the parametric assumptions of the diffusion model leads to increased precision in parameter estimation at the cost of possible bias due to model mis–specification (bias–variance trade–off (Geman et al., 1992)). However, for the purposes of decision–making, bias is not necessarily detrimental (Gigerenzer & Brighton, 2009) while higher precision leads to stronger and more accurate inference (Hastie et al., 2005). One of the aims of this manuscript is to help non–experts approach the notion of when to use the EZ model over the full diffusion model. We simulate data in which we systematically vary the three main diffusion model parameters between two conditions: drift rate, boundary separation, and non-decision time. The data are simulated from a full diffusion model. We then show that, compared to the full diffusion model, the EZ diffusion model is the more powerful tool for identifying differences between two conditions on speed of information accumulation or response caution. We show that this holds across simulations that differ in the number of trials per

participant, the number of participants per group, and the size of the effect between groups. We compare the proportion of times that the EZ and the full diffusion model detected a between–group effect on either mean speed of information accumulation, response caution, or non– decision time parameters (in terms of the result of an independent-samples t–test). The remainder of this paper is organized as follows: in the next section, we discuss the diffusion model in detail. We examine the simple diffusion model, the full diffusion model, and EZ. In the section after that, we discuss our specific parameter settings for our simulation study. Then, we present the results of our simulations. We conclude the paper with a discussion of our findings and the implications for cognitive psychologists looking to analyze their data with the diffusion model.

The diffusion model In the diffusion model for speeded two–choice tasks (Ratcliff, 1978 Vandekerckhove & Tuerlinckx, 2007 van Ravenzwaaij et al., 2012), stimulus processing is conceptualized as the accumulation of noisy evidence over time. A response is initiated when the accumulated evidence reaches a predefined threshold (Fig. 1). The decision process begins at starting point z, after which information is accumulated with a signal–to–noise ratio that is governed by mean drift rate ν and within–trial standard deviation s.2 Mean 2 Mathematically, the change in evidence X

is described by a stochastic differential equation dX(t) = ν · dt + s · dW (t), where s · dW (t) represents the Wiener noise process with mean 0 and variance s 2 · dt. The standard deviation parameter s is often called the “diffusion coefficient” and serves as a scaling parameter that is often set to 0.1 or to 1.

Psychon Bull Rev (2017) 24:547–556

drift rate ν values near zero produce long RTs and near– chance performance. Boundary separation a determines the speed–accuracy trade–off lowering boundary separation a leads to faster RTs at the cost of more errors. Together, these parameters generate a distribution of decision times DT . The observed RT, however, also consists of stimulus– nonspecific components such as response preparation and motor execution, which together make up non–decision time Ter . The model assumes that Ter simply shifts the distribution of DT , such that RT = DT +Ter (Luce, 1986). Hence, the four core components of the diffusion model are (1) the speed of information processing, quantified by mean drift rate ν (2) response caution, quantified by boundary separation a (3) a–priori response bias, quantified by starting point z and (4) mean non–decision time, quantified by Ter .

needs is to execute a few lines of code and the EZ parameters will be calculated instantaneously. The closed–form solutions for the EZ diffusion model require the assumption that there is no between–trial variability in drift rate, η, starting point, sz , or non–decision time, st . Further, the model assumes that responses are unbiased (i.e., z is fixed at half of a). The EZ diffusion model converts RT mean, RT variance, and percentage correct, into the three key diffusion model parameters: mean drift rate ν, boundary separation a, and non–decision time Ter . The EZ diffusion model parameters are computed such that the error rate is described perfectly. EZ calculates diffusion model parameters for each participant and each condition separately. For applications of the EZ diffusion model, see e.g., Schmiedek et al. (2007), Schmiedek et al. (2009), Kamienkowski et al. (2011), and van Ravenzwaaij et al. (2012).

Full diffusion The simple diffusion model can account for most data patterns typically found in RT experiments, but has difficulty accounting for error response times that have a different mean from correct response times (Ratcliff, 1978). One way for the model to produce slow errors is with the inclusion of across–trial variability in drift rate. Such variability will lead to high drifts that produce fast correct responses and low drifts that produce slow error responses. One way for the model to produce fast errors is with the inclusion of across–trial variability in starting point (Laming, 1968 Link, 1975 Ratcliff & Rouder, 1998). Such variability will cause most errors to happen because of an accumulator starting relatively close to the error boundary, whereas correct responses are still relatively likely to happen regardless of the starting point. As a consequence, the accumulated evidence will be lower on average for error responses than for correct responses, resulting in fast errors. For a more elaborate explanation of both of these phenomena, the reader is referred to Ratcliff & Rouder (1998, Fig. 2). Thus, the full diffusion model includes parameters that specify across–trial variability in drift rate, η, and in starting point, sz . Furthermore, the model includes an across–trial variability parameter for non–decision time, st , to better account for the leading edge of response time distributions (e.g., Ratcliff & Tuerlinckx, 2002). EZ diffusion The EZ diffusion model presents the cognitive researcher with an alternative that does not require familiarity with complex fitting routines, nor does it require waiting a potentially long time on the model to estimate parameters from the data (Wagenmakers et al., 2007). All the researcher

Power simulations We conducted four sets of simulations. For every set, we generated 4,500 data sets from the full diffusion model. All of the data sets were intended to mimic a two–condition between–subjects experiment. In the first three sets of simulations, we varied one of the three main diffusion model parameters systematically between the two groups. The fourth set of simulations was identical to the first, except we varied the mean starting point parameter. The range of parameters we used were based on the distribution of observed diffusion model parameters, as reported in Matzke and Wagenmakers (2009). For Group 1, we sampled individual participant diffusion parameters from the following group distributions: ν ∼ N(0.223, 0.08)T(0, 0.586) a ∼ N(0.125, 0.025)T(0.056, 0.393) Ter ∼ N(0.435, 0.09)T(0.206, 0.942) z = bias × a η ∼ N(0.133, 0.06)T(0.05, 0.329) sz ∼ N(0.3 × a, 0.09 × a)T(0.05 × a, 0.9 × a) st ∼ N(0.183, 0.037)T(0, 0.95 × Ter ) The notation ∼ N(, ) indicates that values were drawn from a normal distribution with mean and standard deviation parameters given by the first and second number between parentheses, respectively. The notation T() indicates that the values sampled from the normal distribution were truncated between the first and second numbers in parentheses. Note that in the first three sets of simulations we fixed bias = 1 2 in both the simulations and the model fits, reflecting an unbiased process such as might be expected if the different

Psychon Bull Rev (2017) 24:547–556

boundaries indicate correct vs. incorrect responding. In the fourth set of simulations, we relaxed this assumption and varied bias according to bias ∼ N(0.5, 0.04)T(0.25, 0.75)

the EZ diffusion models. We recorded whether the obtained p–value was smaller than the traditional α of .05. Our analysis centers around the proportion of the 100 simulations for which the p-value was less than α. Results

For Group 2, all individual participant diffusion parameters were sampled from the same group distributions, except for either drift rate ν (sets 1 and 4), boundary separation a (set 2), or non-decision time Ter (set 3).3 For each parameter, we ran three different kinds of effect size simulations: a small, a medium, and a large effect size. Depending on the effect size, the Group 2 mean of the parameter of interest was larger than the Group 1 mean by 0.5, 0.8, or 1.2 within– group standard deviations for the small, the medium, and the large effect size, respectively. To illustrate for drift rate ν, depending on the simulation, we sampled individual participant diffusion parameters for Group 2 from the following group distributions:

The results of the drift rate ν, boundary separation a, and non-decision time Ter simulations (sets 1 to 3) are shown in Figs. 2, 3, and 4, respectively. In all plots, the y–axis plots the proportion of 100 simulations for which a t–test on the focal parameter of the two groups yielded a p 511KB Sizes 1 Downloads 4 Views

Author Summary

In models of decision making, evidence is accumulated until it crosses a threshold. The amount of evidence is directly related to the strength of the sensory input for the decision alternatives. Such one-stage models predict that if two stimulus alternatives are presented in succession, the stimulus alternative presented first dominates the decision, as the accumulated evidence will reach the threshold for this alternative first. Here, we show that for short stimulus durations decision making is not dominated by the first, but by the second stimulus. This result cannot be explained by classical one-stage decision models. We present a two-stage model where sensory input is first integrated before its outcome is fed into a classical decision process.

Citation: Rüter J, Marcille N, Sprekeler H, Gerstner W, Herzog MH (2012) Paradoxical Evidence Integration in Rapid Decision Processes. PLoS Comput Biol 8(2): e1002382.

Editor: Olaf Sporns, Indiana University, United States of America

Received: May 4, 2011 Accepted: December 23, 2011 Published: February 16, 2012

Copyright: © 2012 Rüter et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: JR is funded by the Swiss National Science Foundation (SNF) project No. 320030-119736 “The dynamics of feature fusion”. MHH and WG receive support from the SNF Sinergia project CRSIKO-122697 “State representation in reward based learning - from spiking neuron models to psychophysics”. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.


The authors would like to thank Isabel Gauthier for advice on using the NOMT.


This work was supported by a Clinical and Translational Research Enhancement Award from the Department of Pathology, Microbiology, and Immunology, Vanderbilt University Medical Center. JST and WRH were supported by National Science Foundation grant SES-1556415.

Availability of data and materials

All de-identified data are available on the Open Science Framework at



Thirty healthy subjects with normal or corrected to normal vision were recruited and completed one session of testing. Ten subjects were not invited for the rest of the experiment and excluded from further analyses since either they had difficulties in generating five distinct levels of speed stress. None of the analyses reported in the paper were used as exclusion criteria and were never performed on data from the first session alone. The remaining twenty subjects completed all five sessions of the experiment (13 females, age range: 18–28). All subjects signed informed consent and were compensated for their participation. The protocol was approved by the Georgia Institute of Technology Institutional Review Board. All methods were carried out in accordance with relevant guidelines and regulations.

Subjects performed an orientation discrimination task where they indicated whether a Gabor patch (radius = 4°) embedded in random pixel noise was tilted counterclockwise (“left”) or clockwise (“right”) from vertical. The orientation of the Gabor patch was ± 45° away from the vertical and was randomly determined on each trial. Immediately, after providing their response, subjects were given detailed feedback that indicated (i) whether their response was correct or wrong, (ii) the exact response time on the trial, (iii) the points gained or lost on the trial, and (iv) the total accumulated points until that point of the experiment (Fig. 2).

Each trial started with subjects fixating on a small white dot at the center of the screen for 1000 ms and was followed by a presentation of the Gabor patch for 33 ms. There was no deadline on the response but large RTs were penalized in all conditions except for the “extremely slow” condition. Feedback remained on the screen until the subject pressed a button on the keyboard to advance to the next trial.

The experiment included five different speed-accuracy tradeoff conditions—“extremely fast,” “fast,” “medium,” “slow,” and “extremely slow.” Each condition featured a different penalty on response time (RT) starting from no penalty at all in the “extremely slow” condition and increasing the size of the penalty for the other four conditions. Based on the accuracy and RT on each trial, subjects gained points based on the following formula:

where (Acc) is the accuracy on the current trial (1 for correct responses, 0 for wrong responses), (penalty_) and (adjustment_) are the penalty magnitude on RT and adjustment for the particular condition. The parameter (penalty_) was set to 4, 2, 1, 0.5, and 0, respectively for the five conditions with decreasing speed stress. To ensure that the five conditions resulted in relatively similar earnings for the subjects, we added the parameter (adjustment_) which added or subtracted a fixed number of points on each trial such that conditions with greater speed stress received the largest positive adjustments. Specifically, the value of (adjustment_) was set to + 1, + 0.3, 0, − 0.2, and − 0.5, respectively for the five conditions with decreasing speed stress. Finally, we wanted to prevent a strategy where a subject simple held the keyboard button pressed even before the stimulus appeared. Therefore, a response time of less than 34 ms (roughly corresponding to the duration of the stimulus presentation itself) was penalized by subtracting 3 points such trials were accompanied with a feedback screen that read “Looks like you held the button before the stimulus appeared” coupled with information about the 3 points lost and the total number of points so far. At the end of the experiment, every point was rewarded with 1 cent of bonus payment.


Subjects came for five different sessions each consisting of 1000 experimental trials. Before the start of the first session, subjects were given detailed instructions about the different conditions and their associated formulas for gaining points. It was specifically emphasized that the best strategy in the “extremely fast” condition was to respond as quickly as possible irrespective of accuracy. This instruction was given explicitly because the majority of subjects appeared unwilling to guess randomly for all trials of a given condition without being explicitly told that such a strategy was allowed. Before the start of sessions 2–5, subjects were briefly reminded of these instructions. Each session started with a short training which included one 25-trial block of each of the five conditions.

In each session, subjects completed 4 runs, each consisting of 5 blocks of 50 trials each. Each block consisted of a single condition, and each run included blocks from all five different conditions in a randomized order. At the beginning of each block, subjects were given the name of the condition for that block (“extremely fast,” “fast,” “medium,” “slow,” or “extremely slow”). In each block, we pseudo-randomly interleaved trials with five different Gabor patch contrasts such that each contrast was presented exactly 10 times. The contrasts were the same for each subject and were chosen based on pilot testing to produce a range of performance levels. The exact contrasts used were 4.98%, 6.39%, 8.21%, 10.54%, and 13.53% (the contrasts were chosen to be equally spaced in log space: they are equal to (e^< - 3>) , (e^< - 2.75>) , (e^< - 2.5>) , (e^< - 2.25>) , and (e^< - 2>) , respectively).


The experiment was designed in MATLAB environment using Psychtoolbox 3 52 . The stimuli were presented on a 21.5-inch iMac monitor (1920 × 1080 pixel resolution, 60 Hz refresh rate) in a dark room. Subjects were seated 60 cm away from the screen and provided their responses using a computer keyboard.

Behavioral analyses

We removed all trials with RTs shorter than 150 ms or longer than 1500 ms. This step resulted in removing an average of 2.3% of total trials (range 0.3–4.7% for each subject). Similar results were obtained if RT outliers were instead determined separately for each condition using Tukey’s interquartile criterion. We then computed, for each combination of SAT condition and Gabor patch contrast, the average difference between error and correct RTs ( (RT_ - RT_) ), the median RT for each condition, the ratio of the standard deviation and the mean RT ( (frac < ight)>> < ight)>>) ), and the skewness of the RT distribution. The skewness was computed as Pearson’s moment coefficient of skewness, which is equal to (frac < ight)^ <3>> ight]>> <>>) where (mu) and (sigma) are the mean and standard deviation of the distribution, respectively. In addition, we computed the signal detection theory parameter d′ that quantifies stimulus sensitivity in signal-to-noise units 53 . Specifically, d′ was computed by treating correctly judged clockwise orientations as hits and applying the formula:

where (Phi^< - 1>) is the inverse of the cumulative standard normal distribution that transforms HR and FAR into z-scores.

We quantified the quadratic trends of different curves produced by the five different SAT conditions by fitting a quadratic model (y = ax^ <2>+ bx + c) . The model was fit separately to each contrast of each subject. Group-level t-tests were then performed on the quadratic coefficients obtained for each subject. Statistical tests were based on two-sided one-sample t-tests, paired t-tests, and repeated measures ANOVAs.

Diffusion model analyses

We fit the diffusion model 10 to the data using both the hierarchical drift diffusion model (HDDM) python package 22 and the diffusion model analysis toolbox (DMAT) in MATLAB 24 . For both packages, all fits were done on the data for each subject (the hierarchical option of HDDM was not used). We performed two different sets of model fitting. In the first model fitting, we let the drift rate parameter vary for different contrasts and the boundary parameter vary for different SAT condition (and fixing all other parameters across contrasts and conditions). In the second model fitting, we let all diffusion model parameters other than the starting point (which was always fixed halfway between the two boundaries) vary with both Gabor patch contrast and SAT condition.

The first set of model fittings—which we refer to as the “constrained” fits—were designed to test whether the diffusion model can account for the patterns of the data using the selective influence assumption that stimulus difficulty should only affect the drift rate and that the SAT setting should only affect the boundary 10 . For these fits, we fit a different drift rate parameter for each of the five Gabor patch contrasts, and a different boundary parameter for each of the five SAT conditions. We fixed all other parameters to be the same across contrasts and SAT conditions. The non-decision time and three variability parameters were estimated as part of the model fitting, whereas the starting point was not treated as a free parameter but was fixed to half the value of the boundary. This parameterization resulted in 14 free parameters (5 drift rates, 5 boundary parameters, and 4 additional parameters that were the same across contrasts and conditions). To evaluate whether these “constrained” fits could account for the patterns observed in the data, once parameter fits were obtained, we generated simulated data based on the recovered parameters. For each subject, we simulated 1000 data points for each combination of contrast and SAT condition (for a total of 25,000 simulated data points per subject). We then analyzed these simulated data in the same way as the analyses of the original subject data.

It should be noted here that HDDM and DMAT treat the parameters for the starting point of the accumulation ( (z) ) and the starting point variability ( (s_) ) differently from each other. HDDM defines both of these parameters as a proportion of the value of the boundary parameter (a) . Therefore, when using HDDM, it is only possible to fix the ratios (frac) and (frac <<>>>) across conditions but not the actual values of (z) and (s_) as was originally proposed by Ratcliff and Rouder 15 when introducing the parameter (s_) . The issue of how parameters are fixed between conditions was moot for the starting point of the accumulation (parameter (z) ) since it was fixed to half-way between the two boundaries but was critical for the variability of the starting point (parameter (s_) ) as discussed in the Results section. On the other hand, DMAT fixes the raw values of the parameters (z) and (s_) across conditions. However, one problem that arises is that the parameter (s_) is constrained so that (z + s_ < a) . This is done to ensure that the accumulation always starts between the two boundaries and, indeed, it is arguably nonsensical to assume that the starting point of the accumulation can be beyond either one of the boundaries. However, the “extremely fast” condition in our experiment is naturally fit by a very small value of the boundary (a) . Because the parameter (s_) is constrained to be the same in all conditions (including the “extremely fast” condition), this led to (s_) always taking very small values thus minimizing the size of the starting point variability in all conditions. Since this resulted in bad model fits, in one set of simulations, we removed the constraint that (z + s_ < a) . We note that while this decision allowed for better model fits (specifically for the difference between error and correct RTs), it creates conceptual difficulties for the diffusion model regarding the plausibility of the accumulation starting beyond the two boundaries.

The second set of model fits allowed all diffusion model parameters to vary freely between conditions, and therefore we refer to them as “free” fits. We fit each combination of Gabor patch contrast and SAT setting independently from all other data in the experiment using six free parameters (for drift rate, boundary, non-decision time, drift rate variability, non-decision time variability, and starting point variability the starting point was again set halfway between the two boundaries and was not fit as a free parameter). Overall, across the 25 combinations of contrast x SAT setting, we estimated a total of 150 parameters for each subject. However, because performance in the “extremely fast” condition was at chance for many subjects, we only analyzed the parameter fits for the four SAT conditions where performance was above chance for all subjects. We employed repeated measures ANOVAs to test which diffusion model parameters varied as a function of contrast and SAT setting.

All fits were performed using the default features in both HDDM and DMAT. Specifically, the HDDM fits were performed using Bayesian inference, drawing 4000 samples from posterior using Markov Chain Monte Carlo (MCMC) technique and discarding the first 50 samples. To increase the robustness of the estimation, 5% of the data were considered as outliers and modeled using a uniform distribution not generated by a diffusion process 22 but choosing different values here did not substantially change the results. DMAT has a slight discrepancy between the parameter ranges allowed for the fitting procedure and the procedure for simulating data for a given set of parameters. Specifically, the allowable ranges are larger for two of the parameters during fitting, thus creating situations where DMAT cannot simulate data based on the recovered parameters. To avoid this issue, we altered the allowable ranges during the fitting to be the same as the allowable ranges for data simulation. Specifically, we set the absolute value of the estimated drift rate to lower than 0.5 and its variability to be lower than 0.3. The parameters obtained with these restrictions were typically identical with the ones obtained using the default ranges.

An Introduction to Bayesian Data Analysis for Cognitive Science

It has been long noticed that when we are faced with multiple choices that require an immediate decision, we can speed up the decision at the expense of accuracy and become more accurate at the expense of speed this is the so-called speed-accuracy trade-off (Wickelgren 1977) . The most popular class of models that can incorporate both response times and accuracy and give an account for the speed-accuracy trade-off is the class of sequential sampling models, which include the drift diffusion model (Ratcliff 1978) , the linear ballistic accumulator (Brown and Heathcote 2008) , and others for a review see Ratcliff et al. (2016) .

However, an alternative model that has been proposed in the past is Ollman’s simple fast guess model (Ollman 1966) . Although it has mostly fallen out of favor (but see Dutilh et al. 2011 for a more modern variant of this model) , it presents a very simple framework using finite mixture modeling that can also account for the speed-accuracy trade-off. In the next section, we’ll use this model to exemplify the use of finite mixtures to represent different cognitive processes.

20.1.1 A fast guess model account of the global motion detection task

One way to examine the behavior of human and primate subjects when faced with two-alternative forced choices is the detection of the global motion of a random dot kinematogram (Britten et al. 1993) . In this task, a participant sees a number of random dots on the screen from which a proportion of them move in a single direction (e.g., up) and the rest move in random directions. The participant’s goal is to estimate the overall direction of the movement. One of the reasons for the popularity of this task is that it permits the fine-tuning of the difficulty of trials (Dutilh et al. 2019) : The task is harder when the proportion of dots that move coherently (the level of coherence) is lower see Fig. 20.1.

FIGURE 20.1: Three levels of difficulty of the global motion detection task. The figures show a consistent upward movement with three levels of coherence (10%, 50%, and 100%). The participants see the dots moving in the direction indicated by the arrows. The participants do not see the arrows and all the dots look identical in the actual task. Adapted from Han et al. (2018) licensed under CC BY 4.0.

Ollman’s (1966) fast guess model assumes that the behavior in this task (and in any other choice task) is governed by two distinct cognitive processes: (i) a guess mode, and (ii) a task-engaged mode. In the guess mode, responses are fast and accuracy is at chance level. In the task-engaged mode, responses are slower and accuracy approaches 100%. This means that intermediate values of response times and accuracy can only be achieved by mixing responses from the two modes. Further assumptions of this model are that response times depend on the difficulty of the choice, and that the probability of being on one of the two states depend on the speed incentives during the instructions. (To simplify matters, we’ll ignore the possibility of the accuracy of the choice being also affected by the difficulty of the choice.) Dataset

We implement the assumptions behind Ollman’s fast guess model and examine its fit to data of a global motion detection task from Dutilh et al. (2019) .

The dataset from Dutilh et al. (2019) contains

2800 trials of each of the 20 subjects participating in a global motion detection task. There were two level of coherence yielding hard and easy trials ( diff ), and the trials where done under instructions that emphasized either accuracy or speed ( emphasis ).

We could imagine that if the fast guesses model would be true, we would see a bimodal distribution, when we plot a histogram of the data. Unfortunately, when two similar distributions are mixed, we won’t see any apparent bimodality:

However, another plot reveals that incorrect responses are generally faster, and this is especially true when the instructions emphasized accuracy: A very simple implementation of the fast guess model

The description of the model makes clear that an ideal participant that never guesses has a response time that depends on the difficulty of the trial. As we did in previous chapters, we assume that response times are log-normally distributed, and for simplicity we start by modeling the behavior of a single subject:

[egin rt_n sim LogNormal(alpha + eta cdot x_n, sigma) end]

In the previous equation, (x) is larger for difficult trials. If we center (x) , (alpha) represents the average logarithmic transformed response times for a participant engaged in the task, and (eta) is the effect of trial difficulty on log-response time. We assume a non-deterministic process, with a noise parameter (sigma) . See also Box 4.3 for more information about log-normally distributed response times.

Alternatively, a participant that guesses in every trial would show a response time that is independent of the difficulty of the trial:

[egin rt_n sim LogNormal(gamma, sigma_2) end]

Here (gamma) represents the the average logarithmic transformed response time when a participant only guesses. We assume that responses from the guess mode might have a different noise component than from a task-engaged mode.

The fast guess model makes the assumption that during a task, a single participant would behave in these two ways: They would be engaged in the task a proportion of the trials and would guess on the rest of the trials. This means that for a single participant, there is an underlying probability of being engaged in the task, (p_) , that determines whether they are actually choosing ( (z=1) ) or guessing ( (z=0) ):

The value of the parameter (z) in every trial determines the behavior of the participant. This means that the distribution that we observe is a mixture of the two distributions presented before:

[egin rt_n sim egin LogNormal(alpha + eta cdot x_n, sigma), & ext < if >z_n =1 LogNormal(gamma, sigma_2), & ext < if >z_n=0 end ag <20.1>end]

In order to have a Bayesian implementation, we also need to define some priors. We use priors that encode what we know about reaction time experiments see also 4.2.

[egin egin alpha &sim Normal(6, 1) eta &sim Normal(0, .1) sigma &sim Normal_+(.5, .2) end end]

For now, we don’t commit to any value for the probability of having an engaged response by setting the following prior to (p_) :

This represents a flat prior over probabilities, (p_) is equally likely to be any number between 0 and 1.

Before we fit our model to the real data, we generate synthetic data to make sure that our model is working as expected. We follow Cook, Gelman, and Rubin (2006) , and for now we are going to verify that our model is roughly correct (a more thorough approach is presented in Talts et al. 2018b and Daniel J Schad, Betancourt, and Vasishth 2020) . We are going to generate 1000 observations, where we know the true values of the parameters.

We first define the number of observations, predictors, and true values. We assume 1000 observations and two levels of difficulty -.5 and .5. The values of the parameters are relatively realistic (based on our previous experience on reaction time experiments). Although in the priors we try to encode the range of possible values for the parameters, in this simulation we assume only one instance of this possible range:

We verify that our simulated data is realistic, that is, it’s on the same range as the original data:

To implement the mixture model defined in Eq. (3.8) in Stan, the discrete parameter (z) needs to be marginalized:

[egin egin p(rt_n | Theta) &= p_ cdot LogNormal(rt_n | alpha + eta cdot x_n, sigma) + & (1 - p_) cdot LogNormal(rt_n | gamma, sigma_2) end end]

In addition, we need to work in log-space, taking into account that Stan defines log(PDF) rather than PDFs: [egin egin log(p(rt | Theta)) &= log(p_ cdot LogNormal(rt_n | alpha + eta * x_n, sigma) + & (1 - p_) cdot LogNormal(rt_n | gamma, sigma_2)) &= log( exp( log(p_) + log(LogNormal(rt_n | alpha + eta * x_n, sigma))) + & exp( log(1 - p_) + log(LogNormal(rt_n | gamma, sigma_2)))) end end]

In Stan this translates into:

In the previous code, we use log_sum_exp(x, y) and log1m(x) since they are more computationally stable than log(exp(x) + exp(y)) and log(1-x) respectively. That is, they are less prone to numerical over/underflows.

Call the Stan model mixture1.stan , and fit it to the simulated data:

There a lot of warnings, the Rhats are too large, and number of effective samples is too low:

A traceplot shows clearly that the chains aren’t mixing.

The problem with this model is that the mixture components are underlyingly exchangeable and thus not identifiable. Each chain doesn’t know how each component was identified by the rest of the chains. A major problem is that even though the theoretical model assumes that guesses are faster than engaged responses, this is not explicit in our computational model. That is, our model lacks some of the theoretical information that we have, namely that the distribution of guesses times is faster than the distribution of engaged reaction times. This can be encoded with a strong prior for (gamma) , where we assume that its prior distribution is truncated on an upper bound by the value of (alpha) :

[egin gamma sim Normal(6, 1), ext gamma < alpha end]

Another softer constraint that we could add to our implementation is the assumption that participants are generally trying to do the task more likely than just guessing. The following prior has more probability mass closer to 1 than to 0:

Once we change the higher boundary of gamma we also need to truncate the distribution in Stan by correcting the PDF with its CDF (rather than with the complement of the CDF as when we have a lower truncation) see also Box 4.1.

Fit it to the same dataset:

Now summaries and plots look fine. A multivariate implementation of the model

A problem with the previous implementation of the fast guess model is that we don’t use the accuracy information. We can implement a closer version of the verbal description of the model: In particular we also want to model that in the guess mode accuracy is at chance level and that during the task-engaged mode accuracy approaches 100%.

This means that the mixture affects two pairs of distributions:

A response time distribution

[egin rt_n sim egin LogNormal(alpha + eta cdot x_n, sigma), & ext < if >z_n =1 LogNormal(gamma, sigma_2), & ext < if >z_n=0 end ag <20.2>end]

and an accuracy distribution

We have a new parameter (p_) , which represent the probability of making a correct answer in the engaged mode. The verbal description says that it’s closer to 100%, and here we have freedom to choose whatever prior represents for us close to 100%. We interpret this as follows, but this is not a hard constraint, and if a participant consistently shows lower (or higher) accuracy, (p_) will change:

In our simulated data, we assume that the global motion detection task is done by a very accurate participant, with an accuracy of 99.9%. .

We plot again our simulated data, and this time we can see the effect of task difficulty on the simulated response times and accuracy:

We need now to marginalize the discrete parameters from both pairs of distributions.

[egin egin p(rt, acc | Theta) = & p_ cdot & LogNormal(rt_n | alpha + eta cdot x_n, sigma) cdot & Bernoulli(acc_n | p_) & + & (1 - p_) cdot & LogNormal(rt_n | gamma, sigma_2) cdot & Bernoulli(acc_n | .5) end end]

[egin egin log(p(rt, acc | Theta)) = log(exp(& & log(p_) + &log(LogNormal(rt_n | alpha + eta * x_n, sigma)) + &log(Bernoulli(acc_n | p_))) +& exp(& & log(1 - p_) + & log(LogNormal(rt_n |gamma, sigma_2)) + & log(Bernoulli(acc_n | .5))) )& end end]

Our model translates into the following Stan code:

Save it as mixture3.stan and fit it to also accuracy:

We see that our model can be fit to both response times and accuracy now, and its parameters estimates have sensible values (given our simulated data).

Before we extend this model hierarchically we will account for the instructions given to the participant in the next section. An implementation of the model that accounts for instructions

The actual global motion detection experiment that we started from has another manipulation that can help us to evaluate better the fast guess model. In some trials, the instructions emphasized accuracy (e.g., “Be as accurate as possible.”) and in others speed (e.g., “Be as fast as possible.”). The fast guess model also assumes that the probability of being in one of the two states depend on the speed incentives given during the instructions. This entails that now (p_) depends on the instructions (x_2) , where we encode a speed incentive with (-.5) and an accuracy incentive with (.5) . Essentially we need to fit the following regression:

As we did in, we need to bound the previous regression between 0 and 1, we achieve this using the logistic or inverse logit function:

This means that we need to interpret (alpha_ + x_2 cdot eta_) in log-odds bounded by ((-infty, infty)) rather than as a probability see also in the previous chapter.

The likelihood defined before in remains the same, and the only further change in our model is that rather than a prior on (p_) we need now priors for (alpha_) and (eta_) .

For (eta_) , we assume an effect that can rather large and we won’t assume a direction a prior (for now):

This means that the participant could be affected by the instructions in the expected way with better accuracy in the task when the instructions emphasize accuracy ( (eta >0) ), or the participant might be behaving in an unexpected way with accuracy degrading when accuracy is emphasized ( (eta <0) ) (eta <0) could represent a participant that misunderstands the instructions. It’s certainly possible to include priors that encode the expected direction of the effect instead.

How can we choose a prior for (alpha_) that encodes the same information that we had in the previous model in (p_) ? One possibility is to create an auxiliary parameter (p_) , that represents the baseline probability of being engaged in the task, with the same prior that we use in the previous section, and then transform it to an unconstrained space for our regression with the logit function:

To verify that our priors make sense, we plot the difference in prior predicted probability of being engaged in the task under the two emphasis conditions:

The previous plot shows that we are predicting a priori that the difference in (p_) will be mostly smaller than (.3) , which seems to make sense.

We are ready to generate a new dataset, by deciding on true values for (eta_) and (p_) .

We can generate a plot now were both the difficulty of the task and the instructions are manipulated:

For the Stan implementation, we added a generated quantities plots that can be used for further (prior or posterior) predictive checks. We use the dummy variable onlyprior to indicate whether we use the data or we only sample from the priors. One can always do the predictive checks in R, transforming the code that we wrote for the simulation into a function, and writing the priors in R. However, it can be simpler to take advantage of Stan output format and rewrite the code in Stan. One downside of this, is that the stanfit object that stores the model output can become too large for the memory of the computer.

In the Stan code shown above, log_inv_logit(x) is applying the logistic function to x to transform it in a probability and then applying the logarithm log1m_inv_logit(x) is applying the logistic function to x , and then applying the logarithm to its complement ((1 - p)) .

We save the code as mixture4.stan , and before fitting it to the simulated data, we perform prior predictive checks. Prior predictive checks of the fast guess model

We generate prior predictive distributions, by setting only prior to 1 .

We plot prior predictive distributions of response times as follows. We are plotting them again our simulated data, by setting y = rt in ppc_dens_overlay , this distribution can be thought as a hand-picked instance of the prior predictive distribution.

We see that we tend to generate some responses that are too large, but the general shape of the predictive distribution of the response times is fine.

If we want to plot the prior predicted distribution of differences in response time conditioning on task difficulty, we need to define a new function. Then we use the bayesplot function ppc_stat() that takes as an argument of stat any summary function.

We find that the range of response times look reasonable. There are, however, always more checks that can be done, for example, plotting other summary statistics, or predictions conditioned on other aspects of the data. Fit to simulated data

Fit it to data, by setting onlyprior = 0 :

We see that we fit the model without problems. Before we evaluate the recovery of the parameters more carefully, we implement a hierarchical version of the fast guesses model. A hierarchical implementation of the fast guesses model

So far we have evaluated the behavior of one simulated participant. As we discussed in 5.1.6 in the context of distributional regression models, every parameter in a model can be made hierarchical in a straight-forward way. This doesn’t mean, however, that we are going to be able to estimate those parameters or that our model will converge. The best advice here is to start simple with simulated data. Despite the fact that convergence with simulated data does not guarantee the convergence of the same model with real data, the reverse is in general true.

For our hierarchical version, we assume that both response times in general and the effect of task difficulty vary by participant, and that different participants have different guess times. This entails the following change to the response time distribution:

[egin rt_n sim egin LogNormal(alpha + u_ + x_n cdot (eta + u_), sigma), & ext < if >z_n =1 LogNormal(gamma + u_, sigma_2), & ext < if >z_n=0 end end]

We assume that the three vectors of (u) (adjustment to the intercept and slope of the task-engaged distribution, and the adjustment to the guess time distribution) follow a multinormal distribution centered in zero. For simplicity and lack of more information, we assume the same prior distribution to the three variance components and the same prior for the two correlation between the adjustments ( ( ho_>, ho_>, ho_>) ):

Before we fit the model to the real dataset, we simulate data again this time we simulate 100 trials of each of 20 subjects.

We verify that the distribution of the simulated response times conditional on the simulated accuracy and the experimental manipulations make sense with the following plot:

We implement the model in Stan as follows. The hierarchical extension uses the Cholesky factorization for the group-level effects (as we did in 11.1.3).

Save it as mixtureh.stan and fit it to the simulated data:

We see that we can fit the hierarchical extension of our model to simulated data. Next we’ll evaluate whether we can recover the true values of the parameters. Recovery of the parameters

By “recovering” the true values of the parameters, we mean that the true values are somewhere inside the bulk of the posterior distribution of the model.

We use mcmc_recover_hist to compare the posterior distributions of the relevant parameters of the model with their true values.

The model seems to be underestimating the probability of being correct of the participants ( p_correct ) and overestimating the probability of being engaged in the task ( p_btask ). However, the numerical differences are very small. We can be relatively certain that the model is not seriously mis-specified, but a more principled approach using simulation based calibration is presented in Talts et al. (2018b) and Daniel J Schad, Betancourt, and Vasishth (2020) . Fitting the model to real data

After verifying that our model works as expected, we are ready to fit it to real data. We code the predictors (x) and (x_2) as we did for the simulated data:

The main obstacle now is that fitting the entire dataset takes around 12 hours! If you want to get a taste of the fit of the model, you can sample 150 observations (from the

2800) of each subject as follows:

The complete model is fit as follows.

What can we say about the fit of the model? Our success fitting the fast guess model to real data doesn’t imply that the model is a good account of the data. It just means that it’s flexible enough. Under the assumption that this model is true, we can look at the parameters and conclude the following:

  • Participants seemed to have a very high accuracy once they were engaged in the task. ( p_correct is very high).
  • The instructions seemed to have a very strong effect on the mode of the participants ( beta_task is very high).
  • The guess mode seemed to be much noisier than the task-engaged mode (compare sigma with sigma2 ).
  • Difference between participants ( tau parameters) seem modest in comparison with the effect of the experimental manipulation ( beta ).
  • Slow participants seemed to show a stronger effect of the experimental manipulation ( rho_u1[1,2] is mostly positive).

If we want to know whether our model achieves descriptive adequacy, we need to look at the posterior predictive distributions of the model. However, by using posterior predictive checks, we won’t be able to conclude that our model is not overfitting. Posterior predictive checks

For the posterior predictive checks, we can write the generated quantities block in a new file. The advantage is that we can generate as many observations as needed after estimating the parameters. There is no model block in the follow Stan program.

Save the file as mixtureh_gen.stan , and generate 500 simulated datasets as follows:

We first take a look at the general distribution of response times generated by the posterior predictive model and by our real data in Figure 20.2.

FIGURE 20.2: Posterior predictive distribution of the hierarchical fast guess model in comparision with the observed response times.

We see that the distribution of the observed response times has heavier tails than the predictive distribution. This means that somewhere in our model we are failing to account for response time variability.

Next we examine the effect of the experimental manipulation in Figure 20.3: The model is underestimating the effect of the experimental manipulation and the observed difference between response times is well outside the bulk of the predictive distribution.

FIGURE 20.3: Posterior predictive distribution of the difference in response time due to the experimental manipulation.

We also look at some instances of the predictive distribution. Figure ?? shows to simulated datasets in red overlaid to the real observations in black. As we noticed in Figure 20.2, the model is predicting less variability than what we find in the data.

FIGURE 20.4: Two simulated datasets in red overlaid to the observations in black.

FIGURE 20.5: Two simulated datasets in red overlaid to the observations in black.


Britten, Kenneth H., Michael N. Shadlen, William T. Newsome, and J. Anthony Movshon. 1993. “Responses of Neurons in Macaque Mt to Stochastic Motion Signals.” Visual Neuroscience 10 (6). Cambridge University Press: 1157–69.

Brown, Scott D., and Andrew Heathcote. 2008. “The Simplest Complete Model of Choice Response Time: Linear Ballistic Accumulation.” Cognitive Psychology 57 (3): 153–78.

Cook, Samantha R, Andrew Gelman, and Donald B Rubin. 2006. “Validation of Software for Bayesian Models Using Posterior Quantiles.” Journal of Computational and Graphical Statistics 15 (3). Taylor & Francis: 675–92.

Dutilh, Gilles, Jeffrey Annis, Scott D Brown, Peter Cassey, Nathan J Evans, Raoul PPP Grasman, Guy E Hawkins, et al. 2019. “The Quality of Response Time Data Inference: A Blinded, Collaborative Assessment of the Validity of Cognitive Models.” Psychonomic Bulletin & Review 26 (4). Springer: 1051–69.

Dutilh, Gilles, Eric-Jan Wagenmakers, Ingmar Visser, and Han L. J. van der Maas. 2011. “A Phase Transition Model for the Speed-Accuracy Trade-Off in Response Time Experiments.” Cognitive Science 35 (2): 211–50.

Han, Ding, Jana Wegrzyn, Hua Bi, Ruihua Wei, Bin Zhang, and Xiaorong Li. 2018. “Practice Makes the Deficiency of Global Motion Detection in People with Pattern-Related Visual Stress More Apparent.” PLOS ONE 13 (2). Public Library of Science: 1–13.

Ollman, Robert. 1966. “Fast Guesses in Choice Reaction Time.” Psychonomic Science 6 (4). Springer: 155–56.

Ratcliff, Roger. 1978. “A Theory of Memory Retrieval.” Psychological Review 85 (2). American Psychological Association: 59.

Ratcliff, Roger, Philip L. Smith, Scott D. Brown, and Gail McKoon. 2016. “Diffusion Decision Model: Current Issues and History.” Trends in Cognitive Sciences 20 (4): 260–81.

Schad, Daniel J, Michael Betancourt, and Shravan Vasishth. 2020. “Toward a Principled Bayesian Workflow in Cognitive Science.” Psychological Methods 26 (1). American Psychological Association: 103–26.

Talts, Sean, Michael Betancourt, Daniel Simpson, Aki Vehtari, and Andrew Gelman. 2018b. “Validating Bayesian Inference Algorithms with Simulation-Based Calibration.” arXiv Preprint arXiv:1804.06788.

Wickelgren, Wayne A. 1977. “Speed-Accuracy Tradeoff and Information Processing Dynamics.” Acta Psychologica 41 (1): 67–85.