We are searching data for your request:
Upon completion, a link will appear to access the found materials.
I am looking for an implementation of the linear ballistic accumulator model or Ratcliff's diffusion model (e.g. in R, MATLAB, or Python).
Here are a few options. I have not tried them yet personally.
As mentioned below, the rtdists package in R is able to fit both LBA and diffusion models.
Scott Brown has a copy of Donkin et al (2009) on his web page with some code in R, Excel, and WinBUGS for fitting the LBA model:
There's also the
glbapackage on CRAN by Ingmar Visser.
The Diffussion model is available as a matlab toolbox called (DMAT).
- Donkin, C., Averell, L., Brown, S.D., & Heathcote, A. (2009) Getting more from accuracy and response time data: Methods for fitting the Linear Ballistic Accumulator model. Behavior Research Methods, 41, 1095-1110. PDF
- Vandekerckhove, J., & Tuerlinckx, F. (2008). Diffusion model analysis with MATLAB: A DMAT primer. Behavior Research Methods, 40, 61-72. doi:10.3758/BRM.40.1.61 PDF
For the diffusion model, there is also Eric-Jan Wagenmakers' "EZ-diffusion model", which you can find here.
This paper compares three different pieces of software for estimation of diffusion model parameters:
von Ravenzwaaij D., & Oberauer, K. (2009). How to use the diffusion model: Parameter recovery of three methods: EZ, fast-dm, and DMAT. Journal of Mathematical Psychology, 53 (6), 463-473. [PDF]
The R package diffIRT (http://www.dylanmolenaar.nl/jss1265.pdf) estimates both the Q and the D diffusion models (see his website for the van der Maas et al. paper discussing the differences between these models). R code for the EZ2 approach, which is much faster if that is important for your applications, is http://raoul.socsci.uva.nl/EZ2/.
In the subheading you also mention that you're interested in matlab / python implementations:
I've personally used DMAT in matlab at that's a nice package. However, the python based HDDM package may be one of the best around at the moment (in my opinion) and it has a good user guide.
and the paper associated with the package:
Wieki et al (2013): http://journal.frontiersin.org/article/10.3389/fninf.2013.00014/full
- Wiecki, T. V., Sofer, I., & Frank, M. J. (2013). HDDM: hierarchical bayesian estimation of the drift-diffusion model in python. Frontiers in neuroinformatics, 7, 14. http://journal.frontiersin.org/article/10.3389/fninf.2013.00014/full
The R package
rtdists is another great option:
Provides response time distributions (density/PDF, distribution function/CDF, quantile function, and random generation): (a) Ratcliff diffusion model based on C code by Andreas and Jochen Voss and (b) linear ballistic accumulator (LBA) with different distributions underlying the drift rate.
Sequential sampling models (SSMs) () have established themselves as the de-facto standard for modeling reaction-time data from simple two-alternative forced choice decision making tasks (). Each decision is modeled as an accumulation of noisy information indicative of one choice or the other, with sequential evaluation of the accumulated evidence at each time step. Once this evidence crosses a threshold, the corresponding response is executed. This simple assumption about the underlying psychological process has the appealing property of reproducing not only choice probabilities, but the full distribution of response times for each of the two choices. Models of this class have been used successfully in mathematical psychology since the 60s and more recently adopted in cognitive neuroscience investigations. These studies are typically interested in neural mechanisms associated with the accumulation process or for regulating the decision threshold (see e.g. , , ). One issue in such model-based cognitive neuroscience approaches is that the trial numbers in each condition are often low, making it difficult it difficult to estimate model parameters. For example, studies with patient populations, especially if combined with intraoperative recordings, typically have substantial constraints on the duration of the task. Similarly, model-based fMRI or EEG studies are often interested not in static model parameters, but how these dynamically vary with trial-by-trial variations in recorded brain activity. Efficient and reliable estimation methods that take advantage of the full statistical structure available in the data across subjects and conditions are critical to the success of these endeavors.
Bayesian data analytic methods are quickly gaining popularity in the cognitive sciences because of their many desirable properties (, ). First, Bayesian methods allow inference of the full posterior distribution of each parameter, thus quantifying uncertainty in their estimation, rather than simply provide their most likely value. Second, hierarchical modeling is naturally formulated in a Bayesian framework. Traditionally, psychological models either assume subjects are completely independent of each other, fitting models separately to each individual, or that all subjects are the same, fitting models to the group as if they are all copies of some “average subject”. Both approaches are sub-optimal in that the former fails to capitalize on statistic strength offered by the degree to which subjects are similar in one or more model parameters, whereas the latter approach fails to account for the differences among subjects, and hence could lead to a situation where the estimated model cannot fit any individual subject. The same limitations apply to current DDM software packages such as DMAT  and fast-dm . Hierarchical Bayesian methods provide a remedy for this problem by allowing group and subject parameters to be estimated simultaneously at different hierarchical levels (, , ). Subject parameters are assumed to be drawn from a group distribution, and to the degree that subject are similar to each other, the variance in the group distribution will be estimated to be small, which reciprocally has a greater influence on constraining parameter estimates of any individual. Even in this scenario, the method still allows the posterior for any given individual subject to differ substantially from that of the rest of the group given sufficient data to overwhelm the group prior. Thus the method capitalizes on statistical strength shared across the individuals, and can do so to different degrees even within the same sample and model, depending on the extent to which subjects are similar to each other in one parameter vs. another. In the DDM for example, it may be the case that there is relatively little variability across subjects in the perceptual time for stimulus encoding, quantified by the “non-decision time” but more variability in their degree of response caution, quantified by the “decision threshold”. The estimation should be able to capitalize on this structure so that the non-decision time in any given subject is anchored by that of the group, potentially allowing for more efficient estimation of that subjects decision threshold. This approach may be particularly helpful when relatively few trials per condition are available for each subject, and when incorporating noisy trial-by-trial neural data into the estimation of DDM parameters.
HDDM is an open-source software package written in Python which allows (i) the flexible construction of hierarchical Bayesian drift diffusion models and (ii) the estimation of its posterior parameter distributions via PyMC (). User-defined models can be created via a simple python script or be used interactively via, for example, IPython interpreter shell (:cite:PER-GRA2007). All run-time critical functions are coded in Cython () and compiled natively for speed which allows estimation of complex models in minutes. HDDM includes many commonly used statistics and plotting functionality generally used to assess model fit. The code is released under the permissive BSD 3-clause license, test-covered to assure correct behavior and well documented. Finally, HDDM allows flexible estimation of trial-by-trial regressions where an external measurement (e.g. brain activity as measured by fMRI) is correlated with one or more decision making parameters.
With HDDM we aim to provide a user-friendly but powerful tool that can be used by experimentalists to construct and fit complex, user-specified models using state-of-the-art estimation methods to test their hypotheses. The purpose of this report is to introduce the toolbox and provide a tutorial for how to employ it subsequent reports will quantitatively characterize its success in recovering model parameters and advantages relative to non-hierarchical or non-Bayesian methods as a function of the number of subjects and trials (:cite: SoferWieckiFrank ).
We have presented a design principle for how decision-making should be implemented in the brain, and briefly summarised supporting evidence specifically we propose that decision-making in threshold-based systems should compromise between speed and accuracy of decision-making by manipulation of baseline activation in decision-making neural populations, rather than a manipulation of thresholds, in order to implement stereotypical decisions under varying speed-accuracy tradeoffs. This could be formalised as an optimality argument, that decision-making systems should minimise variability in decision implementation, across decision scenarios such optimality arguments are commonplace in behavioural disciplines such as behavioural ecology, where their predictions are tested against empirical data, and any disagreement used to refine the theory . In applying this normative approach from evolutionary biology to models of neuroscience, we hope to make a modest contribution to the programme of reconciling functional and mechanistic explanations of behaviour . One potential limitation of our analysis is that equivalence of changes in threshold and changes in baseline activation has only formally been demonstrated for linear models. Real neural systems are typically non-linear, but we argue that even though the aforementioned equivalence does not hold for certain important non-linear models, the principle of maintaining a consistent threshold and varying baseline activation, even if the decision-dynamics are changed as a result and this needs to be compensated for by the neural mechanisms, remains an important one that we should expect to see realised the neurophysiological evidence supporting this hypothesis, reviewed above, supports this view.
We suggest that our principle is not specific but should be applicable to any response system. Decision-making takes place at many different levels of brain processing, and while more complex decision-related motor sequences undoubtedly can be affected by decision-task difficulty, we believe our principle should also hold at the most fundamental levels of action selection in the brain. Even the conceptually simplest decision-making mechanisms, such as the race model  can be expressed as accumulator models. Accumulators are also likely to be involved in more complex decision-making processes the basal-ganglia have been demonstrated to be involved in action selection, mediating access to motor control by different competing brain regions. A biologically-plausible mathematical model of the basal-ganglia has been proposed that is able to implement statistically optimal decision-making over multiple alternatives . As with the accumulator models outlined above, this model is based on decision-making populations that must reach a threshold in order to precipitate the corresponding action, and this threshold may be adjusted to compromise between the speed and accuracy of decision-making. There is an interesting difference however that, in this model, the output nuclei of the basal ganglia must fall below an activation threshold in order for the corresponding action to be taken. However the principle is the same, that in order for consistency of decision-implementation we would expect this threshold to remain constant therefore we would predict that accurate decisions should be implemented by raised baseline activation levels of decision-making neural populations in the basal ganglia, while fast decisions should be implemented by lower baseline activation. Bogacz et al.  review four main theories of how speed-accuracy trade-offs can be managed in the cortico-basal ganglia circuit, and note that three involve a change in activation of some part of the circuit, whether striatum , cortical integrators –, or subthalamic nucleus , while none modifies threshold of the output nuclei. We suggest that it could be of interest to interpret not only models but also other data already extant, or yet to be generated, in terms of the proposal we have made here for how consistent decision-implementation should be achieved.
Understanding the neuro-cognitive mechanisms underlying decision-making and reinforcement learning[1–3] has potential implications for many neurological and psychiatric disorders associated with maladaptive choice behavior[4–6]. Modeling work in value-based decision-making and reinforcement learning often relies on simple logistic (softmax) functions[7,8] to link model-based decision values to observed choices. In contrast, in perceptual decision-making, sequential sampling models such as the drift diffusion model (DDM) that not only account for the observed choices but also for the full response time (RT) distributions have a long tradition[9–11]. Recent work in reinforcement learning[12–15], inter-temporal choice[16,17] and value-based choice[18–21] has shown that sequential sampling models can be successfully applied in these domains.
In the DDM, decisions arise from a noisy evidence accumulation process that terminates as the accumulated evidence reaches one of two response boundaries. In its simplest form, the DDM has four free parameters: the boundary separation parameter α governs how much evidence is required before committing to a decision. The upper boundary corresponds to the case when the accumulated evidence exceeds α, whereas the lower boundary corresponds to the case when the accumulated evidence exceeds zero. The drift rate parameter v determines the mean rate of evidence accumulation. A greater drift rate reflects a greater rate of evidence accumulation and thus faster and more accurate responding. In contrast, a drift rate of zero would indicate chance level performance, as the evidence accumulation process would have an equal likelihood of terminating at the upper or lower boundaries (for a neutral bias). The starting point or bias parameter z determines the starting point of the evidence accumulation process in units of the boundary separation, and the non-decision time τ reflects components of the RT related to stimulus encoding and/or response preparation that are unrelated to the evidence accumulation process. The DDM can account for a wide range of experimental effects on RT distributions during two-alternative forced choice tasks.
The application of sequential sampling models such as the DDM has several potential advantages over traditional softmax choice rules. First, including RT data during model estimation may improve both the reliability of the estimated parameters and parameter recovery, thereby leading to more robust estimates. Second, taking into account the full RT distributions can reveal additional information regarding the dynamics of decision processes[14,15]. This is of potential interest, in particular in the context of maladaptive behaviors in clinical populations[14,22–25] but also when the goal is to more fully account for how decisions arise on a neural level.
In the present case study, we focus on a brain region that has long been implicated in decision-making, reward-based learning and impulse regulation[26,27], the ventromedial prefrontal / medial orbitofrontal cortex (vmPFC/mOFC). Performance impairments on the Iowa Gambling Task are well replicated in vmPFC/mOFC patients[26,28,29]. Damage to vmPFC/mOFC also increases temporal discounting[30,31] (but see) and risk-taking[33–35], impairs reward-based learning[36–38] and has been linked to inconsistent choice behavior[39–41]. Meta-analyses of functional neuroimaging studies strongly implicate this region in reward valuation[42,43]. Based on these observations, we reasoned that vmPFC/mOFC damage might also render RTs during decision-making less dependent on value. In the context of the DDM, this could be reflected in changes in the value-dependency of the drift rate v. In contrast, more general impairments in the processing of decision options, response execution and/or preparation would be reflected in changes in the non-decision time. Interestingly, however, one previous model-free analysis in vmPFC/mOFC patients revealed a similar modulation of RTs by value in patients and controls.
The present study therefore had the following aims. The first aim was a validation of the applicability of the DDM as a choice rule in the context of inter-temporal and risky choice. To this end, we first performed a model comparison of variants of the DDM in a data set of nine vmPFC/mOFC lesion patients and nineteen controls. Since recent work on reinforcement learning suggested that the mapping from value differences to trial-wise drift rates might be non-linear rather than linear, we compared these different variants of the DDM in our data and ran posterior predictive checks on the winning DDM models to explore the degree to which the different models could account for RT distributions and the relationship between RTs and subjective value. Second, we re-analyzed previously published temporal discounting data in controls and vmPFC/mOFC lesion patients to examine the degree to which our previously reported model-free analyses could be reproduced using a hierarchical Bayesian model-based analysis with the DDM as the choice rule. Third, we used the same modeling framework to analyze previously unpublished data from a risky decision-making task in the same lesion patients and controls to examine whether risk taking in the absence of a learning requirement is increased following vmPFC/mOFC damage. Finally, we explored changes in choice dynamics as revealed by DDM parameters as a result of vmPFC/mOFC lesions, and investigated whether lesions to vmPFC/mPFC impacted the degree to which RTs were sensitive to subjective value differences, both by examining DDM parameters and via DDM mixture models.
RT Distributions from One and Many Accumulators.
We began by identifying the conditions under which an individual accumulator model (n = 1) and a large-ensemble accumulator model (n = 1,000) predict RT distributions with similar shapes, defined as overlapping 95% confidence intervals over all five RT quintiles (0.1, 0.3, 0.5, 0.7, 0.9). We observed that an individual accumulator model and a large ensemble accumulator model predict RT distributions with virtually indistinguishable shapes if accumulation rates are at least moderately correlated (rv ≥ 0.6) with intermediate termination rules. Much higher rate correlations (rv ≥ 0.9) are necessary under extreme termination rules (Fig. 2). Similar results were obtained under a pooling mechanism (Fig. 2, rightmost column). Thus, RT distributions can be explained both by an individual model accumulator and by accumulating activity of large neuronal ensembles only if their activation dynamics are moderately correlated and RT is not governed by extremely fast or slow accumulators.
Predicted RT distributions as a function of ensemble size (N), termination rule (pN), and accumulation rate correlation (rv). Each panel shows the 0.1, 0.3, 0.5, 0.7, and 0.9 RT quantiles on a log-log scale (the x axis ranges from 10 0 to 10 3 the y axis ranges from 10 2 to 10 3 ) as a function of N, pN, and rv vary across columns and rows, respectively. We identified conditions (pN and rv) under which RT distributions were (i) invariant over the entire interval of N (i.e., 1,1,000 white panels], (ii) invariant with N over the interval (10,1,000 light gray panels), (iii) invariant with N over the interval (100,1,000 medium gray panels), and (iv) not invariant with N (dark gray panels).
RT Distributions Over a Range of Accumulator Ensemble Sizes.
We also investigated the invariance of RT distributions over a range of ensemble sizes to determine whether RTs may be invariant once some critical ensemble size is reached. Knowing that the same RT distributions are predicted whether an ensemble has 10 accumulators or 1,000 accumulators or more provides important insights into the properties of ensemble dynamics. It may be that ensembles need to be “large enough” but that the precise size of the ensemble has little effect on the RT that the ensemble generates.
Extending the analysis above, we investigated how RT distributions scale with accumulator ensemble size. We identified conditions under which a small-ensemble model (n = 10) and an intermediate-ensemble model (n = 100) predict RT distributions with similar shapes as a large-ensemble model (n = 1,000). RT distributions were invariant across ensembles with at least 10 accumulators if accumulation rates were at least modestly correlated (rv ≥ 0.3) and termination rules avoided the extremes (10% ≤ pN ≤ 90%). RT distributions were invariant across larger ensembles (n ≥ 100) with even lower rate correlations (rv ≥ 0.1). Only if accumulation rates were uncorrelated (rv = 0.0) or termination rules were extreme (pN = first and pN = last) did RT distributions vary dramatically in scale and shape with ensemble size (Fig. 2). Similar findings were observed when RT was determined by the pooling termination mechanism (Fig. 2, rightmost column) and with other accumulator model variants we investigated (SI Text, Robustness of Findings).
Variability in RT remains remarkably constant across different ensemble sizes over a large proportion of parameter space. Only for uncorrelated accumulators and extreme termination rules (pN = first or pN = last) does ensemble size affect RT variability, a lack of invariance anticipated by extreme value statistics. By analogy to the central limit theorem, we can perhaps anticipate why median RT remains invariant with ensemble size. However, there is no single mathematical property that might allow us to anticipate why variability in RT is invariant with ensemble size across correlated samples and intermediate termination rules, so we need to rely on simulation. To begin with, we know that for pN = first, variability decreases with ensemble size, and for pN = last, variability increases with ensemble size. So at some point in the range of termination rules we might expect an invariance of variability with ensemble size. What is striking is that this invariance is observed across all of the intermediate termination rules we investigated, not just a single value of termination rule. Also, for small ensemble sizes, variability is largely dominated by sampling variability across those few accumulators, and low correlations between accumulator rates may have only a small influence on the predicted variability from trial to trial. By contrast, for large ensemble sizes, variability is largely dominated by the between-trial variability introduced by the correlation between accumulator rates. These counteracting effects of ensemble size and correlation largely cancel each other out, producing invariance in RT distributions over a range of model parameters and model architectures (SI Text, Robustness of Findings) that we did not anticipate.
Invariance of ART with RT.
We then investigated how the trial-averaged ART from an individual accumulator can be invariant with RT even though RT is produced by a large ensemble of accumulators. Most accumulator models are based on thresholds that are invariant across RT (40 ⇓ –42), and multiple laboratories have observed invariant thresholds of neural discharge rate (6 ⇓ ⇓ ⇓ ⇓ ⇓ ⇓ ⇓ ⇓ ⇓ ⇓ –17). However, the ART of an individual accumulator participating in the ensemble is not guaranteed to reach the same value on each trial because of the stochastic nature of its accumulation process—on some trials it has reached θ and contributes to the measured RT, but on other trials it has not yet reached θ and so does not contribute (Fig. 1C). Though it is trivially true for a single accumulator that ART will be invariant with RT, it is unknown whether large ensembles of accumulators with intermediate termination rules and accumulation rate correlations reproduce the invariance of ART with RT that is regularly measured in neurophysiology.
Just like a neurophysiology experiment would randomly sample one neuron in some brain region, we randomly selected one accumulator in the ensemble and measured ART for that accumulator on each simulated trial. We then quantified how the slope of the linear regression of ART over RT varied for ensembles of 10, 100, and 1,000 accumulators (Fig. 3), mimicking the approach used in neurophysiological analyses. For small ensembles (n = 10), ART was invariant over RT under intermediate termination rules (10% ≤ pN ≤ 90%) and moderate rate correlations (rv ≥ 0.4). With many accumulators (n = 1,000), the invariance of ART with RT was only violated for the earliest termination rule (pN = first) and low accumulation rate correlations (rv ≤ 0.3). Under a pooling mechanism, the invariance of ART with RT was never violated. Thus, the invariance of ART with RT emerges from the dynamics of individual accumulators operating in large ensembles, even though the dynamics of no single accumulator uniquely determine RT.
Relationship between ART and RT as a function of ensemble size (N), termination rule (pN), and accumulation rate correlation (rv). Each panel shows the linear regression slope of ART on RT, expressed as colored pixels, for three ensemble sizes (Left, n = 10 Center, n = 100 Right, n = 1,000) and all combinations of termination rules and accumulation rate correlations. Hatched pixels indicate parameter combinations for which ART varied systematically with RT. Thus, beige, nonhatched pixels represent parameter combinations for which the slope of the linear relationship between ART and RT was zero and nonsignificant.
Relationship Between ART and θ.
Finally, we explored how the ART measured from an individual accumulator relates to the actual threshold of that accumulator (θ). In the neurophysiology literature, it is commonly assumed that the ART of an individual neuron represents a threshold like that in stochastic accumulator models. However, because ART is a trial-averaged measure and the true threshold of a neuron (θ) is unknown, we do not know how closely the value of ART approximates the value of θ.
As expected, with n = 1, ART was constant with RT and identical across trials, and the measured ART equaled the model parameter θ. However, in ensembles operating under intermediate termination rules (10% < pN < 90%) ART varied significantly between trials (Fig. 4). Thus, individual accumulators acting in ensembles do not reach the same activation level at RT on each trial, meaning that measured ART is not necessarily equivalent to the threshold specified by the model (θ) for any given accumulator. Analogous nonequivalence was observed for pooling mechanisms. We further observed that the termination rule determined how closely ART approximated θ. Under early termination rules (pN < 50%), average ART was less than θ. Under late termination rules (pN > 50%), average ART was greater than θ. Under the median termination rule (pN = 50%), average ART equaled θ this entails that the relationship between ART and θ cannot be determined without knowledge of the termination rule. The accumulation rate correlation determined the magnitude of variability in ART. The more homogeneous the accumulators, the smaller the variability in ART and the closer the agreement with θ, which implies that the degree of stochastic variation in ART is indicative of the homogeneity of the accumulation process in the ensemble. Together, though these findings demonstrate unanticipated complexity in the relationship between ART measured in an individual accumulator and the true θ that defines its dynamics, in conditions under which one accumulator resembles many, the average ART measured from neurons is a fair proxy of the relation of θ to RT.
Distribution of measured activation level around RT (ART) between trials in a randomly selected accumulator as a function ensemble size (N), termination rule (pN), and accumulation rate correlation (rv). The x axis ranges from 10 0 to 10 3 , and the y axis ranges from 10 2 to 10 3 . Other conventions as in Fig. 2. Individual threshold (θ, red line) was identical across accumulators. Thus, correspondence between ART and θ is indicated by overlap of distributions (black lines) and threshold (red line).
Inhibitory control adapts to contextual statistics
Subjects performed an anticipatory version of the stop signal task (Fig. 2A see Adaptive stop signal task) similar to that reported previously (Dunovan et al., 2015), with the exception of how contextual information was conveyed to the subject. Rather than explicitly cuing the subject as to the probability of seeing a stop signal on each trial, as was used in our previous study, subjects in the current experiment had to rely on performance feedback to learn the temporal distribution of stop signals in one of three contexts.
Adaptive stop signal task and contextual SSD statistics. A, Anticipatory stop signal task. Top, On Go trials, subjects were instructed to press a key when the ascending bar crossed a target line, always occurring on 520 ms after trial onset. Feedback was given informing the subject if their response was earlier or later than the Go target (maximum 100 points). Bottom, On Stop trials, the bar stopped and turned red before reaching the target line. If no response was made (correct), the subject received a bonus of 200 points. Failure to inhibit the key press resulted in a −100 point penalty. B, Stop signal statistics across contexts. Distributions show the sampling distributions for SSDs on context trials in the Early (blue), Uniform (gray), and Late (purple) groups. Early and Late SSDs were normally distributed (parameters stated as in-figure text, N(μ, σ)). Below the distributions, each row of tick marks indicates the context SSDs for a single example subject in each group. Bottom row of red tick marks indicates the five probe SSDs included for all subjects regardless of context.
To assess behavioral differences across contexts, we compared accuracy on stop signal trials at each Probe SSD across groups as well as the mean RTs on correct (response on Go trial) and error (i.e., response on Stop trial) responses. Separate one-way ANOVAs revealed a significant main effect of context across groups on both correct RTs, F(2,72) = 10.07, p < 0.001, and error RT (responses on stop signal trials), F(2,72) = 21.72, p < 0.00001. Consistent with our hypothesis, we found a significant interaction between context condition and Probe SSD, F(2.23,80.15) = 3.60, p = 0.027 (Fig. 3A). Shifting the mean of the context SSD distribution later into the trial led to delayed responding on Go trials (Fig. 3A, middle, right) as well as greater stopping accuracy on probe trials (Fig. 3A, left) in the Uniform and Late groups relative to the Early group. Thus, as predicted, participants could reliably learn to modulate their inhibitory control efficiency based on the probabilistic structure of prior stop signal timing (Shenoy and Yu, 2011).
Effects of context on stop accuracy and RTs. A, Subject-averaged stop accuracy (left) and cumulative RT distributions for correct (Go trials middle) and error (Stop trials right) responses in the Early (blue), Uniform (gray), and Late (purple) contexts. B, Posterror slowing following failed Stop trials in each context and subsequent decay over five trials. C, The posterror slowing observed immediately after a failed stop (terr + 1) in each context (e.g., first data point in B). Error bars and shaded area represent the 95% CI calculated across subjects.
We next examined whether failed Stop trials elicited any systematic changes in RT on subsequent trials. Figure 3B shows the immediate slowing and subsequent decay in RTs following a stop error (probe trials only), calculated with respect to the average RT on the five trials that preceded the error. A one-way ANOVA revealed a significant effect of context on the degree to which subjects slowed responses immediately following stop errors, F(2,72) = 4.27, p = 0.018. Unlike the observed effects on RT and accuracy, which scaled with differences in the mean SSD in each context, group differences in posterror slowing appeared to be driven by the variance of SSDs, with stop errors eliciting greater slowing in the Uniform context than in the Early and Late contexts (Fig. 3C). Collectively, these findings suggest that adaptive control is sensitive to multiple task dimensions and that these dimensions manifest in dissociable behavioral profiles.
Static DPM parameter identifiability
The DPM (Dunovan et al., 2015) assumes that an execution decision is made when an accumulating execution process, with onset time tr and drift rate ve, crosses an upper decision boundary a (see Computational models). On Stop trials, a nested braking process, with negative drift rate vb, is initiated at the current state of the execution process at the time of the SSD and accumulates back toward the lower boundary (always set equal to 0 Fig. 1A). The model successfully cancels an action when the braking process reaches the lower bound before execution process terminates at the upper execution threshold.
For a cognitive model to be informative, it is important to verify that its parameters are identifiable, or able to be reliably estimated from observable measures of the target behavior. The issue of model identifiability is particularly relevant to novel variants of sequential sampling models, as several recently proposed models within this class have been found to exhibit poor identifiability despite providing convincing fits to experimental data (Miletić et al., 2017 White et al., 2018). More common variants, however, such as the drift-diffusion model and linear ballistic accumulator, are reasonably identifiable with sufficient trial counts and the application of appropriate optimization procedures (Ratcliff and Tuerlinckx, 2002 van Ravenzwaaij and Oberauer, 2009 Visser and Poessé, 2017).
In practice, the identifiability of a model can be assessed by performing fits to simulated data, for which the true parameters are known, and comparing the recovered estimates. To evaluate the identifiability of parameters in the DPM, we adopted the following procedure. First, we identified three generative parameter sets that approximated the average stopping accuracy curve and RT distributions observed in each context condition, ensuring that the generative parameter sets yielded plausible behavioral patterns. Each of these three generative parameter sets served as hyperparameters describing the mean of a normally distributed population from which 25 “subject-level” parameter sets were sampled and used to simulate 1000 trials (for sampling details, see Computational models). This produced a simulated dataset similar in size and dimension to that of the empirical data while capturing the assumption that subject parameter values in each context vary around a shared mean. Each of the three group-level parameter sets was used to generate 20 simulated datasets (each comprised of 25 randomly sampled subjects with 1000 trials per subject). The DPM was then fit to the subject-averaged stop accuracy and RT quantiles for each of the simulated datasets following the optimization routine outlined in Materials and Methods (i.e., here, “subject-averaged” data were calculated by first estimating the stop accuracy curve over probe SSDs, correct RT quantiles, and error RT quantiles for each subject then calculating the mean for each of these values across subjects).
Parameter estimates recovered from the fits are summarized in Figure 4A–D, with the recovered values for each parameter plotted against the respective generative value for each of the three sets. All parameters were recovered with a high degree of accuracy. In addition to accurately recovering generative parameter values, the DPM provided high-quality fits to the datasets generated from all three parameter sets, as shown by the positively skewed distribution of χ 2 values in Figure 4E. The results of this simulation and recovery analysis suggest that parameters of the DPM are identifiable when fitting group-level data and are robust to variability in the parameter values of individual subjects.
Simulation and parameter recovery analysis of DPM. A, True and estimated boundary height (a blue), (B) braking drift rate (vb red), (C) onset time (tr purple), and (D) execution drift rate (ve green) for three generative parameter sets. Lines indicate true generative parameter means. Lighter colors represent the range of sampled subject-level estimates. Squares represent estimated parameter means. Error bars represent ±1 SD. E, Distribution of χ 2 values for fits to all 60 simulated datasets (gray).
Individual subject DPM fits
To better understand the cognitive mechanisms underlying the observed effects of feedback on timing and control behavior across contexts, we fit RT and stop accuracy data to the DPM. To isolate the parameters that were influenced by the experimental manipulations, the model fits were performed in multiple consecutive stages. To reduce the combinatorial space of possible model configurations, we adopted a forward stepwise model selection approach where we began by comparing models in which a single parameter was free to vary across conditions (Table 3), execution drift (ve), braking drift (vb), onset delay (tr), or boundary height (a).
Static fit statistics for early and late contexts
Because the behavioral effects of interest were driven by trial-by-trial feedback (i.e., at the subject level) as well as differences in the sampling distributions of SSDs across contexts (i.e., at the group level), single parameter models were fit to behavioral data for individual subjects, allowing select parameters to vary between the first and second half of trials in the experiment, and to data at the group level, allowing parameters to vary across contexts.
Consistent with our previous study in which subjects proactively modulated the drift rate of the execution process (ve) in a probabilistic cueing paradigm (Dunovan et al., 2015), we found that allowing ve to vary between the first and second half of trials provided the best average fit across subjects in the current experiment (AICv = −206.5, BICv = −202.2, SD = 23.47 Fig. 5A). Comparable IC scores were provided by alternative models (e.g., AICtr = −201.70, BICtr = −199.34, SD = 21.86). Thus, we also inspected the number of subjects for which each model outperformed the rest and found that the ve model was the best fitting model for more subjects (N = 33) than any alternative models (see Fig. 5B). Parameter estimates (Fig. 5C) showed that the ve values tended to increase over the course of the experiment, higher in the second compared with the first half of trials. Notably, this effect was most pronounced in the Early context, followed by the Uniform and Late contexts, respectively, suggesting that subjects in the Early context were more sensitive to timing errors than those in the Uniform and Late contexts.
Single-subject DPM fits and model comparison. Variants of the DPM were fit to all individual subject datasets with different parameters left free to vary between the first and second half of trials. A, Mean subject BIC and AIC scores for boundary height (a blue), execution drift rate (ve green), braking drift rate (vb red), and onset delay (tr yellow) (dark circles). Lighter circles represent an individual subject. Error bars represent 95% CI. B, Same mean values as in A, but with size of the dots scaled to reflect the number of subjects for which each model had the lowest AIC/BIC score. White text indicates the number of subjects best described by each model (e.g., factor used to scale the size of the dots). C, Observed increase in ve values estimated for the first (v1) and second half (v2) of trials in each context. Error bars represent 95% CI. Schematic represents the relative increase in ve in Early (cyan), Uniform (gray), and Late (purple) contexts, compared with a shared initial drift rate (i.e., before learning black dotted line).
Contextual modulation of DPM parameters
Next, we investigated whether the same mechanism was able to account for the observed differences in RT and stop accuracy at the group level, by first optimizing parameters to the average data in the Uniform context, where the timing of the stop signal is unpredictable, and then to the average data in the Early and Late contexts, holding all parameters constant at the best fitting Uniform values, except for one or two parameters of interest. The model that best accounted for differences in the stop accuracy and RT quantiles across the three context conditions was selected for further investigation of feedback-dependent learning mechanisms. The fitting routine (for details, see Materials and Methods) was repeated a total of 20 times using different initialization values for all parameters at the start of each run to avoid biases in the optimization process. The summary of fits to the Uniform context data is provided in Table 1. In line with our previous findings (Dunovan et al., 2015), as well as the outcome of single-subject fits in the current study, leaving the execution drift rate free provided a better account of context-dependent changes in behavior compared with alternative single-parameter models (Best-Fit AICve = −363.02 Fig. 6A).
Group-level model comparison and best fit predictions across context. A, AIC (dark) and BIC (light) scores for all single-parameter models, allowing execution boundary height (a blue), execution drift rate (ve green), braking drift rate (vb red), or onset delay (tr yellow) to vary across contexts. Three dual-parameter models were also included to test for possible benefits of allowing ve (best fitting single parameter model) to vary along with a (teal), vb (purple), or tr (dark green). Error bars indicate the 95% CI. B, Qualitative effects of context on a (top) and ve parameter estimates (bottom) in the Early and Late contexts. C, Model predicted data (lines and larger transparent circles) simulated with best fit parameters from the ve, a model, corresponding to dotted circle in A overlaid on the average empirical data for Early (cyan), Uniform (gray), and Late (purple) contexts. Error bars represent 95% CI.
To further test the relationship between execution drift rate and context, we performed another round of fits to test for possible interactions between the execution drift rate and a second free parameter, boundary height (a), braking drift rate (vb), or onset delay (tr). The AIC and BIC scores from these fits showed that a combination of boundary height and execution drift rate (ve and a) provided the best overall fit to the data (Best Fit AICa,ve = −372.26), reasonably exceeding that of the drift-only model (|AICve −AICa,ve| = 9.24) to justify the added complexity of the dual-parameter model. Figure 6C shows a qualitative assessment of the a and vE model's goodness of fit, revealing a high degree of overlap between the simulated and observed stop accuracy and RT data in both Early and Late conditions. These results suggest that there may be two targets of learning in the decision process: a strong modulation of the execution drift rate and a subtler modulation of the boundary height.
Adaptive DPM with dual-learning mechanisms
It is not clear from the preceding analysis whether error-driven changes in the drift rate and boundary height are able to capture trial-to-trial adjustments of response speed and stop accuracy as statistics of the environment are learned experientially. Here we explore how drift rate and boundary height mechanisms adapt on a trialwise basis to different sources of feedback to drive context-dependent control and decision-making.
We implemented two forms of corrective learning (Fig. 7A): one targeting the execution drift rate v and another targeting the height of the execution boundary a. We hereafter denote execution drift rate as v rather than ve to avoid multiple subscripts in the adaptive model equations. On correct Go trials (Fig. 7A, left, middle), the current drift rate (vt) was updated (vt+1 Eq. 6) to reflect the signed difference between the model's RT on the current trial and the target time (T G = 520 ms), increasing the drift rate following “slow” responses (i.e., RTt > T G ) and decreasing the drift rate following “fast” responses (i.e., RTt < T G ). On failed Stop trials, vt was updated according to the same equation but with the error term reflecting the difference between RTt and the trial response deadline (T S = 680 ms), thus slowing the drift rate to reduce the probability of failed stops in the future. This form of RT-dependent modulation in the drift rate is motivated by recent findings demonstrating adaptation of action velocity by dopaminergic prediction error signaling in the striatum (Yttri and Dudman, 2016). In the context of the “believer-skeptic” framework (Dunovan and Verstynen, 2016), fast RT errors could reinforce the “skeptic” (i.e., indirect pathway) and suppress the “believer” (i.e., direct pathway) by decreasing dopaminergic tone in the striatum as follows: In addition to receiving feedback about errors in action timing, subjects also received penalties for failing to suppress responses on Stop trials. In the adaptive DPM, failed stops (Fig. 7A, right) caused an increase in the boundary height (a0) according to a δ function with height βt and decayed exponentially on each subsequent trial (at<err>) until reaching its baseline value a0 or until another stop error occurred (Eq. 7) as follows: This form of adaptation in boundary height is motivated by physiological evidence that the STN plays a critical role in setting threshold for action execution and that this relationship is modulated by error commissions (Cavanagh et al., 2014). On all correct Go trials and the first failed Stop trial, the timing errors were scaled by the same learning rate (α0). An additional parameter was included to modulate the sensitivity (π) to stop errors over time (Eq. 8), allowing the model to capture an observed decrease in the stop accuracy over time in each of context groups (Fig. 8C). According to Equation 8, π dropped exponentially over time at a rate p, acting as a scalar on αt (Eq. 9) and βt (Eq. 10) before updating values of drift rate (Eq. 6) and boundary height (Eq. 8) after a failed stop. Higher values of p led to more rapid decay of π toward zero and, thus, a more rapid desensitization to Stop trial errors as follows:
Adaptive DPM parameter recovery and learning predictions in Uniform context. A, Schematic showing how the execution drift rate is modulated following timing errors on Go trials (left) and how the boundary height is modulated following failed inhibitions on Stop trials. B, Parameter recovery results for α (left, teal), β (middle, purple), and p (right, orange) parameters in the primary version of the adaptive DPM. Horizontal lines indicate true generative parameter means. Light colors represent the range of sampled subject-level estimates. Squares represent estimated parameter means. Error bars represent ±1 SD. Subject-averaged timeseries (dark line) and 95% CI (gray area) showing the (C) RT on Go trials and (D) accuracy on Stop trials. Each point in the timeseries (n = 30) represents the windowed average RT/accuracy over ∼30 trials. The corresponding adaptive model predictions are overlaid (dotted line), averaged over simulations to each individual subject's data.
Adaptive DPM modulates behavior according to context-specific control demands. A, Average stop accuracy curves (left) and correct (middle) and error (right) RT distributions predicted by adaptive model simulations in the Early (blue) and Late (purple) contexts (initialized with the optimal parameters of the Uniform context). B, Empirical timeseries of go RTs with model predictions overlaid for Early (left), Uniform (middle), and Late (right) contexts. C, Empirical and model predicted timeseries of stop accuracy for the same conditions as in B.
Adaptive DPM parameter identifiability
Before fitting the adaptive DPM to observed subject data, we first performed a parameter recovery analysis, similar to that conducted for the static DPM, to ensure that the learning rate and decay parameters introduced in the adaptive model could be reliably identified (for procedural details, see Materials and Methods). The parameter recovery results are displayed in Figure 7B, showing the recovered estimates for α, β, and p overlaid on the true values. For all three generative parameter sets, the optimization procedure for fitting the adaptive DPM accurately recovered the true values of α, β, and p. In one case (recovery estimates of α for parameter set 2), the 95% CI of recovered parameter estimates failed to overlap with the range of generative values however, the trend of recovered α estimates followed the trend of the true values across parameter sets 1 (highest α), 2 (medium α), and 3 (lowest α).
Adaptive fits in the uniform context
After confirming the identifiability of learning parameters in the adaptive DPM, we next sought to confirm that the trial-averaged behavior of the adaptive model was preserved after fitting the learning rates (e.g., stop accuracy curve on probe trials and RT quantiles on correct and error trials). The adaptive DPM's predictions are indeed closely aligned with the empirical statistics used to fit the static model (adaptive DPM χ 2 static = 0.005, static DPM χ 2 static = 0.011 Table 4). Although this is not necessarily surprising, it is promising to confirm that introducing feedback-dependent adaptation in the drift rate and boundary height parameters does not compromise the model's fit to trial-averaged statistics. Next, we inspected the degree to which this model captured changes in Go trial RT and Stop trial accuracy in the Uniform context. Indeed, the predicted time course of both behavioral measures showed a high degree of correspondence with the observed behavioral patterns (Fig. 7C,D). These qualitative fits show that it is indeed possible to capture feedback-dependent changes in RT and stop accuracy with the specific types of error learning in ve (Eq. 6) and a (Eq. 7) parameters. Without an alternative model with which to compare, however, it is impossible to conclude anything about the specificity of these particular learning rules (e.g., the hypothesized dependencies of ve and a on timing and control errors, respectively). Therefore, we compared the fits afforded by the primary version of the adaptive DPM with an alternative version in which a was modulated by timing errors and ve was modulated by failed stops. In the alternative version of the model, Equation 6 becomes at+1 = at · , increasing and decreasing a following fast (RTt < 520 ms) and slow (RTt > 520 ms) responses on Go trials, and increasing a on failed Stop trials in proportion to the speed of response speed. Additionally, Equation 7 becomes υterr = υ0 − βte −terr , slowing ve by a magnitude of βt. Indeed, this alternate version of the adaptive model afforded an improvement over the static model fits to trial-averaged statistics (adaptive DPMalt χ 2 static = 0.007, static model χ 2 static = 0.011 Table 4) however, compared with the adaptive DPM in which ve was modulated by timing errors and a was increased following failed stops (χ 2 adapt = 0.235, AIC = −326.4, BIC = −320.1), fits of the alternative adaptive model (χ 2 adapt = 0.861, AIC = −248.6, BIC = −242.3) provided a worse fit to feedback-dependent changes in RT and stop accuracy over time in the Uniform context (Table 4).
Adaptive dependent process model fit statistics
Adaptive predictions in early and late contexts
Consistent with our original hypotheses, fits of the primary version of the adaptive DPM to behavior in the Uniform context highlight two possible mechanisms for acquiring the prior on the SSD: adaptive modulation of response speed by the drift rate and cautionary increases in boundary height following control errors. To confirm that these mechanisms work together to adaptively learn based only on the statistics of previous input signals, we took the average parameter scheme from the Uniform context fits and simulated each subject in the Early and Late contexts. If the context-dependent changes in the RT distributions and stop accuracy are indeed a reflection of the proposed learning mechanisms, then the model simulations should reveal similar RT and accuracy time courses as in the observed behavior.
Figure 8A shows the simulated stop-curve and RT distributions generated by the adaptive model based on feedback in the Early and Late conditions. As in the observed data (Fig. 3A), adaptation to Early SSDs led to impaired stopping accuracy, but faster RTs relative to simulated predictions in the Late condition. In Figure 8B, C, the middle panels show the same trial-binned RT and stop accuracy means as in Figure 7C, D (Uniform condition), flanked by corresponding time courses from simulations to Early (left) and Late (right) conditions. The adaptive model predictions show a high degree of flexibility, conforming to idiosyncratic changes in the trialwise behavioral dynamics within each context SSD condition. For instance, the RTs in the Early condition exhibit a relatively minor and gradual decay over the course of the experiment (Fig. 8B, left), contrasting markedly from the early increase and general volatility of RTs in the Late condition (Fig. 8B, right). The adaptive DPM largely captures both patterns, underscoring feedback-driven adaptation in the drift rate as a powerful and flexible tool for commanding inhibitory control across a variety of settings. In addition to predicting group differences in the time course of RTs, the simulations in Figure 8C show a striking degree of precision in the model-estimated changes in stop accuracy, both over time and between groups.
Because the static model fits revealed marginal evidence for the drift-only model (Fig. 6A), we next asked whether this simpler model was able to account for the learning-related behavioral changes with the same precision as the dual-learning (i.e., drift and boundary) model. To test this hypothesis, we ran simulations in which the boundary learning rate was set to zero, thereby leaving only the drift rate free to vary in response to feedback. Figure 9A shows the error between observed and model-predicted estimates for each of the behavioral measures in Figure 3 (e.g., RT, stop accuracy, and posterror slowing) based on 20 simulations of the drift-only and dual-learning models. Compared with the drift-only model, the dual-learning model showed no significant benefits in terms of fit to the trialwise RT (t(24) = 1.09, p = 0.28) or accuracy (t(24) = 0.23, p = 0.82) but showed a marked improvement in the fit to posterror slowing (t(24) = −6.91, p < 0.00001) (Fig. 9A). Importantly, the interaction of drift rate and boundary adaptation in the dual-learning model not only reduced the error in the model fit, but recovered the same qualitative pattern of posterror slowing across contexts observed in the data (Fig. 9B). In contrast, the drift-only model predicted the largest posterror slowing effect in the Early condition (Fig. 9B, left). This is particularly revealing because no information about the observed posterror slowing was included in the adaptive cost function when fitting the learning-rate parameters. Collectively, these results suggest that goal-directed tuning of movement timing (i.e., RT) and control (i.e., stop accuracy) is best described by feedback-driven changes in the drift rate and boundary-height parameters of accumulation-to-bound decisions.
Utility of including boundary adaptation compared with drift-only model. A, Relative error of simulated compared with observed RT, accuracy, and posterror slowing on probe trial measures based on 20 simulated datasets for the drift-only and drift and bound adaptive models. Posterror slowing in each context condition as predicted by the (B) drift-only and (C) drift and bound models. For comparison with patterns observed in empirical data, see Figure 3C. Error bars indicate the 95% CI around the mean.
Is there an R implementation of the linear ballistic accumulator model or Ratcliff's diffusion model for measuring response time and accuracy? - Psychology
Psychon Bull Rev (2017) 24:547–556 DOI 10.3758/s13423-016-1081-y
The EZ diffusion model provides a powerful test of simple empirical effects Don van Ravenzwaaij1 · Chris Donkin2 · Joachim Vandekerckhove3
Published online: 28 June 2016 © The Author(s) 2016. This article is published with open access at Springerlink.com
Abstract Over the last four decades, sequential accumulation models for choice response times have spread through cognitive psychology like wildfire. The most popular style of accumulator model is the diffusion model (Ratcliff Psychological Review, 85, 59–108, 1978), which has been shown to account for data from a wide range of paradigms, including perceptual discrimination, letter identification, lexical decision, recognition memory, and signal detection. Since its original inception, the model has become increasingly complex in order to account for subtle, but reliable, data patterns. The additional complexity of the diffusion model renders it a tool that is only for experts. In response, Wagenmakers et al. (Psychonomic Bulletin & Review, 14, 3–22, 2007) proposed that researchers could use a more basic version of the diffusion model, the EZ diffusion. Here, we simulate experimental effects on data generated from the full diffusion model and compare the power of the full diffusion model and EZ diffusion to detect those effects. We show that the EZ diffusion model, by virtue of its relative simplicity, will be sometimes better able to detect
Electronic supplementary material The online version of this article (doi:10.3758/s13423-016-1081-y) contains supplementary material, which is available to authorized users. Don van Ravenzwaaij
University of Groningen, Department of Psychology, Grote Kruisstraat 2/1, Heymans Building, room 169, 9712 TS Groningen, The Netherlands
University of New South Wales, Sydney, Australia
University of California, Irvine CA, USA
experimental effects than the data–generating full diffusion model. Keywords Sequential accumulator models · Response time analysis · Power · Model complexity
In everyday life, we are constantly confronted with situations that require a quick and accurate action or decision. Examples include mundane tasks such as doing the dishes (we do not want to break china, but we also do not want to spend the next hour scrubbing), or vacuum cleaning (we like to get as many nooks and corners as possible, but also want to get back to finishing that paper), but also more serious activities, such as typing a letter or performing a placement test. For all these actions, there exists a trade–off, such that greater speed comes at the expense of more errors. This phenomenon is called the speed–accuracy trade–off (Schouten & Bekker, 1967 Wickelgren, 1977). In experimental psychology it is common practice to study this speed–accuracy trade–off with relatively simple tasks. More often than not, the task requires participants to decide between one of two alternatives as quickly and accurately as possible. Notable examples include the lexical decision paradigm (Rubenstein et al., 1970) in which the participant is asked to classify letter strings as English words (e.g., LEMON) or non–words (e.g., LOMNE), and the moving dots task (Ball & Sekuler, 1982) in which participants have to determine whether a cloud of partly coherently moving dots appears to move to the left or to the right. Typically, the observed variables from these and other two– alternative forced choice tasks are distributions of response times (RTs) for correct and incorrect answers. One way to analyze the data from these kinds of tasks is to draw inferences based on one of, or both, the mean of the correct RTs,
Psychon Bull Rev (2017) 24:547–556
or the percentage of correct responses. These measures, however, do not speak directly to underlying psychological processes, such as the rate of information processing, response caution, and the time needed for stimulus encoding and non–decision processes (i.e., response execution). They also do not address the speed–accuracy trade–off. The motivation among cognitive psychologists to be able to draw conclusions about these unobserved psychological processes has led to the advent of sequential accumulator models. A prominent example of such a model is the diffusion model (Ratcliff, 1978). The model assumes that an observer accumulates evidence for responses until a threshold level of evidence for one of the responses is reached. The time taken to accumulate this evidence, plus a non–decision time, gives the observed response time, and the choice is governed by which particular threshold is reached. Over the last four decades, as increasingly complex data patterns were observed, the diffusion model grew in order to account for these data. Ratcliff (1978) added the assumption that accumulation rate varied from trial to trial in order to account for the observation that incorrect responses were slower than correct responses. Ratcliff and Rouder (1998) assumed that the starting point of evidence could vary from trial to trial (following Laming, 1968), allowing them to account for incorrect responses that were faster than correct responses. Finally, Ratcliff and Tuerlinckx (2002) also proposed that non-decision time would vary across trials, an assumption that allowed the model to account for patterns in the speed with which the fastest responses were made. The version of the diffusion model that includes all components of between–trial variability is known henceforth as the ‘full’ diffusion model. As a theoretical model of decision–making, the full diffusion model is impressive – it accounts for a wide range of reliable empirical phenomena. Among others, the diffusion model has been successfully applied to experiments on perceptual discrimination, letter identification, lexical decision, categorization, recognition memory, and signal detection (e.g., Ratclif,f 1978 Ratcliff et al., 2004, 2006 Klauer et al., 2007 Wagenmakers et al., 2008 van Ravenwaaij et al., 2011 Ratcliff et al., 2010). Using the diffusion model, researchers have examined the effects on decision making of alcohol intake (van Ravenzwaaij et al. 2012), playing video games (van Ravenzwaaij et al. 2014), sleep deprivation (Ratcliff & van Dongen, 2009), anxiety (White et al., 2010), and hypoglycemia (Geddes et al. 2010). The model has also been applied extensively in the neurosciences (Ratcliff et al., 2007 Philiastides et al., 2006 Mulder et al., 2012). In recent years, researchers have begun to use the full diffusion model as a measurement model. A diffusion model analysis takes as input the entire RT distribution for both correct and incorrect responses. The model maps the
observed RTs and error rates into a space of psychological parameters, such as processing speed and response caution. Such an analysis has clear benefits over traditional analyses, which make no attempt to explain observed data in terms of psychologically meaningful processes. A full diffusion model analysis is complicated, for two reasons. First, the model is complicated to use. Parameters for the model are estimated using optimization on functions that involve numerical integration and infinite sums.While there have been valiant efforts to make such models easier to use (Donkin et al., 2009, 2011 Vandekerckhove & Tuerlinckx, 2007, 2008 Voss & Voss, 2007), the application of a full diffusion model remains an approach most suited for experts. Second, the model itself may be more complex than is required by the data it is being used to fit, at least when the model is being used as a measurement model. When the data do not provide enough constraint on the estimation of model parameters, the more complex model will overfit the data, which leads to increased variability in parameter estimates.1 In response to the complexity of the full diffusion model, Wagenmakers et al. (2007) advocated for the use of the “EZ diffusion model”. The EZ diffusion model forgoes between– trial variability in accumulation rate, starting point, and non-decision time, as well as a–priori response bias (but see Grasman et al., 2009). By removing all of these additional model components, no fitting routine is required to estimate the parameters of the EZ diffusion model. Instead, the EZ diffusion takes RT mean, RT variance, and percentage correct, and transforms them into a mean rate of information accumulation, response caution, and a non–decision time. The EZ model has been heralded for the ease with which it can be applied to data. However, critics have claimed that it is “too EZ” (Ratcliff, 2008, but see Wagenmakers et al., 2008). It is true that the EZ diffusion model can not account for the very broad range of data patterns for which the full diffusion was developed. However, the patterns of fast and slow errors, and shifting leading edges, that warrant the full complexity of the diffusion model are often observed in experiments that are specifically designed to observe such patterns, usually involving many thousands of trials. It is unclear whether such complex patterns can be detected in data coming from simpler experiments, at least to the point that they constrain the estimation of additional model parameters.
see this, imagine an experimental design in which the betweentrial variability in accumulation rate parameter in the diffusion model, ν, is unidentifiable (i.e., every value of νˆ can yield the same likelihood value). If we were to fit a model to data that includes ν, the maximum likelihood value of the other parameters in the model, θ , will be estimated conditional on νˆ . Because the parameters in the diffusion model are correlated, the value of θˆ depends on ν. ˆ As such, estimating ν artificially increases the variability in estimates of θ .
Psychon Bull Rev (2017) 24:547–556
Fig. 1 The diffusion model and its parameters as applied to a moving dots task (Ball & Sekuler, 1982). Evidence accumulation begins at starting point z, proceeds over time guided by mean drift rate ν, and stops whenever the upper or the lower boundary is reached. Boundary separation a quantifies response caution. Observed RT is an additive combination of the time during which evidence is accumulated and non–decision time Ter
Van Ravenzwaaij and Oberauer (2009) examined the ability of both the full and EZ model to recover the mean structure and individual differences of parameter values used to generate fake data. The authors concluded that EZ was well capable of recovering individual differences in the parameter structure, but exhibited a bias in the recovery of the mean structure. Interestingly, the full diffusion model was unable to recover individual differences in the across– trial variability parameters, casting doubt on the added value of these extra parameters in more “typical” data sets. Recovery of the mean structure depended very much on the specific implementation. Here, we show that the additional complexity of the full diffusion model has adverse consequences when one aims to use the model to detect the existence of an empirical effect. Simplifying the parametric assumptions of the diffusion model leads to increased precision in parameter estimation at the cost of possible bias due to model mis–specification (bias–variance trade–off (Geman et al., 1992)). However, for the purposes of decision–making, bias is not necessarily detrimental (Gigerenzer & Brighton, 2009) while higher precision leads to stronger and more accurate inference (Hastie et al., 2005). One of the aims of this manuscript is to help non–experts approach the notion of when to use the EZ model over the full diffusion model. We simulate data in which we systematically vary the three main diffusion model parameters between two conditions: drift rate, boundary separation, and non-decision time. The data are simulated from a full diffusion model. We then show that, compared to the full diffusion model, the EZ diffusion model is the more powerful tool for identifying differences between two conditions on speed of information accumulation or response caution. We show that this holds across simulations that differ in the number of trials per
participant, the number of participants per group, and the size of the effect between groups. We compare the proportion of times that the EZ and the full diffusion model detected a between–group effect on either mean speed of information accumulation, response caution, or non– decision time parameters (in terms of the result of an independent-samples t–test). The remainder of this paper is organized as follows: in the next section, we discuss the diffusion model in detail. We examine the simple diffusion model, the full diffusion model, and EZ. In the section after that, we discuss our specific parameter settings for our simulation study. Then, we present the results of our simulations. We conclude the paper with a discussion of our findings and the implications for cognitive psychologists looking to analyze their data with the diffusion model.
The diffusion model In the diffusion model for speeded two–choice tasks (Ratcliff, 1978 Vandekerckhove & Tuerlinckx, 2007 van Ravenzwaaij et al., 2012), stimulus processing is conceptualized as the accumulation of noisy evidence over time. A response is initiated when the accumulated evidence reaches a predefined threshold (Fig. 1). The decision process begins at starting point z, after which information is accumulated with a signal–to–noise ratio that is governed by mean drift rate ν and within–trial standard deviation s.2 Mean 2 Mathematically, the change in evidence X
is described by a stochastic differential equation dX(t) = ν · dt + s · dW (t), where s · dW (t) represents the Wiener noise process with mean 0 and variance s 2 · dt. The standard deviation parameter s is often called the “diffusion coefficient” and serves as a scaling parameter that is often set to 0.1 or to 1.
Psychon Bull Rev (2017) 24:547–556
drift rate ν values near zero produce long RTs and near– chance performance. Boundary separation a determines the speed–accuracy trade–off lowering boundary separation a leads to faster RTs at the cost of more errors. Together, these parameters generate a distribution of decision times DT . The observed RT, however, also consists of stimulus– nonspecific components such as response preparation and motor execution, which together make up non–decision time Ter . The model assumes that Ter simply shifts the distribution of DT , such that RT = DT +Ter (Luce, 1986). Hence, the four core components of the diffusion model are (1) the speed of information processing, quantified by mean drift rate ν (2) response caution, quantified by boundary separation a (3) a–priori response bias, quantified by starting point z and (4) mean non–decision time, quantified by Ter .
needs is to execute a few lines of code and the EZ parameters will be calculated instantaneously. The closed–form solutions for the EZ diffusion model require the assumption that there is no between–trial variability in drift rate, η, starting point, sz , or non–decision time, st . Further, the model assumes that responses are unbiased (i.e., z is fixed at half of a). The EZ diffusion model converts RT mean, RT variance, and percentage correct, into the three key diffusion model parameters: mean drift rate ν, boundary separation a, and non–decision time Ter . The EZ diffusion model parameters are computed such that the error rate is described perfectly. EZ calculates diffusion model parameters for each participant and each condition separately. For applications of the EZ diffusion model, see e.g., Schmiedek et al. (2007), Schmiedek et al. (2009), Kamienkowski et al. (2011), and van Ravenzwaaij et al. (2012).
Full diffusion The simple diffusion model can account for most data patterns typically found in RT experiments, but has difficulty accounting for error response times that have a different mean from correct response times (Ratcliff, 1978). One way for the model to produce slow errors is with the inclusion of across–trial variability in drift rate. Such variability will lead to high drifts that produce fast correct responses and low drifts that produce slow error responses. One way for the model to produce fast errors is with the inclusion of across–trial variability in starting point (Laming, 1968 Link, 1975 Ratcliff & Rouder, 1998). Such variability will cause most errors to happen because of an accumulator starting relatively close to the error boundary, whereas correct responses are still relatively likely to happen regardless of the starting point. As a consequence, the accumulated evidence will be lower on average for error responses than for correct responses, resulting in fast errors. For a more elaborate explanation of both of these phenomena, the reader is referred to Ratcliff & Rouder (1998, Fig. 2). Thus, the full diffusion model includes parameters that specify across–trial variability in drift rate, η, and in starting point, sz . Furthermore, the model includes an across–trial variability parameter for non–decision time, st , to better account for the leading edge of response time distributions (e.g., Ratcliff & Tuerlinckx, 2002). EZ diffusion The EZ diffusion model presents the cognitive researcher with an alternative that does not require familiarity with complex fitting routines, nor does it require waiting a potentially long time on the model to estimate parameters from the data (Wagenmakers et al., 2007). All the researcher
Power simulations We conducted four sets of simulations. For every set, we generated 4,500 data sets from the full diffusion model. All of the data sets were intended to mimic a two–condition between–subjects experiment. In the first three sets of simulations, we varied one of the three main diffusion model parameters systematically between the two groups. The fourth set of simulations was identical to the first, except we varied the mean starting point parameter. The range of parameters we used were based on the distribution of observed diffusion model parameters, as reported in Matzke and Wagenmakers (2009). For Group 1, we sampled individual participant diffusion parameters from the following group distributions: ν ∼ N(0.223, 0.08)T(0, 0.586) a ∼ N(0.125, 0.025)T(0.056, 0.393) Ter ∼ N(0.435, 0.09)T(0.206, 0.942) z = bias × a η ∼ N(0.133, 0.06)T(0.05, 0.329) sz ∼ N(0.3 × a, 0.09 × a)T(0.05 × a, 0.9 × a) st ∼ N(0.183, 0.037)T(0, 0.95 × Ter ) The notation ∼ N(, ) indicates that values were drawn from a normal distribution with mean and standard deviation parameters given by the first and second number between parentheses, respectively. The notation T() indicates that the values sampled from the normal distribution were truncated between the first and second numbers in parentheses. Note that in the first three sets of simulations we fixed bias = 1 2 in both the simulations and the model fits, reflecting an unbiased process such as might be expected if the different
Psychon Bull Rev (2017) 24:547–556
boundaries indicate correct vs. incorrect responding. In the fourth set of simulations, we relaxed this assumption and varied bias according to bias ∼ N(0.5, 0.04)T(0.25, 0.75)
the EZ diffusion models. We recorded whether the obtained p–value was smaller than the traditional α of .05. Our analysis centers around the proportion of the 100 simulations for which the p-value was less than α. Results
For Group 2, all individual participant diffusion parameters were sampled from the same group distributions, except for either drift rate ν (sets 1 and 4), boundary separation a (set 2), or non-decision time Ter (set 3).3 For each parameter, we ran three different kinds of effect size simulations: a small, a medium, and a large effect size. Depending on the effect size, the Group 2 mean of the parameter of interest was larger than the Group 1 mean by 0.5, 0.8, or 1.2 within– group standard deviations for the small, the medium, and the large effect size, respectively. To illustrate for drift rate ν, depending on the simulation, we sampled individual participant diffusion parameters for Group 2 from the following group distributions:
The results of the drift rate ν, boundary separation a, and non-decision time Ter simulations (sets 1 to 3) are shown in Figs. 2, 3, and 4, respectively. In all plots, the y–axis plots the proportion of 100 simulations for which a t–test on the focal parameter of the two groups yielded a p 511KB Sizes 1 Downloads 4 Views
In models of decision making, evidence is accumulated until it crosses a threshold. The amount of evidence is directly related to the strength of the sensory input for the decision alternatives. Such one-stage models predict that if two stimulus alternatives are presented in succession, the stimulus alternative presented first dominates the decision, as the accumulated evidence will reach the threshold for this alternative first. Here, we show that for short stimulus durations decision making is not dominated by the first, but by the second stimulus. This result cannot be explained by classical one-stage decision models. We present a two-stage model where sensory input is first integrated before its outcome is fed into a classical decision process.
Citation: Rüter J, Marcille N, Sprekeler H, Gerstner W, Herzog MH (2012) Paradoxical Evidence Integration in Rapid Decision Processes. PLoS Comput Biol 8(2): e1002382. https://doi.org/10.1371/journal.pcbi.1002382
Editor: Olaf Sporns, Indiana University, United States of America
Received: May 4, 2011 Accepted: December 23, 2011 Published: February 16, 2012
Copyright: © 2012 Rüter et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: JR is funded by the Swiss National Science Foundation (SNF) project No. 320030-119736 “The dynamics of feature fusion”. MHH and WG receive support from the SNF Sinergia project CRSIKO-122697 “State representation in reward based learning - from spiking neuron models to psychophysics”. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
The authors would like to thank Isabel Gauthier for advice on using the NOMT.
This work was supported by a Clinical and Translational Research Enhancement Award from the Department of Pathology, Microbiology, and Immunology, Vanderbilt University Medical Center. JST and WRH were supported by National Science Foundation grant SES-1556415.
Availability of data and materials
All de-identified data are available on the Open Science Framework at https://osf.io/r3gzs/.
Thirty healthy subjects with normal or corrected to normal vision were recruited and completed one session of testing. Ten subjects were not invited for the rest of the experiment and excluded from further analyses since either they had difficulties in generating five distinct levels of speed stress. None of the analyses reported in the paper were used as exclusion criteria and were never performed on data from the first session alone. The remaining twenty subjects completed all five sessions of the experiment (13 females, age range: 18–28). All subjects signed informed consent and were compensated for their participation. The protocol was approved by the Georgia Institute of Technology Institutional Review Board. All methods were carried out in accordance with relevant guidelines and regulations.
Subjects performed an orientation discrimination task where they indicated whether a Gabor patch (radius = 4°) embedded in random pixel noise was tilted counterclockwise (“left”) or clockwise (“right”) from vertical. The orientation of the Gabor patch was ± 45° away from the vertical and was randomly determined on each trial. Immediately, after providing their response, subjects were given detailed feedback that indicated (i) whether their response was correct or wrong, (ii) the exact response time on the trial, (iii) the points gained or lost on the trial, and (iv) the total accumulated points until that point of the experiment (Fig. 2).
Each trial started with subjects fixating on a small white dot at the center of the screen for 1000 ms and was followed by a presentation of the Gabor patch for 33 ms. There was no deadline on the response but large RTs were penalized in all conditions except for the “extremely slow” condition. Feedback remained on the screen until the subject pressed a button on the keyboard to advance to the next trial.
The experiment included five different speed-accuracy tradeoff conditions—“extremely fast,” “fast,” “medium,” “slow,” and “extremely slow.” Each condition featured a different penalty on response time (RT) starting from no penalty at all in the “extremely slow” condition and increasing the size of the penalty for the other four conditions. Based on the accuracy and RT on each trial, subjects gained points based on the following formula:
where (Acc) is the accuracy on the current trial (1 for correct responses, 0 for wrong responses), (penalty_ Subjects came for five different sessions each consisting of 1000 experimental trials. Before the start of the first session, subjects were given detailed instructions about the different conditions and their associated formulas for gaining points. It was specifically emphasized that the best strategy in the “extremely fast” condition was to respond as quickly as possible irrespective of accuracy. This instruction was given explicitly because the majority of subjects appeared unwilling to guess randomly for all trials of a given condition without being explicitly told that such a strategy was allowed. Before the start of sessions 2–5, subjects were briefly reminded of these instructions. Each session started with a short training which included one 25-trial block of each of the five conditions. In each session, subjects completed 4 runs, each consisting of 5 blocks of 50 trials each. Each block consisted of a single condition, and each run included blocks from all five different conditions in a randomized order. At the beginning of each block, subjects were given the name of the condition for that block (“extremely fast,” “fast,” “medium,” “slow,” or “extremely slow”). In each block, we pseudo-randomly interleaved trials with five different Gabor patch contrasts such that each contrast was presented exactly 10 times. The contrasts were the same for each subject and were chosen based on pilot testing to produce a range of performance levels. The exact contrasts used were 4.98%, 6.39%, 8.21%, 10.54%, and 13.53% (the contrasts were chosen to be equally spaced in log space: they are equal to (e^< - 3>) , (e^< - 2.75>) , (e^< - 2.5>) , (e^< - 2.25>) , and (e^< - 2>) , respectively). The experiment was designed in MATLAB environment using Psychtoolbox 3 52 . The stimuli were presented on a 21.5-inch iMac monitor (1920 × 1080 pixel resolution, 60 Hz refresh rate) in a dark room. Subjects were seated 60 cm away from the screen and provided their responses using a computer keyboard. We removed all trials with RTs shorter than 150 ms or longer than 1500 ms. This step resulted in removing an average of 2.3% of total trials (range 0.3–4.7% for each subject). Similar results were obtained if RT outliers were instead determined separately for each condition using Tukey’s interquartile criterion. We then computed, for each combination of SAT condition and Gabor patch contrast, the average difference between error and correct RTs ( (RT_
Subjects came for five different sessions each consisting of 1000 experimental trials. Before the start of the first session, subjects were given detailed instructions about the different conditions and their associated formulas for gaining points. It was specifically emphasized that the best strategy in the “extremely fast” condition was to respond as quickly as possible irrespective of accuracy. This instruction was given explicitly because the majority of subjects appeared unwilling to guess randomly for all trials of a given condition without being explicitly told that such a strategy was allowed. Before the start of sessions 2–5, subjects were briefly reminded of these instructions. Each session started with a short training which included one 25-trial block of each of the five conditions.
In each session, subjects completed 4 runs, each consisting of 5 blocks of 50 trials each. Each block consisted of a single condition, and each run included blocks from all five different conditions in a randomized order. At the beginning of each block, subjects were given the name of the condition for that block (“extremely fast,” “fast,” “medium,” “slow,” or “extremely slow”). In each block, we pseudo-randomly interleaved trials with five different Gabor patch contrasts such that each contrast was presented exactly 10 times. The contrasts were the same for each subject and were chosen based on pilot testing to produce a range of performance levels. The exact contrasts used were 4.98%, 6.39%, 8.21%, 10.54%, and 13.53% (the contrasts were chosen to be equally spaced in log space: they are equal to (e^< - 3>) , (e^< - 2.75>) , (e^< - 2.5>) , (e^< - 2.25>) , and (e^< - 2>) , respectively).
The experiment was designed in MATLAB environment using Psychtoolbox 3 52 . The stimuli were presented on a 21.5-inch iMac monitor (1920 × 1080 pixel resolution, 60 Hz refresh rate) in a dark room. Subjects were seated 60 cm away from the screen and provided their responses using a computer keyboard.
We removed all trials with RTs shorter than 150 ms or longer than 1500 ms. This step resulted in removing an average of 2.3% of total trials (range 0.3–4.7% for each subject). Similar results were obtained if RT outliers were instead determined separately for each condition using Tukey’s interquartile criterion. We then computed, for each combination of SAT condition and Gabor patch contrast, the average difference between error and correct RTs ( (RT_