# How to determine an unbiased estimate of threshold in a single-stimulus forced choice experiment?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

# Objective

I need to estimate the unbiased threshold of detectable speed difference in a car-following task. The details are as follows:

# Experiment Design

This experiment was performed in a driving simulator. In each trial, the participant drives a car in a single lane.

## Procedure

1. Initially, the participant's car and the lead vehicle (car or truck) moved at 100 km/h (27.78 m/s). The lead vehicle was at one of the two distances from the participant's car in a given trial: 50 m or 100 m.
2. After few seconds, the 'Observe' message appeared on participant's windscreen. At the onset of this message, the lead vehicle began to change its speed (increase or decrease) by 1 m/s or 3 m/s at the rate of 0.5 m/s (e.g. it would take 6 seconds to decrease speed by 3 m/s).
3. Once the speed difference was complete (e.g. lead vehicle reached 27.78 - 3 = 24.78 m/s speed), the participant was given 4 seconds to observe this change in speed. Then the 'Change Speed' message appeared. At this point, participants were forced to push either the gas pedal or the brake pedal to indicate that they noticed a positive speed difference (e.g. +3 m/s) or a negative speed difference (e.g. -3m/s). They had to provide their reaction even if they did not see any speed difference. Please note that the instructions were given before the experiment began.

## Details/Limitations

1. There were no catch trials, i.e. there was no trial in which a speed difference was not created.
2. The trials were done with random combinations of spacing and speed difference. Sometimes negative speed difference trials were done consecutively; in other cases the trials were positive speed difference followed by a negative one. For example: 50 m and +1 m/s -> 50 m and +3 m/s -> 100 m and -3m/s -> 100 m and -1 m/s. These could be in any order.

## Possible sources of bias

A smaller distance of 50 m is relatively unsafe compared to 100 m, especially when the lead vehicle is a truck. So, with small distance, small speed difference and/or a truck, participants might be biased to apply brakes even if the lead vehicle accelerates.

# Results

I now have proportions of correctly detected responses for each combination of distance and speed difference.

# Question

How do I determine an unbiased measure of the threshold of speed difference in each case? Does a speed difference that is correctly detected in at least 75% of the trials an appropriate threshold?

Kindly share any beginner-friendly resources for estimating threshold in a single stimulus forced choice task. Thank you in advance.

Any estimate you make is going to be pretty bad. To answer this question requires so many assumptions about the data, experiment, and intended usage, it is just not possible. You would be better off running the experiment again after consulting someone who knows more about experimental design. Beyond that, you could hire a stats consultant.

Apart from the lack of catch trials, you only have 2 speeds. Even if you get lucky and they bracket threshold (e.g., d' of 1 or 75% correct), without more theory (or data), you cannot interpolate between the points.

## Results

Fig 4 shows the spread (σ) of the participants’ psychometric function as a function of age for three experiments. Green dots show the spread value obtained from the fitting procedure. The averages of the spreads (σ) for each Experiment 1 were, in log10 arcsec: 2AFC_g = 1.005 ± 0.776 (mean ± SD) 4AFC_g = 1.186 ± 0.74 4AFC_l = 1.306 ± 0.632. A spread of 1 log10 arcsec means that 95% of the range of the psychometric function occurs over a factor of 10 in disparity. For example, if performance is 2.5% above chance when disparity is 10 arcsec, then with σ = 1 log10 arcsec it will be 2.5% below maximum when disparity is 100 arcsec. Conversely, a spread of 1.3 log10 arcsec corresponds to a factor of 20.

Left panels show the spread value (σ) (i.e. inverse of the slope, with units of log10 arcsec) as a function of age. Right panels show the distribution of spreads for each experiment. A. Results from Experiment 1. B. Experiment 2 C. Experiment 3. Green dots are the individual spreads obtained from the fitting. The red dots in panels A and B correspond to the spread values of the participants described in Fig 3. Black lines are fitted regression lines (dashed lines: 95% regression confidence interval for the mean). A. (age)2AFC_g = 0.632+0.308×log10(age). B. (age)4AFC_g = 2.097–0.924×log10(age). C. (age)4AFC_l = 2.295–0.996×log10(age). Top left of left panels shows the Pearson correlation between age and thresholds (log10(age) and σ).

In order to compare the spreads of the three experiments we performed the Levene’s test for homogeneity of variances (F(2,212) = 1.872, p = 0.156), and the Shapiro-Wilk normality test showing that only the distribution of spreads in Experiment 3 showed normality (p>0.05). Thus, we have performed two tests, a parametric one-way analysis of variance (ANOVA) and a non-parametric Kruskal-Wallis test. Both tests produce the same conclusions. ANOVA shows significant differences between σ estimates from the three experiments (F(2,212) = 3.267, p = 0.04). Post-hoc tests for multiple comparisons using Bonferroni correction only shows significant differences between Experiment 1 (2AFC_g) and Experiment 3 (4AFC_l) (p = 0.035). Kruskal-Wallis test was conducted to compare the three distributions of spreads and results shows again significant differences (χ 2 = 8.462,p = 0.015,d.f. = 2). Pairwise comparisons show significant differences between Experiment 1 and 3 (p = 0.011) too.

Correlation analysis shows no significant correlation in Experiment 1 between spread (σ) and age (log10(age)) (r = 0.108, p = 0.370, N = 71) and also, in Experiment 2 (r = -0.153, p = 0.211, N = 68). Experiment 3 shows a significant negative correlation (r = -0.257, p = 0.025, N = 76). However, after computing the Cook’s distance in order to detect and remove highly influential observations (observations with Cook’s distance higher than three times the mean of all Cook’s distances), no significant correlations are found (r2AFC_g = 0.196, p = 0.1209, N = 64 r4AFC_g = -0.153, p = 0.235, N = 62 r4AFC_l = -0.159, p = 0.1887, N = 70). Therefore, our results suggest that spread values are independent of age.

Fig 5 shows the disparity thresholds (θ, in log10 units) as a function of age from the three experiments. Green dots show the disparity threshold obtained from the fitting procedure. The averages of the thresholds (in log10(arcsec)) for each Experiment were: 2AFC_g = 1.528 ± 0.3 (mean ± SD) (33.73 arcsec, N = 71) 4AFC_g = 1.54 ± 0.203 (34.67 arcsec, N = 68) 4AFC_l = 1.568 ± 0.23 (36.98 arcsec, N = 76).

Left panels show the stereoacuity thresholds (log10(arcsec)) as a function of age. Right panels show the distribution of thresholds for each experiment. A. Results from Experiment 1. B. Experiment 2 C. Experiment 3. Green dots are the individual thresholds obtained from the fitting that correspond to a probability of correct response of 0.75. The red dots in panels A and B correspond to the thresholds of the participants described in Fig 3. Black lines are fitted regression lines (dashed lines: 95% regression confidence interval for the mean). A. (age)2AFC_g = 1.919–0.323×log10(age). B. (age)4AFC_g = 1.974–0.441×log10(age). C. (age)4AFC_l = 1.69–0.123×log10(age). Top left of left panels shows the Pearson correlation between age and thresholds (log10(age) and log10(arcsec)).

Multiple studies have found that stereoacuity improves (i.e. disparity thresholds decrease) with age until the age of 10 years [23–27], remains steady until the age of 50–60 years and then stereoacuity decreases [27–29]. Given that our youngest participants are around 5 years and very few are over 50 (see Fig 5), we might expect to see a negative correlation between age and disparity thresholds. Pearson’s product-moment correlation between age (log10(age)) and thresholds (log10(arcsec)) did indeed show a significant negative correlation (r2AFC_g = -0.293, p = 0.013, N = 71) for Experiment 1 and for Experiment 2 (r4AFC_g = -0.267, p = 0.027, N = 68). However, for Experiment 3, we found no correlation (r4AFC_l = -0.087, p = 0.455, N = 76). We also computed the Cook’s distance in order to detect highly influential observations. Only Experiment 1 still shows significant correlations after removing highly influential observations (r2AFC_g = -0.32, p = 0.008, N = 66 r4AFC_g = -0.146, p = 0.261, N = 61 r4AFC_l = -0.052, p = 0.668, N = 69). The absence of correlation in the 4AFC experiments is presumably related to the lower age-range of the participants.

Fig 6 shows the spreads (σ) as a function of the thresholds (θ) for all participants. Correlation analysis shows no correlation for Experiment 1 (r = 0.137, p = 0.254, N = 71) and significant correlations for Experiment 2 (r = 0.357, p = 0.003, N = 68) and Experiment 3 (r = 0.341, p = 0.003, N = 76). After removing influential observations using Cook’s distance, the correlations barely changed (r2AFC_g = 0.1, p = 0.428, N = 64 r4AFC_g = 0.349, p = 0.0049, N = 63 r4AFC_l = 0.2906, p = 0.0133, N = 72). Therefore, although the correlations are not very strong, 4AFC data shows that the lower the thresholds the steeper the psychometric function (small spread).

Each show the spread value (σ) (i.e. inverse of the slope, in log10 arcsec) as a function of the disparity threshold (θ, in log10(arcsec)). A. Results from Experiment 1. B. Experiment 2 C. Experiment 3. Green dots are the individual spreads and thresholds obtained from the fitting. The red dots in panels A and B correspond to the spread and thresholds values of the participants described in Fig 3. Black lines are fitted regression lines (dashed lines: 95% regression confidence interval for the mean). A. (θ)2AFC_g = 0.462+0.355×θ B. (θ)4AFC_g = -0.823+1.305×θ. C. (θ)4AFC_l = -0.159+0.934×θ. Top right of the panels shows the Pearson correlation between thresholds and spread (θ and σ).

Finally, we analyzed the estimated lapse rate. We define the “lapse rate” parameter λ to be the probability of responding incorrectly as the result of a lapse in attention etc. The average values fitted for the three experiments were 2AFC_g = 0.0103 ± 0.021 (mean ± SD) (N = 71) 4AFC_g = 0.0231 ± 0.0271 (N = 68) 4AFC_l = 0.0208 ± 0.027 (N = 76). Levene’s test for homogeneity of variances shows heteroscedasticity (F(2,212) = 16.366, p<0.001) and Shapiro-Wilk normality test shows that the three distributions of lapse rates are not normally distributed. Although the distributions show heteroscedasticity, we performed a non-parametric Kruskal-Wallis test that shows significant differences between the distributions of lapse rates (χ 2 = 9.784,p = 0.008,d.f. = 2). Pairwise comparisons show significant differences only between lapse rates of Experiment 1(2AFC_g) and Experiment 2 (4AFC_g) and almost significant (p = 0.058) between Experiment 1 and 3(4AFC_l) (Welch’s ANOVA show the same results, F(2,138.868) = 5.943, p = 0.003, and significant differences between Experiment 1 and 2 (p = 0.01) and between 1 and 3 (p = 0.04)). This is not a surprising result given that with the same probability of making a lapse (λ*), the probability of responding incorrectly as a result of a lapse increases with the number of alternatives [30]: λ = λ*(1−γ). Our data indicate the following estimates for the probability of making a lapse: 2AFC_g = 0.0207 ± 0.042 (mean ± SD) (N = 71) 4AFC_g = 0.0308 ± 0.0361 (N = 68) 4AFC_l = 0.0276 ± 0.036 (N = 76). Shapiro-Wilk test shows confirms that the distributions are not normally distributed, however, Levene’s test shows homogeneity of variance (F(2,212) = 0.072, p<0.930). Kruskal-Wallis test shows no significant differences between the distributions of lapse rates (χ 2 = 4.686,p = 0.096,d.f. = 2). Thus, the probability of making a lapse is similar in the three Experiments (0.02–0.03, a percentage of 2–3%).

## Detection Theory : A User's Guide

Detection Theory is an introduction to one of the most important tools for analysis of data where choices must be made and performance is not perfect. Originally developed for evaluation of electronic detection, detection theory was adopted by psychologists as a way to understand sensory decision making, then embraced by students of human memory. It has since been utilized in areas as diverse as animal behavior and X-ray diagnosis.

This book covers the basic principles of detection theory, with separate initial chapters on measuring detection and evaluating decision criteria. Some other features include:
*complete tools for application, including flowcharts, tables, pointers, and software
*student-friendly language
*complete coverage of content area, including both one-dimensional and multidimensional models
*separate, systematic coverage of sensitivity and response bias measurement
*integrated treatment of threshold and nonparametric approaches
*an organized, tutorial level introduction to multidimensional detection theory
*popular discrimination paradigms presented as applications of multidimensional detection theory and
*a new chapter on ideal observers and an updated chapter on adaptive threshold measurement.

This up-to-date summary of signal detection theory is both a self-contained reference work for users and a readable text for graduate students and other researchers learning the material either in courses or on their own.

## Participants and Methods

The experiments were approved by the Institutional Review Board of Baylor College of Medicine.

### Participants

Except for the first author, participants were all naïve to the purpose of the study. Participants provided informed consent and received compensation. Nineteen participants (8 males, 11 females. Age 27 ± 7) took part in Experiment 1. Twenty-one participants (13 males, 8 females. Age 29 ± 7) took part in Experiment 2. Twenty participants (6 males, 14 females. Age 27 ± 6) took part in Experiment 3.

### Apparatus

Experiment stimuli were displayed on a CRT monitor (Viewsonic G225f) with a screen resolution of 1024 × 768 pixels and a refresh rate of 100 Hz, driven by a Dell Precision T3400 workstation running Windows XP. There was no other light source other than the monitor in the experimental room. Participants sat at a distance of approximately 60 cm from the display. Each participant wore a pair of earplugs with approximately 33 dB noise reduction to prevent distraction.

### Stimuli

Stimuli were presented using Psychtoolbox 3 (Brainard, 1997 Pelli, 1997 Kleiner et al., 2007) for Matlab. Stimuli consisted of one or two drifting Gabor patches with spatial frequency of 0.28 cycle/degree (estimated at 60 cm viewing distance). The standard deviation of the 2-dimensional Gaussian envelop of each Gabor patch was 0.90°. The starting phase of each Gabor patch was independently sampled from a uniform distribution over the range of 0𠄲π. The peak luminance of the Gabor patch was 36.0 cd/m 2 . Stimuli were presented over gray background of mid-luminance. Each Gabor patch was displayed at a distance of 5.4° visual angle away from the fixation point. The fixation point was at the center of the screen, indicated by a white cross spanning a visual angle of 0.6°. Through the time course of each stimulus, the sinusoidal component of each Gabor patch drifted in a direction independently sampled from a uniform distribution over the range of 0�°. The speed of their drifting was such that the luminance of any pixel of the Gabor patch was modulated by a sinusoidal time signal of either 1 Hz (for the low temporal frequency stimulus) or 6 Hz (for the high temporal frequency stimulus). At the onset of each stimulus, the contrast of the Gabor patch ramped up linearly from zero to maximum in 40 ms. At the offset, it ramped down in 40 ms. This ramping of the contrast was to minimize potential arousal introduced by abrupt onsets of stimuli.

Whenever two Gabor patches were displayed simultaneously, the centers of the two Gabor patches were on opposite sides from the fixation point, both on an invisible line that passed through the fixation point. In any trial, the orientation of the invisible line passing through the fixation point and the Gabor patch(es) in the first epoch was randomly sampled from a uniform distribution over 0𠄲π. The invisible line passing through the fixation point and the Gabor patch(es) in the second epoch was always orthogonal to the invisible line in the first epoch. This design was to minimize the effect of adaption due to presenting consecutive stimuli at the same location (Johnston et al., 2006).

### Experiment Procedures

On each trial, a participant watched two groups of drifting Gabor patterns on the screen one after another and judged whether the duration of the second group was longer or shorter than that of the first group. Each group was composed of either a single Gabor patch drifting at 1 Hz (we denote this by L), or a single Gabor patch drifting at 6 Hz (we denote this by H), or a pair of Gabor patches, one at 1 Hz and the other at 6 Hz (we denote this by HL). In an HL stimulus, the two Gabor patches had the same onset time and offset time. The directions in which they drifted were randomly chosen and independent from each other. If a participant asked which one patch of the HL stimulus they should judge, he/she was instructed that since the patches appeared and disappeared synchronously, he/she should judge the duration in which both of them stay on the screen.

The structure of each trial was as follows. A trial started by a fixation cross appearing in the center of the screen. After a duration sampled from a uniform distribution over the range of 600� ms, the first group of Gabor patch(es) appeared. 500� ms after the offset of the first group of Gabor patch(es), the second group appeared. 300� ms after the offset of the second group, the fixation cross disappeared and the participants were allowed to make response. They indicated the duration of the second group as lasting longer by pressing the right arrow key, or indicated it as lasting shorter by pressing the left arrow key. No feedback was provided. 1000� ms after they made a response, the next trial started.

On any trial of an experiment, one group of Gabor patches lasted for 600 ms. We denote this stimulus of fixed duration by reference stimulus. The other group lasted for duration of one of 26 values between 100 and 1100 ms, equally spaced by steps of 40 ms. We denote this stimulus by comparison stimulus. For each of these 26 values, the number of its incidence was approximately proportional to the probability density of a Gaussian distribution with a mean of 600 ms and a standard deviation of 300 ms at that duration, rounded to the nearest integer. Thus, over the course of an experiment, the distribution of the duration of comparison stimuli approximates a truncated Gaussian distribution.

#### Experiment 1

There were two conditions in the experiment. In one condition, the reference stimulus was H and the comparison stimulus was L (denoted by LvsH). In the other condition, the reference was L and the comparison was H (denoted by HvsL). On half of the trials of each condition, the reference stimulus appeared before the comparison stimulus. On the other half of the trials, the comparison stimulus appeared before the reference stimulus. Each condition had 180 trials, including both orders of display. For each order of display in each condition, the comparison stimuli of 100, 140, 180, …, and 1100 ms occurred for 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 4, 3, 3, 2, 2, 2, and 1 times. These numbers of incidences were generated to approximate a Gaussian distribution described above. Trials corresponding to different conditions, orders and comparison durations were randomly interleaved in a session. There was no signal to indicate to the participants which condition a trial belonged to.

#### Experiment 2

On all trials, the reference stimulus was an HL stimulus. The comparison stimulus was an L, H, or HL stimulus. The reference stimulus was always presented before the comparison stimulus. Each condition had 148 trials. In each condition, the comparison stimuli of 100, 140, 180, …, and 1100 ms occurred for 2, 2, 4, 4, 4, 6, 6, 6, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 6, 6, 6, 4, 4, 4, 2, and 2 times. The trials of the three conditions were randomly interleaved.

#### Experiment 3

There were seven conditions in the experiment. In two conditions, the reference stimulus was H the comparison stimulus was H or L, respectively. In two other conditions, the reference stimulus was L the comparison stimulus was H or L, respectively. In the other three conditions, the reference stimulus was HL the comparison stimulus was H, L, or HL, respectively. On half of the trials of each condition the reference stimulus was presented before the comparison stimulus. On the other half of the trials, the comparison stimulus was presented before the reference stimulus. Each condition had 228 trials. Each participant completed three sessions of experiment. For each order of display in each condition, the comparison stimuli of 100, 140, 180, …, and 1100 ms occurred for 3, 3, 3, 3, 3, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 3, 3, 3, 3, 3, 3, and 3 times in total over all sessions. Trials corresponding to different conditions, orders and durations of comparison stimuli were randomly interleaved in a session. The number of trials corresponding to each condition, order and duration of comparison stimulus was equal across sessions.

## Psychophysically-anchored, Robust Thresholding in Studying Pain-related Lateralization of Oscillatory Prestimulus Activity

Psychophysical methods such as the QUEST estimation procedure can efficiently yield robust estimates of the stimulation intensity at which nonpainful sensations transition into painful sensations. By stimulating repeatedly at the threshold intensity, the variability in rating responses can directly be attributed to perceptual classifications in subsequent analyses.

### Abstract

In perceptual studies, it is often important to objectively assess the equality of delivered stimulation across participants or to quantify the intra-individual sensation magnitude that is evoked by stimulation over multiple trials. This requires a robust mapping of stimulus magnitude to perceived intensity and is commonly achieved by psychophysical estimation methods such as the staircase procedure. Newer, more efficient procedures like the QUEST algorithm fit a psychophysical function to the data in real time while at the same time maximizing the efficiency of data collection. A robust estimate of the threshold intensity between painful and nonpainful perceptions can then be used to reduce the influence of variations in sensory input in subsequent analyses of oscillatory brain activity. By stimulating at a constant threshold intensity determined by an adaptive estimation procedure, the variance in the ratings can be directly attributed to perceptual processes. Oscillatory activity can then be contrasted between "pain" and "no-pain" trials directly, yielding activity that closely relates to perceptual classification processes in nociception.

### Introduction

When conducting behavioral experiments involving human participants, it is important to be able to closely control the intensities of presented stimuli. Using stimuli of equal intensity for all participants, however, will in some settings introduce the bias of subjective perception. For some perceptual qualities such as pain, there are high inter- and intra-individual variations in perceived intensity at a constant stimulus level 1 , 2 . For experiments that assume equal subjective percepts, it is thus a necessity to match the subjectively perceived intensity across participants. This is also important when examining perception at threshold level, e.g., between painful and nonpainful stimulation. Psychophysics research has addressed these kind of problems for decades, and today there are sophisticated but easy-to-use methods available to achieve robust psychophysical anchoring.

A simple, classical method of mapping the intensity of a stimulus to an individual sensation magnitude is the staircase method 3 . Hereby, the intensity of successive stimuli is increased or decreased, until there is a change in the participant's response relating to the desired threshold or position on the subjective sensation scale. Repeating this process number of times, yields a plausible estimate of the reversal point. Classical methods, however, fail to make use of all the information contained in each rating trial. This leads to an unnecessarily high number of trials required to reach convergence. Methods such as (linear) regression or function fitting might fail, if the assumptions for the relationship between stimulus intensity and sensation magnitude are wrong or do not hold for the tested stimulus range. The adaptive procedures not only yield a robust point estimate for a certain subjective intensity, but do so more efficiently. Especially for longer experiments, which heavily rely on accurate estimation of a threshold or sensation magnitude, it is necessary for the psychophysical method to be both robust and at the same time efficient with respect to the number of required trials. This is especially important in fields such as pain research, where the total exposure to painful stimulation should be kept as low as possible for the participants' benefit.

Although the classical staircase methods are still widely used, e.g. in quantitative sensory testing, the use of more advanced estimation methods that make better use of the acquired information across trials is steadily increasing. In the case of the maximum likelihood estimation method QUEST 4 , 5 used here, this is probably due to the readily available implementation in the popular Matlab PsychToolbox 6 suite. The modern, revised version of this procedure is superior to classical estimation methods both in robustness and the low number of trials required to arrive at a sufficient estimate, if used with the right settings 7 .

The rationale behind the QUEST procedure is to fit a Weibull function to the incoming data to model the psychophysical transformation between stimulus intensity and sensation magnitude. The parameters for the psychophysical Weibull function are in part given by the experimenter, e.g. the steepness of the function or the offset due to the false positive rate and responder inconsistency. The positioning of the parameter of interest along the intensity dimension is approximated by the procedure using Bayesian maximum likelihood estimation. Hereby, a probability distribution is assumed over the location of the target parameter, i.e. the threshold intensity. Given a sensible prior assumption for such a distribution, the algorithm will determine the most informative intensity that the participant should respond to. For the current implementation of the procedure, this is the mean of the prior probability distribution 8 . For each successive trial, the prior probability distribution is in essence multiplied with the likelihood of the participant's given response at the tested stimulation level, as characterized by the Weibull function. Every response will be used to continuously update the probability distribution estimate for the threshold parameter. This procedure is repeated until a satisfying estimate is produced. The procedure is more efficient than a simple regression because it makes immediate use of the collected responses to determine which stimulation intensity to test next. Also, the procedure will specifically probe around the point of interest, e.g. a threshold or certain sensation intensity. Using only testing data from such a limited range in regression would lead to an unstable estimate, making adaptive procedures more robust in settings where only relatively low numbers of trials are feasible.

Such robust psychophysical anchoring can be used to measure changes in pain sensitivity over time, modulatory effects in hyperalgesia/allodynia research or analgesic effects in pharmacological interventions, amongst other settings. Another interesting prospect of being able to anchor stimuli to the intensity just at the threshold between two sensory continua is to examine subjective perception across the transition from non-painful to painful sensation 9 , 10 , 11 . This scenario is very interesting because if the pain threshold has been robustly estimated, pain and no-pain conditions can be contrasted in electroencephalographic (EEG) activity, for example, without changing the physical stimulus intensity 12 . This allows for the observation of pain-specific perceptual processes under constant stimulus conditions by examining the difference in brain activity between trials rated as painful and non-painful.

We will demonstrate how to use the readily available implementation of adaptive estimation in PsychToolbox to robustly determine the individual pain threshold in an EEG experiment where the contrast between pain and no-pain activity is examined for lateralization effects, depending on the stimulation site. Since the stimulation intensity can be kept constant after the thresholding procedure, it is not necessary to account for EEG activity co-varying with stimulus intensity in the subsequent analysis.

### Protocol

The experiment has been approved by the ethics commission of the Hamburg medical association (PV4509).

1. Beyond standard selection criteria, such as fitness for pain stimulation, head implants or pre-existing neurological conditions, make sure the participants are not suffering from acute or chronic pain, are not taking any pain medication, and have no known history of substance abuse. Participants should also not have taken part in any pharmacological studies during the 4 weeks prior to the experiment.
2. Include participants of any gender, yet take care to only include female participants that are using hormonal contraceptives 13 , 14 to minimize the effect of cyclic changes in pain perception.
3. Before administering any kind of stimulation, make sure participants have given informed consent in writing.
1. Select an appropriate cap size and prepare the EEG electrode setup as per the system's instruction manual.
2. Set the sample rate and high/low cutoff as well as the impedance limits of the recording equipment (recommended: 500 Hz, 0.5 Hz high-pass filter, impedances ង kΩ).
3. Make sure the stimulation device and the EEG device are not electrically coupled by running the EEG system on battery.
4. Ensure that any link between the EEG system and the computer controlling the electrical stimulation device is potential-free.

3. Electrical Stimulation Setup

1. To best make use of the time resolution of the EEG recording, keep the electrical stimulation as short as possible. Set the stimulator to a single, monophasic stimulation pulse with 1 ms duration and 400 V maximum voltage. If a more intense pain level is needed, or the exact timing of the post-stimulus EEG recording does not take precedence, other stimulation protocols can be used.
2. Make sure the electrical stimulator is switched on but the output to the electrode is switched off. For the DS7A stimulator the switch labeled 'OUTPUT' to the right of the device should be in the down position.
3. Locate the landmark(s) that identify the chosen stimulation site. For a stimulation at the hand, use the muscle between thumb and index finger (abductor/flexor pollicis brevis). Ask the participant to lay their hand on a flat surface with all digits stretched out and apposed. Identify the stimulation site by bisecting the distance between the first knuckles of the thumb and index finger.
4. Clean the skin by applying electrode preparation gel. Make sure not to use alcohol or disinfectant, which might leave residue on the skin that can lead to irritation or unreliable stimulation.
5. Attach the stimulation electrode and fasten it in place with textile tape.
6. Ask the participant to find a comfortable position for the hand and to try not to move the hand during the experiment, if possible. For the participant's convenience, place a soft tissue under the hand to absorb any humidity, depending on the surface permeability.
7. Enable the stimulator's output by switching the 'OUTPUT' switch to the upward position.

4. Determine Starting Points

1. Instruct the participant on how to operate the rating scale on the screen using the mouse. The left half represents non-painful sensations the right half corresponds to a standard pain VAS scale, providing a visual equivalent for a range of continuous sensation intensities in the form of a horizontal line. Point out to the participant that the absolute center point of the scale cannot be selected. Provide the participant with the standardized instructions about the anchor points 15 (Table 1).
2. Give the participant the opportunity to get comfortable with the rating process by applying stimuli of varying intensity and recording the responses. Use the information gathered during this phase to get an estimate for two intensities that consistently evoke strong but nonpainful sensations (low-point) and moderately painful sensations (high-point), respectively. Continue the stimulation for about 25 - 30 trials or until satisfied with having reached good estimates. During this time, it can be beneficial to query the participant for verbal feedback on the intensities and the subjective similarity of repeatedly presented stimulus intensities.
3. Try to pick the intensities randomly to evoke responses around the scale center. For best results, do not simply increase or decrease the intensities linearly, and also explore the more extreme ends of the painful side. This phase should also give the participant the opportunity to get accustomed to the potentially unfamiliar stimulation and establish some reference for a consistent rating range. Because of this, it is advisable to apply intensities from the whole range of possible stimulus intensities, while also repeating some intensities.
4. Once satisfied with having obtained estimates for both a high-point and low-point starting intensity, inform the participant that the first part of the experiment is about to start and (s)he should keep on rating as practiced while random stimulus intensities are presented.

NOTE: The QUEST algorithm requires some parameters to be specified before starting estimation. Those parameters include the steepness of the psychophysical function (beta, typically 3.5), the fraction of trials where a random answer is expected (delta, typically 0.01), and the fraction of trials where a positive response is expected even though no stimulation is given (gamma, no recommendation). For Bayesian estimation, the range (SD) of the expected ratings and the spacing of possible responses (grain) must be specified. For a VAS, grain should be set to the resolution of the scale (typically 1), and the SD should be set large enough to include both the scale zero point and the maximum possible intensity plus some safety margin. The recommendations and "typical" values given here are explained in detail in the QuestCreate source code included with PsychToolbox 6 , 16 . For pain at the threshold, a gamma value of at most 0.01 should be plausible. The estimation method is relatively robust in terms of misspecification of the parameters, however for settings with only few trials, failure to specify sensible parameters might increase the uncertainty of the final estimate. If the standard deviation is set too low, the procedure will have problems converging on estimates that lie outside the area spanned by the standard deviation around the prior guess for the parameter. Thus it is important to rather err on the side of a too large standard deviation.

1. Create two QUEST sessions with the parameters given above. Start one from the high-point intensity and one from the low-point. Information about the implementation logic of the estimation process can be found in the supplemental material (S1).
2. Randomly select a probe intensity of one of the two runs given by the respective QuestMean function.
3. Set the electrical stimulator to the probe intensity. If a different intensity than the one suggested by the algorithm needs to be applied or the suggested intensity is out of range, feed the presented intensity back into the QuestUpdate function in step 5.5.
4. Trigger the stimulus.
5. After the participant has rated the stimulus, run QuestUpdate for the respective estimation session and supply it with the actual stimulus intensity presented as well as the participant's rating.
6. Continue running rating trials until the estimates are stable or a predefined stopping criterion (ᡠ trials) has been reached.
7. Record the mean threshold estimate between both estimation runs, starting from the high and low starting point as given by QuestMean.
8. Allow the participant to take a break at this stage, if desired.

6. Stimulate at Threshold Level

Note: It is possible to adjust the rating and block count to your needs as long as it is tolerable to the participant.

1. Inform the participant that for the remaining part of the experiment, more blocks with random stimulation will be following and they should keep rating as they did before. If needed, refresh the instruction on the scale anchor points.
2. Start the EEG recording.
3. Set the electrical stimulator to the mean threshold estimate obtained in step 5.7 and keep the setting constant throughout the rest of the experiment.
4. Start a rating block (30 trials) and observe the data quality of the EEG recording. Depending on the EEG data quality, run 4 - 5 rating blocks and allow the participant to take short breaks in between the blocks.
NOTE: Try to keep social interaction with the participant to a minimum during these breaks or standardize the interaction as much as possible.
5. When finished, stop the EEG recording, switch the stimulator output to off, and remove the electrode. Debrief the participant after removing the EEG cap.

### Representative Results

Using a rating scale split into one half for nonpainful and one half for painful sensations (Figure 1a), constant stimulation can be applied over many trials while still yielding ratings across the scale midpoint (Figure 1b). This way, changes in sensory input can be avoided, and the rating outcome can be directly related to intrinsic perceptual classification processes related to pain.

Figure 1: Experimental Description. (a) The rating scale with the left side spanning non-painful sensation and the right side spanning painful sensation. (b) Procedure used for data collection. 40 thresholding trials followed by 4 - 6 blocks of constant stimulation (30 trials each). The blocks had a jittered 3 - 5 s intertrial interval (ITI). The rating scale appeared 0.25 s after stimulation. Please click here to view a larger version of this figure.

The two estimation runs starting from the "nonpain" low-point and "pain" high-point converge on robust threshold estimates. Taking the mean of both estimates yields the final threshold estimate, while the bias induced by the starting intensity is reduced (Figure 2a). The subjective stimulation intensity evoked by repeated stimulation at the estimated threshold is stable across multiple blocks of 30 trials each within one experimental session (Figure 2b).

Figure 2: Stability of the Threshold Estimates. (a) Data for a single participant showing the algorithm converging on two estimates, one for a high intensity starting point, one for a low intensity starting point. To minimize the influence of the starting point, both threshold estimates were averaged (dashed line). (b) Stability of the rating medians over the course of the experiment under constant stimulation at the estimated threshold across all participants (n=25). Please click here to view a larger version of this figure.

By splitting the concurrently recorded EEG data into trials that were rated as "painful" and "nonpainful", respectively, the oscillatory activity can be contrasted post-hoc. This yields activity differences which coincide with perceptual decisions about the same stimulus being categorized as strong sensation or as pain. Figures 3a and b show these differences for a time-window before the painful stimulus is presented (-0.8 s to 0 s before stimulus onset) and the theta-band frequency range (4 - 7 Hz), which have previously been shown to be connected to subsequent perceptual classification in pain 17 . The thresholding paradigm enables the examination of such prestimulus differences in oscillatory activity linked to subsequent perceptual classification of pain, independent of stimulus magnitude.

Figure 3: Power Differences between Nonpain and Pain. Data has been transformed to time-frequency domain using a multi-taper method. Depicted are Theta frequencies between 4 - 7 Hz and before stimulus onset (-0.8 s - 0 s). (a) Power difference specific to the subsequent classification of the stimulus to the left hand as painful. Data adopted from Taesler & Rose 17 (n = 15). (b) Power specific to classification of a stimulus to the right hand as painful (n = 10). (c) Common Theta activity between (a) and (b), independent of the stimulated side (n = 25). The topo-plot shows the sum of the lateralized differences between painful and non-painful stimulation. For individual pain/no-pain topographies (S2) as well as a comparison to pre-existing post-stimulus data 10 (S3) please refer to the supplemental materials. Please click here to view a larger version of this figure.

By changing the side of the stimulation between groups, these pre-stimulus effects can further be disentangled from any lateralization effects in stimulus expectation. Figure 3c shows the sum of activity across both groups (left hand/right hand), highlighting prestimulus theta activity, that is common to the perceptual classification of non-pain versus pain irrespective of the site of stimulation.

 Scale position Instruction Score Leftmost “No sensation at all” 0 Left to center “The strongest sensation, that is not yet painful” 49 Center “Pain threshold - this point cannot be selected” 50 Right to center “Painful sensation” 51 Rightmost “Maximum tolerable pain” 100

Table 1: Definition of Rating Scale Anchor Points. Since the middle of the scale cannot be chosen, the ratings can also be dichotomized into a two-alternate-forced-choice (2AFC) dataset between nonpain and pain.

### Discussion

Here we used the well theoretically founded QUEST method to efficiently estimate a robust psychophysical threshold between non-pain and pain perception. Using constant stimulation at this threshold enables an analysis of perceptual decisions independent of changes in stimulus magnitude. While we examined threshold intensity at the transition point between innocuous and noxious sensation domains, other points along the pain scale (e.g.,㺲 on a 100-point pain scale) can also be anchored with the here presented estimation method. In these cases, care has to be taken to account for habituation or sensitization effects across the course of the experiment. Such effects are more likely to occur for higher stimulation intensities.

One critical step in this procedure is to optimally adjust the necessary parameters for the psychophysical function to be fitted by the adaptive procedure. Another important issue is the instruction given to the participant regarding the anchoring of the response scale. The participant should have a clear understanding of where to range in the subjective intensities on the scale. It is thus very important to standardize and repeat these instructions, whenever necessary, to avoid introducing any bias into the ratings. Specifying a scale that is split into a nonpainful and painful side might prove difficult to handle for some participants, since both sensory continua might differ in their respective sensitivity. In this case, when the information from the split-scale is not needed for further analysis, the estimation procedure can also be carried out as a two-alternative forced choice paradigm. Here, the participant just has to decide, whether a stimulus was perceived as painful or not. In case of problems with the rating scale, the estimation will be robust, as long as the participant's response about a stimulus being painful or nonpainful is veridical and false responses are within the limits specified by the delta and gamma parameters.

In cases where the initial thresholding does not converge upon a plausible estimate or rating irregularities become evident, the experiment should be interrupted and restarted. In such cases, it might help to ask the participant about their interpretation of the scale and their subjective perception of the stimulation. If technical errors such as a loose electrode or a faulty connection to the stimulator can be ruled out, it might be helpful to ask the participant about their strategies for dealing with pain. Participants who regularly deal with pain in martial arts or high-performance sports, for example, might exhibit irregular responses despite passing the initial screening. Additionally, social interaction with the participant after the beginning of the experiment and during the breaks should be standardized, as not to induce any effects of experimenter demeanor or induced compliance.

The method outlined here has been demonstrated to be very robust within one experimental session. However, there might be substantial differences in thresholds in the same participants when measured over multiple sessions and days. This might in part be due to circadian changes in pain susceptibility or to changes in arousal or motivation. For heat pain, this has also been shown to be an effect of differing interpretation of the scale range over multiple sessions 18 . These problems could be reduced by re-training participants on the scale anchoring in each session and averaging multiple stimulations into one aggregated rating per trial 19 . An additional concern is that reattaching the electrode in a different session might not yield the exact same stimulation intensity and thus may change the estimated threshold.

Using an adaptive estimation procedure such as QUEST, in each iteration, the full set of information from all prior thresholding trials is used to determine the optimal intensity for the next test intensity. This decreases the number of necessary trials while increasing the robustness against inconsistent ratings during thresholding compared to classical methods such as the staircase. The thresholding process could be further optimized by independently gathering data in a pilot experiment to better estimate the slope of the psychophysical function for the desired modality or stimulus type 7 .

Even though the theoretical foundation of the algorithm presented here is sound and we have demonstrated that a robust estimate for exhaustive experiments can be obtained, there are already improved techniques available, that further reduce the number of trials needed to reach robust threshold estimates. These optimized Bayesian methods not only promise less biased results for low trial numbers but also try to fit the position as well as the slope of the psychophysical function in one iteration 20 .

By using such advanced estimation methods, future research in areas relying on the anchoring of subjective perception can benefit. For one, these algorithms reduce the strain on participants and thus help make the experimental setting more ecologically valid. Additionally, they improve accuracy, not only in threshold experiments, but potentially in all self-report measures suited for psychophysical procedures — a property especially useful for research in the clinical setting.

## Is response time predictive of choice? An experimental study of threshold strategies

This paper investigates the usefulness of non-choice data, namely response times, as a predictor of threshold behavior in a simple global game experiment. Our results indicate that the signals associated to the highest or second highest response time at the beginning of the experiment are both unbiased estimates of the threshold employed by subjects at the end of the experiment. This predictive ability is lost when we move to the third or higher response times. Moreover, the response time predictions are better than the equilibrium predictions of the game. They are also robust, in the sense that they characterize behavior in an “out-of-treatment” exercise where we use the strategy method to elicit thresholds.

This is a preview of subscription content, access via your institution.

## ORIGINAL RESEARCH article

• 1 Psychophysiology of Food Perception, German Institute of Human Nutrition Potsdam-Rehbruecke, Nuthetal, Germany
• 2 NutriAct – Competence Cluster Nutrition Research Berlin-Potsdam, Nuthetal, Germany

Adaptive methods provide quick and reliable estimates of sensory sensitivity. Yet, these procedures are typically developed for and applied to the non-chemical senses only, i.e., to vision, audition, and somatosensation. The relatively long inter-stimulus-intervals in gustatory studies, which are required to minimize adaptation and habituation, call for time-efficient threshold estimations. We therefore tested the suitability of two adaptive yes-no methods based on SIAM and QUEST for rapid estimation of taste sensitivity by comparing test-retest reliability for sucrose, citric acid, sodium chloride, and quinine hydrochloride thresholds. We show that taste thresholds can be obtained in a time efficient manner with both methods (within only 6.5 min on average using QUEST and ߩ.5 min using SIAM). QUEST yielded higher test-retest correlations than SIAM in three of the four tastants. Either method allows for taste threshold estimation with low strain on participants, rendering them particularly advantageous for use in subjects with limited attentional or mnemonic capacities, and for time-constrained applications during cohort studies or in the testing of patients and children.

## Experiment 1

In Experiment 1 (Fig. 1a) subjects had to encode and maintain either one or two motion directions of random dot patterns (RDPs). We presented the two RDPs either simultaneously or sequentially at different locations. We varied the similarity of their directions on the basis of subjective perceptual thresholds that were independently determined for simultaneous and sequential encoding to increase comparability. After a maintenance phase, subjects were cued to report one of the two presented items by adjusting manually the motion direction of a probe RDP. This design allowed us to investigate distortions of direction representations by analyzing shifts of the error distribution of the target item relative to the non-target item in feature space. Crucially, in both conditions subjects had to maintain the items concurrently in memory for a 1-s period. Thus, both conditions included concurrent maintenance, but only in the simultaneous condition were both items concurrently present on-screen. Assuming a neural parallelism between perceptual and memory representations, we expected mutual repulsion between the presented motion directions, both under simultaneous and sequential encoding.

Task display and results of Experiment 1. (a) Subjects viewed two random dot patterns (RDPs) and memorized their motion directions (indicated here by arrows for illustration only). The RDPs were presented on different retinal positions either simultaneously (top) or sequentially (bottom). After a short delay, subjects were cued to report one of the memorized directions by adjusting the direction of a probe RDP. (b) Distortion effects measured as a shift of the mean error are shown for each presentation condition (simultaneous, S1 sequential and S2 sequential) and similarity condition (2, 3, and 4 steps of JND), with positive values indicating direction repulsion and negative values indicating attraction. Error bars depict between-subject standard errors of the mean. (c) Sign-adjusted response distributions of the different presentation conditions of Experiment 1 (pooled across JNDs). Positive values indicate responses away from the non-target directions. The gray area represents the location of the non-target directions relative to the target (simultaneous condition: −6 to −49°, sequential conditions: −10° to −49°). The black vertical line is at 0° (i.e., at the expected center of an unbiased distribution)

### Participants

Twenty-five adults (14 females age 19–30 years M = 22.27, SD = 2.16) participated in the experiment after providing written informed consent. The study was approved by the ethics committee of the University of Frankfurt Medical Faculty. All participants reported normal or corrected-to-normal visual acuity. They were naive to the purpose of the experiment and were either paid (€10/h) or received course credit for their participation. The experiment comprised two sessions, held on different days, lasting about 90 min each. The data from four subjects were excluded. One subject dropped out between the first and the second sessions. Two subjects were excluded from further data analysis because the standard deviation of their recall errors differed more than 3 SD from the mean SD. One subject was excluded due to an experimenter error (the experimental stimuli were accidentally determined by another subject’s discrimination thresholds).

### Discrimination threshold estimation

Prior to the main experiment, we determined each participant’s direction discrimination thresholds for eight different orientations (from 0° to 315° in 45° steps) using a two-alternative forced choice task in an adaptive procedure controlled by the Psi-Marginal algorithm (Prins & Kingdom, 2009). Thus, the presented item pairs were generated with one stimulus always chosen from the eight predefined orientations and the second stimulus was adaptively created by the algorithm to determine the just noticeable difference (JND) set as the 75% discrimination threshold. We chose this procedure since simultaneous and sequential perception show different discrimination thresholds (Lakshminarayanan, Raghuram, & Khanna, 2005). By tuning our similarity steps to psychophysical thresholds we were able to keep similarity constant across the presentation modes. Since direction discrimination is known to be influenced by the proximity to the cardinal directions with discrimination being most precise close to the cardinals and least sensitive at oblique directions (the so-called oblique effect Appelle, 1972 Gros, Blake, & Hiris, 1998 Matthews & Qian, 1999), we determined discrimination thresholds for different directions (cardinals and oblique) to account for these differences in our similarity manipulation.

### Stimuli and apparatus

In each trial two random dot patterns (RDP) were presented at a distance of 10° of visual angle to the left and the right of a fixation square (edge length 0.3° of visual angle) that was located in the center of the screen. RDPs consisted of 400 white dots on a black background, with each dot covering 0.15°. Dots were displayed within an invisible circular aperture of 15° in diameter. Dots reaching the edge of the circular aperture were repositioned randomly on the opposite side of the aperture therefore, dot density was kept constant throughout the presentation. Motion was 100% coherent and within a trial all dots moved with the same speed. Speed varied randomly between 6°/s and 14°/s between trials, with no speed being repeated in immediate succession. In the sequential presentation condition, spatial position and serial position were counterbalanced, i.e., the first presented stimulus appeared in half of the trials to the left of the fixation square. MATLAB R2010a and the Psychophysics Toolbox (Brainard, 1997) were used to generate and display the stimuli. The participants viewed stimuli on an LCD monitor (refresh rate 60 Hz) from a distance of 50 cm in a dimly lit room.

### Procedure

Each trial began with a 1-s fixation period. The two stimuli then appeared either simultaneously for 1 s or sequentially for 0.5 s each, without an inter-stimulus interval (ISI). Simultaneous and sequential trials were randomly interleaved. Subsequent to stimulus presentation, the participants judged which one of the two stimuli was more clockwise oriented by pressing the corresponding mouse button (left button for the left stimulus, right button for the right stimulus). There was no time limit on the response. After the response the participants received feedback. The fixation square turned red if the response was wrong and green if the response was correct. In total the threshold estimation procedure comprised 800 trials. Each of the eight directions was measured separately for simultaneous and sequential presentation, with 50 trials for each direction and presentation mode. Subjects were instructed to fixate the central square throughout a trial.

### Analysis

For simultaneous presentation the mean JNDs for the reference directions (0°, 45°, 90°, 135°, 180°, 225°, 270°, and 315°) were 6.36°, 8.90°, 5.94°, 9.42°, 5.56°, 9.56°, 4.93°, and 8.41°, respectively. For sequential presentation the respective mean JNDs were 6.80°, 10.41°, 5.44°, 9.75°, 6.11°, 9.68°, 4.58°, and 10.43°. The resulting individual discrimination thresholds for the eight different reference directions were subsequently used in a curve-fitting procedure to extrapolate the thresholds across the whole 360° range (i.e., to achieve threshold estimates for those directions we did not test directly). We computed separate fits for each quadrant (i.e., 0° - 90°, 90° - 180°, 180° - 270°, 270° - 360°) to account for possible direction-specific asymmetries in motion perception (e.g., motion to the right vs. motion to the left). The function used for fitting was: f(x) = a × sin(2 × x/180 × pi) + b + c × x, with a being the amplitude, b the base level, and c the skew (see Supplementary Fig. 1 for a graphic depiction of the measured and fitted data). Curve fitting was done with the EzyFit toolbox for Matlab (Moisy, 2011). As thresholds tend to differ between simultaneous and sequential presentations (Lakshminarayanan et al., 2005), we calculated threshold functions separately for both data sets of each subject. Footnote 1

### Stimuli and apparatus

The stimuli were the same as used for discrimination threshold estimation except that a set size 1 condition was included in which only a single stimulus was presented. For the set size 1 condition an item’s direction was randomly chosen from 360 different directions, spanning the whole circle in steps of 1°. In the set size 2 conditions the presented item pairs were generated by drawing one item direction randomly from 360 directions and picking another direction to be 2, 3, or 4 JNDs more clockwise. Cumulated JNDs were calculated using the graphic method described by Luce and Edwards (1958). The three inter-item differences (2, 3, and 4 JNDs) appeared equally often within each presentation condition and were randomly interleaved. The average angular separations were 16.65° (inter-subject range 10.22–24.63), 24.90° (range 15.28–37.58), and 32.40° (range 20–48.92) for JND2, 3, and 4 in the sequential condition, and 15.56° (range 5.99–24.62), 23.06 (range 9.24–37.01), and 30.67° (range 12.18–48.97) for JND2, 3, and 4 in the simultaneous condition.

MATLAB R2010a and the Psychophysics Toolbox (Brainard, 1997) were used to generate and display the stimuli. The participants viewed stimuli on an LCD monitor (refresh rate 60 Hz) from a distance of 50 cm in a dimly lit room.

### Procedure

Figure 1a depicts the trial structure for the simultaneous and sequential presentation condition, respectively. Each trial began with a 1-s fixation period. The stimuli then appeared either simultaneously for 1 s or sequentially for 0.5 s each without an inter-stimulus interval (ISI) (in the set size 1 condition only one stimulus appeared for 0.5 s presentation durations were chosen to match the average time-per-item in the different presentation conditions). Subsequent to stimulus presentation there was a 1-s delay. In the set size 1 condition, half of the trials had a 1-s delay and the other half a 1.5-s delay to match the time between stimulus presentation and report with the first and second presented stimulus of the sequential condition, respectively. Following the delay, a cue appeared for 1 s, indicating the item for recall by pointing to the left or right. After the cue-offset a randomly oriented RDP appeared at the position of the cued item. Participants adjusted the direction of the RDP by moving the mouse to the left or right. There was no upper time limit to the response. After entering their response, participants received feedback for 0.3 s. A colored dot, appearing next to the fixation square to the side of the recalled item in a color from green to red drawn from a continuous color scale indicated the precision of the response. Relative direction (clockwise, counter-clockwise), serial position (S1, S2), and spatial position (left, right) of the probed item were counter-balanced. The experiment consisted of 1296 experimental trials, 432 trials per condition (set size 1, simultaneous, sequential). Participants performed 15 practice trials at the beginning that were excluded from data analysis.

### Analysis

Response errors were calculated as the difference between the reported and the veridical direction of the cued item with a positive sign indicating responses away from the non-target item and a negative sign indicating responses towards the non-target. For this calculation we inverted the response errors for counter-clockwise directions. In consequence, general motor response biases and item-inherent biases averaged out in the analysis since they received opposite signs for clockwise and counter-clockwise directions (see Huang & Sekuler, 2010). The precision of memory representations is expressed as the circular standard deviation of the error distributions. The set size 1 condition served to identify a possible memory decay over time associated with the different maintenance durations between S1 (1.5 s) and S2 (1 s) (see Huang & Sekuler, 2010, for a similar procedure). For the set size 1 condition, we merely calculated the standard deviation of the error distribution. Note that no second item was present in set size 1 trials to which a response bias could be related. Hence, the set size 1 condition did not qualify for a bias analysis comparable to the analyses for the set size 2 conditions.

### Results

Figure 1b shows the mean errors for each presentation and similarity condition, with positive values indicating motion repulsion and negative values indicating attraction. Simultaneously presented items showed a repulsion effect i.e. they were reproduced as shifted away from the non-target item direction. For sequential presentation, distortion directions differed between items: the first presented item (S1) was attracted towards the second item in the sequence (S2), while S2 was repulsed from S1. To test for systematic distortions in the different conditions, we separately tested the mean response error for each cell of our design (Simultaneous, S1, S2 × JND2, JND3, JND4) against the null hypothesis that there is no distortion in the mean of the error distributions. All cells showed significant deviations from zero (see Table 1).

To compare the distortions in the different conditions directly, we calculated a two-way analysis of variance (ANOVA) with the two within-subject factors presentation condition (Simultaneous vs. S1 vs. S2) and similarity (2 vs. 3 vs. 4 JNDs). We found that while there was no main effect of similarity, F(2, 40) = 2.40 p = .104, η 2 = 0.11, there was a significant main effect of presentation condition, F(1.38, 27.67) = 76.59, p < .001, η 2 = 0.79 (Greenhouse-Geisser corrected), and a significant interaction, F(4, 80) = 2.97, p = .024, η 2 = 0.13. To disentangle the results, we ran two separate ANOVAs. First, we compared Simultaneous vs. S2 to see if both conditions differed in distortion magnitude and if there was a modulation of the effect strength by similarity. The ANOVA yielded main effects of similarity, F(1.42, 28.37) = 6.03, p = .012, η 2 = 0.23, and presentation condition, F(1, 20) = 6.52, p = .019, η 2 = 0.25, but no interaction, F(2, 40) = 0.21, p = .815, η 2 = 0.01. To test if there was a similarity modulation for S1, we conducted a one-factorial ANOVA with all three JND steps. There was no significant effect of similarity, F(2, 40) = 1.15, p = .328, η 2 = 0.05.

However, due to the lack of an ISI between S1 and S2 in the sequential condition, the sudden S2 onset might have interfered with S1 consolidation. Furthermore, since both items were presented successively, the time intervals between item presentation and cue differed (1.5 s for S1 1.0 s for S2), thus potentially introducing time-based differences of memory precision. To rule out the possibility that the observed distortion effects were confounded by these factors, we ran two control analyses. First, to test whether the consolidation of S1 may have been interrupted, resulting in decreased S1 performance relative to S2, we compared the memory precision of S1 and S2 (see Fig. 2a) with a two-way analysis of variance (ANOVA) with the two within-subject factors serial position (S1 vs. S2) and similarity (2 vs. 3 vs. 4 JNDs). Memory precision of S1 did not differ from S2, F(1, 20) = 0.62, p = .441, η 2 = 0.03. Furthermore, there was no main effect of similarity, F(2, 40) = 0.992, p = .380, η 2 = 0.05, and no interaction, F(2, 40) = 1.55, p = .224, η 2 = 0.07. The 500-ms interval of item exposition therefore was most likely sufficient to fully consolidate the S1. Second, to test for a possible impact of different delay durations, we compared memory precision in the set size 1 conditions where a single item was tested after long (1.5-s) versus short (1-s) delays. We observed comparable memory precision for both delay periods (short delay: M = 22.24°, SD = 10.71° long delay: M = 23.32°, SD = 10.02°), t(20) = −1.44, p = .167, d = −0.31 (paired t-test).

Recall precision across all three experiments, expressed as the standard deviation (SD) of the recall errors. Recall precision was comparable across serial positions in Experiment 1 (a) and Experiment 2 (b). A recency benefit (higher precision of the second direction of the sequence, S2) emerged in Experiment 3 (c). Sim = simultaneous presentation condition, S1 = first presented item in the sequential presentation condition, S2 = second presented item in the sequential presentation condition. Error bars depict between-subject standard errors of the mean

### Discussion

Experiment 1 showed that when two items were presented simultaneously in the periphery, subjects reported the directions as repelled from each other. This is in line with the Direction Illusion literature that reports repulsion between two concurrent motion directions in the high-similarity range (e.g., Kim & Wilson, 1997).

For sequential encoding the distortion effects diverged: subjects reported S1 as attracted by S2, and S2 as repulsed by S1. The proactive S2 repulsion showed a highly similar profile to the simultaneous encoding condition, suggesting a similar interference to between two concurrently viewed stimuli. Yet, the overall magnitude of the repulsion effect in the simultaneous condition was slightly higher, which may result from the differences in the presentation duration between the conditions. This result hence demonstrates that motion repulsion is not restricted to simultaneous perceptual stimulation but also occurs between successive stimuli in a proactive manner. In contrast, S1 was not repulsed but attracted by S2. The opposite distortion directions of S1 and S2 suggest that distortions were based on different mechanisms, or happened between different representational states of the items.

Regarding the S1 attraction effect, the immediate succession of S1 and S2 bears a possible explanation as well. To detect motion, it is necessary to integrate sensory samples across a prolonged time frame, as motion perception implies change detection of an object’s spatial positon. Motion integration has been found to occur across a period of about 100 ms of signal sampling (e.g., Alais, Apthorp, Karmann, & Cass, 2011 Burr, 1980 Snowden & Braddick, 1991). Since motion perception relies on signal integration across time, the S1 attraction effect might stem from an accidental integration of early S2 motion signals into a not-yet-fully consolidated S1 representation. Due to the short presentation time, participants might have used the S1 offset to signal the end of S1 information sampling. Since S1 offset and S2 onset coincided, a small proportion of the initial S2 motion signals might have blended into the S1 motion calculation before the integration window closed and mental processing shifted from S1 processing to S2 processing. Note that this framework suggests that the initial S2 motion perception is shifted towards the S1 direction. However, such a proactive signal merging would probably be corrected in the stream of ongoing S2 signal sampling.

With regard to the JND-modulation, we found an increase of the repulsion effect with increasing inter-item differences. This similarity modulation was comparable for simultaneously presented items and for S2 during sequential presentation. This result might seem counter-intuitive at first, because increased motion repulsion has been found for similar as compared to dissimilar items (e.g., Braddick et al., 2002 Kang & Choi, 2015 Kim & Wilson, 1997), with a peak repulsion effect at about 45° angular separation between two directions. However, those studies presented items across a wide range of similarity steps. In contrast, we used individual discrimination thresholds as the unit of our similarity manipulation with a maximum difference of four JNDs. In fact, the largest angular deviation presented in our experiment was 49° (that is, for the subject with the largest threshold in our sample). Therefore, all presented angular deviations were equal to or below this value. In other words, for most subjects, all direction differences in our experiment were on the increasing side of the repulsion profile observed in other studies (i.e., below 45° angular separation). Hence, finding an increase of effect magnitude with increasing dissimilarity can be attributed to the limited similarity range tested here.

## Introduction

Psychophysics and recordings of neuronal activity have long been used to study vision in monkeys, cats and humans. More recently, it has been shown that rodent visual circuits bear many similarities to those in these species. For instance, neurons in the mouse primary visual cortex have highly tuned receptive fields 1,2 and mice can discriminate simple 3,4 and complex 5 shapes. Thus, the mouse is currently emerging as an important and practical model system for studying the neuronal circuitry underlying visual discrimination, perceptual learning and decision-making 6 .

In the mouse, however, visual discrimination can only be studied through learning and learning, in turn, improves discrimination performance 3,4 . Little consideration has been given to the question of how the interplay between visual discrimination and learning influences the development of discrimination capacity and conditioned response in freely moving mice. Experience-dependent improvements in discrimination performance have been reported in most, if not all, auditory 7 , visual 8,9 and olfactory 10 tasks. When a conditioned stimulus (CS + ) is held constant, the learning rate increases with the discriminative value of the CS + , thereby making stimulus discriminability a powerful determinant of how perceptual learning transfers between analogous visual stimuli 8,11 . However, the relationship between varying stimulus discriminability and learning remains poorly understood. This is highly relevant because open environments vary continuously and allow locomotion, which modifies the structure of sensory arrays 12 and little is known about how learning and discrimination deal with such variability. We reasoned that, if the discriminability of a reinforced stimulus varies continuously during learning, then the sign and slope of such stimulus variations should determine the learning rate and shift the discrimination threshold.

To study the interplay between visual discrimination and learning, we adapted a two-alternative forced-choice visual discrimination task 3,4 and examined how positive, negative and oscillating gradients of stimulus similarity correlated with conditioned response and discrimination performance as a function of learning. During training, we presented the mice with a fixed reinforced image (i.e. conditioned stimulus, CS + ) and multiple non-reinforced images (CS − ) with different degrees of structural similarity to the CS + , measured by using parametric descriptions derived from image quality metrics. This allowed us to arrange equiprobable CS − stimuli into different training configurations of variable similarity. Introducing novel measures that allow the detection of successful discrimination of complex images, we found that the difficulty of comparable training conditions shaped the development of a well-defined visual conditioned response in freely moving mice 3,4 . Our results reveal the rules that govern the interplay between discrimination and learning.

## Introduction

Preventing oneself from performing an action is a fundamental part of normal response control and there are countless situations in which the most reflexive or pre-potent response must be inhibited in order to successfully achieve a goal or behave appropriately in a given context [1, 2]. Typically, the speed with which a participant is able to inhibit a pre-potent response is taken as a measure of response inhibition (RI). However, this can only be measured indirectly, since successful RI effectively eliminates an observable response. The stop-signal task (SST), as described by Logan & Cowan [1], has become a popular method for measuring the speed of RI. A typical SST comprises a large number of “go” trials on which participants are cued to give a prescribed response, and a smaller proportion of “stop” trials, on which the cue to respond (the go signal) is followed after a short interval by a stop signal instructing them to withhold that response. Response time (RT) on go trials and the probability of successfully inhibiting the response on stop trials are both critical for measuring RI in the SST. The interval between the go and stop signals, the stop-signal delay (SSD), is manipulated to find the critical delay at which the probability of successfully inhibiting the response equals 0.5. The difference between the average RT on go trials and this critical SSD reflects the average time that a participant takes to successfully withhold the response (stop-signal reaction time or SSRT). SSRT has proven to be a useful measure of individual differences in RI, Sergeant [3] describing the SST as ‘the most direct measure of the processes required in inhibiting a response’ (p9) and consequently it is now a widely used research and assessment tool, particularly in relation to developmental disorders such as Attention Deficit Hyperactivity Disorder (ADHD).

Two conventional methods of presenting the go and stop trials have been widely used in the SST. The original SST used a method of constant stimuli, where stop signals are presented at set intervals after the go signal and an inhibition function is produced based on the probability of inhibition at each stop-signal interval [14]. The other is a nonparametric adaptive staircase which involves a simple method of adjustment whereby the SSD is increased or decreased after each stop trial in a stepwise fashion according to the success or failure to inhibit on the previous stop trial [15]. This homes in on the point at which the probability of inhibiting is 0.5, the point at which the best estimate of SSRT is obtained. Both of these methods require large numbers of trials, although adaptive stepwise adjustment is relatively rapid in comparison to the method of constant stimuli and hence has been the preferred method.

### Rapid threshold estimation using the Bayesian adaptive estimates technique

In psychophysics, a similar problem is well known to researchers requiring reliable estimates of psychophysical thresholds in situations where running large numbers of trials is impractical. Researchers have used Bayes’ theorem to develop several techniques to try to overcome the practical limitations of lengthy psychophysical procedures, thereby expanding the repertoire of analytic tools available for rapid threshold estimation [16, 17, 18]. Kontsevich and Tyler [18] outlined an adaptive estimation technique (the Ψ method) that takes advantage of Bayes’ rule and chooses test variables on the principle of minimizing entropy, the amount of information required to have complete knowledge of a system [19, 20], to find reliable threshold estimates over a relatively small number of trials. The technique calculates the prior probabilities of a successful response for each of several possible stimulus values that could be chosen to present to the observer, assuming that those probabilities are governed by an underlying psychometric function with a range of different possible parameters. The goal is to find the most likely combination of parameters (i.e. the best fitting psychometric function) within a defined parameter space. On each trial, the amount of information that could be gained from testing each stimulus value is calculated, and the stimulus value that stands to yield the most information is selected for the next trial. On the basis of the observer’s response, the posterior probabilities of each combination of parameters are then updated, ready to calculate entropy for selecting the next stimulus value. In this fashion, the Ψ staircase quickly finds a stable estimate of the most likely psychometric parameters. In our application to the SST, the successful response is actual inhibition of a response on stop trials, and the parameters being estimated describe the slope and threshold of the function that relates probability of successfully inhibiting, p(i), to the SSD. Thus the aim of the technique is to use the Ψ method to quickly and accurately estimate the critical SSD for which RI success and failure are equally likely.

This study compared and validated the Ψ method of interval adjustment against methods previously used with adults and children. The stepwise adjustment and Ψ methods were compared with functions derived from trials using the method of constant stimuli, a lengthier test procedure that is not practicable in many testing situations, but is nevertheless useful for further validation of the shorter estimation procedures.

The use of an adaptive estimation method may permit reliable estimation of the SSRT over a relatively small number of trials. Kontsevich and Tyler [18] demonstrated that reliable threshold estimates could be calculated with the Ψ method using as few as 30 trials on tasks that estimate psychophysical thresholds. However, this method was designed for and validated using two-alternative forced-choice psychophysical judgments made under carefully controlled conditions. It remains to be seen whether the same method will be useful under the response requirements of the stop signal task.

### Applying the Ψ method to the SST

We first tested the feasibility of using the Ψ staircase to estimate SSRT using simulations of response inhibition based on a simple horse-race model of RI [21]. The simulations followed a similar rationale and method to that used by Band, van der Molen, and Logan [22]. Go RTs were assumed to be distributed according to the concatenation of Gaussian and exponential distributions (an ex-Gaussian distribution). The ex-Gaussian distribution mimics the positive skew of most RT distributions from choice response tasks and several researchers have found that it yields a good fit to empirical data [23, 24, 25, 26]. This distribution is assumed to be the same for go and stop trials. In other words, the presence of the stop signal on a stop trial is assumed to have no impact on the speed of executing the go response, if indeed it is executed. Under a simple race model of response execution, on each stop trial, if execution of the inhibitory process finishes before execution of the primary response then the response is successfully withheld. If execution of the primary response finishes before execution of the inhibitory process then the response is made (a failure to inhibit).

A more comprehensive and detailed description of the Ψ method can be found in Kontsevich and Tyler [18]. We will focus here on the particular details of its implementation in the current paradigm. To begin with, the probability of successfully inhibiting a response on a stop trial must be assumed to be a function of the SSD. This is effectively a survival function, where the probability of successfully inhibiting decreases monotonically with the duration of the SSD. Alternatively, the function can be described just as validly as a cumulative function of the time remaining before the designated trial timeout at the onset of the stop signal (

SSD). As the SSD approaches the time when the trial times out (

SSD = 0), the probability of successful inhibition should be very low, equal to a baseline error rate reflecting the proportion of go trials on which a go response is not performed in the allotted time. When SSD equals zero (

SSD equals the maximum time allowed to make the response), the probability of successful inhibition should be very high, equal to one minus the proportion of trials on which the stop signal is completely ignored regardless of the SSD. The following equations provide two methods for deriving the probability of successful inhibition, p(i) as a function of the time remaining between the SSD and the end of the trial,

1. Carvell

You have difficult choices

2. Faebar

I confirm. And I have faced it. We can communicate on this theme.

3. Vigrel

Has stopped on a forum and has seen this topic. Do you allow me to help?

4. Aenedleah

Yes, this is our modern world and I'm afraid that nothing can be done about it :)

5. Donell

Okay, intrigued ...