# Fit a psychometric function when the maximum is not 100% (not because of lapse)

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I found that currently the form of psychometric function assumes that the proportion correct lies between 0.5-1 given the range of stimulus level, but how to fit a psychometric function when the subject's performance never goes to 1 (and it is not because of lapse)?

An example of how this situation can happen: suppose I want to study how stimulus presentation time affect a small target's visibility in a given peripheral visual field location by a 2AFC task. The target's feature is fixed throughout the experiment. I can imagine that with longer stimulus presentation time (like 1 ms vs. 200 ms), subject can make more correct responses, but because of the limit of peripheral vision, the correct response rate can never go to 100% even if you present the stimulus for a very long time. How can I fit a psychometric function (proportion correct ~ stimulus presentation time) to data like this?

Yes. When you fit your psychometric function you try to maximize $$p(X| heta)$$, with $$X$$ your observations and $$heta$$ your parameters. When you assume 100% correct answers, you would typically assume that your psychometric function is $$Phi(s| heta)$$, $$s$$ being your stimulus value and $$Phi$$ being a cumulative normal. More specifically $$Phi(s| heta)=frac{1}{sigmasqrt{2pi}} e^{-frac{(s-mu)^2}{2sigma^2}}$$. Here $$heta$$ is both the mean $$mu$$ and the std $$sigma$$. To have a psychometric function that does not go to 100% correct, you just need to add a third variable to $$heta$$. It doesn't matter whether the observer's performance is limited by lapses or another factor, mathematically that's equivalent. Then you fit your data with $$Phi'(s| heta)=frac{lambda}{2}+(1-lambda)Phi(s| heta)$$. With $$lambda$$ your lapse rate, assuming this lapse rate is symmetrical. You could add a 4th parameter if you have a good reason to assume the lapse rate is different near 0 and near 1 (then $$heta=(mu,sigma,lambda_{low},lambda_{high})$$). Also, you might run into issues because if you fit with $$lambda$$ unconstrained, you could get values higher than 1 or lower than 0. So when fitting it is helpful either to use a constrained algorithm (e.g. a quasi-Newtonian gradient descent, in Matlab that would be fmincon), or better to transform your parameter using a function that is bounded like the cumulative normal function itself. In other words, use $$Phi'(s| heta)=(frac{Phi(lambda_{low})}{2})+(1-frac{Phi(lambda_{high})}{2})Phi(s| heta)$$.

It is in facts recommended to always allow for a lapse rate, as you could misestimate the slope of your psychometric function when you do not.

## Introduction

Dyslexia is a learning disability that affects between 5% and 17% of the population and poses a substantial economic and psychological burden for those affected 1,2,3 . Despite decades of research, it remains unclear why so many children without obvious intellectual, sensory, or circumstantial challenges find written word recognition so difficult.

One popular and persistent theory is that dyslexia arises as a result of an underlying auditory processing deficit 4,5,6,7,8,9 . According to this theory, a low-level auditory processing deficit disrupts the formation of a child’s internal model of speech sounds (phonemes) during early language learning later, when young learners attempt to associate written letters (graphemes) with phonemes, they struggle because their internal representation of phonemes is compromised 10 .

In line with this hypothesis, many studies report group differences between dyslexics and typical reading control participants in auditory psychophysical tasks including amplitude modulation detection 11,12,13,14,15 , frequency modulation detection 16,17,18,19,20 , rise time discrimination, and duration discrimination 21,22,23,24,25 . Moreover, attributing dyslexia to an auditory deficit is appealing because one of the most effective predictors of reading difficulty is poor phonemic awareness—the ability to identify, segment and manipulate the phonemes within a spoken word 26,27 . If a child’s phoneme representation were abnormal due to auditory processing deficits, it could be a common factor underlying both poor performance in auditory phoneme awareness tests and difficulties learning to decode written words.

In one variant of this auditory hypothesis, dyslexia is thought to involve a deficit specifically in the processing of rapid modulations in sound, usually referred to as a “rapid temporal processing” deficit 5,28,29 . This idea has been controversial (see Rosen 30,31 for review), but also remains as one of the widely cited accounts of auditory deficits in dyslexia 7 . One source of controversy is that the rapid temporal processing deficit is far from universal in Tallal’s 5 study that first proposed a causal relationship between rapid temporal processing and reading ability, only 8 out of 20 dyslexic children showed a deficit on a temporal order judgment task. Similarly, in a more comprehensive study of 17 dyslexic adults, only 11 had impaired performance on an extensive battery of auditory psychophysical tasks 32 . Moreover, in that study, tasks requiring temporal cue sensitivity were not systematically more effective than non-temporal tasks at separating dyslexics from controls. Several studies have also shown that dyslexic participants have heightened sensitivity to allophonic speech contrasts (differences in pronunciation that do not change the identity of the speech sound) 33,34,35 , including contrasts primarily marked by temporal differences in the stimuli, which contradicts the hypothesis that the dyslexic brain lacks access to temporal information. Furthermore, other studies have shown that dyslexics have normal abilities to resolve spectrotemporal modulations in noise, suggesting that apparent speech perception difficulties result from higher-level aspects of processing beyond encoding sounds 36,37,38 . More recently, the rapid temporal processing hypothesis has been reframed as a deficit in processing dynamic, but not necessarily rapid, aspects of speech 10,39,40 .

A second reason to question the link between auditory processing deficits and dyslexia is that the standard technique for data analysis in psychophysical experiments is prone to severe bias. Specifically, in many phoneme categorization studies, participants identify auditory stimuli that vary along some continuum as belonging to one of two groups, a psychometric function is fit to the resulting data, and the slopes of these psychometric functions are compared between groups of dyslexic and control readers (for reviews, see Vandermosten et al. 55 Noordenbos and Serniclaes 35 ). In this approach, a steep slope indicates a clear boundary between phoneme categories, whereas a shallow slope suggests less defined categories (i.e., fuzzy phonological representations), possibly due to poor sensitivity to the auditory cue(s) that mark the phonemic contrast. Unfortunately, most studies in the dyslexia literature fit the psychometric function using algorithms that fix its asymptotes at zero and one, which is equivalent to assuming that categorization performance is perfect at the extremes of the stimulus continuum (i.e., assuming a “lapse rate” of zero). This assumption is questionable in light of the evidence that dyslexics may be less consistent categorizing stimuli across the full continuum 35 , as well as evidence that attention, working memory, or task difficulty, rather than stimulus properties, may underlie group differences between readers with dyslexia and control subjects.

The zero lapse rate assumption is particularly problematic given that fixed asymptotes at zero and one leads to strongly downward-biased slope estimates even when true lapse rates are fairly small 56,57 . In other words, if a participant makes chance errors on categorization, and these errors happen on trials near the ends of the continuum, the canonical psychometric fitting routine used in most previous studies will underestimate the steepness of the category boundary. Thus, a tendency to make a larger number of random errors (due to inattention, memory, or task-related factors) will be wrongly attributed to an indistinct category boundary on the contrast under study. Since it is precisely the subjects with dyslexia that more frequently show attention or working memory deficits, research purporting to show less distinct phoneme boundaries in readers with dyslexia may in fact reflect non-auditory, non-linguistic differences between study populations.

Unfortunately, most studies in the dyslexia literature that use psychometric functions to model categorization performance appear to suffer from this bias. Although most do not report their analysis methods in sufficient detail to be certain, we infer (based on published plots, and the lack of any mention of asymptote estimation) that the bias is widespread (e.g., Reed 58 Manis et al. 59 Breier et al. 60 Chiappe, Chiappe and Siegel 61 Maassen et al. 62 Bogliotti et al. 34 Zhang et al. 63 ). A few studies report using software that in principle supports asymptote estimation during psychometric fitting, but do not report the parameters used in their analysis (e.g., Vandermosten et al. 43,55 ). In one case, researchers who re-fit their psychometric curves without data from the continuum endpoints found a reduced effect size, prompting the authors to wonder whether any effect would remain if an unbiased estimator of slope were used 64 . Only a few studies have fit psychometric functions with free asymptotic parameters or investigated differences in lapse rates 35,65,66 however, it is worth noting that their methods did not constrain the lapse rate, which can lead to upward-biased slope estimates 67 . This is a consequence of the fact that slope and lapse parameters “trade off” in the optimization space of sigmoidal function fits 68 .

In light of all these problems—the inconsistency of findings, confounding influences of experimental task design, and bias introduced by analysis choices—it is reasonable to wonder whether children with dyslexia really have any abnormality in phoneme categorization, once those factors are all controlled for. The present study addresses the relationship between reading ability and phoneme categorization ability, and in particular, whether children with dyslexia show a greater deficit on phoneme contrasts that rely on temporally varying (dynamic) cues, as opposed to static cues. This study avoids the aforementioned methodological problems by using multiple paradigms with different attentional and memory demands, and by analyzing categorization performance using Bayesian estimation of the psychometric function 67 . In this approach, the four parameters of a psychometric function—the threshold, slope, and two asymptote parameters—are assigned prior probability distributions, which formalize the experimenter’s assumptions about their likely values. By allowing the asymptote parameters to vary, the slope is estimated in a less biased way than in traditional modeling approaches with fixed or unconstrained asymptotes. However, because fitting routines trade off between asymptote parameters and the slope parameter of a logistic model in optimization space 68 , it can be difficult to estimate both accurately at the same time. To address this difficulty, we first performed cross-validation on the prior distribution of the asymptote parameters to determine the optimal model to fit the data.

This paper presents data from 44 children, aged 8–12 years, and a wide range of reading abilities. Our experimental task is based on the design of Vandermosten et al. 43,55 , which assessed categorization performance for two kinds of stimulus continua: those that differed based on a spectrotemporal cue (dynamic), and those that differed based on a purely spectral cue (static). In the original study, the authors concluded that children with dyslexia are specifically impaired at categorizing sounds (both speech and non-speech) that differ on the basis of dynamic cues. However, although the dynamic and static stimuli in their study were equated for overall length, the duration of the cues relevant for categorization were not equal: in the dynamic stimuli, the cue (a vowel formant transition) was available for 100 ms, but in the static stimuli, the cue (the formant frequency of a steady state vowel) was available for 350 ms. This raises the question of whether cue duration, rather than the dynamic nature of the cue, was the source of apparent impairment in categorization among participants with dyslexia. The present study avoids this confound by changing the “static cue” stimuli from steady-state vowels to fricative consonants (a /ʃa/

/sa/continuum), so that the relevant cue duration is 100 ms in both the static (/ʃa/

/da/) stimulus continua. Additionally, Vandermosten and colleagues used a test paradigm in which listeners heard three sounds and were asked to decide if the third sound was more like the first or second (an ABX design). Here we included both an ABX task and a single-stimulus categorization task, to see whether the memory and attention demands of the ABX paradigm may have played a role in previous findings. Thus, by (a) assessing categorical perception of speech continua with static and dynamic cues, (b) varying the cognitive demands of the psychophysical paradigm, and (c) empirically determining the optimal parameterization of the psychometric function with cross-validation, we aim to clarify the role of auditory processing deficits in dyslexia.

## RESULTS

### Human studies.

Fitted psychometric function ( μ ^ , σ ^ ) and confidence scaling ( k ^ ) parameters for each of our four subjects for yaw rotation about an earth-vertical rotation axis are shown in Figs. 5 (mean) and 6 (SD) parameter fits are plotted vs. the number of trials in increments of 5 trials starting at the 15th trial. [To demonstrate raw performance for individual test sessions, appendix b (see Fig. B1) presents the parameter fits for each of the six individual tests for each subject.] As described in the methods , all parameter estimates are determined using maximum likelihood methods.

Fig. 5.Summary of human psychometric parameter estimates as trial number increases. Each column represents fitted parameters for 1 subject. A–D: average fitted psychometric width parameter ( σ ^ ). E–H: average fitted confidence scaling factor ( k ^ ). I–L: average fitted psychometric function bias ( μ ^ ). Thick black curves show average psychometric parameter estimates calculated using conventional forced-choice analyses. Thick red curves show average parameter estimates determined by fitting confidence probability judgment data. Errors bars (thin gray curves and thin red curves, respectively) represent SD of parameter estimates.

Fig. 6.SD of human psychometric parameter estimates as trial number increases. Each column represents fitted parameters for one subject in the same order as Fig. 5. A–D: SD of the fitted psychometric width parameter ( σ ^ ). E–H: fitted psychometric function bias ( μ ^ ). Black curves show SD of psychometric parameter estimates calculated using conventional forced-choice analyses. Gray curves show SD of parameter estimates determined via our CSD model fit.

Consistent with previous studies utilizing adaptive procedures (e.g., Chaudhuri and Merfeld 2013 Garcia-Perez and Alcala-Quintana 2005), the conventional estimates of the width of the psychometric function ( σ ^ ) took between 50 and 100 trials to stabilize (Fig. 5, A–D, black curves). More specifically, using these conventional psychometric methods, the estimated width parameter ( σ ^ ) was significantly lower after 20 trials than after 100 trials (repeated measures ANOVA, n = 4 subjects, P = 0.011).

In contrast, estimates of the width parameter ( σ ^ ) using our confidence fit technique required fewer than 20 trials to reach stable levels (Fig. 5, A–D, red curves). Specifically, the width parameter ( σ ^ ) estimated using confidence probability judgments was not significantly different after 20 trials than for 100 trials (repeated-measures ANOVA, n = 4 subjects, P = 0.251). Furthermore, the estimated width parameter after 20 trials using confidence probability judgments was not significantly different from the estimated width parameter after 100 trials using conventional psychometric fit methods (repeated-measures ANOVA, n = 4 subjects, P = 0.907).

Furthermore, the parameter estimates obtained using conventional psychometric fits (Fig. 6, black traces) were more variable than the fits obtained using our CSD model (Fig. 6 gray traces). In fact, the precision of the psychometric width estimate using the confidence model was about the same after 20 trials (average SD of 0.124 across subjects) as the conventional psychometric fit estimate after 100 trials (0.129).

The estimates of the shift of the psychometric functions ( μ ^ ) showed a qualitatively similar pattern the estimates that utilized confidence reached stable levels a little sooner and were more precise than the estimates provided by the conventional analysis. We also note that three of our subjects seemed well calibrated (Fig. 5, E–G) with fitted confidence-scaling factors near 1, while the second subject had a fitted confidence-scaling factor near 2 (Fig. 5H), suggesting substantial underconfidence.

### Simulations.

We also simulated tens of thousands of test sessions to test the confidence fit procedures more thoroughly. The simulations were designed to mimic the human studies with the obvious difference being that we defined the simulated psychometric [Ψ(x)] and confidence [χ(x)] functions. Since we knew these simulated functions, this allowed us to quantify parameter fit accuracy. For all simulated data sets, we fit the conventional binary forced-choice data and compared and contrasted these fits with the CSD fits. Histograms show fitted parameters after 20 (Fig. 7, A–C) and 100 (Fig. 7, D–F) trials for 10,000 simulations. After as few as 20 trials, the CSD fit parameters demonstrated relatively tight distributions (Fig. 7, B and C) compared with the binary fits that show ragged distributions (Fig. 7A). After 100 trials, the binary fit parameters demonstrated relatively tight distributions (Fig. 7D) that mimicked those found for the CSD fit parameters after 20 trials (Fig. 7, B and C). The CSD fit parameters after 100 trials (Fig. 7, E and F) demonstrated higher precision (i.e., lower variance) than the binary fit parameters after 100 trials (Fig. 7D). (See Fig. B2 for similar histograms for 100 trials for the other 2 simulation data sets.)

Fig. 7.Parameter distributions show parameter estimates for 10,000 simulated experiments with 20 and 100 trials. The columns from left to right represent the fitted psychometric width parameter ( σ ^ ), the fitted confidence scaling factor ( k ^ ), and the fitted psychometric function bias ( μ ^ ) as shown on the x-axis at bottom. A and D: fitted parameters of conventional binary forced-choice parameter estimates. B and E: fitted parameters estimates determined via our CSD model fit for a well-calibrated subject (k = 1). C and F: fitted parameters estimates determined via our CSD model fit for an underconfident subject (k = 2). The solid black line shows the actual parameter value (i.e., μ = 0.5 or σ = 1), the solid gray line shows the mean of fitted parameters, and the dashed gray lines indicate SD on each side of the mean.

Mimicking the format previously used for the human data (Figs. 5 and 6) simulation parameter fits are plotted vs. the number of trials in increments of 5 trials starting at the 15th trial. The black curves in Figs. 8 and 9 show the fitted psychometric function parameters for the binary forced choice data the red (Fig. 8) and gray (Fig. 9) curves show the fitted psychometric and confidence function parameters fit using the CSD model. (To provide direct quantitative comparisons, appendix b summarizes data from all simulations in tabular form.)

Fig. 8.Summary of simulation parameter estimates as trial number increases. As illustrated via insets, each column represents different simulated combinations of the confidence function (red solid curves) and the fitted confidence function (red dashed curves). A, E, and I: well-calibrated subject (k = 1) when both confidence and confidence fit functions are cumulative Gaussians. B, F, and J: underconfident subject (k = 2) when both confidence and confidence fit functions are cumulative Gaussians. C, G, and K: underconfident subject when the confidence function is linear, χ(x) = m(x − μ) + 0.5 = 0.1443x + 0.428, with added zero-mean uniform noise [U(−0.1,+0.1)], and the confidence fit function is a cumulative Gaussian. D, H, and L: underconfident subject with the same linear confidence function with added zero-mean uniform noise [U(−0.05,+0.05)] when the confidence fit function is linear, χ ^ ( x ) = m ^ ( x − μ ^ ) + 0.5 . A–D: fitted psychometric width parameter ( σ ^ ). E–G: fitted confidence-scaling factor ( k ^ ). H: fitted slope of confidence function. I–L: fitted psychometric function bias ( μ ^ ). Thick black curves show average conventional forced-choice parameter estimates, which are identical for all conditions. Thick red curves show average parameter estimates determined by fitting confidence probability judgments. Errors bars (thin gray curves and thin red curves, respectively) represent SD of parameter estimates.

Fig. 9.SD of simulation parameter estimates as trial number increases. Each column represent the same conditions as Fig. 8. A–D: fitted psychometric width parameter ( σ ^ ). E–H: fitted psychometric function bias ( μ ^ ). Black curves show SD of conventional forced-choice parameter estimates, which are identical for all conditions. Gray curves show SD of parameter estimates determined via our CSD model fit.

The simulated data (Figs. 8, A, E, and I, and 9, A and E and Tables B1–B3, row 2) show that the CSD model yielded fit parameters that accurately matched those simulated when the simulated subject's confidence was well calibrated (k = 1), where “well calibrated” means that the subject's confidence matches the psychometric function, χ(x) = ϕ(xμ + 0.5, = 1). Even when the subject's confidence was not well calibrated (k = 2), the confidence fit parameters matched the three confidence function parameters well (Figs. 8, B, F, and J, and 9, B and F, and Tables B1–B3, row 3). In fact, except that the fitted confidence-scaling factor ( k ^ ) settles near a value of 2 (Fig. 8F) instead of 1 (Fig. 8E), the average psychometric parameter estimates for an underconfident subject appeared nearly the same as for a well-calibrated subject. Indeed, the fitted psychometric width parameter ( σ ^ ) demonstrated a lower SD for an underconfident subject than for a well-calibrated subject (see appendix b ).

To demonstrate robustness, we utilized the same Gaussian confidence fit model (Eq. 2) while simulating a confidence model that differed from the Gaussian confidence fit model in two ways. First, we modeled the confidence function as a linear function (slope of 0.1445 i.e., σ = 2) instead of a cumulative Gaussian. In addition, secondly, we added zero-mean uniform noise, U(−0.1,0.1), to the simulated confidence response. Despite these differences, the confidence fit of these simulated data mimics the earlier confidence fits well (Figs. 8, C, G, and K, and 9, C and G, and Tables B1–B3, row 4). The primary difference is that the parameter fit precision was not as good as for the first two simulation sets described above but was still better than for the conventional fits. For example, despite the severe noise (−10% to +10%), the fit precision for the width parameter ( σ ^ ) after 20 trials utilizing confidence matched the fit precision after about 50 trials using conventional analyses.

Finally, to demonstrate the flexibility of the confidence fit technique, we model the same linear confidence function from the previous paragraph, but we now add less extreme zero-mean uniform noise levels U(−0.05,0.05) and fit a linear confidence function that mimics the linearity of the true confidence function used for these simulations. The fit accuracy and precision were very good (Figs. 8, D, H, and L, and 9, D and H, and Tables B1–B3, row 5), demonstrating that the fitted psychometric function and confidence function need not be similar in form. (For some conventional confidence metrics, including goodness of fit parameters, see Table B4.)

De acuerdo con nuestro razonamiento, encontramos que la sensibilidad en las condiciones de cue de disparidad y textura no se vio afectada por la aplicación de tDCS (disparidad anódica: t 11 = 1.32, P = 0.21 disparidad catódica: t 11 = 0.58, P = 0.58 textura anódica: t 11 = 0.63, P = 0.54 textura catódica: t 11 = 1.08, P = 0.30 Fig. 5a). En la condición simulada, observamos el beneficio de comportamiento esperado de la combinación: es decir, el rendimiento en la condición de señal congruente fue significativamente mayor que para las condiciones de señal única (disparidad, t 11 = 7.57, P = 1.1e −5 , d de Cohen = 2.18 textura, t 11 = 5.67, P = 1.4e −4 , d de Cohen = 1.63).

Imagen de tamaño completo

Sin embargo, de manera importante, la ventaja de combinar señales en las condiciones de congruencia e incongruencia se redujo mediante la aplicación de tDCS. En particular, encontramos que la tDCS catódica redujo la sensibilidad a las señales congruentes e incongruentes (congruente, t 11 = 5.17, P = 3.0e −4 , d de Cohen = 1.49 incongruente, t 11 = 2.58, P = 0.02, d de Cohen 0.74), mientras que un menor rendimiento bajo la estimulación anódica no fue estadísticamente significativo (congruente, t 11 = 1.33, P = 0.21 incongruente, t 11 = 0.56, P = 0.61 Fig. 5a, b). Esto indica que la perturbación de la excitabilidad cortical alrededor de V3B / KO interrumpió la capacidad de los observadores para integrar la disparidad y las señales de textura, mientras que no afecta la sensibilidad a las señales individuales.

tDCS modula la excitabilidad del tejido neural durante la estimulación (efectos en línea) y siguiendo el desplazamiento de la estimulación (efectos fuera de línea) 35 . Si bien los efectos moduladores de las tDCS en línea y fuera de línea son similares, el trabajo farmacológico indica que el mecanismo neurofisiológico que produce esta modulación puede ser diferente 40 . Por lo tanto, probamos si la integración de la señal también se ve afectada por el tDCS en línea, repitiendo el experimento en un nuevo conjunto de participantes y midiendo ahora el desempeño del comportamiento durante la estimulación. Encontramos el mismo patrón de resultados: la sensibilidad a las señales combinadas congruentes / incongruentes se redujo significativamente durante tDCS catódica (congruente, t 11 = 2.69, P = 0.02, Cohen&aposs d = 0.78 incongruente, t 11 = 2.98, P = 0.01, Cohen&aposs d = 0, 86 Fig. 5c, d). Estos resultados muestran que la integración de la señal se ve afectada por el tDCS tanto en línea como fuera de línea, y proporcionan una réplica de los principales efectos de tDCS en una segunda cohorte de participantes.

Para evaluar si tDCS afectó el sesgo de los observadores, analizamos las diferencias en el punto de igualdad subjetiva entre las condiciones de estimulación. Encontramos efectos marginalmente significativos (medidas repetidas, análisis de varianza (RM ANOVA), fuera de línea: F 2, 22 = 3, 22, P = 0, 06 en línea: F 2, 22 = 2, 72, P = 0, 09) sin embargo, las diferencias fueron pequeñas y en dirección opuesta para la estimulación en línea o fuera de línea. Además, la mayor diferencia entre la estimulación en línea y fuera de línea es entre las condiciones simuladas que proporcionan la línea de base de control (Figura 4 complementaria). Por lo tanto, interpretamos estos resultados como fortuitos.

## References

1. Fechner G. Elements of Psychophysics. Hilt, Rinehart & Winston, Inc. 1860/1966

2. Gescheider GA. Psychophysics: The Fundamentals Lawrence Erlbaum Associates. Mahwah, New Jersey: Lawrence Erlbaum Associates 1997.

3. Peirce JW. PsychoPy – Psychophysics software in Python. Journal of Neuroscience Methods. 2007162(1–2):8–13.

4. Peirce JW. Generating stimuli for neuroscience using PsychoPy. doi:10.3389/neuro.11.010.2008 Frontiers in Neuroinformatics. 20092:10.

5. Prins, N., & Kingdom, F.A.A. (2009). Palamedes: MATLAB routines for analyzing psychophysical data. http://www.palamedestoolbox.org.

6. Wichmann FA, Hill NJ. The psychometric function: I Fitting, sampling and goodness-of-fit. Perception and Psychophysics. 2001a63:1293–1313.

7. Wichmann FA, Hill NJ. The psychometric function: II Bootstrap-based confidence intervals and sampling. Perception and Psychophysics. 2001b63:1314–1329.

Classifying Psychophysical Experiments

## Methods

### Participants

Observers were recruited from the University of Cambridge and had normal or corrected-to-normal vision, and were screened for stereo deficits. A priori sample sizes were established using effect sizes from previous MRS 14 and tDCS 35 studies to achieve 90% power. Twenty observers participated in the MRS experiment however, two were not included in the analysis: one withdrew mid-scan and a hardware fault stopped acquisition mid-scan for the other. Eighteen subjects (15 male 17 right-handed 25.1 ± 3.1 years) completed MRS for V3B/KO and M1, of whom 15 also returned for the (control) V1 scan. Twelve observers participated in each of the 5 tDCS experiments, for a total of 34 different observers (19 male 31 right-handed 24 ± 3.6 years). Experiments were approved by the University of Cambridge ethics committee all observers provided written informed consent.

### Apparatus and stimuli

Stimuli were generated in MATLAB (The MathWorks, Inc., Matick, MA) using Psychophysics Toolbox extensions 55,56 . Binocular presentation was achieved using a pair of Samsung 2233RZ LCD monitors (120 Hz, 1680 × 1050) viewed through mirrors in a Wheatstone stereoscope configuration. The viewing distance was 50 cm and participants’ head position was stabilized using an eye mask, head rest and chin rest. Eye movement was recorded binocularly at 1 kHz using an EyeLink 1000 (SR Research Ltd, Ontario, Canada).

Stimuli were virtual planes slanted about the horizontal axis (Fig. 3a). Two cues to slant were independently manipulated: texture and disparity. The texture cue was generated by Voronoi tessellation of a regular grid of points (1° ± 0.1° point spacing) randomly jittered in two dimensions by up to 0.3° 21,57 . Each texture patch had on average 64 texture elements (textels) however, the actual number of textels varied between trials depending on their size. Each textel was randomly assigned a grey level and shrunk about its centroid by 20%, creating the appearance of ‘cracks’ between textels. The width of these cracks also varied as a function of surface slant, thus providing additional texture information. Texture surfaces were mapped onto a vertical virtual surface and rotated about the horizontal axis by the specific texture-defined angle, before a perspective projection consistent with the physical viewing geometry was applied. To isolate the disparity cue, a random-dot stimulus was generated using the same parameters as in the texture stimuli, i.e., an average of 64 dots with randomized grey level assignment. In the single-cue disparity and two-cue conditions, binocular disparity was calculated from the cyclopean view and applied to each vertex/dot based on the specific disparity-defined slant angle.

Surfaces were presented unilaterally (80% left and 20% right of fixation) inside a half-circle aperture (radius 6°) and a cosine edge profile to blur the appearance of depth edges. Stimuli were presented on mid-grey background, surrounded by a grid of black and white squares (75% density) designed to provide an unambiguous background reference. In the stereoscopic conditions, observers could theoretically discriminate surface slant based only on the difference in depth at the top/bottom of a pair of stimuli. Similarly, in the texture-only condition, observers could make judgements based on the difference in textel density at the top/bottom of a pair of stimuli. To minimize the availability of these cues, disparity-defined position was randomized by shifting the surface relative to the fixation plane (0° disparity) to between ± 10% of the total surface depth. Texture-defined position in depth—which corresponded to average textel size—was randomized for each stimulus presentation by increasing point spacing in the initial grid of points by ± 10% 21 .

We presented four cue conditions: 2× single-cue (texture and disparity) and 2× two-cue conditions (congruent and incongruent). Stimuli in the single-cue texture condition were presented monocularly (right eye), whereas all other stimuli were presented binocularly.

### Procedure

Observers performed a two-interval forced-choice discrimination task in which the reference and test stimuli were presented in randomized order (Fig. 3b). Each stimulus was presented for 500 ms with an inter-stimulus interval of 300 ms. Following the offset of the second stimulus, observers were prompted to indicate which stimulus was more slanted (using a keypress) by the fixation cross changing from white to black. No duration limit was enforced for responses, but observers were encouraged to respond quickly. Following a response, the fixation cross was changed back to white and a fixation period of 500 ms preceded the onset of the next trial. A method of constant stimuli procedure was used to control the difference in slant between the reference and test stimuli. The MATLAB toolbox Psignifit 58 (http://psignifit.sourceforge.net/) was used to fit psychometric functions to the data. Sensitivity to slant was derived from the slope of the psychometric function and the point of subjective equality (PSE) from the threshold.

In the congruent-cue condition, reference stimuli consisted of consistent texture and disparity slant (S δ = S χ = 40°). It is noteworthy that we chose this slant angle, as observers sensitivity to disparity and texture cues was similar (at larger angles, observers become relatively more sensitive to the texture cue 59 ). Ensuring similar cue reliabilities (i.e., a 1:1 reliability ratio) gave us the greatest potential to detect the improved performance associated with combination. Specifically, the maximum possible benefit for combining independent cues is a factor of √2 for the case when the two cues have equal reliability this benefit is smaller when the two cues differ in reliability.

As we were testing the robustness of observers’ perception, we designed the stimulus in the incongruent-cue condition such that one cue was more reliable than the other. To achieve this, we took advantage of the fact that sensitivity to texture-defined slant increases with slant angle 59 . This allowed us to manipulate cue reliability, without changing aspects of the stimuli other than slant (i.e., we did not need to add noise or manipulate contrast, which might complicate comparisons between conditions). Specifically, for the incongruent condition, we combined a smaller disparity slant (S δ = 20°) with a larger texture slant (S χ = 50°), yielding a stimulus whose component cue elements differed in reliability (approximately 2:1 ratio). We chose a 2:1 reliability ratio for the incongruent case, as this (i) could be achieved while holding all other stimulus parameters constant between congruent and incongruent conditions (except slant angle), and (ii) was predicted by the model to produce robust behaviour. In addition to the combined conditions, single-cue conditions were included, for each of the slant angles used in the combined stimuli (i.e., S δ = [40°,20°], S χ = [40°,50°]). We also included a test stimulus with 0° texture and disparity slant. This was intended to be easily discriminable from the reference stimuli and thus provide a generalized measure of psychophysical performance by capturing the lapse rate of the observers. In addition, we presented six trials with reference stimuli selected at random at the start of each block to refresh observers’ familiarity with the task. Observers were regularly prompted to maintain fixation throughout the experiment.

In the congruent- and single-cue conditions, the test stimuli were defined by congruent and single cues, within a range ±20° of the reference stimulus (40°) over eight evenly spaced steps, i.e., ± [20.0,14.3,8.5,2.8]. For the incongruent-cue condition, the test stimuli were defined by congruent cues, within a range of ± 25° of the midpoint between the slants defined by the incongruent cues of the reference stimulus (35°) over eight evenly spaced steps, i.e., ± [25.0,17.8,10.7,3.6]. For participants who showed high precision in the incongruent condition during the initial familiarization stage, this range was adjusted to ± 14° to more closely assess their sensitivity. As an incongruent test stimulus was compared against consistent-cue reference stimuli, the PSE in the incongruent condition provides an assessment of the perceived shape of the incongruent stimulus in terms of congruent stimuli.

Before brain imaging/stimulation experiments, participants performed a familiarization session in the laboratory. This was used to introduce participants to viewing the stimuli in the stereoscope and ensure they could perform the slant discrimination task.

For the MRS experiment, participants took part in two further sessions. One session was used to acquire MRS measurements inside the MRI scanner while the participants were at rest (i.e., no active task was performed). The other session measured psychophysical performance on the slant discrimination task under the different experimental conditions. The two sessions were separated by 24–48 h and the order of sessions was counterbalanced across participants. For each condition, observers underwent two blocks of 214 trials. Condition order was randomized.

For the tDCS experiments, participants took part in three experimental sessions (sham, anodal or cathodal). Each session was separated by at least 36 h and the order of sessions was counterbalanced across participants. During the initial familiarization session, reference stimuli for the ipsilateral control trials were drawn at random from the pool of reference slants used in the main experiment. During stimulation sessions, the control reference slant was set to that which individual observers could discriminate at 80% performance. Calibration of the eye tracker was performed immediately before the onset of each block in tDCS sessions. Condition order was counterbalanced across simulation sessions and subjects.

### Magnetic resonance spectroscopy

Magnetic resonance scanning was conducted on a 3T Siemens Prisma equipped with a 32-channel head coil. Anatomical T1-weighted images were acquired for spectroscopic voxel placement with an ‘MP-RAGE’ sequence. For detection of GABA, spectra were acquired using a macromolecule-suppressed MEGA-PRESS sequence: echo time = 68 ms, repetition time = 3000 ms 256 transients of 2048 data points were acquired in 13 min experiment time a 14.28 ms Gaussian editing pulse was applied at 1.9 (ON) and 7.5 (OFF) p.p.m. water unsuppressed 16 transients. Water suppression was achieved using variable power with optimized relaxation delays and outer volume suppression. Automated shimming followed by manual shimming was conducted to achieve approximately 12 Hz water linewidth.

Spectra were acquired from three locations a target (V3B/KO) and two control (V1 and M1) voxels (30 × 30 × 20 mm) (Supplementary Figure 8a). The V3B/KO voxel was positioned in the right hemisphere, adjacent to the median line, and rotated in the sagittal and axial planes so as to align with the posterior surface of the brain, while preventing protrusion from the occipital lobe and limiting inclusion of the ventricles. The V1 voxel was placed medially in the occipital lobe, the lower face aligned with the cerebellar tentorium and positioned so to avoid including the sagittal sinus and to ensure it remained within the occipital lobe. The M1 voxel was defined in the axial plane as being centred on the ‘hand knob’ area of the precentral gyrus and aligned to the upper surface of the brain in the sagittal and coronal planes. These locations are commonly used for defining corresponding target and control voxels in studies linking GABA to cognitive processes 15,60 .

Spectral quantification was conducted with GANNET 2.0 61 (Baltimore, MD, USA), a MATLAB toolbox designed for analysis of GABA MEGA-PRESS spectra, modified to fit a double-Gaussian to the GABA peak. Individual spectra were frequency and phase corrected before subtracting ‘ON’ and ‘OFF’, resulting in the edited spectrum (Supplementary Figure 8b). The edited GABA peak was modelled off a double-Gaussian (Supplementary Figure 8c) and values of GABA relative to water (GABA/H2O modelled as a mixed Gaussian–Lorentzian) in institutional units were produced. The fitting residual for water and GABA were divided by the amplitude of their fitted peaks to produce normalized measures of uncertainty. The quadratic of these was calculated to produce a combined measure of uncertainty for each measurement 62,63 . This combined fitting residual was relatively low across all participants for all voxel locations, from 3.8% to 9.4% (mean: 6.6% ± 0.2%).

To ensure that variation in GABA concentrations between subjects was not due to differences in overall structural composition within the spectroscopy voxels, we performed a segmentation of voxel content into GM, WM and cerebrospinal fluid (CSF). This was then used to apply a CSF correction 64 to the GABA/H2O measurements with the following equation:

where (C_<>>) and (C_<>>) are the CSF-corrected and -uncorrected GABA concentrations, respectively, and (f_<>>) and (f_<>>) are the proportion of GM and WM within the voxel. Segmentation was performed using the Statistical Parametric Mapping toolbox for MATLAB (SPM12, http://www.fil.ion.ucl.ac.uk/spm/). The DICOM of the voxel location was used as a mask to calculate the volume of each tissue type (GM, WM and CSF) for both visual and sensorimotor voxels.

### Transcranial direct current stimulation

Direct current stimulation was applied using a pair of conductive rubber electrodes (3 × 3 cm stimulating electrode, 5 × 5 cm reference electrode) held in saline-soaked synthetic sponges and delivered by a battery-driven constant current stimulator (neuroConn, Ilmenau, Germany). For seven participants, functional anatomical scans were used to identify areas V3B/KO in the right hemisphere and then neuronavigational equipment (Brainsight 2, Montreal, Canada) was used to locate the closest point to the centre of mass of this region on subjects’ scalp (Supplementary Figure 9a). The visual cortex electrode was then placed at this location. For the remaining subjects, the average location of this point, relative to positions of the international 10–20 electroencephalography system, was used to place the visual cortex electrode. For all subjects, the reference electrode was placed at position Cz. In the anodal and cathodal conditions, the tDCS current (1 mA) ramped up and down (20 s) before and after continuous application for 20 min. In the sham condition, the current was ramped up then immediately ramped down.

For participants with V3B/KO anatomically localized, FreeSurfer (https://surfer.nmr.mgh.harvard.edu) was used to reconstruct head models from anatomical scans and SimNIBS (http://simnibs.de) used to simulate electric field density resulting from stimulation (Supplementary Figure 9b-d). Simulations indicated that current density was largely unilaterally localized and peaked around V3B/KO.

### Proscriptive integration model

Each primary input (unimodal unit) to the model is specified by its intensity (A) and its slant angle in radians (θ). The slant receptive field for each primary unit was modelled as a one-dimensional von Mises distribution

where θcue_pref indicates cue slant preference. Arbitrarily, θcue_pref takes n = 37 evenly distributed values between (- frac<>><2>) and (frac<>><2>) , and the receptive field size, kcue, was chosen to be 2 producing a slant tuning bandwidth of approximately 10 degrees. The response of each primary unit was assumed to scale linearly with cue intensity, Acue.

Combination units in the model were generated by drawing input from all possible pairs of unimodal units, as denoted a subscript (δ or χ), such that there are 37 × 37 = 1396 combination units. Based on previous empirical evidence 27 we assume that combination units perform a summation of their inputs that increases monotonically, but sublinearly, with stimulus intensity

where E(θ δ χ) denotes the activity of the combination unit with disparity slant preference θ δ and texture slant preference θ χ. The nonlinearity models sublinear response functions of the combination units, which could be mediated by means of synaptic depression or normalization 28 .

The activity of combination units is then passed to a one-dimensional layer of output units. Output units receive input from combination units along the incongruent unimodal preference diagonal with readout weights defined by a cosine, which peaks at unimodal cue preference

where F i denotes the response of the output unit and c denotes a temperature offset that models inhibitory dominance of sensory responses 30 . This offset was assumed to be 0.05 for all simulations.

Activity was converted to firing rate by thresholding negative activity values to zero. The height and position of the peak(s) were used to assess estimate reliability and (slant) position 29 . Unimodal responses, for comparison, were generated by setting one of the cue intensities to zero.

For the simulation of Fig. 2c, cue intensities of A δ = 1 and A χ = 8 were used to achieve a 1:3 ratio of sensitivity to match previous work 7 . For the simulations in Fig. 4, stimulus intensities of A δ = A χ = 1 (single and congruent) and A δ = 1, A χ = 4 (incongruent) were used to match the sensitivity ratios engineered for the behavioural stimuli (1:1, congruent 1:2, incongruent). To simulate variable suppression, an additional parameter (β) was used to attenuate the negative readout weights. For each simulation, β was set to a value drawn at random from a Gaussian distribution (mean = 0.75, σ = 0.1). To simulate individual variability in sensitivity to cues, cue intensity was drawn from a Gaussian distribution (mean = A, σ = 0.1).To compare between simulations, we calculated the reliability of the single/combined cue signals relative to one another. For the simulation of Supplementary Figure 1, we systematically varied the variability of the cue intensities from σ = [0,0.25].

In the tDCS experiments, we observed that sensitivity for congruent cues in the sham conditions was significantly higher than the maximum bound for fusion (that is, the quadratic sum offline, t11 = 4.3, P = 0.001, online, t11 = 3.7, P = 0.003). This is likely to be because disparity and texture were not fully isolated in the single-cue conditions, and that these ‘latent’ cues acted to reduce sensitivity to the ‘single’ cue (see Ban et al. 10 for a discussion of this issue). Thus, to simulate the effects of tDCS with the model (Fig. 5e, f), we first simulated performance in the sham conditions by including a latent cue in single-cue simulations. The intensity of the latent cue was fit using the relative difference between single- and congruent-cue sensitivity (Fig. 5a, c) and held constant for both single-cue simulations. Having fit the model to behaviour in the sham condition, we simulated tDCS by varying two partially free parameters, which independently varied the strength of the positive and negative cosinusoidal readout weights by a factor between zero and one. These parameters model the effects of tDCS on GABAergic (inhibitory) and glutamatergic (excitatory) neurotransmission. We first used the data from the congruent-cue conditions (Fig. 5a, c) to fit these parameters. We then applied these now fixed parameters to the simulation of performance in the incongruent conditions to test their generalizability (Fig. 5b, d).

To simulate perceptual rivalry, the response of the output units X i is driven by activity of constant strength F i from the combination layer and produces mutual inhibition (γ) through lateral connections with weights defined by a half-wave rectified cosine function. The dynamics of the output units are further defined by slow adaptation (α) and stochastic variability (σ)

where S[X i] denotes a sigmoidal transformation (using a Naka-Rushton function) of the activity of X i, W corresponds to Gaussian noise and A θ represents adaptation

For the simulation in Fig. 6a, b, cue intensities of A δ = 1 and A χ = 1 were used to produce the constant activity in the combination layer F(θ). Timescales of τ = 1 and τ A = 125 were used to define the temporal dynamics of inhibition and adaptation γ = α = 7, and the SD of noise was assumed to be σ = 0.005.

Maximum likelihood predictions in Figs. 1c, d and 3c were simulated using the following equations:

## Modellering van perceptuele rivaliteit

Nadat we hebben bekeken dat het beeldmateriaal consistent is met het model dat het voorschrift beschrijft, maken we een laatste opmerking over het nut ervan voor het begrijpen van andere perceptuele verschijnselen. Als tegenstrijdige of dubbelzinnige afbeeldingen worden gepresenteerd aan een kijker, zoals in het geval van binoculaire rivaliteit of het bekijken van een Neckar-kubus, ervaren kijkers meestal perceptuele afwisseling in de loop van de tijd. Traditioneel werd de studie van dergelijke waarnemingen behoorlijk gescheiden gehouden van modellen van routinematige perceptuele verwerking 42, 43 . Daarentegen laat hier zien dat proscription een natuurlijke basis biedt die zowel routinematige perceptuele schattingen als alternerende perceptuele interpretaties herbergt.

Tot nu toe hebben we een robuuste perceptie bij conflictsituaties overwogen wanneer de ene keu aanzienlijk betrouwbaarder is dan de andere. We bekijken nu het geval waar signalen in conflict zijn, maar even betrouwbaar. In dit geval is er geen principiële manier om een ​​keu over de andere te selecteren en het resultaat is typisch bistabiele perceptie 22, 23 . Het proscriptieve integratiemodel past dit gedrag natuurlijk aan: wanneer conflicterende signalen worden gesimuleerd met vergelijkbare betrouwbaarheid, wordt een bimodale populatie-respons waargenomen in de uitvoer (figuur 6a, tijd 0).

Simulatie van perceptuele rivaliteit met het proscriptieve integratiemodel. een perceptuele rivaliteit geproduceerd met het model: tijd 0 geeft de waarschijnlijkheidsfunctie weer die wordt gerepresenteerd in de uitvoerlaag wanneer twee incongruente signalen van gelijke betrouwbaarheid worden gesimuleerd. Wederzijdse remming tussen neuronale representaties (en interne ruis) resulteert in dominantie op tijdstip 1. Aanpassing van de eenheden die de dominante representatie vertegenwoordigen, resulteert in het geleidelijke verval en daaropvolgende heropkomst van de niet-dominante representatie op tijdstip 2. b Waarschijnlijkheid van de gesimuleerde schuine hoeken als een functie van de tijd, die de typische dynamiek van perceptuele afwisseling tonen. c Psychofysische gegevens ontleend aan van Ee et al. 23 . Dit toont het verschil tussen cue-schattingen als een functie van cue-conflicten (zie Online-methoden voor een gedetailleerde beschrijving van gegevensextractie en heranalyse). De blauwe lijn is een fit van het proscriptieve model en het gearceerde gebied geeft de ervaring van perceptuele rivaliteit aan. d toont een voorbeeldonderwerp van van Ee et al. 23 waarin een cue domineert perceptie en rivaliteit is niet ervaren. De blauwe lijn (nauwelijks zichtbaar) toont de fit van het proscriptieve model waarbij aan de twee aanwijzingen heel verschillende betrouwbaarheidswaarden worden toegekend, zodanig dat de een volledig de andere domineert

Afbeelding op volledige grootte

We kunnen perceptuele alternatie modelleren door een combinatie van wederzijdse inhibitie en aanpassing tussen de concurrerende, bimodale representaties aan te nemen 19, 42 . De dynamiek van adaptatie stelt ons in staat om de temporele dynamiek van perceptuele bistabiliteit te simuleren (figuur 6b). In het bijzonder resulteert wederzijdse inhibitie tussen neuronale representaties (en interne ruis) aanvankelijk in de dominantie van één van de aanwijzingen over de andere (figuur 6a, tijd 1). Door aanpassing verandert de activiteit van eenheden die de dominante representatie vertegenwoordigen echter geleidelijk en wordt gevolgd door het opnieuw verschijnen van de niet-dominante representatie (figuur 6a, tijd 2). Deze cyclus vormt de basis van de mate van omwisseling tussen perceptuele schattingen (figuur 6b).

Gezien het mechanisme van wederzijdse inhibitie en aanpassing goed ingeburgerd is, is het misschien niet verrassend dat het hier kan worden gebruikt om bistabiliteit te produceren. Door dit mechanisme echter te gebruiken om het proscriptieve integratiemodel uit te breiden, konden we het gedrag van waarnemers vastleggen die tegenstrijdige aanwijzingen tegen een helling zagen gemeten door een eerdere studie 23 . We hebben met name een reeks cue-conflicten gesimuleerd en de mate van bistabiliteit in de schattingen van het model beoordeeld. Dit stelde ons in staat menselijke psychofysische prestaties vast te leggen die de situaties laten zien waarin bistabiliteit wordt ervaren (figuur 6c, gearceerde blauwe regio). Het vorige werk rapporteerde ook gevallen waarin deelnemers geen bistabiliteit ondervonden (figuur 6d). We hebben dit in het model vastgelegd op basis van het feit dat dergelijke waarnemers ongelijke betrouwbaarheid aan de gepresenteerde aanwijzingen toeschrijven, zodat één keu altijd domineert.

## Experiment 2. Dependence of the Facial Identity After-Effect on Orientation Structure

Key psychophysical support for the psychological validity of the face space model (described in section “Introduction”) comes from the observation that identity after-effects are strongest between faces that belong to the same identity axis (Leopold et al., 2001 Rhodes and Jeffery, 2006). That perception of upright faces seems to disproportionately rely on horizontal information suggests that the horizontal orientation band is largely responsible for carrying the relevant cues to facial identity. If so, identity after-effects should be preferentially driven by horizontal and not by vertical image structure. Experiment 2 addressed this issue directly by measuring identity after-effects when adapting faces contained either broadband, horizontal, or vertical information.

### Materials and Methods

#### Subjects

One of the authors (Steven C. Dakin) and two observers (DK and JAG) naïve to the purpose of the experiment (all wearing optical correction as necessary) participated in the experiment. DK and JAG provided their written informed consent prior to participation. All were experienced psychophysical observers. They familiarized themselves with the two test faces by passively viewing them for a period of at least 15 min before commencing testing. The protocol was approved by the faculty ethics committee.

#### Stimuli

We obtained full-front uniformly lit digital photographs of two male subjects and manually located 31 key-points (Figure 3) on these images. Faces were masked from the background and were normalized to have equal mean luminance. They were scaled and co-aligned with respect to the center of the eyes prior to morphing. We generated morphed versions of these images using custom software written in the Matlab programming environment. Specifically, for a given image and a set of original and transformed key-points, we used the original key-points to generate a mesh over the face using Delaunay triangulation (i.e., we computed the unique set of triangles linking key-points such that no triangle contains any of the key-points) and then used the same point-to-point correspondences with the transformed key-points to generate a set of transformed triangles (Figure 3). We used the original triangles as masks to cut out corresponding image regions from the face and stretched these into registration with the transformed triangles using MatLab’s built-in 2D bilinear interpolation routines (interp2) to perform image stretching. The sum of all these stretched triangles is the morphed face.

Figure 3. The location of the 31 key-points used for morphing is superimposed on an example “morphed-average” faces from our set.

To generate morphs intermediate between the identities of the two faces we first calculated a weighted average of the key-point locations of the two faces:

where w is the weight given to face ȱ. We then morphed each face (I1 and I2) into registration with the new key-points (giving and ) and generated the final image by performing a weighted average (in the image domain) of these two images:

In the experiment we used test faces generated with seven values of w from 0.2 to 0.8 in steps of 0.1. Prior to presentation or filtering, we equated the RMS contrast in all SF bands of the stimulus. This ensured that all unfiltered face stimuli had identical power spectra. Examples of the unfiltered/broadband stimuli are shown in Figure 4A. Filtering methods were identical to those described above: face information was restricted to a Gaussian range of orientation energy (σ = 14°) centered on either horizontal or vertical orientation.

#### Procedure

On un-adapted trials observers were presented with a central fixation marker (200 ms) followed by a single morphed face stimulus that remained on the screen for 1000 ms. Observers then made a categorization response (using the computer keyboard) as to whether the face appeared more like face ȱ or face Ȳ. Responses were not timed but observers were encouraged to respond promptly to reduce overall testing duration. On adapted trials the initial fixation marker was followed by an adapting face stimulus that remained on the screen either for 30 s on the first trial (to build up adaptation) or for 5 s on subsequent trials. To avoid retinal adaptation observers tracked a fixation marker that moved up and down the vertical midline of the face ( ± 0.6° from the center) during the adaptation phase. The adapting face stimulus was always either 100% face ȱ or face Ȳ, and in three different blocked conditions was either (a) unfiltered, or had been restricted to (b) horizontal or (c) vertical information. Note that the test was always a broadband face.

There was a total of seven conditions (no adaptation, and the three adapting conditions with both faces ȱ and Ȳ). The order of testing was randomized on a subject by subject basis. Performance in each condition was evaluated in a run consisting of 56 trials: eight trials at each of the seven stimulus levels (w = 0.2𠄰.8 in steps of 0.1). At least two runs were conducted for each subject, giving a total of at least 784 trials per subject.

Experiments were run under the MATLAB programming environment incorporating elements of the PsychToolbox (Brainard, 1997). Stimuli were presented on a CRT monitor (LaCie Electron Blue 22) fitted with a Bits ++ box (Cambridge Research Systems) operating in Mono ++ mode to give true 14-bit contrast accuracy. The display was calibrated with a Minolta LS110 photometer, then linearized using a look-up table, and had a mean (background) and maximum luminance of 50 and 100 cd/m 2 respectively. The display was viewed at a distance of 110 cm. Face stimuli (adapters and test) were 8.5-cm wide by 11-cm tall subtending 4.4 × 5.7° of visual angle.

#### Data analyses

The probability that a subject categorized a given broadband test face as more like face Ȳ, as a function of the morph level (from face ȱ to face Ȳ) was fitted with a cumulative Gaussian function to give the point of subjective-equality (PSE or bias i.e., the morph level leading to a 50% probability that the stimulus was categorized as face Ȳ) and the precision (the slope parameter of the best fitting cumulative Gaussian, which is equivalent to the stimulus level eliciting 82% correct performance). These parameters were bootstrapped (based on 1024 bootstraps Efron and Tibshirani, 1993) to yield 95% percentile-based confidence intervals that were used to derive PSE error bars (Figures 4D,G,J). Specifically by assuming binomially distributed error of subjects responses (e.g. 8/10 “more like face Ȳ”) at each morph level we could resample the data to generate a series of new response rates across morph level, which we could fit with a psychometric function to yield a new estimates of the PSE. By repeating this procedure we obtained a distribution of PSEs from which we could compute confidence intervals.

### Results and Discussion

Figures 4B–J shows the results from this experiment. Each data point (Figures 4B,C,E,F,H,I) is the probability that a given observer classed a stimulus at some morph level (between faces ȱ and Ȳ) as looking more like face Ȳ. Solid lines are the fit psychometric functions used to estimate a PSE (the morph level leading that observer to be equally likely to categorize the stimulus as either face.) PSEs are plotted in the bar graphs to the right (Figures 4D,G,J). With no adaptation (black/white points, curves and bars) all curves were centered near the 50:50 morph level indicating that all subjects were equally likely to categorize an equal mix of faces ȱ and Ȳ as face ȱ or Ȳ.

Adapting to a broadband version of face ȱ (Figures 4B,E,H purple points and curves) shifted curves leftwards and PSE’s (Figures 4D,G,J purple bars) fell below 0.5 indicating that a stimulus needed to contain < 40% of face Ȳ to be equally likely to be classed as face ȱ or Ȳ. When subjects adapted to a broadband version of face Ȳ (Figures 4C,F,I purple points and curves) the function shifted rightwards and the PSE (Figures 4D,G,J purple bars) were greater than 0.5 indicating that now subjects needed the morph to contain > 60% of face Ȳ to be equally likely to be classed as face ȱ or Ȳ. This is the standard identity after-effect adapting to a given face pushes the subsequent discrimination function towards the adapted end of the morph continuum. The size of our effects is comparable to previous reports (Leopold et al., 2001).

Data from the horizontally and vertically filtered adapter conditions are shown as red and green data points, curves and bars, respectively. Adapting to a horizontally filtered face elicited a shift in the psychometric function for subsequent discrimination that was almost indistinguishable from the effect of adapting to the broadband face (compare purple and red bars) although we note that after adaptation to horizontally filtered face ȱ the psychometric function became shallower (indicating poorer discrimination). By contrast adapting to vertically filtered faces elicited little adaptation leading to bias comparable to estimates from the ‘no-adaptation’ condition (compare green and black/white bars).

Given that we did not compare identity after-effects when adapter and test faces fell on the same or different identity vector(s), we cannot know if the measured after-effects exclusively reflect adaptation to identity. Consequently, there are three alternative reasons, which could account for the weaker identity after-effect observed with vertical compared to horizontal content: (1) the adapted mechanism may be tuned for both identity strength and the orientation of its input, (2) alternatively, it may be tuned for identity strength only (so that if vertically filtered adapter looked more similar to one another – as indicated by the results of Experiment 1 – they might induce less adaptation), or (3) it could relate to the location of the morphing key-points varying more in the horizontal than in the vertical direction.

To investigate accounts (1) and (2) we measured psychometric functions for the identification of upright faces that were morphed between the two test identities. Faces were upright and either broadband or filtered to contain either horizontal or vertical information (with the same bandwidths in the adaptation phase of the main experiment). We also measured identification performance with inverted broadband faces. Results – plotted in Figure 5 – indicate that vertically filtered faces are about three times more difficult to discriminate from one another than horizontally filtered faces (% values in parentheses indicate identity change threshold – i.e., the identity increment leading to 82% correct identification). That discrimination of vertically filtered faces is so poor indicates that poor adaptation may indeed be attributable to these stimuli not eliciting a sufficiently strong sense of identity. We return to this point in the section 𠇍iscussion”.

Figure 5. Psychometric functions are shown for the discrimination of upright/broadband faces (in purple), inverted/broadband faces (in black/white), horizontally filtered upright faces (in red) and vertically filtered/upright faces (in green). Identity thresholds - the identity levels eliciting 82% accurate discrimination - are given in parentheses.

We also looked at whether location of key-points used to morph the faces may have influenced our results. We analyzed the x and y locations of 21 key-points (corresponding to the internal facial features as shown in Figure 3) drawn from 81 male faces. The standard deviation of the y coordinates was 60% higher than of the x coordinates indicating that there was considerably more variation in the vertical than horizontal location of facial features. Because this increase in variation might arise from the elongated aspect ratio of faces (i.e., there’s more room for variation in y coordinates) we also computed the ratio of Fano factors (a mean-corrected measure of dispersion) which was �% higher along y axis than along x axis. The notion that such a difference might contribute to our findings relies on several further assumptions, notably the validity of both the location and sampling density of key-points, the linear relationship between key-point location and discriminability of feature change, etc. Nevertheless, this analysis indicates that the robust identity after-effects observed for horizontally filtered stimuli could at least in part originate from structural properties of faces preferentially supporting the transmission of information through vertical location of features (i.e., along the y axis) (Goffaux and Rossion, 2007 Dakin and Watt, 2009).

With these caveats in mind, the present findings indicate that the visual mechanisms responsible for the representation of face identity, as indexed by identity after-effect, are tuned to horizontal bands of orientation. We proposed that the advantage for encoding face identity based on the vertical arrangement of horizontal face information would be that this information is available across viewpoint changes (Goffaux and Rossion, 2007 Dakin and Watt, 2009) a notion we explicitly test in the next experiment.

## Eye tracking data screening

Before analysis, eye movement data were screened to remove noisy and/or spurious recordings. Owing to the bespoke experimental setup (ie, recording eye position from behind one-way mirrors in a haploscope) and the time-sensitive nature of brain stimulation (ie, leaving insufficient time to redo or restart blocks), the eye tracker would occasionally fail to track participants&apos eyes for an entire block. Of the 28 blocks (19%) that were omitted from the analysis, 27 had < 1% of data collected. We omitted the remaining block because of (physiologically unlikely) variability in eye position signals that indicated noisy tracking performance. Finally, before averaging trials, we removed points exceeding the radius of the stimulus (4.5°).

## Fit poisson distribution to data (histogram + line)

I need to do exactly what @interstellar asked here Fit poisson distribution to data but within the R environment (not matlab).

So, I created a barplot with my observed values and I just need to fit a poisson distribution on it.

the barplot created is the following:

I am still new to R so how I could use the fitdist function or something else to create a line above my barplot?

Any help would be really appreciated.

I have worked out something but I am not sure 100% if it is correct:

However, the curve is not smooth..

## Introduction

Dyslexia is a learning disability that affects between 5% and 17% of the population and poses a substantial economic and psychological burden for those affected 1,2,3 . Despite decades of research, it remains unclear why so many children without obvious intellectual, sensory, or circumstantial challenges find written word recognition so difficult.

One popular and persistent theory is that dyslexia arises as a result of an underlying auditory processing deficit 4,5,6,7,8,9 . According to this theory, a low-level auditory processing deficit disrupts the formation of a child’s internal model of speech sounds (phonemes) during early language learning later, when young learners attempt to associate written letters (graphemes) with phonemes, they struggle because their internal representation of phonemes is compromised 10 .

In line with this hypothesis, many studies report group differences between dyslexics and typical reading control participants in auditory psychophysical tasks including amplitude modulation detection 11,12,13,14,15 , frequency modulation detection 16,17,18,19,20 , rise time discrimination, and duration discrimination 21,22,23,24,25 . Moreover, attributing dyslexia to an auditory deficit is appealing because one of the most effective predictors of reading difficulty is poor phonemic awareness—the ability to identify, segment and manipulate the phonemes within a spoken word 26,27 . If a child’s phoneme representation were abnormal due to auditory processing deficits, it could be a common factor underlying both poor performance in auditory phoneme awareness tests and difficulties learning to decode written words.

In one variant of this auditory hypothesis, dyslexia is thought to involve a deficit specifically in the processing of rapid modulations in sound, usually referred to as a “rapid temporal processing” deficit 5,28,29 . This idea has been controversial (see Rosen 30,31 for review), but also remains as one of the widely cited accounts of auditory deficits in dyslexia 7 . One source of controversy is that the rapid temporal processing deficit is far from universal in Tallal’s 5 study that first proposed a causal relationship between rapid temporal processing and reading ability, only 8 out of 20 dyslexic children showed a deficit on a temporal order judgment task. Similarly, in a more comprehensive study of 17 dyslexic adults, only 11 had impaired performance on an extensive battery of auditory psychophysical tasks 32 . Moreover, in that study, tasks requiring temporal cue sensitivity were not systematically more effective than non-temporal tasks at separating dyslexics from controls. Several studies have also shown that dyslexic participants have heightened sensitivity to allophonic speech contrasts (differences in pronunciation that do not change the identity of the speech sound) 33,34,35 , including contrasts primarily marked by temporal differences in the stimuli, which contradicts the hypothesis that the dyslexic brain lacks access to temporal information. Furthermore, other studies have shown that dyslexics have normal abilities to resolve spectrotemporal modulations in noise, suggesting that apparent speech perception difficulties result from higher-level aspects of processing beyond encoding sounds 36,37,38 . More recently, the rapid temporal processing hypothesis has been reframed as a deficit in processing dynamic, but not necessarily rapid, aspects of speech 10,39,40 .

A second reason to question the link between auditory processing deficits and dyslexia is that the standard technique for data analysis in psychophysical experiments is prone to severe bias. Specifically, in many phoneme categorization studies, participants identify auditory stimuli that vary along some continuum as belonging to one of two groups, a psychometric function is fit to the resulting data, and the slopes of these psychometric functions are compared between groups of dyslexic and control readers (for reviews, see Vandermosten et al. 55 Noordenbos and Serniclaes 35 ). In this approach, a steep slope indicates a clear boundary between phoneme categories, whereas a shallow slope suggests less defined categories (i.e., fuzzy phonological representations), possibly due to poor sensitivity to the auditory cue(s) that mark the phonemic contrast. Unfortunately, most studies in the dyslexia literature fit the psychometric function using algorithms that fix its asymptotes at zero and one, which is equivalent to assuming that categorization performance is perfect at the extremes of the stimulus continuum (i.e., assuming a “lapse rate” of zero). This assumption is questionable in light of the evidence that dyslexics may be less consistent categorizing stimuli across the full continuum 35 , as well as evidence that attention, working memory, or task difficulty, rather than stimulus properties, may underlie group differences between readers with dyslexia and control subjects.

The zero lapse rate assumption is particularly problematic given that fixed asymptotes at zero and one leads to strongly downward-biased slope estimates even when true lapse rates are fairly small 56,57 . In other words, if a participant makes chance errors on categorization, and these errors happen on trials near the ends of the continuum, the canonical psychometric fitting routine used in most previous studies will underestimate the steepness of the category boundary. Thus, a tendency to make a larger number of random errors (due to inattention, memory, or task-related factors) will be wrongly attributed to an indistinct category boundary on the contrast under study. Since it is precisely the subjects with dyslexia that more frequently show attention or working memory deficits, research purporting to show less distinct phoneme boundaries in readers with dyslexia may in fact reflect non-auditory, non-linguistic differences between study populations.

Unfortunately, most studies in the dyslexia literature that use psychometric functions to model categorization performance appear to suffer from this bias. Although most do not report their analysis methods in sufficient detail to be certain, we infer (based on published plots, and the lack of any mention of asymptote estimation) that the bias is widespread (e.g., Reed 58 Manis et al. 59 Breier et al. 60 Chiappe, Chiappe and Siegel 61 Maassen et al. 62 Bogliotti et al. 34 Zhang et al. 63 ). A few studies report using software that in principle supports asymptote estimation during psychometric fitting, but do not report the parameters used in their analysis (e.g., Vandermosten et al. 43,55 ). In one case, researchers who re-fit their psychometric curves without data from the continuum endpoints found a reduced effect size, prompting the authors to wonder whether any effect would remain if an unbiased estimator of slope were used 64 . Only a few studies have fit psychometric functions with free asymptotic parameters or investigated differences in lapse rates 35,65,66 however, it is worth noting that their methods did not constrain the lapse rate, which can lead to upward-biased slope estimates 67 . This is a consequence of the fact that slope and lapse parameters “trade off” in the optimization space of sigmoidal function fits 68 .

In light of all these problems—the inconsistency of findings, confounding influences of experimental task design, and bias introduced by analysis choices—it is reasonable to wonder whether children with dyslexia really have any abnormality in phoneme categorization, once those factors are all controlled for. The present study addresses the relationship between reading ability and phoneme categorization ability, and in particular, whether children with dyslexia show a greater deficit on phoneme contrasts that rely on temporally varying (dynamic) cues, as opposed to static cues. This study avoids the aforementioned methodological problems by using multiple paradigms with different attentional and memory demands, and by analyzing categorization performance using Bayesian estimation of the psychometric function 67 . In this approach, the four parameters of a psychometric function—the threshold, slope, and two asymptote parameters—are assigned prior probability distributions, which formalize the experimenter’s assumptions about their likely values. By allowing the asymptote parameters to vary, the slope is estimated in a less biased way than in traditional modeling approaches with fixed or unconstrained asymptotes. However, because fitting routines trade off between asymptote parameters and the slope parameter of a logistic model in optimization space 68 , it can be difficult to estimate both accurately at the same time. To address this difficulty, we first performed cross-validation on the prior distribution of the asymptote parameters to determine the optimal model to fit the data.

This paper presents data from 44 children, aged 8–12 years, and a wide range of reading abilities. Our experimental task is based on the design of Vandermosten et al. 43,55 , which assessed categorization performance for two kinds of stimulus continua: those that differed based on a spectrotemporal cue (dynamic), and those that differed based on a purely spectral cue (static). In the original study, the authors concluded that children with dyslexia are specifically impaired at categorizing sounds (both speech and non-speech) that differ on the basis of dynamic cues. However, although the dynamic and static stimuli in their study were equated for overall length, the duration of the cues relevant for categorization were not equal: in the dynamic stimuli, the cue (a vowel formant transition) was available for 100 ms, but in the static stimuli, the cue (the formant frequency of a steady state vowel) was available for 350 ms. This raises the question of whether cue duration, rather than the dynamic nature of the cue, was the source of apparent impairment in categorization among participants with dyslexia. The present study avoids this confound by changing the “static cue” stimuli from steady-state vowels to fricative consonants (a /ʃa/

/sa/continuum), so that the relevant cue duration is 100 ms in both the static (/ʃa/

/da/) stimulus continua. Additionally, Vandermosten and colleagues used a test paradigm in which listeners heard three sounds and were asked to decide if the third sound was more like the first or second (an ABX design). Here we included both an ABX task and a single-stimulus categorization task, to see whether the memory and attention demands of the ABX paradigm may have played a role in previous findings. Thus, by (a) assessing categorical perception of speech continua with static and dynamic cues, (b) varying the cognitive demands of the psychophysical paradigm, and (c) empirically determining the optimal parameterization of the psychometric function with cross-validation, we aim to clarify the role of auditory processing deficits in dyslexia.

## Methods

### Participants

Observers were recruited from the University of Cambridge and had normal or corrected-to-normal vision, and were screened for stereo deficits. A priori sample sizes were established using effect sizes from previous MRS 14 and tDCS 35 studies to achieve 90% power. Twenty observers participated in the MRS experiment however, two were not included in the analysis: one withdrew mid-scan and a hardware fault stopped acquisition mid-scan for the other. Eighteen subjects (15 male 17 right-handed 25.1 ± 3.1 years) completed MRS for V3B/KO and M1, of whom 15 also returned for the (control) V1 scan. Twelve observers participated in each of the 5 tDCS experiments, for a total of 34 different observers (19 male 31 right-handed 24 ± 3.6 years). Experiments were approved by the University of Cambridge ethics committee all observers provided written informed consent.

### Apparatus and stimuli

Stimuli were generated in MATLAB (The MathWorks, Inc., Matick, MA) using Psychophysics Toolbox extensions 55,56 . Binocular presentation was achieved using a pair of Samsung 2233RZ LCD monitors (120 Hz, 1680 × 1050) viewed through mirrors in a Wheatstone stereoscope configuration. The viewing distance was 50 cm and participants’ head position was stabilized using an eye mask, head rest and chin rest. Eye movement was recorded binocularly at 1 kHz using an EyeLink 1000 (SR Research Ltd, Ontario, Canada).

Stimuli were virtual planes slanted about the horizontal axis (Fig. 3a). Two cues to slant were independently manipulated: texture and disparity. The texture cue was generated by Voronoi tessellation of a regular grid of points (1° ± 0.1° point spacing) randomly jittered in two dimensions by up to 0.3° 21,57 . Each texture patch had on average 64 texture elements (textels) however, the actual number of textels varied between trials depending on their size. Each textel was randomly assigned a grey level and shrunk about its centroid by 20%, creating the appearance of ‘cracks’ between textels. The width of these cracks also varied as a function of surface slant, thus providing additional texture information. Texture surfaces were mapped onto a vertical virtual surface and rotated about the horizontal axis by the specific texture-defined angle, before a perspective projection consistent with the physical viewing geometry was applied. To isolate the disparity cue, a random-dot stimulus was generated using the same parameters as in the texture stimuli, i.e., an average of 64 dots with randomized grey level assignment. In the single-cue disparity and two-cue conditions, binocular disparity was calculated from the cyclopean view and applied to each vertex/dot based on the specific disparity-defined slant angle.

Surfaces were presented unilaterally (80% left and 20% right of fixation) inside a half-circle aperture (radius 6°) and a cosine edge profile to blur the appearance of depth edges. Stimuli were presented on mid-grey background, surrounded by a grid of black and white squares (75% density) designed to provide an unambiguous background reference. In the stereoscopic conditions, observers could theoretically discriminate surface slant based only on the difference in depth at the top/bottom of a pair of stimuli. Similarly, in the texture-only condition, observers could make judgements based on the difference in textel density at the top/bottom of a pair of stimuli. To minimize the availability of these cues, disparity-defined position was randomized by shifting the surface relative to the fixation plane (0° disparity) to between ± 10% of the total surface depth. Texture-defined position in depth—which corresponded to average textel size—was randomized for each stimulus presentation by increasing point spacing in the initial grid of points by ± 10% 21 .

We presented four cue conditions: 2× single-cue (texture and disparity) and 2× two-cue conditions (congruent and incongruent). Stimuli in the single-cue texture condition were presented monocularly (right eye), whereas all other stimuli were presented binocularly.

### Procedure

Observers performed a two-interval forced-choice discrimination task in which the reference and test stimuli were presented in randomized order (Fig. 3b). Each stimulus was presented for 500 ms with an inter-stimulus interval of 300 ms. Following the offset of the second stimulus, observers were prompted to indicate which stimulus was more slanted (using a keypress) by the fixation cross changing from white to black. No duration limit was enforced for responses, but observers were encouraged to respond quickly. Following a response, the fixation cross was changed back to white and a fixation period of 500 ms preceded the onset of the next trial. A method of constant stimuli procedure was used to control the difference in slant between the reference and test stimuli. The MATLAB toolbox Psignifit 58 (http://psignifit.sourceforge.net/) was used to fit psychometric functions to the data. Sensitivity to slant was derived from the slope of the psychometric function and the point of subjective equality (PSE) from the threshold.

In the congruent-cue condition, reference stimuli consisted of consistent texture and disparity slant (S δ = S χ = 40°). It is noteworthy that we chose this slant angle, as observers sensitivity to disparity and texture cues was similar (at larger angles, observers become relatively more sensitive to the texture cue 59 ). Ensuring similar cue reliabilities (i.e., a 1:1 reliability ratio) gave us the greatest potential to detect the improved performance associated with combination. Specifically, the maximum possible benefit for combining independent cues is a factor of √2 for the case when the two cues have equal reliability this benefit is smaller when the two cues differ in reliability.

As we were testing the robustness of observers’ perception, we designed the stimulus in the incongruent-cue condition such that one cue was more reliable than the other. To achieve this, we took advantage of the fact that sensitivity to texture-defined slant increases with slant angle 59 . This allowed us to manipulate cue reliability, without changing aspects of the stimuli other than slant (i.e., we did not need to add noise or manipulate contrast, which might complicate comparisons between conditions). Specifically, for the incongruent condition, we combined a smaller disparity slant (S δ = 20°) with a larger texture slant (S χ = 50°), yielding a stimulus whose component cue elements differed in reliability (approximately 2:1 ratio). We chose a 2:1 reliability ratio for the incongruent case, as this (i) could be achieved while holding all other stimulus parameters constant between congruent and incongruent conditions (except slant angle), and (ii) was predicted by the model to produce robust behaviour. In addition to the combined conditions, single-cue conditions were included, for each of the slant angles used in the combined stimuli (i.e., S δ = [40°,20°], S χ = [40°,50°]). We also included a test stimulus with 0° texture and disparity slant. This was intended to be easily discriminable from the reference stimuli and thus provide a generalized measure of psychophysical performance by capturing the lapse rate of the observers. In addition, we presented six trials with reference stimuli selected at random at the start of each block to refresh observers’ familiarity with the task. Observers were regularly prompted to maintain fixation throughout the experiment.

In the congruent- and single-cue conditions, the test stimuli were defined by congruent and single cues, within a range ±20° of the reference stimulus (40°) over eight evenly spaced steps, i.e., ± [20.0,14.3,8.5,2.8]. For the incongruent-cue condition, the test stimuli were defined by congruent cues, within a range of ± 25° of the midpoint between the slants defined by the incongruent cues of the reference stimulus (35°) over eight evenly spaced steps, i.e., ± [25.0,17.8,10.7,3.6]. For participants who showed high precision in the incongruent condition during the initial familiarization stage, this range was adjusted to ± 14° to more closely assess their sensitivity. As an incongruent test stimulus was compared against consistent-cue reference stimuli, the PSE in the incongruent condition provides an assessment of the perceived shape of the incongruent stimulus in terms of congruent stimuli.

Before brain imaging/stimulation experiments, participants performed a familiarization session in the laboratory. This was used to introduce participants to viewing the stimuli in the stereoscope and ensure they could perform the slant discrimination task.

For the MRS experiment, participants took part in two further sessions. One session was used to acquire MRS measurements inside the MRI scanner while the participants were at rest (i.e., no active task was performed). The other session measured psychophysical performance on the slant discrimination task under the different experimental conditions. The two sessions were separated by 24–48 h and the order of sessions was counterbalanced across participants. For each condition, observers underwent two blocks of 214 trials. Condition order was randomized.

For the tDCS experiments, participants took part in three experimental sessions (sham, anodal or cathodal). Each session was separated by at least 36 h and the order of sessions was counterbalanced across participants. During the initial familiarization session, reference stimuli for the ipsilateral control trials were drawn at random from the pool of reference slants used in the main experiment. During stimulation sessions, the control reference slant was set to that which individual observers could discriminate at 80% performance. Calibration of the eye tracker was performed immediately before the onset of each block in tDCS sessions. Condition order was counterbalanced across simulation sessions and subjects.

### Magnetic resonance spectroscopy

Magnetic resonance scanning was conducted on a 3T Siemens Prisma equipped with a 32-channel head coil. Anatomical T1-weighted images were acquired for spectroscopic voxel placement with an ‘MP-RAGE’ sequence. For detection of GABA, spectra were acquired using a macromolecule-suppressed MEGA-PRESS sequence: echo time = 68 ms, repetition time = 3000 ms 256 transients of 2048 data points were acquired in 13 min experiment time a 14.28 ms Gaussian editing pulse was applied at 1.9 (ON) and 7.5 (OFF) p.p.m. water unsuppressed 16 transients. Water suppression was achieved using variable power with optimized relaxation delays and outer volume suppression. Automated shimming followed by manual shimming was conducted to achieve approximately 12 Hz water linewidth.

Spectra were acquired from three locations a target (V3B/KO) and two control (V1 and M1) voxels (30 × 30 × 20 mm) (Supplementary Figure 8a). The V3B/KO voxel was positioned in the right hemisphere, adjacent to the median line, and rotated in the sagittal and axial planes so as to align with the posterior surface of the brain, while preventing protrusion from the occipital lobe and limiting inclusion of the ventricles. The V1 voxel was placed medially in the occipital lobe, the lower face aligned with the cerebellar tentorium and positioned so to avoid including the sagittal sinus and to ensure it remained within the occipital lobe. The M1 voxel was defined in the axial plane as being centred on the ‘hand knob’ area of the precentral gyrus and aligned to the upper surface of the brain in the sagittal and coronal planes. These locations are commonly used for defining corresponding target and control voxels in studies linking GABA to cognitive processes 15,60 .

Spectral quantification was conducted with GANNET 2.0 61 (Baltimore, MD, USA), a MATLAB toolbox designed for analysis of GABA MEGA-PRESS spectra, modified to fit a double-Gaussian to the GABA peak. Individual spectra were frequency and phase corrected before subtracting ‘ON’ and ‘OFF’, resulting in the edited spectrum (Supplementary Figure 8b). The edited GABA peak was modelled off a double-Gaussian (Supplementary Figure 8c) and values of GABA relative to water (GABA/H2O modelled as a mixed Gaussian–Lorentzian) in institutional units were produced. The fitting residual for water and GABA were divided by the amplitude of their fitted peaks to produce normalized measures of uncertainty. The quadratic of these was calculated to produce a combined measure of uncertainty for each measurement 62,63 . This combined fitting residual was relatively low across all participants for all voxel locations, from 3.8% to 9.4% (mean: 6.6% ± 0.2%).

To ensure that variation in GABA concentrations between subjects was not due to differences in overall structural composition within the spectroscopy voxels, we performed a segmentation of voxel content into GM, WM and cerebrospinal fluid (CSF). This was then used to apply a CSF correction 64 to the GABA/H2O measurements with the following equation:

where (C_<>>) and (C_<>>) are the CSF-corrected and -uncorrected GABA concentrations, respectively, and (f_<>>) and (f_<>>) are the proportion of GM and WM within the voxel. Segmentation was performed using the Statistical Parametric Mapping toolbox for MATLAB (SPM12, http://www.fil.ion.ucl.ac.uk/spm/). The DICOM of the voxel location was used as a mask to calculate the volume of each tissue type (GM, WM and CSF) for both visual and sensorimotor voxels.

### Transcranial direct current stimulation

Direct current stimulation was applied using a pair of conductive rubber electrodes (3 × 3 cm stimulating electrode, 5 × 5 cm reference electrode) held in saline-soaked synthetic sponges and delivered by a battery-driven constant current stimulator (neuroConn, Ilmenau, Germany). For seven participants, functional anatomical scans were used to identify areas V3B/KO in the right hemisphere and then neuronavigational equipment (Brainsight 2, Montreal, Canada) was used to locate the closest point to the centre of mass of this region on subjects’ scalp (Supplementary Figure 9a). The visual cortex electrode was then placed at this location. For the remaining subjects, the average location of this point, relative to positions of the international 10–20 electroencephalography system, was used to place the visual cortex electrode. For all subjects, the reference electrode was placed at position Cz. In the anodal and cathodal conditions, the tDCS current (1 mA) ramped up and down (20 s) before and after continuous application for 20 min. In the sham condition, the current was ramped up then immediately ramped down.

For participants with V3B/KO anatomically localized, FreeSurfer (https://surfer.nmr.mgh.harvard.edu) was used to reconstruct head models from anatomical scans and SimNIBS (http://simnibs.de) used to simulate electric field density resulting from stimulation (Supplementary Figure 9b-d). Simulations indicated that current density was largely unilaterally localized and peaked around V3B/KO.

### Proscriptive integration model

Each primary input (unimodal unit) to the model is specified by its intensity (A) and its slant angle in radians (θ). The slant receptive field for each primary unit was modelled as a one-dimensional von Mises distribution

where θcue_pref indicates cue slant preference. Arbitrarily, θcue_pref takes n = 37 evenly distributed values between (- frac<>><2>) and (frac<>><2>) , and the receptive field size, kcue, was chosen to be 2 producing a slant tuning bandwidth of approximately 10 degrees. The response of each primary unit was assumed to scale linearly with cue intensity, Acue.

Combination units in the model were generated by drawing input from all possible pairs of unimodal units, as denoted a subscript (δ or χ), such that there are 37 × 37 = 1396 combination units. Based on previous empirical evidence 27 we assume that combination units perform a summation of their inputs that increases monotonically, but sublinearly, with stimulus intensity

where E(θ δ χ) denotes the activity of the combination unit with disparity slant preference θ δ and texture slant preference θ χ. The nonlinearity models sublinear response functions of the combination units, which could be mediated by means of synaptic depression or normalization 28 .

The activity of combination units is then passed to a one-dimensional layer of output units. Output units receive input from combination units along the incongruent unimodal preference diagonal with readout weights defined by a cosine, which peaks at unimodal cue preference

where F i denotes the response of the output unit and c denotes a temperature offset that models inhibitory dominance of sensory responses 30 . This offset was assumed to be 0.05 for all simulations.

Activity was converted to firing rate by thresholding negative activity values to zero. The height and position of the peak(s) were used to assess estimate reliability and (slant) position 29 . Unimodal responses, for comparison, were generated by setting one of the cue intensities to zero.

For the simulation of Fig. 2c, cue intensities of A δ = 1 and A χ = 8 were used to achieve a 1:3 ratio of sensitivity to match previous work 7 . For the simulations in Fig. 4, stimulus intensities of A δ = A χ = 1 (single and congruent) and A δ = 1, A χ = 4 (incongruent) were used to match the sensitivity ratios engineered for the behavioural stimuli (1:1, congruent 1:2, incongruent). To simulate variable suppression, an additional parameter (β) was used to attenuate the negative readout weights. For each simulation, β was set to a value drawn at random from a Gaussian distribution (mean = 0.75, σ = 0.1). To simulate individual variability in sensitivity to cues, cue intensity was drawn from a Gaussian distribution (mean = A, σ = 0.1).To compare between simulations, we calculated the reliability of the single/combined cue signals relative to one another. For the simulation of Supplementary Figure 1, we systematically varied the variability of the cue intensities from σ = [0,0.25].

In the tDCS experiments, we observed that sensitivity for congruent cues in the sham conditions was significantly higher than the maximum bound for fusion (that is, the quadratic sum offline, t11 = 4.3, P = 0.001, online, t11 = 3.7, P = 0.003). This is likely to be because disparity and texture were not fully isolated in the single-cue conditions, and that these ‘latent’ cues acted to reduce sensitivity to the ‘single’ cue (see Ban et al. 10 for a discussion of this issue). Thus, to simulate the effects of tDCS with the model (Fig. 5e, f), we first simulated performance in the sham conditions by including a latent cue in single-cue simulations. The intensity of the latent cue was fit using the relative difference between single- and congruent-cue sensitivity (Fig. 5a, c) and held constant for both single-cue simulations. Having fit the model to behaviour in the sham condition, we simulated tDCS by varying two partially free parameters, which independently varied the strength of the positive and negative cosinusoidal readout weights by a factor between zero and one. These parameters model the effects of tDCS on GABAergic (inhibitory) and glutamatergic (excitatory) neurotransmission. We first used the data from the congruent-cue conditions (Fig. 5a, c) to fit these parameters. We then applied these now fixed parameters to the simulation of performance in the incongruent conditions to test their generalizability (Fig. 5b, d).

To simulate perceptual rivalry, the response of the output units X i is driven by activity of constant strength F i from the combination layer and produces mutual inhibition (γ) through lateral connections with weights defined by a half-wave rectified cosine function. The dynamics of the output units are further defined by slow adaptation (α) and stochastic variability (σ)

where S[X i] denotes a sigmoidal transformation (using a Naka-Rushton function) of the activity of X i, W corresponds to Gaussian noise and A θ represents adaptation

For the simulation in Fig. 6a, b, cue intensities of A δ = 1 and A χ = 1 were used to produce the constant activity in the combination layer F(θ). Timescales of τ = 1 and τ A = 125 were used to define the temporal dynamics of inhibition and adaptation γ = α = 7, and the SD of noise was assumed to be σ = 0.005.

Maximum likelihood predictions in Figs. 1c, d and 3c were simulated using the following equations:

## References

1. Fechner G. Elements of Psychophysics. Hilt, Rinehart & Winston, Inc. 1860/1966

2. Gescheider GA. Psychophysics: The Fundamentals Lawrence Erlbaum Associates. Mahwah, New Jersey: Lawrence Erlbaum Associates 1997.

3. Peirce JW. PsychoPy – Psychophysics software in Python. Journal of Neuroscience Methods. 2007162(1–2):8–13.

4. Peirce JW. Generating stimuli for neuroscience using PsychoPy. doi:10.3389/neuro.11.010.2008 Frontiers in Neuroinformatics. 20092:10.

5. Prins, N., & Kingdom, F.A.A. (2009). Palamedes: MATLAB routines for analyzing psychophysical data. http://www.palamedestoolbox.org.

6. Wichmann FA, Hill NJ. The psychometric function: I Fitting, sampling and goodness-of-fit. Perception and Psychophysics. 2001a63:1293–1313.

7. Wichmann FA, Hill NJ. The psychometric function: II Bootstrap-based confidence intervals and sampling. Perception and Psychophysics. 2001b63:1314–1329.

Classifying Psychophysical Experiments

De acuerdo con nuestro razonamiento, encontramos que la sensibilidad en las condiciones de cue de disparidad y textura no se vio afectada por la aplicación de tDCS (disparidad anódica: t 11 = 1.32, P = 0.21 disparidad catódica: t 11 = 0.58, P = 0.58 textura anódica: t 11 = 0.63, P = 0.54 textura catódica: t 11 = 1.08, P = 0.30 Fig. 5a). En la condición simulada, observamos el beneficio de comportamiento esperado de la combinación: es decir, el rendimiento en la condición de señal congruente fue significativamente mayor que para las condiciones de señal única (disparidad, t 11 = 7.57, P = 1.1e −5 , d de Cohen = 2.18 textura, t 11 = 5.67, P = 1.4e −4 , d de Cohen = 1.63).

Imagen de tamaño completo

Sin embargo, de manera importante, la ventaja de combinar señales en las condiciones de congruencia e incongruencia se redujo mediante la aplicación de tDCS. En particular, encontramos que la tDCS catódica redujo la sensibilidad a las señales congruentes e incongruentes (congruente, t 11 = 5.17, P = 3.0e −4 , d de Cohen = 1.49 incongruente, t 11 = 2.58, P = 0.02, d de Cohen 0.74), mientras que un menor rendimiento bajo la estimulación anódica no fue estadísticamente significativo (congruente, t 11 = 1.33, P = 0.21 incongruente, t 11 = 0.56, P = 0.61 Fig. 5a, b). Esto indica que la perturbación de la excitabilidad cortical alrededor de V3B / KO interrumpió la capacidad de los observadores para integrar la disparidad y las señales de textura, mientras que no afecta la sensibilidad a las señales individuales.

tDCS modula la excitabilidad del tejido neural durante la estimulación (efectos en línea) y siguiendo el desplazamiento de la estimulación (efectos fuera de línea) 35 . Si bien los efectos moduladores de las tDCS en línea y fuera de línea son similares, el trabajo farmacológico indica que el mecanismo neurofisiológico que produce esta modulación puede ser diferente 40 . Por lo tanto, probamos si la integración de la señal también se ve afectada por el tDCS en línea, repitiendo el experimento en un nuevo conjunto de participantes y midiendo ahora el desempeño del comportamiento durante la estimulación. Encontramos el mismo patrón de resultados: la sensibilidad a las señales combinadas congruentes / incongruentes se redujo significativamente durante tDCS catódica (congruente, t 11 = 2.69, P = 0.02, Cohen&aposs d = 0.78 incongruente, t 11 = 2.98, P = 0.01, Cohen&aposs d = 0, 86 Fig. 5c, d). Estos resultados muestran que la integración de la señal se ve afectada por el tDCS tanto en línea como fuera de línea, y proporcionan una réplica de los principales efectos de tDCS en una segunda cohorte de participantes.

Para evaluar si tDCS afectó el sesgo de los observadores, analizamos las diferencias en el punto de igualdad subjetiva entre las condiciones de estimulación. Encontramos efectos marginalmente significativos (medidas repetidas, análisis de varianza (RM ANOVA), fuera de línea: F 2, 22 = 3, 22, P = 0, 06 en línea: F 2, 22 = 2, 72, P = 0, 09) sin embargo, las diferencias fueron pequeñas y en dirección opuesta para la estimulación en línea o fuera de línea. Además, la mayor diferencia entre la estimulación en línea y fuera de línea es entre las condiciones simuladas que proporcionan la línea de base de control (Figura 4 complementaria). Por lo tanto, interpretamos estos resultados como fortuitos.

## Modellering van perceptuele rivaliteit

Nadat we hebben bekeken dat het beeldmateriaal consistent is met het model dat het voorschrift beschrijft, maken we een laatste opmerking over het nut ervan voor het begrijpen van andere perceptuele verschijnselen. Als tegenstrijdige of dubbelzinnige afbeeldingen worden gepresenteerd aan een kijker, zoals in het geval van binoculaire rivaliteit of het bekijken van een Neckar-kubus, ervaren kijkers meestal perceptuele afwisseling in de loop van de tijd. Traditioneel werd de studie van dergelijke waarnemingen behoorlijk gescheiden gehouden van modellen van routinematige perceptuele verwerking 42, 43 . Daarentegen laat hier zien dat proscription een natuurlijke basis biedt die zowel routinematige perceptuele schattingen als alternerende perceptuele interpretaties herbergt.

Tot nu toe hebben we een robuuste perceptie bij conflictsituaties overwogen wanneer de ene keu aanzienlijk betrouwbaarder is dan de andere. We bekijken nu het geval waar signalen in conflict zijn, maar even betrouwbaar. In dit geval is er geen principiële manier om een ​​keu over de andere te selecteren en het resultaat is typisch bistabiele perceptie 22, 23 . Het proscriptieve integratiemodel past dit gedrag natuurlijk aan: wanneer conflicterende signalen worden gesimuleerd met vergelijkbare betrouwbaarheid, wordt een bimodale populatie-respons waargenomen in de uitvoer (figuur 6a, tijd 0).

Simulatie van perceptuele rivaliteit met het proscriptieve integratiemodel. een perceptuele rivaliteit geproduceerd met het model: tijd 0 geeft de waarschijnlijkheidsfunctie weer die wordt gerepresenteerd in de uitvoerlaag wanneer twee incongruente signalen van gelijke betrouwbaarheid worden gesimuleerd. Wederzijdse remming tussen neuronale representaties (en interne ruis) resulteert in dominantie op tijdstip 1. Aanpassing van de eenheden die de dominante representatie vertegenwoordigen, resulteert in het geleidelijke verval en daaropvolgende heropkomst van de niet-dominante representatie op tijdstip 2. b Waarschijnlijkheid van de gesimuleerde schuine hoeken als een functie van de tijd, die de typische dynamiek van perceptuele afwisseling tonen. c Psychofysische gegevens ontleend aan van Ee et al. 23 . Dit toont het verschil tussen cue-schattingen als een functie van cue-conflicten (zie Online-methoden voor een gedetailleerde beschrijving van gegevensextractie en heranalyse). De blauwe lijn is een fit van het proscriptieve model en het gearceerde gebied geeft de ervaring van perceptuele rivaliteit aan. d toont een voorbeeldonderwerp van van Ee et al. 23 waarin een cue domineert perceptie en rivaliteit is niet ervaren. De blauwe lijn (nauwelijks zichtbaar) toont de fit van het proscriptieve model waarbij aan de twee aanwijzingen heel verschillende betrouwbaarheidswaarden worden toegekend, zodanig dat de een volledig de andere domineert

Afbeelding op volledige grootte

We kunnen perceptuele alternatie modelleren door een combinatie van wederzijdse inhibitie en aanpassing tussen de concurrerende, bimodale representaties aan te nemen 19, 42 . De dynamiek van adaptatie stelt ons in staat om de temporele dynamiek van perceptuele bistabiliteit te simuleren (figuur 6b). In het bijzonder resulteert wederzijdse inhibitie tussen neuronale representaties (en interne ruis) aanvankelijk in de dominantie van één van de aanwijzingen over de andere (figuur 6a, tijd 1). Door aanpassing verandert de activiteit van eenheden die de dominante representatie vertegenwoordigen echter geleidelijk en wordt gevolgd door het opnieuw verschijnen van de niet-dominante representatie (figuur 6a, tijd 2). Deze cyclus vormt de basis van de mate van omwisseling tussen perceptuele schattingen (figuur 6b).

Gezien het mechanisme van wederzijdse inhibitie en aanpassing goed ingeburgerd is, is het misschien niet verrassend dat het hier kan worden gebruikt om bistabiliteit te produceren. Door dit mechanisme echter te gebruiken om het proscriptieve integratiemodel uit te breiden, konden we het gedrag van waarnemers vastleggen die tegenstrijdige aanwijzingen tegen een helling zagen gemeten door een eerdere studie 23 . We hebben met name een reeks cue-conflicten gesimuleerd en de mate van bistabiliteit in de schattingen van het model beoordeeld. Dit stelde ons in staat menselijke psychofysische prestaties vast te leggen die de situaties laten zien waarin bistabiliteit wordt ervaren (figuur 6c, gearceerde blauwe regio). Het vorige werk rapporteerde ook gevallen waarin deelnemers geen bistabiliteit ondervonden (figuur 6d). We hebben dit in het model vastgelegd op basis van het feit dat dergelijke waarnemers ongelijke betrouwbaarheid aan de gepresenteerde aanwijzingen toeschrijven, zodat één keu altijd domineert.

## Eye tracking data screening

Before analysis, eye movement data were screened to remove noisy and/or spurious recordings. Owing to the bespoke experimental setup (ie, recording eye position from behind one-way mirrors in a haploscope) and the time-sensitive nature of brain stimulation (ie, leaving insufficient time to redo or restart blocks), the eye tracker would occasionally fail to track participants&apos eyes for an entire block. Of the 28 blocks (19%) that were omitted from the analysis, 27 had < 1% of data collected. We omitted the remaining block because of (physiologically unlikely) variability in eye position signals that indicated noisy tracking performance. Finally, before averaging trials, we removed points exceeding the radius of the stimulus (4.5°).

## Fit poisson distribution to data (histogram + line)

I need to do exactly what @interstellar asked here Fit poisson distribution to data but within the R environment (not matlab).

So, I created a barplot with my observed values and I just need to fit a poisson distribution on it.

the barplot created is the following:

I am still new to R so how I could use the fitdist function or something else to create a line above my barplot?

Any help would be really appreciated.

I have worked out something but I am not sure 100% if it is correct:

However, the curve is not smooth..

## Experiment 2. Dependence of the Facial Identity After-Effect on Orientation Structure

Key psychophysical support for the psychological validity of the face space model (described in section “Introduction”) comes from the observation that identity after-effects are strongest between faces that belong to the same identity axis (Leopold et al., 2001 Rhodes and Jeffery, 2006). That perception of upright faces seems to disproportionately rely on horizontal information suggests that the horizontal orientation band is largely responsible for carrying the relevant cues to facial identity. If so, identity after-effects should be preferentially driven by horizontal and not by vertical image structure. Experiment 2 addressed this issue directly by measuring identity after-effects when adapting faces contained either broadband, horizontal, or vertical information.

### Materials and Methods

#### Subjects

One of the authors (Steven C. Dakin) and two observers (DK and JAG) naïve to the purpose of the experiment (all wearing optical correction as necessary) participated in the experiment. DK and JAG provided their written informed consent prior to participation. All were experienced psychophysical observers. They familiarized themselves with the two test faces by passively viewing them for a period of at least 15 min before commencing testing. The protocol was approved by the faculty ethics committee.

#### Stimuli

We obtained full-front uniformly lit digital photographs of two male subjects and manually located 31 key-points (Figure 3) on these images. Faces were masked from the background and were normalized to have equal mean luminance. They were scaled and co-aligned with respect to the center of the eyes prior to morphing. We generated morphed versions of these images using custom software written in the Matlab programming environment. Specifically, for a given image and a set of original and transformed key-points, we used the original key-points to generate a mesh over the face using Delaunay triangulation (i.e., we computed the unique set of triangles linking key-points such that no triangle contains any of the key-points) and then used the same point-to-point correspondences with the transformed key-points to generate a set of transformed triangles (Figure 3). We used the original triangles as masks to cut out corresponding image regions from the face and stretched these into registration with the transformed triangles using MatLab’s built-in 2D bilinear interpolation routines (interp2) to perform image stretching. The sum of all these stretched triangles is the morphed face.

Figure 3. The location of the 31 key-points used for morphing is superimposed on an example “morphed-average” faces from our set.

To generate morphs intermediate between the identities of the two faces we first calculated a weighted average of the key-point locations of the two faces:

where w is the weight given to face ȱ. We then morphed each face (I1 and I2) into registration with the new key-points (giving and ) and generated the final image by performing a weighted average (in the image domain) of these two images:

In the experiment we used test faces generated with seven values of w from 0.2 to 0.8 in steps of 0.1. Prior to presentation or filtering, we equated the RMS contrast in all SF bands of the stimulus. This ensured that all unfiltered face stimuli had identical power spectra. Examples of the unfiltered/broadband stimuli are shown in Figure 4A. Filtering methods were identical to those described above: face information was restricted to a Gaussian range of orientation energy (σ = 14°) centered on either horizontal or vertical orientation.

#### Procedure

On un-adapted trials observers were presented with a central fixation marker (200 ms) followed by a single morphed face stimulus that remained on the screen for 1000 ms. Observers then made a categorization response (using the computer keyboard) as to whether the face appeared more like face ȱ or face Ȳ. Responses were not timed but observers were encouraged to respond promptly to reduce overall testing duration. On adapted trials the initial fixation marker was followed by an adapting face stimulus that remained on the screen either for 30 s on the first trial (to build up adaptation) or for 5 s on subsequent trials. To avoid retinal adaptation observers tracked a fixation marker that moved up and down the vertical midline of the face ( ± 0.6° from the center) during the adaptation phase. The adapting face stimulus was always either 100% face ȱ or face Ȳ, and in three different blocked conditions was either (a) unfiltered, or had been restricted to (b) horizontal or (c) vertical information. Note that the test was always a broadband face.

There was a total of seven conditions (no adaptation, and the three adapting conditions with both faces ȱ and Ȳ). The order of testing was randomized on a subject by subject basis. Performance in each condition was evaluated in a run consisting of 56 trials: eight trials at each of the seven stimulus levels (w = 0.2𠄰.8 in steps of 0.1). At least two runs were conducted for each subject, giving a total of at least 784 trials per subject.

Experiments were run under the MATLAB programming environment incorporating elements of the PsychToolbox (Brainard, 1997). Stimuli were presented on a CRT monitor (LaCie Electron Blue 22) fitted with a Bits ++ box (Cambridge Research Systems) operating in Mono ++ mode to give true 14-bit contrast accuracy. The display was calibrated with a Minolta LS110 photometer, then linearized using a look-up table, and had a mean (background) and maximum luminance of 50 and 100 cd/m 2 respectively. The display was viewed at a distance of 110 cm. Face stimuli (adapters and test) were 8.5-cm wide by 11-cm tall subtending 4.4 × 5.7° of visual angle.

#### Data analyses

The probability that a subject categorized a given broadband test face as more like face Ȳ, as a function of the morph level (from face ȱ to face Ȳ) was fitted with a cumulative Gaussian function to give the point of subjective-equality (PSE or bias i.e., the morph level leading to a 50% probability that the stimulus was categorized as face Ȳ) and the precision (the slope parameter of the best fitting cumulative Gaussian, which is equivalent to the stimulus level eliciting 82% correct performance). These parameters were bootstrapped (based on 1024 bootstraps Efron and Tibshirani, 1993) to yield 95% percentile-based confidence intervals that were used to derive PSE error bars (Figures 4D,G,J). Specifically by assuming binomially distributed error of subjects responses (e.g. 8/10 “more like face Ȳ”) at each morph level we could resample the data to generate a series of new response rates across morph level, which we could fit with a psychometric function to yield a new estimates of the PSE. By repeating this procedure we obtained a distribution of PSEs from which we could compute confidence intervals.

### Results and Discussion

Figures 4B–J shows the results from this experiment. Each data point (Figures 4B,C,E,F,H,I) is the probability that a given observer classed a stimulus at some morph level (between faces ȱ and Ȳ) as looking more like face Ȳ. Solid lines are the fit psychometric functions used to estimate a PSE (the morph level leading that observer to be equally likely to categorize the stimulus as either face.) PSEs are plotted in the bar graphs to the right (Figures 4D,G,J). With no adaptation (black/white points, curves and bars) all curves were centered near the 50:50 morph level indicating that all subjects were equally likely to categorize an equal mix of faces ȱ and Ȳ as face ȱ or Ȳ.

Adapting to a broadband version of face ȱ (Figures 4B,E,H purple points and curves) shifted curves leftwards and PSE’s (Figures 4D,G,J purple bars) fell below 0.5 indicating that a stimulus needed to contain < 40% of face Ȳ to be equally likely to be classed as face ȱ or Ȳ. When subjects adapted to a broadband version of face Ȳ (Figures 4C,F,I purple points and curves) the function shifted rightwards and the PSE (Figures 4D,G,J purple bars) were greater than 0.5 indicating that now subjects needed the morph to contain > 60% of face Ȳ to be equally likely to be classed as face ȱ or Ȳ. This is the standard identity after-effect adapting to a given face pushes the subsequent discrimination function towards the adapted end of the morph continuum. The size of our effects is comparable to previous reports (Leopold et al., 2001).

Data from the horizontally and vertically filtered adapter conditions are shown as red and green data points, curves and bars, respectively. Adapting to a horizontally filtered face elicited a shift in the psychometric function for subsequent discrimination that was almost indistinguishable from the effect of adapting to the broadband face (compare purple and red bars) although we note that after adaptation to horizontally filtered face ȱ the psychometric function became shallower (indicating poorer discrimination). By contrast adapting to vertically filtered faces elicited little adaptation leading to bias comparable to estimates from the ‘no-adaptation’ condition (compare green and black/white bars).

Given that we did not compare identity after-effects when adapter and test faces fell on the same or different identity vector(s), we cannot know if the measured after-effects exclusively reflect adaptation to identity. Consequently, there are three alternative reasons, which could account for the weaker identity after-effect observed with vertical compared to horizontal content: (1) the adapted mechanism may be tuned for both identity strength and the orientation of its input, (2) alternatively, it may be tuned for identity strength only (so that if vertically filtered adapter looked more similar to one another – as indicated by the results of Experiment 1 – they might induce less adaptation), or (3) it could relate to the location of the morphing key-points varying more in the horizontal than in the vertical direction.

To investigate accounts (1) and (2) we measured psychometric functions for the identification of upright faces that were morphed between the two test identities. Faces were upright and either broadband or filtered to contain either horizontal or vertical information (with the same bandwidths in the adaptation phase of the main experiment). We also measured identification performance with inverted broadband faces. Results – plotted in Figure 5 – indicate that vertically filtered faces are about three times more difficult to discriminate from one another than horizontally filtered faces (% values in parentheses indicate identity change threshold – i.e., the identity increment leading to 82% correct identification). That discrimination of vertically filtered faces is so poor indicates that poor adaptation may indeed be attributable to these stimuli not eliciting a sufficiently strong sense of identity. We return to this point in the section 𠇍iscussion”.

Figure 5. Psychometric functions are shown for the discrimination of upright/broadband faces (in purple), inverted/broadband faces (in black/white), horizontally filtered upright faces (in red) and vertically filtered/upright faces (in green). Identity thresholds - the identity levels eliciting 82% accurate discrimination - are given in parentheses.

We also looked at whether location of key-points used to morph the faces may have influenced our results. We analyzed the x and y locations of 21 key-points (corresponding to the internal facial features as shown in Figure 3) drawn from 81 male faces. The standard deviation of the y coordinates was 60% higher than of the x coordinates indicating that there was considerably more variation in the vertical than horizontal location of facial features. Because this increase in variation might arise from the elongated aspect ratio of faces (i.e., there’s more room for variation in y coordinates) we also computed the ratio of Fano factors (a mean-corrected measure of dispersion) which was �% higher along y axis than along x axis. The notion that such a difference might contribute to our findings relies on several further assumptions, notably the validity of both the location and sampling density of key-points, the linear relationship between key-point location and discriminability of feature change, etc. Nevertheless, this analysis indicates that the robust identity after-effects observed for horizontally filtered stimuli could at least in part originate from structural properties of faces preferentially supporting the transmission of information through vertical location of features (i.e., along the y axis) (Goffaux and Rossion, 2007 Dakin and Watt, 2009).

With these caveats in mind, the present findings indicate that the visual mechanisms responsible for the representation of face identity, as indexed by identity after-effect, are tuned to horizontal bands of orientation. We proposed that the advantage for encoding face identity based on the vertical arrangement of horizontal face information would be that this information is available across viewpoint changes (Goffaux and Rossion, 2007 Dakin and Watt, 2009) a notion we explicitly test in the next experiment.

## RESULTS

### Human studies.

Fitted psychometric function ( μ ^ , σ ^ ) and confidence scaling ( k ^ ) parameters for each of our four subjects for yaw rotation about an earth-vertical rotation axis are shown in Figs. 5 (mean) and 6 (SD) parameter fits are plotted vs. the number of trials in increments of 5 trials starting at the 15th trial. [To demonstrate raw performance for individual test sessions, appendix b (see Fig. B1) presents the parameter fits for each of the six individual tests for each subject.] As described in the methods , all parameter estimates are determined using maximum likelihood methods.

Fig. 5.Summary of human psychometric parameter estimates as trial number increases. Each column represents fitted parameters for 1 subject. A–D: average fitted psychometric width parameter ( σ ^ ). E–H: average fitted confidence scaling factor ( k ^ ). I–L: average fitted psychometric function bias ( μ ^ ). Thick black curves show average psychometric parameter estimates calculated using conventional forced-choice analyses. Thick red curves show average parameter estimates determined by fitting confidence probability judgment data. Errors bars (thin gray curves and thin red curves, respectively) represent SD of parameter estimates.

Fig. 6.SD of human psychometric parameter estimates as trial number increases. Each column represents fitted parameters for one subject in the same order as Fig. 5. A–D: SD of the fitted psychometric width parameter ( σ ^ ). E–H: fitted psychometric function bias ( μ ^ ). Black curves show SD of psychometric parameter estimates calculated using conventional forced-choice analyses. Gray curves show SD of parameter estimates determined via our CSD model fit.

Consistent with previous studies utilizing adaptive procedures (e.g., Chaudhuri and Merfeld 2013 Garcia-Perez and Alcala-Quintana 2005), the conventional estimates of the width of the psychometric function ( σ ^ ) took between 50 and 100 trials to stabilize (Fig. 5, A–D, black curves). More specifically, using these conventional psychometric methods, the estimated width parameter ( σ ^ ) was significantly lower after 20 trials than after 100 trials (repeated measures ANOVA, n = 4 subjects, P = 0.011).

In contrast, estimates of the width parameter ( σ ^ ) using our confidence fit technique required fewer than 20 trials to reach stable levels (Fig. 5, A–D, red curves). Specifically, the width parameter ( σ ^ ) estimated using confidence probability judgments was not significantly different after 20 trials than for 100 trials (repeated-measures ANOVA, n = 4 subjects, P = 0.251). Furthermore, the estimated width parameter after 20 trials using confidence probability judgments was not significantly different from the estimated width parameter after 100 trials using conventional psychometric fit methods (repeated-measures ANOVA, n = 4 subjects, P = 0.907).

Furthermore, the parameter estimates obtained using conventional psychometric fits (Fig. 6, black traces) were more variable than the fits obtained using our CSD model (Fig. 6 gray traces). In fact, the precision of the psychometric width estimate using the confidence model was about the same after 20 trials (average SD of 0.124 across subjects) as the conventional psychometric fit estimate after 100 trials (0.129).

The estimates of the shift of the psychometric functions ( μ ^ ) showed a qualitatively similar pattern the estimates that utilized confidence reached stable levels a little sooner and were more precise than the estimates provided by the conventional analysis. We also note that three of our subjects seemed well calibrated (Fig. 5, E–G) with fitted confidence-scaling factors near 1, while the second subject had a fitted confidence-scaling factor near 2 (Fig. 5H), suggesting substantial underconfidence.

### Simulations.

We also simulated tens of thousands of test sessions to test the confidence fit procedures more thoroughly. The simulations were designed to mimic the human studies with the obvious difference being that we defined the simulated psychometric [Ψ(x)] and confidence [χ(x)] functions. Since we knew these simulated functions, this allowed us to quantify parameter fit accuracy. For all simulated data sets, we fit the conventional binary forced-choice data and compared and contrasted these fits with the CSD fits. Histograms show fitted parameters after 20 (Fig. 7, A–C) and 100 (Fig. 7, D–F) trials for 10,000 simulations. After as few as 20 trials, the CSD fit parameters demonstrated relatively tight distributions (Fig. 7, B and C) compared with the binary fits that show ragged distributions (Fig. 7A). After 100 trials, the binary fit parameters demonstrated relatively tight distributions (Fig. 7D) that mimicked those found for the CSD fit parameters after 20 trials (Fig. 7, B and C). The CSD fit parameters after 100 trials (Fig. 7, E and F) demonstrated higher precision (i.e., lower variance) than the binary fit parameters after 100 trials (Fig. 7D). (See Fig. B2 for similar histograms for 100 trials for the other 2 simulation data sets.)

Fig. 7.Parameter distributions show parameter estimates for 10,000 simulated experiments with 20 and 100 trials. The columns from left to right represent the fitted psychometric width parameter ( σ ^ ), the fitted confidence scaling factor ( k ^ ), and the fitted psychometric function bias ( μ ^ ) as shown on the x-axis at bottom. A and D: fitted parameters of conventional binary forced-choice parameter estimates. B and E: fitted parameters estimates determined via our CSD model fit for a well-calibrated subject (k = 1). C and F: fitted parameters estimates determined via our CSD model fit for an underconfident subject (k = 2). The solid black line shows the actual parameter value (i.e., μ = 0.5 or σ = 1), the solid gray line shows the mean of fitted parameters, and the dashed gray lines indicate SD on each side of the mean.

Mimicking the format previously used for the human data (Figs. 5 and 6) simulation parameter fits are plotted vs. the number of trials in increments of 5 trials starting at the 15th trial. The black curves in Figs. 8 and 9 show the fitted psychometric function parameters for the binary forced choice data the red (Fig. 8) and gray (Fig. 9) curves show the fitted psychometric and confidence function parameters fit using the CSD model. (To provide direct quantitative comparisons, appendix b summarizes data from all simulations in tabular form.)

Fig. 8.Summary of simulation parameter estimates as trial number increases. As illustrated via insets, each column represents different simulated combinations of the confidence function (red solid curves) and the fitted confidence function (red dashed curves). A, E, and I: well-calibrated subject (k = 1) when both confidence and confidence fit functions are cumulative Gaussians. B, F, and J: underconfident subject (k = 2) when both confidence and confidence fit functions are cumulative Gaussians. C, G, and K: underconfident subject when the confidence function is linear, χ(x) = m(x − μ) + 0.5 = 0.1443x + 0.428, with added zero-mean uniform noise [U(−0.1,+0.1)], and the confidence fit function is a cumulative Gaussian. D, H, and L: underconfident subject with the same linear confidence function with added zero-mean uniform noise [U(−0.05,+0.05)] when the confidence fit function is linear, χ ^ ( x ) = m ^ ( x − μ ^ ) + 0.5 . A–D: fitted psychometric width parameter ( σ ^ ). E–G: fitted confidence-scaling factor ( k ^ ). H: fitted slope of confidence function. I–L: fitted psychometric function bias ( μ ^ ). Thick black curves show average conventional forced-choice parameter estimates, which are identical for all conditions. Thick red curves show average parameter estimates determined by fitting confidence probability judgments. Errors bars (thin gray curves and thin red curves, respectively) represent SD of parameter estimates.

Fig. 9.SD of simulation parameter estimates as trial number increases. Each column represent the same conditions as Fig. 8. A–D: fitted psychometric width parameter ( σ ^ ). E–H: fitted psychometric function bias ( μ ^ ). Black curves show SD of conventional forced-choice parameter estimates, which are identical for all conditions. Gray curves show SD of parameter estimates determined via our CSD model fit.

The simulated data (Figs. 8, A, E, and I, and 9, A and E and Tables B1–B3, row 2) show that the CSD model yielded fit parameters that accurately matched those simulated when the simulated subject's confidence was well calibrated (k = 1), where “well calibrated” means that the subject's confidence matches the psychometric function, χ(x) = ϕ(xμ + 0.5, = 1). Even when the subject's confidence was not well calibrated (k = 2), the confidence fit parameters matched the three confidence function parameters well (Figs. 8, B, F, and J, and 9, B and F, and Tables B1–B3, row 3). In fact, except that the fitted confidence-scaling factor ( k ^ ) settles near a value of 2 (Fig. 8F) instead of 1 (Fig. 8E), the average psychometric parameter estimates for an underconfident subject appeared nearly the same as for a well-calibrated subject. Indeed, the fitted psychometric width parameter ( σ ^ ) demonstrated a lower SD for an underconfident subject than for a well-calibrated subject (see appendix b ).

To demonstrate robustness, we utilized the same Gaussian confidence fit model (Eq. 2) while simulating a confidence model that differed from the Gaussian confidence fit model in two ways. First, we modeled the confidence function as a linear function (slope of 0.1445 i.e., σ = 2) instead of a cumulative Gaussian. In addition, secondly, we added zero-mean uniform noise, U(−0.1,0.1), to the simulated confidence response. Despite these differences, the confidence fit of these simulated data mimics the earlier confidence fits well (Figs. 8, C, G, and K, and 9, C and G, and Tables B1–B3, row 4). The primary difference is that the parameter fit precision was not as good as for the first two simulation sets described above but was still better than for the conventional fits. For example, despite the severe noise (−10% to +10%), the fit precision for the width parameter ( σ ^ ) after 20 trials utilizing confidence matched the fit precision after about 50 trials using conventional analyses.

Finally, to demonstrate the flexibility of the confidence fit technique, we model the same linear confidence function from the previous paragraph, but we now add less extreme zero-mean uniform noise levels U(−0.05,0.05) and fit a linear confidence function that mimics the linearity of the true confidence function used for these simulations. The fit accuracy and precision were very good (Figs. 8, D, H, and L, and 9, D and H, and Tables B1–B3, row 5), demonstrating that the fitted psychometric function and confidence function need not be similar in form. (For some conventional confidence metrics, including goodness of fit parameters, see Table B4.)

1. Marius

And all the same it turns - Galileo

2. Kyle

the result will be good

3. Romain