Information

Evidence that positive rewards are learnt faster than negative rewards?

Evidence that positive rewards are learnt faster than negative rewards?



We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Its folk psychology wisdom that its easier to reward positive behavior than punish negative behavior (e.g. any book on parenting or dog training), but is there any evidence in the cognitive science literature that this is indeed the case? And if so, then have possible mechanisms for this phenomena been proposed?

For some context, I'm asking this because coming from a computational perspective this doesn't really make sense. The typical reinforcement learning algorithm tries to maximize the expected discounted cumulative reward of behavior, and it makes no difference if that entails avoiding negative rewards or seeking out positive rewards.


This is a great question.

Short answer: No, the evidence does not suggest that positive reinforcement is universally more effective than negative reinforcement or punishment. However, there are still good reasons to focus on rewards over punishment in real-life training/learning situations.

Long answer:

The trouble for folk psychology began with Skinner's somewhat unfortunate choice of terminology… While Skinner advocated positive reinforcement over negative reinforcement or punishment, it seems to have remained an industry secret that what he meant by those terms is not what the general public thinks he meant.

The modern view of positive and negative reinforcement is that they are essentially synonyms. They are different ways of looking at the same thing, like describing a glass of water based on how full or how empty it is. Computationally, as you say, learning algorithms that assign positive values to targets or negative values to non-targets are mathematically equivalent.

Although they are often confused with positive and negative reinforcement, rewards and aversives are different terms with different meanings. Testing whether one is more effective than the other is tricky, as in practice they are usually qualitatively different. For example, is ice cream or spanking more effective for getting your kid to do their homework? The answer is: It depends - how much ice cream, how much spanking… ? Surely we can find a ratio of ice cream to spanking at which they are equally effective. The type of research that examines this question is interested in determining where that boundary is (here is an example). Thought experiment: How can qualitatively different feedback mechanisms be applied in machine learning?

Research more likely to answer your question compares positive and negative feedback that is arguably qualitatively equivalent. For example, compare gaining money to avoiding loss of money, reducing risk to avoiding increase in risk, and it's even possible to compare the effects on animals that are trained in a token economy. In recent years, the folk psychology idea that positive feedback is universally superior to negative feedback has been called into question by such research. A few examples:

  • Michael Perone reviews research demonstrating undesirable effects of positive reinforcement.
  • Comparing "well done!" to "got it wrong this time", Eveline Crone et al found that before the age of twelve, children perform better with positive feedback, but older children and adults do better with negative feedback.
  • A study comparing "keep the word Good on the screen" to "keep the word Bad off the screen" found that negative reinforcement may be more effective for some types of learning.
  • Ayelet Fishbach and Stacey Finkelstein review research that suggests that negative feedback is more effective for experts, and motivates goal pursuit when it signals insufficient progress.
  • Lots of research into Loss Aversion suggests that avoiding the loss of money is a more powerful motivator than gaining an equivalent amount of money. However, this phenomenon may not translate to an effect on learning.
  • Despite all this, note that the evidence in favour of positive reinforcement is much greater - these are just some notable exceptions.

Neurological reasons for the difference in effectiveness between positive and negative feedback have been proposed - for example by Eveline Crone in the study cited above.

Training using rewards is preferred for animal trainers, parents, and teachers in most practical situations:

  • There is usually a much wider range of undesirable behaviour to punish than desirable behaviour to reward. Animals and young children have difficulty determining what the desirable behaviour is with only clues about what behaviours are undesirable. Adults are easier to work with because you can explain the desirable behaviour in words.
  • Punishment must be applied to all undesirable behaviour to be effective, while reward need only be applied to desirable behaviour, and even then only intermittently, to be effective.
  • Due to classical conditioning, subjects may attribute punishment to factors unrelated to their behaviour, such as the trainer or the classroom. They may learn to avoid the trainer, or avoid getting caught, rather than the desirable behaviour.
  • Notice how none of these points apply to machine learning, where the process is structured, constrained, and automated.

John Maag summarizes reasons to promote positive reinforcement in schools: Namely that teachers often find it convenient and effective (in the short-term) to administer punishment and are overly reliant on it in many situations where positive reinforcement would be more effective and desirable.

Nice video on the topic.


A Word From Verywell

The negativity bias can have a powerful impact on your behavior, but being aware of it means that you can take steps to adopt a more positive outlook on life. Taking a more mindful approach that involves being aware of your own tendency toward negativity and consciously elevating happier thoughts to the forefront of awareness—this is one of the best ways to combat negative bias.  

Ruminating on the negative can take a serious toll, so taking steps to combat this bias can play a role in boosting your mental well-being.


Behavior Modification

Behavior modification is a psychotherapeutic intervention primarily used to eliminate or reduce maladaptive behavior in children or adults. While some therapies focus on changing thought processes that can affect behavior, for example, cognitive behavioral therapy, behavior modification focuses on changing specific behaviors with little consideration of a person’s thoughts or feelings. The progress and outcome of the intervention can be measured and evaluated. A functional analysis of the antecedents and consequences of the problem behavior(s) must be identified. This leads to the creation of the specific target behaviors that will become the focus of change. Then, certain variables can be manipulated via reinforcers and punishments to change problem behavior(s). The goal is to eliminate or reduce the maladaptive behavior.

Behavior modification is a type of behavior therapy. B. F. Skinner demonstrated that behavior could be shaped through reinforcement and/or punishment. Skinner noted that a reinforcer is a consequence that increases the likelihood of behavior to recur, while punishment is a consequence that decreases the chance. Positive and negative are used in mathematical terms. Positive indicates that something is added, and negative indicates something is subtracted or taken away. Thus, positive reinforcement occurs when a behavior is encouraged by rewards. If a child enjoys candy and cleaning the room is the desired behavior, the candy is a positive reinforcer (reward) because it is something that is given or added when the behavior occurs. This makes the behavior more likely to recur. Negative reinforcement is removing a stimulus as the consequence of behavior but results in a positive outcome for the individual. For example, a fine is dropped, and a person no longer has to go to jail. The removal of the negative stimulus (the fine) results in a positive outcome for the individual, no jail time.

Conversely, positive punishment is the addition of an adverse consequence. For example, a child gets spanked when he crosses the street without holding his mother’s hand. He then no longer crosses the street alone. The spanking is positive punishment because it is a consequence added to the situation that decreases the likelihood of the child crossing the street alone. Negative punishment is taking away favorable consequences to reduce an unwanted behavior. For example, if Emily doesn’t finish her homework on time, her cell phone gets taken away. She makes it a priority to finish her homework immediately after school before she does anything else. Removal of the cell phone would be a “negative” because it takes something away, decreasing the chance that she won’t finish her homework the next time.

Reinforcement and punishment both work independently, as well as together, as part of a behavior plan. Positive reinforcement works exceedingly better and faster than punishment. In child psychiatry, parents often come to the office angry and frustrated with their child because “nothing works.” They have tried multiple types of punishments when bad behavior has occurred using removal of toys or privileges away or placing a child in time out. Often positive types are not being reinforced. One immediate benefit of behavior modification plans is the shift away from solely punishing unwanted behavior to also rewarding good behavior.

(Table 1, Scott and Cogburn, 2017)

In table 1, note that punishment and reinforcement have nothing to do with good or bad behavior, only if it increases or decreases the likelihood of the behavior to recur.

There are several schedules of reinforcement that can impact behavior. When a behavior plan is initially set up, continuous two is used to establish and reinforce the behavior. Once the behavior has been established, continuous reinforcement can change to intermittent reinforcement which is termed thinning. There are four types of intermittent reinforcement. They are:

Fixed interval where the person is reinforced by a set number of responses

Variable interval where the person is reinforced by a variable number of responses

Fixed ratio where the person is reinforced after a certain number of responses

Variable ratio where the person is reinforced after a variable number of responses. Variable ratio intermittent reinforcement is the most effective schedule to reinforce a behavior.

Fixed interval: rewarding a person at the end of each day

Variable interval: rewarding a person sometimes at the end of the day, sometimes at the end of the week, sometimes every few days

Fixed ratio: rewarding a person after completing the desired behavior four times

Variable ratio: rewarding a person after completing the desired behavior after three times, then after six times, then after two times. Gambling is a real-world example of a variable ratio of reinforcement.


Negative (positive) contrast effect

Negative and positive contrast are used in regard to the amount of reinforcement a subject or participant receives in a given situation. Negative contrast is when a subject starts out with a large reinforcement and then after awhile, are suddenly shifted to a smaller reinforcer, thereby decreasing their behavior. Ώ] Positive contrast on the other hand is when a subject or participant is given an ok or small reward and then is suddenly shifted to a larger reward, which then increases the behavior. ΐ] This theory is important in the understanding of learning. If someone is given a small reward when the learning is first presented, but then later on given a larger reward, they will show more desire to continue learning. Α]


Washington University Open Scholarship

Schizophrenia (SCZ) is characterized by severe cognitive impairments and amotivation, generally referred to as negative symptoms, including anhedonia and/or avolition. Amotivation tends to exist in prodromal patients and persist over the illness course regardless of successful antipsychotic medications, which are known to reduce positive symptoms, including hallucination and delusions (e.g., (Horan, Blanchard, Clark, & Green, 2008 Tarbox et al., 2013). Importantly, amotivation is a promising predictor for later social functioning in SCZ, even after accounting for patients' cognitive impairments (e.g., (Evensen et al., 2012 Faerden et al., 2010). Despite this crucial impact on functioning outcome in SCZ, to date, no study has systematically investigated neural mechanism underlying amotivation in SCZ.

To date, it has been well documented that many of cognitive impairments in SCZ may reflect a core deficit of non-emotional context processing, supported by the dorsolateral prefrontal cortex (DLPFC), and defined by the ability to maintain non-emotional context information necessary to regulate upcoming behavioral response towards goal-directed behavior (e.g., (Cohen, Barch, Carter, & Servan-Schreiber, 1999). Recent evidence from both animal and healthy human neuroimaging work suggests that the DLPFC plays a crucial role in representing and integrating reward-related context information. However, it has been unexplored whether individuals with SCZ can represent and integrate reward-related contextual information to modulate cognitive control function implicated in the DLPFC.

Thirty-six individuals with SCZ and twenty-seven healthy controls (HC) underwent behavioral and fMRI data collection at 3Tsela while performing a modified response conflict processing task under two contexts, that is, no-reward baseline and reward contexts. Participants first performed baseline conditions without any knowledge regarding the future potential for incentives (Baseline-Context BCXT). Each trial started with a baseline cue, "XX" that was pre-instructed to participants as being irrelevant to the task. After each cue, "XX," either a house or building picture (with overlaid words that are either congruent or incongruent) was presented to each participant one at a time. The job of the task was to categorize each picture as either a house or a building by pressing a certain button while ignoring the overlaid word. Following the baseline condition, participants performed additional reward blocks on which they were told that they could win money on some trials by performing fast (faster than their median correct reaction times (RT) in the baseline and accurately). Each trial was then preceded either by a "$20" cue (Reward-Cue RC), indicating that a fast and correct response would be rewarded or by a "XX" cue (Reward-Context RCXT), indicating zero money would be possible on the trial. After the target stimulus, participants received immediate feedback regarding the reward points they earned on the trials, as well as their cumulative earning in points.

As such, this response conflict task paradigm enabled examination of: (1) reward context effects by comparing performance and brain activity when the cue, "XX" was presented in the baseline context versus in the reward context (BCXT vs. RCXT trials cued by the same cue, "XX") and (2) reward cue effects by comparing performance during RC (cued by "$20") versus RCXT (cued by "XX") within reward blocks. Importantly, by employing a mixed state-item fMRI design, I investigated both sustained (block-based) context-dependent and transient (trial-by-trial) reward-related cue activity at both behavioral and neural levels.


Evidence that positive rewards are learnt faster than negative rewards? - Psychology

Learning active or reactive responses to fear involves different brain circuitry. This study examined how the nuclus accumbens (NAc), dorsal hippocampus (DH) and medial prefrontal cortex (mPFC) may interact in memory processing for these two kinds of responses. Male Wistar rats with cannulae implanted in these areas were trained on a contextual fear conditioning or inhibitory avoidance task that respectively engaged a reactive or active response to fear in the test. Immediately after training, a memory modulating factor released by stress, norepinephrine (NE), was infused into one region and 4% lidocaine into another to examine if an upstream activation effect could be blocked by the downstream suppression. Retention tested 1 day later showed that in both tasks posttraining infusion of NE at different doses into either the DH or mPFC enhanced retention but the enhancement was blocked by concurrent infusion of lidocaine into the other region, suggesting reliance of the effect on functional integrity of both regions. Further, posttraining intra-NAc lidocaine infusion attenuated memory enhancement of NE infused to the DH or mPFC in the inhibitory avoidance task but did not do so in contextual fear conditioning. These results suggest that NE regulation of memory formation for the reactive and active responses to fear may rely on distinct interactions among the DH, mPFC and NAc.

Acute stress impairs set-shifting but not reversal learning

The ability to update and modify previously learned behavioral responses in a changing environment is essential for successful utilization of promising opportunities and for coping with adverse events. Valid models of cognitive flexibility that contribute to behavioral flexibility include set-shifting and reversal learning. One immediate effect of acute stress is the selective impairment of performance on higher-order cognitive control tasks mediated by the medial prefrontal cortex (mPFC) but not the hippocampus. Previous studies show that the mPFC is required for set-shifting but not for reversal learning, therefore the aim of the present experiment is to assess whether exposure to acute stress (15 min of mild tail-pinch stress) given immediately before testing on either a set-shifting or reversal learning tasks would impair performance selectively on the set-shifting task. An automated operant chamber-based task, confirmed that exposure to acute stress significantly disrupts set-shifting but has no effect on reversal learning. Rats exposed to an acute stressor require significantly more trials to reach criterion and make significantly more perseverative errors. Thus, these data reveal that an immediate effect of acute stress is to impair mPFC-dependent cognition selectively by disrupting the ability to inhibit the use of a previously relevant cognitive strategy.

Effects of stereotypic behaviour and chronic mild stress on judgement bias in laboratory mice

Cognitive processes are influenced by underlying affective states, and tests of cognitive bias have recently been developed to assess the valence of affective states in animals. These tests are based on the fact that individuals in a negative affective state interpret ambiguous stimuli more pessimistically than individuals in a more positive state. Using two strains of mice we explored whether unpredictable chronic mild stress (UCMS) can induce a negative judgement bias and whether variation in the expression of stereotypic behaviour is associated with variation in judgement bias. Sixteen female CD-1 and 16 female C57BL/6 mice were trained on a tactile conditional discrimination test with grade of sandpaper as a cue for differential food rewards. Once they had learned the discrimination, half of the mice were subjected to UCMS for three weeks to induce a negative affective state. Although UCMS induced a reduced preference for the higher value reward in the judgement bias test, it did not affect saccharine preference or hypothalamic–pituitary–adrenal (HPA) activity. However, UCMS affected responses to ambiguous (intermediate) cues in the judgement bias test. While control mice showed a graded response to ambiguous cues, UCMS mice of both strains did not discriminate between ambiguous cues and tended to show shorter latencies to the ambiguous cues and the negative reference cue. UCMS also increased bar-mouthing in CD-1, but not in C57BL/6 mice. Furthermore, mice with higher levels of stereotypic behaviour made more optimistic choices in the judgement bias test. However, no such relationship was found for stereotypic bar-mouthing, highlighting the importance of investigating different types of stereotypic behaviour separately.

Sexually divergent changes in select brain proteins and neurosteroid levels after a history of ethanol drinking and intermittent PTSD-like stress exposure in adult C57BL/6J mice

Human studies reported that the number of past-year stressors was positively related to current drinking patterns, including binge drinking. In animal models, exposure to predator odor stress (PS), considered a model of traumatic stress, consistently increased ethanol intake. Recently, we reported that repeated PS significantly increased ethanol intake and had a synergistic interaction with prior binge drinking (binge group) in male but not in female C57BL/6J mice, when compared to mice without prior binge exposure (control group). The current studies utilized plasma and dissected prefrontal cortex (PFC) and hippocampal tissue from these animals and from age-matched naïve mice (naïve group). Western blots assessed relative protein levels of P450scc (an enzyme involved in the first step of steroidogenesis), of GABAA receptor α2 and α4 subunits, and of two proteins involved in synaptic plasticity – ARC (activity-regulated cytoskeletal protein) and synaptophysin. Gas chromatography-mass spectrometry simultaneously quantified 10 neurosteroid levels in plasma. A history of ethanol drinking and PS exposure produced brain regional and sex differences in the changes in proteins examined as well as in the pattern of neurosteroid levels versus (vs.) values in naïve mice. For instance, P450scc levels were significantly increased only in binge and control female PFC and hippocampus vs. naïve mice. Some neurosteroid levels were significantly altered by binge treatment in both males and females, whereas others were only significantly altered in males. These sexually divergent changes in neurosteroid and protein levels add to evidence for sex differences in the neurochemical systems influenced by traumatic stress and a history of ethanol drinking.

Broadening the etiological discourse on Alzheimer's disease to include trauma and posttraumatic stress disorder as psychosocial risk factors

Biomedical perspectives have long dominated research on the etiology and progression of Alzheimer's disease (AD) yet these approaches do not solely explain observed variations in individual AD trajectories. More robust biopsychosocial models regard the course of AD as a dialectical interplay of neuropathological and psychosocial influences. Drawing on this broader conceptualization, we conducted an extensive review of empirical and theoretical literature on the associations of trauma, posttraumatic stress disorder (PTSD) and AD to develop a working model that conceptualizes the role of psychosocial stressors and physiological mechanisms in the onset and course of AD. The proposed model suggests two pathways. In the first, previous life trauma acts as a risk factor for later-life onset of AD, either directly or mediated by PTSD or PTSD correlates. In the second, de novo AD experiential trauma is associated with accelerated cognitive decline, either directly or mediated through PTSD or PTSD correlates. Evidence synthesized in this paper indicates that previous life trauma and PTSD are strong candidates as psychosocial risk factors for AD and warrant further empirical scrutiny. Psychosocial and neurological-based intervention implications are discussed. A biopsychosocial approach has the capacity to enhance understanding of individual AD trajectories, moving the field toward ‘person-centered’ models of care.

Predator-scent stress, ethanol consumption and the opioid system in an animal model of PTSD

Emerging literature points to stress exposure as a potential contributor to the development of alcohol abuse, but animal models have yielded inconsistent results. Converging experimental data indicate that the endogenous opioid system modulates alcohol consumption and stress regulation. The aim of the present study is to examine the interplay between stress exposure, behavioral stress responses, ethanol (EtOH) consumption and the endogenous opioid system in an animal model of posttraumatic stress disorder. Rats were exposed to stress and then tested in a two-bottle free choice (TBC) assay or in a conditioned place preference paradigm. In some experiments, the endogenous opioid system was pharmacologically manipulated prior to stress exposure. The behavioral outcomes of stress exposure were assessed in an elevated plus-maze, with the acoustic startle response, and by monitoring the freezing response to trauma reminder. Immunoreactivity of phosphorylated opioid receptors in hippocampal subregions was also measured. Stress significantly increased the consumption of EtOH in the TBC assay. The severity of the behavioral response to stress was associated with EtOH consumption, cue-triggered freezing response to a trauma reminder, and endogenous levels of phosphorylated opioid receptors in the hippocampus. Pharmacologically manipulating the endogenous opioid system prior to stress exposure attenuated trauma cue-triggered freezing responses and blocked predator scent stress-induced potentiation of EtOH consumption. These data demonstrate a stress-induced potentiation of EtOH self-administration and reveal a clear association between individual patterns of the behavioral response to stress and alcohol preference, while indicating a role for the endogenous opioid system in the neurobiological response to stress.


Discussion

This study sought to determine whether the impact of reward and punishment generalizes across different types of motor skill learning, as implemented using a Serial Reaction Time Task (SRTT) and a Force Tracking Task (FTT). We found that punishment had opposing effects on performance of the two skills. During performance of the SRTT, training with punishment led to improved reaction times overall with minimal detriment to accuracy. In contrast, punishment impaired performance of the FTT. These effects were only present whilst feedback was being given there was no effect of training with feedback on general or sequence-specific retention measured at 1 hour, 24 hours, and 30 days in either task. Our results refute any simple model of the interaction between feedback and performance. Instead, we show that the impact of feedback depends on the training environment and the skill being learned.

There may be a number of reasons for this task-specific effect of feedback. While both tasks rely on sequence learning, they differ with respect to the mechanism that facilitates improvement. The motivational salience of punishment (i.e. loss aversion) may explain the performance benefit seen on the SRTT, where the added attention facilitated by punishment has been hypothesized to recruit additional neural resources to aid SRTT performance 8,18 . However, a purely motivational account cannot explain the deleterious effect of punishment to performance on the FTT. Therefore, we need to consider alternative explanations that may account for the differential effects of reward and punishment to performance these two tasks.

The two tasks also differ with respect to their motor demands. Specifically, in our implementation, performance on the FTT relies on more precise motor control than the SRTT. Within the motor system, others have reported that reward-related dopaminergic activity reduces motor noise 19 , while dopaminergic activity associated with punishment leads to an increase in motor variability, i.e. noise 20 . We found that punishment impaired general (i.e. non sequence-specific) performance on the FTT. After one-hour, during the retention test without feedback, the punishment group performed as well as the reward and control groups. We think that our findings are consistent with the hypothesis that punishment may increase motor noise, which may have led to impaired performance by the punishment group during training. Because increased motor variability was not directly measured in our implementation of the SRTT, participants would not be penalized for any variation in movement that did not impact reaction time directly. If an assessment of motor variability was considered in the evaluation of SRTT performance, one might find that punishment impairs this dimension of performance. Our implementation of the SRTT and the FTT do not have a direct measure of motor variability and we cannot explicitly address this issue in the present study. Future work should examine this question.

The implementations of the tasks used here also differed with respect to the information content of a given instance of feedback. Ordinarily, learning on the SRTT relies on the positive prediction error encoded in the striatum that occurs on fixed-sequence trials 8,21 . The reward or punishment in the SRTT may augment this positive prediction error and facilitate performance and learning. In contrast, the moment-to-moment feedback given on the FTT is not associated with an instantaneous positive prediction error signal. Rather, our implementation of the FTT is similar to discontinuous motor tasks that rely on the cerebellum and may therefore not benefit from moment-to-moment feedback 22 (but also see Galea, et al. 4 for an additional account of cerebellar learning with feedback). Finally, although information content was not intentionally manipulated, this difference may also alter effect the reward and punishment on these tasks.

Unlike prior studies, we saw no benefit of reward to retention 4,7,8,10 . Most studies that have looked at reward and punishment in skill learning have only examined immediate recall 4,8,10 , and only one study has shown a benefit of reward to long-term retention of a motor skill 7 . In their study, Abe, et al. 7 observed that the control and punishment groups evidenced diminished performance after 30-days compared to their post-training time-point. Importantly, Abe, et al. 7 also found that the reward group showed offline gains from the immediate time point to 24-hours after training, and this effect persisted through 30-days. So, while in our study the punishment and control group did not evidence forgetting from 24-hours to 30-days, potentially limiting our sensitivity to the effect of reward, the reward group in our study also did not show any offline-gains. As such, we are confident in our finding that reward did not impact retention.

While not discussed at length by Abe and colleagues, their punishment group performed significantly worse during training, suggesting that the skill was not learned as effectively by participants in that group. Therefore, it is unclear whether the difference in memory observed in their study can be attributed to a benefit of reward to consolidation or to ineffective acquisition when training with punishment. Our study design differed from the implementation used by Abe and colleagues 7 with respect to the input device (whole-hand grip force in our study, precision pinch force by Abe and colleagues), feedback timing, and trial duration. However, our result questions robustness of the finding that reward benefits skill retention. We maximized our design to be sensitive to differences in online-learning rather than retention, and future studies should examine other factors that influence the effect of feedback on retention of skill memories.

With respect to the SRTT, it is worth considering that our participants evidenced less sequence-specific learning than some others have found in unrewarded versions of this task, where the difference between sequence and random trials can be up to 80 ms 23,24,25 . However, there is considerable variability in the difference between sequence and random trials on the SRTT reported in the literature, and some groups have reported sequence-specific learning effects on the SRTT to be between 10 and 30 ms 26,27 . The difference reported after learning by the Control, Reward, and Punishment groups in our study is approximately equal to the difference for the rewarded group reported by Wachter, et al. 8 (

30 ms) and more than observed in their control and punishment groups. This is evidence of substantially less sequence-specific knowledge than we observed in our study, and we are therefore confident that participants were able to learn and express sequence-specific knowledge in all three feedback conditions.

Finally, we recognize that there are difficulties in comparing performance across tasks. Because the tasks used here vary in performance outcome (response time in the SRTT, tracking error in the FTT), comparing them in a quantitative way is not possible. However, the dissociation in the effect of punishment in these contexts provides compelling evidence that the effect does depend on task. Moreover, our study brings together the previously disparate literature examining the effects of reward and punishment on skill learning. This result shines light on the challenge of extrapolating from a single experiment in a specific context to a more general account of skill learning.

Overall, we have demonstrated that punishment modulates on-line performance in a task-specific manner and in our study we found that neither reward nor punishment modulates long-term retention of skill memories. These findings cast doubt on the commonly held hypothesis that reward is ubiquitously beneficial to memory, and, suggest that the interaction between feedback and learning should be better understood before feedback can be fully exploited in clinical situations.


Discussion

Across two experiments, we demonstrate robust framing effect in both the win and the loss domains. Framing influences choices in both domains: subjects are more risk averse in the positive frame and more risk seeking in the negative frame, both in the domain of wins and losses. Importantly, our ERP studies show that the initial evaluations of the positive and the negative frame differ within 300 ms in the win domain. However, in both ERP studies, we did not find significant ERP difference between the two differently valenced frames in the loss domain, even though behavioral framing effects were significant.

Our results lend some neural support to the prospect theory accounts of framing effect which posits that positive frames are encoded as gains whereas negative frames are encoded as losses, suggesting that the description invariance principle is indeed violated in the framing paradigm. Two descriptions, “keep �” and “lose �” when the initial amount is �, represented the identical option, but elicited distinct FRN and P300. The FRN was more negative for “lose �” than for “keep �” even though participants can easily reason that losing � out of � equals keeping �. According to the motivational accounts of the FRN which suggest that the FRN is sensitive to the valence of outcomes (Gehring and Willoughby, 2002), our data indicate that negative frames are encoded as losses whereas positive frames are encoded as gains in the win domain. The FRN effect cannot simply be explained by the valence of the words we used. For example, both words “keep” and “save” are positive but they elicited distinct FRNs. Moreover, previous studies using pleasant and unpleasant affectively valent words did not find differences in the FRNs in the time window between 200 and 300 ms post-stimuli (Kiehl et al., 1999 Bernat et al., 2001). Our results on P300 are also consistent with the view that the P300 is related to processes of attentional allocation and to high-level motivational/affective evaluation, being more positive for more positive stimuli (Olofsson et al., 2008). Thus, we provide direct neural evidence that the description invariance principle is violated and outcome evaluation depends on its representation. Our data favor the prospect theory account of framing to other theories that require deliberate reasoning.

It is surprising that the frame effect on FRN was only significant in the win domain but not in the loss domain. One possibility is that the negative frame in the win domain is encoded as “worse than expected” negative prediction error, whereas the positive frame in the loss domain is encoded as �tter than expected” positive prediction error, and the FRN is only sensitive to negative prediction error (Nieuwenhuis et al., 2004). It is possible that when the initial amount is a gain, “lose �” is encoded as a loss and produces a “worse than expected” negative prediction error. Previous studies have consistently shown that the FRN is sensitive to the negative prediction error (Holroyd et al., 2003, 2008, 2009 Holroyd and Krigolson, 2007). However, findings are mixed regarding whether the FRN is sensitive to the positive prediction error as well. Some studies have shown that the FRN is sensitive to the positive prediction error but with much smaller magnitude, compared with its sensitivity to the negative prediction error (Oliveira et al., 2007 Yu et al., 2011). Some studies found no effects of the positive prediction error on FRN amplitude (Holroyd et al., 2003 Krigolson and Holroyd, 2007 Bellebaum et al., 2010). It remains unclear why the ACC (or FRN amplitude) appears to be less responsive to the positive prediction error. A number of recent studies suggest that dopamine and serotonin neuromodulators contribute differentially to coding for outcomes in the win and the loss domain, respectively. For example, it has been shown that dopamine agonists affected choices in the gain domain (both neurally and behaviorally) but not the loss domain (Pessiglione et al., 2006). Genetic variation in tonic dopamine and serotonin levels modifies risk seeking in gain and loss domains, respectively (Zhong et al., 2009). Accordingly, given that the FRN is believed to reflect a dopaminergic signal, we should not be surprised to see that it only reflects negative prediction error. Recent ERP studies also showed no significant difference in the FRN for good and bad outcomes in the loss domain (Kreussel et al., 2012 Sambrook et al., 2012). A recent study demonstrated that size and probability of rewards modulate the FRN associated with wins but not losses (San Martin et al., 2010). This might be due to a separation of dopamine and serotonin coding functions in gain and loss domains. It is also possible that different sub-regions in dopaminergic midbrain and the striatum encode different types of prediction error and positive prediction error may not be sent to the ACC (Bayer and Glimcher, 2005 Pessiglione et al., 2006 Cohen et al., 2010). Our results contribute to a growing body of empirical evidence showing a greater modulation of the FRN for win feedback in comparison to loss feedback (Cohen et al., 2007 Holroyd et al., 2008 San Martin et al., 2010). Our findings, if replicated, suggest that different neural substrates may be involved in modulating framing effect in the win and the loss domain. Although previous neuroimaging studies focus on the framing effect in the win domain (De Martino et al., 2006 Roiser et al., 2009), the neural correlate of the framing effect in the loss domain is still unclear. There is accumulating evidence suggesting that the neural mechanisms underlying win and loss processing are different (Oɽoherty et al., 2001, 2003, 2006 Ullsperger and von Cramon, 2003 Kringelbach, 2005 Nieuwenhuis et al., 2005 Liu et al., 2007). Other neuroimaging methods (e.g., fMRI) are needed to further examine the neural basis of framing in the loss domain.

Previous neuroimaging studies on framing only focus on the decision stage. Using the similar economic decision-making paradigm, two studies compared choices in accordance with the framing effect and choices against the framing effect at the decision stage (De Martino et al., 2006 Roiser et al., 2009). These studies highlight the interplay between prefrontal cortex and amygdala in framing effect. The ACC is interpreted as exerting cognitive control over emotional response in amygdala. It has been shown that choices made counter to, relative to those made in accord with, the frame were associated with increased anterior cingulate𠄺mygdala coupling in individuals with homozygous for the long (la) allele at the 5-HTTLPR (Roiser et al., 2009). However, amygdala lesion patients did not show abnormal framing effect (Talmi et al., 2010), suggesting that the amygdala may not play a causal role in framing, although it contributes to decision making in framing. Our findings suggest that the ACC may not only contribute to framing effect by inhibiting amygdalar activity, but is also involved in the motivational evaluation of stimuli. Taken together, our findings suggest an important role of the ACC in framing.

Another two fMRI studies on framing effect used the Asian disease problem. One study compared risky choices with sure option choices and found that the cognitive effort required to select a sure gain was considerably lower than that required to choose a risky gain in the positive frame but not in the negative frame (Gonzalez et al., 2005). Activation in frontal, parietal areas differed between risky and certain choices, but only for the positive and not for the negative frame. Another study compared choices in the positive frame with choices in the negative frame and found that choices in the positive frame were associated with enhanced activity in inferior frontal gyrus, insula and parietal lobe (Zheng et al., 2010). No significantly increased neural activity for choices in the negative frame was reported. Our findings extend these studies by showing differential encoding of frame before decisions are made.

It is worth noting that both behavioral and neural responses to frame are different between the blocked design experiment (Experiment 1) and the mixed design experiment (Experiment 2), suggesting that our experimental manipulation did influence subjects' behavioral and brain responses. In Experiment 2, as Figure 2 shows, participants were generally more likely to gamble across conditions, compared with their probability of gambling in Experiment 1. When wins and losses were presented within a block, the contrast between wins and losses became salient. This may induce a general loss aversion tendency and the risk-seeking strategy to compensate loss aversion (i.e., gamble in the hope to keep all wins and avoid all losses). Moreover, in Experiment 2, the RTs for decisions in the win domain tended to be faster than the RTs in the loss domain (p = 0.087), in contrast with findings that there was no RT difference between domains in Experiment 1. It is possible that when win and loss domains are mixed, the switching between domains makes making decisions about losses more difficult. For the ERP results, as Figures 2, 3 shows, the ERP amplitudes were overall smaller in Experiment 2 than in Exp eriment 1. The results may be attributable to individual differences in these relatively small samples. Nevertheless, the main effects of frames remain significant and consistent across experiments, suggesting that our behavioral and ERP results are stable. It is important to notice that the loss-associated FRNs were not more negative than win-associated FRNs in the current study probably because that participants already received win/loss feedback in the “initial amounts” stage. Thus, information in the “sure amounts” stage does not provide additional information on the win/loss dimension. The FRN may respond more to the framing manipulation rather than to the already known win/loss dimension. Moreover, the ERP waveforms in Figures 2C,D show that in the loss domain, both negative and positive frames were coded as gains (i.e., they appear to be comparable in magnitude to the “win_positive” condition and the “win_negative” condition). In other words, when starting with an initial loss, it appears that both frames are coded as relatively advantageous outcomes. The ERP response here might be understood as the reward positivity to predictive cues (Holroyd et al., 2011). While FRN amplitude did not vary by frame valence in the loss domain, the fact that these two conditions appear to be coded as gains might be potentially meaningful. Future studies may further investigate this phenomenon.

Some limitations in our study are worth mentioning. First, although the ACC is generally believed to be the main generator of the FRN (Miltner et al., 1997 Gehring and Willoughby, 2002 Nieuwenhuis et al., 2004 Martin et al., 2009), our ERP studies did not provide direct evidence to link FRN amplitude with ACC activity. Other neuroimaging methods with high spatial resolution are needed to locate the source of FRN more precisely. Although it is widely believed that the FRN is generated in the ACC, recent studies show that the sources of the FRN might be widely distributed (Carlson et al., 2011 Foti et al., 2011). Second, although we provide evidence that framing influences the initial option evaluation processes, it is still unclear whether framing also influences the subsequent decision-making processes. The interactions among several brain regions may underlie the effects of framing at the decision stages, and these processes could be better examined using fMRI. Third, due to the poor spatial resolution of ERP, our studies are silent about the brain regions underlying the observed framing effects in the win and loss domains. Finally, the P300 results were inconsistent across the two experiments in our study, possibly due to the difference in the experimental design (blocked vs. mixed). Although not the focus of the current study, we also reported the P300 findings the sake of completeness. The functional significance of the P300 in reward processing is still under debate. More studies are needed to further elucidate the role of the P300 in assessing outcomes.

People make judgments based on their representations of events, rather than on the events themselves. Decision making is not description-invariant, as would be expected on a normative theory, and hence can change according to the representation that is provided. In prospect theory, it is the decision maker's private framing of the problem in terms of gains or losses that determines her evaluation of the options. Our findings demonstrate that framing influences decisions in both the win and the loss domains and provide neural evidence that the description invariance principle is violated in the framing effect.


Conclusions

The present study indicates that the behavioral and neural processing of positive and negative performance feedback is preserved in older adults. It was shown that positive performance feedback can serve as a reward in both older and younger adults. These results have important clinical implications for intervention studies aimed at improving cognitive performance in older adults. Whereas an extrinsic reward such as money would be unsuitable to use in cognitive training, performance feedback can easily be implemented in a training procedure. It has the additional benefit of being able to tap a neural mechanism that is in tact in older adults.


METHODS

Participants

Ten healthy right-handed volunteers were recruited from the Aston University student population (2 men, 8 women mean age ± SD = 21.67 ± 4.89 years). Volunteers were screened for a history of neurological or psychiatric disease and those scoring over 10 on the Beck Depression Inventory (Beck et al., 1961) were also excluded. The purpose and risks of the study were explained to all volunteers, who gave written informed consent to participate, as approved by Aston University Ethics Committee.

Experimental Design

Participants performed a spatial n-back working memory task. Stimuli were presented in blocks of 12 trials. Each block had a duration of 42.5 sec and began with the presentation of an instruction screen (4 sec) and ended with performance feedback (2 sec). Within a block, each trial started with a fixation spot (500 msec), displayed in the center of a background configuration of randomly arrayed letters. This was followed by the presentation of a square box with several instances of the same letter inside, in uppercase or lowercase font (250 msec) (Figure 1). The box with letters was superimposed on the background of randomly arrayed letters. The box contained one of six possible letters and appeared in one of six possible locations. During the 2300-msec intertrial interval, the background of randomly arrayed letters was displayed alone. Participants had to indicate whether the spatial location of the box was the same (“target”) or different (“nontarget”) from that displayed either one or three trials ago (1-back and 3-back) by pressing a button on a response box. In addition, participants performed a control task, a 0-back task, which required participants to simply match the spatial location of each box displayed with the box that had been presented in the very first trial of the block.

Schematic figure of the different events associated within a trial in the spatial n-back task. Each trial consisted of a fixation spot (500 msec) presented in the middle of randomly arrayed letters, followed by the presentation of a square box (250 msec), which also had randomly arrayed letters inside. The spatial location of the square box varied from trial to trial, and appeared in one of six locations. During the intertrial interval (2300 msec), the background of randomly arrayed letters was displayed alone. In this period, participants indicated whether the box location was the same (“target”) or different (“nontarget”) from that displayed either one or three trials ago (1-back or 3-back).

Schematic figure of the different events associated within a trial in the spatial n-back task. Each trial consisted of a fixation spot (500 msec) presented in the middle of randomly arrayed letters, followed by the presentation of a square box (250 msec), which also had randomly arrayed letters inside. The spatial location of the square box varied from trial to trial, and appeared in one of six locations. During the intertrial interval (2300 msec), the background of randomly arrayed letters was displayed alone. In this period, participants indicated whether the box location was the same (“target”) or different (“nontarget”) from that displayed either one or three trials ago (1-back or 3-back).

To examine the relative effects of reward and working memory on the PFC network, two orthogonal factors were manipulated in the n-back task: (1) memory load and (2) reinforcement. Memory load was manipulated by using two different levels of the n-back task with 1-back placing a low load and 3-back a high load. The n-back task was performed in association with two different values of financial reward: 10p and £1. The task was also performed with no reinforcement present. The instruction screen prior to the start of each block displayed information regarding the type of task (0-back, 1-back or 3-back) and the rewarding value of the task (no money, 10p, or £1). The experiment was divided into five runs of eight blocks (one each of the six experimental conditions and two control blocks). Run order was counterbalanced among participants.

Participants were informed that during rewarded blocks each correct response to a target would result in winning the amount at stake however, each incorrect response to a target would result in losing that amount. Participants were also notified that they would receive the total amount won during the task at the end of the experiment. Average winnings were £13.50 (SD = ±5.37).

Image Acquisition and Analysis

fMRI was carried out on a 3-T Siemens Trio system at Aston University, using a T2*-weighted gradient-echo, echo-planar imaging sequence (TR = 3 sec, TE = 30 msec, flip angle = 90°, FOV = 25 × 25 cm, matrix size = 64 × 64).

The images consisted of 40 axial slices (angled at ∼20° away from the eyes, which allowed the slices to be angled away from the predominant direction of the intrinsic susceptibility-induced field gradients which can result in susceptibility and/or distortion artifacts in the ventral PFC). Similarly, the relatively low TE value (above) was chosen to decrease potential signal dropout, as this results in less phase dispersion across voxels at high field strengths. Slices were 3 mm thick (128 × 128 in-plane resolution). To allow equilibrium to reach steady state, two dummy volumes were collected before the start of each run (117 volumes) and discarded before analysis. T1-weighted scans were acquired for anatomical localization.

Analysis was performed using SPM2 (Wellcome Institute of Neurology, implemented in Matlab Mathworks, MA). Prior to model application, brain volumes from each participant were realigned to the first volume to correct for head motion. Functional images were then spatially normalized into a standard Montreal Neurological Institute (MNI) echo-planar imaging template. Following this, we applied spatial smoothing with an isotropic Gaussian kernel filter of 10-mm full width at half maximum to facilitate intersubject averaging. Data were analyzed with a random effects model. For every participant, a single mean contrast image was produced for each comparison. The set of voxel values for each contrast constituted a statistical parametric map of the t statistic [SPM(t)], which was then transformed to the unit normal distribution, SPM(Z). We thresholded the activations at a voxel threshold of p < .001, uncorrected, and accepted as significant those clusters that survived at p < .05, corrected for multiple comparisons for the entire brain. Because our main focus is on a prespecified network of regions outlined in the Introduction—the lateral PFC, the VMPFC, the fronto-polar cortex, the dACC, and the dorsal striatum—for these regions we report activations that survive an uncorrected threshold of p < .001, but are significant at p < .05 when a small volume correction (SVC) (based on a sphere of diameter 10 mm, centered on the peak coordinates Worsley et al., 1996) is applied. As SPM coordinates are given in MNI space, regions were identified by converting the coordinates to Talairach space with a nonlinear transform (Brett, Christoff, Cusack, & Lancaster, 2001).

FMRI Connectivity Analysis

PPI analysis allows the detection of interactions between brain regions in relation to a study paradigm, that is, the way in which activity in one brain region modulates activity in another region specifically in response to a cognitive/sensory process (in this case, for high memory load relative to low and for high rewards relative to low or no reward) (Friston et al., 1997).

Using SPM2, we extracted for each individual the deconvolved time course of activity from a sphere of 6 mm radius centered on the most significant voxels from the group analyses identified as reflecting changes in memory load and changes in motivation in the lateral PFC (BA 10, 46 and BA 44) and the subgenual cingulate (BA 25). An SPM was then computed for each individual that revealed areas where activation is predicted by the PPI term. The individual contrast images were then taken to the second level to perform a random effect analysis (using a one-sample t test). A stringent random effects model with a priori defined regions and a statistical threshold of p < .005 (uncorrected) was used, with an extent threshold of 7 voxels per cluster.

We also used a physiophysiological interaction analysis to examine how the interaction between lateral PFC and VMPFC regions varies with activation in the dACC and the caudate nucleus. This analysis focuses on the physiological interaction between two brain regions rather than the interaction between one brain region and the psychological impact of a task condition. We examined the effect of the interaction between the right lateral PFC and the right dACC and also between the right lateral PFC and the right caudate nucleus. Subject-specific time series were extracted for each of these three regions, centered on the most significantly active voxel from the group analysis.


Watch the video: Rewards system for kids. Effective Positive Rewards (August 2022).