In order to better understand the causes of obsessive-compulsive disorder, researchers constructed a model of behaviour. They demonstrated how the cycle between obsession and compulsion may be intensified when learning parameters for reinforcement and punishment are severely unbalanced. This research may enhance therapies for mental illness.
Researchers from Tamagawa University, the Advanced Telecommunications Research Institute International, and the Nara Institute of Science and Technology (NAIST) have shown that obsessive-compulsive disorder (OCD) can be understood as the result of unequal learning between reinforcement and punishment.
They demonstrated that disparities in brain computations that relate present outcomes to prior actions might result in disordered behaviour based on actual testing of their theoretical model. This can occur specifically when the memory trace signal for previous acts degrades differentially for successful and unsuccessful results. In this instance, “good” indicates the outcome was better than anticipated, whereas “poor” means the outcome was subpar. The development of OCD is explained in this study. OCD is a mental condition that causes anxiety and is defined by intrusive, repetitive thoughts called obsessions and specific repetitive behaviours called compulsions.
How OCD affects mental health
Even though OCD patients are aware that their obsessions or compulsions are unreasonable, they frequently feel powerless to alter their behaviour. In extreme circumstances, this may prevent the person from being able to lead a regular life. Compulsive actions are attempts to momentarily relieve tension brought on by obsessions. Examples include obsessive hand washing or continually verifying that doors are closed before leaving the house. But until recently, it was unclear how the cycle of obsessions and compulsions came to be intensified.
Now, a team led by researchers at NAIST has used reinforcement learning theory to model the disordered cycle associated with OCD. In this framework, an outcome that is better than predicted becomes more likely (positive prediction error), while a result that is worse than expected is suppressed (negative prediction error). In implementation of reinforcement learning, it is also important to consider delays, as well as positive/negative prediction errors. In general, the outcome of a certain choice is available after a certain delay.
Therefore, reinforcement and punishment should be assigned to recent choices within a certain time frame. This is called credit assignment, which is implemented as a memory trace in reinforcement learning theory. Ideally, memory trace signals for past actions decay at equal speed for both positive and negative prediction errors. However, this cannot be completely realized in discrete neural systems. Using simulations, NAIST scientists found that agents implicitly learn obsessive-compulsive behavior when the trace decay factor for memory traces of past actions related to negative prediction errors (n-) is much smaller than that related to positive prediction errors (n+).
This means that, from the opposite perspective, the view of past actions is much narrower for negative prediction errors than for positive prediction errors. “Our model, with imbalanced trace decay factors (n+ > n-) successfully represents the vicious circle of obsession and compulsion characteristic of OCD,” say co-first authors Yuki Sakai and Yutaka Sakai.
To test this prediction, the researchers had 45 patients with OCD and 168 healthy control subjects play a computer-based game with monetary rewards and penalties. Patients with OCD showed much smaller n- compared with n+, as predicted by computational characteristics of OCD. In addition, this imbalanced setting of trace decay factors (n+ > n-) was normalized by serotonin enhancers, which are first-line medications for treatment of OCD. “Although we think that we always make rational decisions, our computational model proves that we sometimes implicitly reinforce maladaptive behaviors,” says corresponding author, Saori C. Tanaka.
This computational model implies that patients with severely unbalanced trace decay factors may not respond to behavioural therapy alone, even if it is currently challenging to identify treatment-resistant individuals solely on their clinical symptoms. One day, it may be possible to utilise these data to identify which patients are most likely to reject behavioural therapy before starting treatment.