Non-action Learning: Saving Action-Associated Cost Serves as a Covert Reward

Tanimoto, Sai and Kondo, Masashi and Morita, Kenji and Yoshida, Eriko and Matsuzaki, Masanori (2020) Non-action Learning: Saving Action-Associated Cost Serves as a Covert Reward. Frontiers in Behavioral Neuroscience, 14. ISSN 1662-5153

[thumbnail of pubmed-zip/versions/1/package-entries/fnbeh-14-00141/fnbeh-14-00141.pdf] Text
pubmed-zip/versions/1/package-entries/fnbeh-14-00141/fnbeh-14-00141.pdf - Published Version

Download (8MB)

Abstract

“To do or not to do” is a fundamental decision that has to be made in daily life. Behaviors related to multiple “to do” choice tasks have long been explained by reinforcement learning, and “to do or not to do” tasks such as the go/no-go task have also been recently discussed within the framework of reinforcement learning. In this learning framework, alternative actions and/or the non-action to take are determined by evaluating explicitly given (overt) reward and punishment. However, we assume that there are real life cases in which an action/non-action is repeated, even though there is no obvious reward or punishment, because implicitly given outcomes such as saving physical energy and regret (we refer to this as “covert reward”) can affect the decision-making. In the current task, mice chose to pull a lever or not according to two tone cues assigned with different water reward probabilities (70% and 30% in condition 1, and 30% and 10% in condition 2). As the mice learned, the probability that they would choose to pull the lever decreased (<0.25) in trials with a 30% reward probability cue (30% cue) in condition 1, and in trials with a 10% cue in condition 2, but increased (>0.8) in trials with a 70% cue in condition 1 and a 30% cue in condition 2, even though a non-pull was followed by neither an overt reward nor avoidance of overt punishment in any trial. This behavioral tendency was not well explained by a combination of commonly used Q-learning models, which take only the action choice with an overt reward outcome into account. Instead, we found that the non-action preference of the mice was best explained by Q-learning models, which regarded the non-action as the other choice, and updated non-action values with a covert reward. We propose that “doing nothing” can be actively chosen as an alternative to “doing something,” and that a covert reward could serve as a reinforcer of “doing nothing.”

Item Type: Article
Subjects: STM Article > Biological Science
Depositing User: Unnamed user with email support@stmarticle.org
Date Deposited: 02 Jan 2023 12:42
Last Modified: 23 Feb 2024 03:57
URI: http://publish.journalgazett.co.in/id/eprint/3

Actions (login required)

View Item
View Item