Reinforcement Learning Journal, vol. 2, 2024, pp. 547–562.
Presented at the Reinforcement Learning Conference (RLC), Amherst Massachusetts, August 9–12, 2024.
Both entropy-minimizing and entropy-maximizing objectives for unsupervised reinforcement learning (RL) have been shown to be effective in different environments, depending on the environment's level of natural entropy. However, neither method alone results in an agent that will consistently learn intelligent behavior across environments. In an effort to find a single entropy-based method that will encourage emergent behaviors in any environment, we propose an agent that can adapt its objective online, depending on the entropy conditions it faces in the environment, by framing the choice as a multi-armed bandit problem. We devise a novel intrinsic feedback signal for the bandit, which captures the agent's ability to control the entropy in its environment. We demonstrate that such agents can learn to optimize task returns through entropy control alone in didactic environments for both high- and low-entropy regimes and learn skillful behaviors in certain benchmark tasks.
Adriana Hugessen, Roger Creus Castanyer, Faisal Mohamed, and Glen Berseth. "Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning." Reinforcement Learning Journal, vol. 2, 2024, pp. 547–562.
BibTeX:@article{hugessen2024surprise,
title={Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning},
author={Hugessen, Adriana and Castanyer, Roger Creus and Mohamed, Faisal and Berseth, Glen},
journal={Reinforcement Learning Journal},
volume={2},
pages={547--562},
year={2024}
}