Optimistic critics can empower small actors

By Olya Mastikhina, Dhruv Sreenivas, and Pablo Samuel Castro

Reinforcement Learning Journal, vol. TBD, 2025, pp. TBD.

Presented at the Reinforcement Learning Conference (RLC), Edmonton, Alberta, Canada, August 5–9, 2025.


Download:

Abstract:

Actor-critic methods have been central to many of the recent advances in deep reinforcement learning. The most common approach is to use _symmetric_ architectures, whereby both actor and critic have the same network topology and number of parameters. However, recent works have argued for the advantages of _asymmetric_ setups, specifically with the use of smaller actors. We perform broad empirical investigations and analyses to better understand the implications of this and find that, in general, smaller actors result in performance degradation and overfit critics. Our analyses suggest _poor data collection_, due to value underestimation, as one of the main causes for this behavior, and further highlight the crucial role the critic can play in alleviating this pathology. We explore techniques to mitigate the observed value underestimation, which enables further research in asymmetric actor-critic methods.


Citation Information:

Olya Mastikhina, Dhruv Sreenivas, and Pablo Samuel Castro. "Optimistic critics can empower small actors." Reinforcement Learning Journal, vol. TBD, 2025, pp. TBD.

BibTeX:
@article{mastikhina2025optimistic,
    title={Optimistic critics can empower small actors},
    author={Mastikhina, Olya and Sreenivas, Dhruv and Castro, Pablo Samuel},
    journal={Reinforcement Learning Journal},
    year={2025}
}