Reinforcement Learning Journal, vol. 3, 2024, pp. 1511–1532.
Presented at the Reinforcement Learning Conference (RLC), Amherst Massachusetts, August 9–12, 2024.
Self-supervised learning has brought about a revolutionary paradigm shift in various computing domains, including NLP, vision, and biology. Recent approaches involve pretraining transformer models on vast amounts of unlabeled data, serving as a starting point for efficiently solving downstream tasks. In reinforcement learning, researchers have recently adapted these approaches, developing models pretrained on expert trajectories. However, existing methods mostly rely on intricate pretraining objectives tailored to specific downstream applications. This paper conducts a comprehensive investigation of models, referred to as pre-trained action-state transformer agents (PASTA). Our study covers a unified framework and covers an extensive set of general downstream tasks including behavioral cloning, offline Reinforcement Learning (RL), sensor failure robustness, and dynamics change adaptation. We systematically compare various design choices and offer valuable insights that will aid practitioners in developing robust models. Key findings highlight improved performance of component-level tokenization, the use of fundamental pretraining objectives such as next token prediction or masked language modeling, and simultaneous training of models across multiple domains. In this study, the developed models contain fewer than 7M parameters allowing a broad community to use these models and reproduce our experiments. We hope that this study will encourage further research into the use of transformers with first principle design choices to represent RL trajectories and contribute to robust policy learning.
Raphael Boige, Yannis Flet-Berliac, Lars C.P.M Quaedvlieg, Arthur Flajolet, Guillaume Richard, and Thomas PIERROT. "PASTA: Pretrained Action-State Transformer Agents." Reinforcement Learning Journal, vol. 3, 2024, pp. 1511–1532.
BibTeX:@article{boige2024pasta,
title={{PASTA}: {P}retrained Action-State Transformer Agents},
author={Boige, Raphael and Flet-Berliac, Yannis and Quaedvlieg, Lars C.P.M. and Flajolet, Arthur and Richard, Guillaume and PIERROT, Thomas},
journal={Reinforcement Learning Journal},
volume={3},
pages={1511--1532},
year={2024}
}