Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting

By Edoardo Cetin, Ahmed Touati, and Yann Ollivier

Reinforcement Learning Journal, vol. TBD, 2025, pp. TBD.

Presented at the Reinforcement Learning Conference (RLC), Edmonton, Alberta, Canada, August 5–9, 2025.


Download:

Abstract:

The forward-backward representation (FB) is a recently proposed framework (Touati et al., 2023; Touati and Ollivier, 2021) to train be- havior foundation models (BFMs) that aim at providing zero-shot efficient policies for any new task specified in a given reinforcement learning (RL) environment, without training for each new task. Here we address two core limitations of FB model training First, FB, like all successor-feature-based methods, relies on a linear encoding of tasks: at test time, each new reward function is linearly projected onto a fixed set of pre- trained features. This limits expressivity as well as precision of the task representation. We break the linearity limitation by introduc- ing auto-regressive features for FB, which let fine-grained task features depend on coarser- grained task information. This can represent arbitrary nonlinear task encodings, thus sig- nificantly increasing expressivity of the FB framework. Second, it is well-known that training RL agents from offline datasets often requires spe- cific techniques.We show that FB works well together with such offline RL techniques, by adapting techniques from (Nair et al., 2020a; Cetin et al., 2024) for FB. This is necessary to get non-flatlining performance in some datasets, such as DMC Humanoid. As a result, we produce efficient FB BFMs for a number of new environments. Notably, in the D4RL locomotion benchmark, the generic FB agent matches the performance of stan- dard single-task offline agents (IQL, XQL). In many setups, the offline techniques are needed to get any decent performance at all. The auto-regressive features have a positive but moderate impact, concentrated on tasks requiring spatial precision and task general- ization beyond the behaviors represented in the trainset. Together, these results establish that generic, reward-free FB BFMs can be competitive with single-task agents on standard benchmarks, while suggesting that expressivity of the BFM is not a key limiting factor in the environments tested.


Citation Information:

Edoardo Cetin, Ahmed Touati, and Yann Ollivier. "Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting." Reinforcement Learning Journal, vol. TBD, 2025, pp. TBD.

BibTeX:
@article{cetin2025finer,
    title={Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting},
    author={Cetin, Edoardo and Touati, Ahmed and Ollivier, Yann},
    journal={Reinforcement Learning Journal},
    year={2025}
}