Hey @willccbb — this list from your tweet is a great enumeration of RL framework design decisions:
https://x.com/i/status/2037734454459089027
- context management
- user sims
- native tool parsing
- harness-in-sandbox
- harness-outside-of-sandbox
- no sandbox at all
- groupwise rewards
- intermediate rewards
- multiple environments
- resource management
- custom metrics/error handling
- offline evals
Suggest adding a prominent section (README or docs overview) that explicitly lists which patterns verifiers supports and links to relevant docs/examples for each.
Helps users quickly understand the design space and find what they need. Right now several of these are well-documented but not discoverable unless you already know to look — and a few (context management/compaction, user sims, offline evals) could use dedicated sections.
Hey @willccbb — this list from your tweet is a great enumeration of RL framework design decisions:
https://x.com/i/status/2037734454459089027
Suggest adding a prominent section (README or docs overview) that explicitly lists which patterns verifiers supports and links to relevant docs/examples for each.
Helps users quickly understand the design space and find what they need. Right now several of these are well-documented but not discoverable unless you already know to look — and a few (context management/compaction, user sims, offline evals) could use dedicated sections.