Implementing TALES Resource Server by christopherzc · Pull Request #203 · NVIDIA-NeMo/Gym

christopherzc · 2025-10-19T22:03:50Z

Contributing To NeMo-Gym — PR Answers (TALES Resource Server)

1) Necessary information

i. Corresponding dataset on the spreadsheet: N/A

ii. Description of the prompt (source + domain):

Domain: Text-game environments (multi-framework agentic evaluation).
Source: [TALES / tale-suite] tasks (tt_split branch). Prompts are the system prompt used in TALES plus the observation–action history.

iii. Description of the environment:

This is an implementation of TALES: a multi-framework agentic environment that evaluates an agent's ability to reason through and progress through open-ended, situated, text-environments. TALES consists of 5 text-adventure game frameworks with a total of 122 tasks (games) with the rough order difficulty of textworld, textworld_express, alfworld, scienceworld, and jericho.

iv. Description of the verifier:

Rewards/scores come from the underlying Gymnasium environments; the environment itself acts as the verifier.
Ground-truth walkthroughs exist but are not unique; many actions are accepted via nearest-neighbour parsing (e.g., “take lantern” / “get lantern” / “pick up lantern”).

v. Legal approval status:
N/A

2) Simple correctness check

i Commands used to run the server for the uploaded data:

# 1) Ensure vLLM is installed and simple_weather example works (per main repo)
# 2) Start vLLM in another terminal (example):
vllm serve Qwen/Qwen3-30B-A3B --enable-expert-parallel --host 0.0.0.0 --port 8000

# 3) Start the TALES Gym server:
source .venv/bin/activate
config_paths="responses_api_models/openai_model/configs/openai_model.yaml,\
resources_servers/tales/configs/tales.yaml"
ng_run "+config_paths=[$config_paths]"

# 4) Generate 5 GPT-4o rollouts:
python resources_servers/tales/example_scripts/single_turn/generate_single_turn_gpt_rollouts.py

ii Resulting rollout and judges (5 examples):
Please see examples under data/gpt4o_single_turn_examples

iii Additional notes for running the server properly:
Please see the README.md under example_scripts/single_turn/ for more details

examples_clean are the stripped down input-output for ease of viewing.

examples_full contain the entire input-output. examples_full have been removed due to the response id triggering the secret detector.

3) Tests

Test files / command to run tests:

Notes on coverage / responsibilities:
The use of the walkthroughs for Step 4 implicitly acts as a unit test for the environments. If needed, a test can be added verifying the outcome of the walkthrough. (Am wanting to wait until the actual multi-turn is working before I added this)

4) Reward profiling

Models:

Qwen 3 30B A3B
Qwen 3 235B Instruct (for agent/agentic coding/instruction/game) or Qwen 3 235B Thinking (for math/competition coding)

We generate 500+ prompt-response pairs for the specified model. As TALES is inherently multi-turn, not every step on a correct trajectory will return a reward. We do the following to emulate the reward distribution for single-turn domains.

Method (from README):

Extract walkthrough actions (a₀…a_k) from each environment.
Roll out to obtain ((obs_n, a_n, r_n)).
For each step where (r_n \ne 0), build a single-turn prompt containing history up to (n) and ask the model to predict (a_n).
Sample 16 responses per prompt. For each unique predicted action (a_{pred}), fast-forward env to step (n-1) and execute (a_{pred}) to check acceptance.
Applied across TextWorld, TextWorld-Express, ALFWorld, ScienceWorld for walkthroughs of length < 5 (~600 prompts).
For Qwen3-30B-A3B, prepend /no_think to the user input.

Command used:

python resources_servers/tales/example_scripts/single_turn/generate_single_turn_rollouts.py

examples_clean are the stripped down input-output for ease of viewing. ~~examples_full contain the entire input-output.~~ examples_full have been removed due to the response id triggering the secret detector.

Report the reward distribution (percent all-correct / all-incorrect / mixture):
See the outputs under data/single_turn_rollouts

5) Training-based correctness check (after NeMo Gym + NeMo RL integration)

N/A (Was told this isn't ready yet)

6) PR Check and Review

Reviewer (independent reproduction):
Prithviraj Ammanabrolu (pammanabrolu@nvidia.com)

Reviewer checklist:

Verified steps 1–5 above
Checked correctness of 5 examples
Re-ran README procedure to regenerate dataset
After reproduction success, pinged @banghuaz-nvidia @bxyu-nvidia

Signing Your Work

All commits include a DCO sign-off:

git commit -s -m "Add TALES resource server integration and examples"

Pointers to examples & docs (from repo layout)

Single-turn examples & generator scripts: resources_servers/tales/example_scripts/single_turn/
- generate_single_turn_gpt_rollouts.py (5 GPT-4o examples)
- generate_single_turn_rollouts.py (~500+ prompts for reward profiling)
Multi-turn drafts & notes: see example_scripts/multi_turn/ and notes in README
Sample data: data/single_turn_rollouts/example_clean.jsonl (referenced in README)

Environment / Setup Recap

Java required (ScienceWorld only):

sudo apt-get update && sudo apt-get install -y default-jre default-jdk

Start vLLM (example):

vllm serve Qwen/Qwen3-30B-A3B --enable-expert-parallel --host 0.0.0.0 --port 8000

Start NeMo Gym server for TALES:

source .venv/bin/activate
config_paths="responses_api_models/openai_model/configs/openai_model.yaml,\
resources_servers/tales/configs/tales.yaml"
ng_run "+config_paths=[$config_paths]"

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Add copy-pr-bot

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Add initial repo template

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com> Co-authored-by: bxyu-nvidia <bxyu@nvidia.com>

Migrated over from gitlab: - Display aggregate metrics - Aggregate generic keys using multineedle - Display other dynamic aggregations - Count string totals and unique values - Remove TrainDataProcessor dependency, add test - Remove dupe file read, fix arg types hints --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Signed-off-by: Brian Yu <bxyu@nvidia.com>

…nfo (#27) Signed-off-by: Brian Yu <bxyu@nvidia.com>

updated the following logging print when running ng_prepare_data from, for example: "Found 0 agent server instance configs withOUT datasets:" to "Found 0 agent server instance configs WITHOUT datasets:" to match the format of the subsequent logs, for example: "Found 1 agent server instance configs WITH datasets:" Signed-off-by: chrismun <cmunley@nvidia.com>

update readme for resources servers for updated cli Signed-off-by: chrismun <cmunley@nvidia.com>

Signed-off-by: Brian Yu <bxyu@nvidia.com>

@banghuaz-nvidia

## DRAFT - Seeking Additional Input ### What's Complete - Types of contributions and priorities - Development setup and workflow - DCO and commit signing (complete guide) - CI/CD requirements and troubleshooting - Quality control checklist for resource servers - Common issues and troubleshooting ### What Needs Input #### Resource Server Guidelines - These need to be updated for OSS community users @banghuaz-nvidia #### RL Framework Integrations - I proposed a checklist of things, but need @bxyu-nvidia to help --- Addresses #132 --------- Signed-off-by: Chris Wing <cwing@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: Brian Yu <bxyu@nvidia.com>

…id being mistaken as a secret Signed-off-by: Christopher Z. Cui <czcui@ucsd.edu>

…ing secret checker Signed-off-by: Christopher Z. Cui <czcui@ucsd.edu>

…omehow Signed-off-by: Christopher Z. Cui <czcui@ucsd.edu>

…atures (#155) - Add per-record regex extraction from template_metadata.output_regex - Add full generation rescue when regex extraction fails (partial credit) - Add length-based threshold to skip regex for long answers (>120 chars) - Add 3 new tests covering all new features (7/7 passing) - Add example_openqa.jsonl with 5 diverse examples + rollouts + metrics - Update README with new config fields and accurate defaults - Optimize defaults for OpenQA while maintaining backward compatibility All features only activate when template_metadata.output_regex is present, making them safe for existing datasets without template_metadata. --------- Signed-off-by: Pritam Gundecha <pgundecha@nvidia.com>

…port STEM MCQA dataset (#128) Adds support for custom answer extraction in MCQA resources server via the optional `template_metadata.output_regex` field. This enables handling STEM datasets with custom prompt formats that don't match the standard grading modes. --------- Signed-off-by: Pritam Gundecha <pgundecha@nvidia.com>

Signed-off-by: Brian Yu <bxyu@nvidia.com>

#193 --------- Signed-off-by: Sugam Devare <sdevare@nvidia.com>

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Signed-off-by: Sugam Devare <sdevare@nvidia.com>

…d in pull request Signed-off-by: Christopher Zhang Cui <czcui@ucsd.edu>

Signed-off-by: Christopher Zhang Cui <czcui@ucsd.edu>

…ating README Signed-off-by: Christopher Zhang Cui <czcui@ucsd.edu>

vadam5 · 2026-03-11T00:25:15Z

Sorry folks, this PR was mistakenly closed when one of our folks mistakenly force-pushed diverging refs to Github. We are looking to remedy this and re-open the PR.

vadam5 · 2026-03-11T02:42:49Z

Replacement PR opened here: #874

chtruong814 and others added 30 commits August 25, 2025 16:39

Initial commit

51cc441

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Add copy-pr-bot

7625f00

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Add initial repo template

cd96ed4

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Merge pull request #1 from NVIDIA-NeMo/chtruong/copy-pr-bot

d0c0cac

Add copy-pr-bot

Merge remote-tracking branch 'origin/main' into chtruong/template

9b1afdc

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Add docstring parser

4847931

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Fix docs build

817d85e

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Fix secret detector

831d980

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Change to use cpu runner for build

e89b476

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Fix initial test

b5b5980

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Use uv

2acdd68

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Remove e2e coverage

cdea8b7

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Merge pull request #2 from NVIDIA-NeMo/chtruong/template

ced800a

Add initial repo template

Update GitHub with Gitlab main (#3)

9501fdf

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Alias as Penguin (#4)

a6cd962

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Add Copyright docs README FAQ (#7)

d292374

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Dapo17k (#6)

7ebdbd2

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Fix docs build failures (#8)

10e2971

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Fix docs (#10)

f2e5eb9

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Improve Github SSH Key setup docs (#12)

8c753d1

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Comp-Coding Verifier (#5)

48212f8

Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com> Co-authored-by: bxyu-nvidia <bxyu@nvidia.com>

VLLMModel docs in main Readme (#13)

0d4eb58

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Fix agent name in docs (#15)

0d2cf31

Signed-off-by: Brian Yu <bxyu@nvidia.com>

VLLMModel propogates token IDs (#11)

e5c2afd

Signed-off-by: Brian Yu <bxyu@nvidia.com>

VLLMModel tokenize params cleanup (#21)

90ca6a9

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Update Comp-Coding README.md (#26)

0ebd762

Docs improvements - remove Why NeMo Gym section and add CI/CD tests i…

b590d40

…nfo (#27) Signed-off-by: Brian Yu <bxyu@nvidia.com>

update readmes from ng_collect_traj to ng_collect_rollouts (#25)

323b66c

update readme for resources servers for updated cli Signed-off-by: chrismun <cmunley@nvidia.com>

bxyu-nvidia and others added 15 commits October 21, 2025 09:44

Large docs improvement PR from @cwing-nvidia (#208)

8deb3cb

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Add back How-To's and FAQs (#209)

7f58807

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Docs fixes (#210)

24148f5

Signed-off-by: Brian Yu <bxyu@nvidia.com>

README edits, removing the *_full.jsonl rollouts due to the response_…

87e75e8

…id being mistaken as a secret Signed-off-by: Christopher Z. Cui <czcui@ucsd.edu>

Commented out code that generates *_full.jsonl files to avoid trigger…

651d4a6

…ing secret checker Signed-off-by: Christopher Z. Cui <czcui@ucsd.edu>

Reverted vllm_model.yaml since it was getting picked up as a secret s…

2647a2d

…omehow Signed-off-by: Christopher Z. Cui <czcui@ucsd.edu>

Add README to docs folder (#216)

e8b1d53

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Ray comp coding infra (#195)

b7e958b

#193 --------- Signed-off-by: Sugam Devare <sdevare@nvidia.com>

Misc docs fixes (#218)

770097e

Signed-off-by: Brian Yu <bxyu@nvidia.com>

CLI help and command help; misc improvements (#229)

b783377

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Misc infra 20251024 (#234)

e2dff44

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Fix ray version mismatch (#231)

cb63942

Signed-off-by: Sugam Devare <sdevare@nvidia.com>

This was referenced Oct 27, 2025

Session ID not consistent when requerying same server_client instance #238

Open

openai_model doesn't accept assistant role messages #239

Closed

christopherzc added 2 commits October 27, 2025 18:55

Added more info to README, included scripts to generate bugs mentione…

095777c

…d in pull request Signed-off-by: Christopher Zhang Cui <czcui@ucsd.edu>

Updated to latest main

2d2e62c

Signed-off-by: Christopher Zhang Cui <czcui@ucsd.edu>

This was referenced Oct 27, 2025

Improve documentation on async/await patterns #240

Open

doc: model serving options (vLLM) #194

Closed

Improve multi-step agent documentation #242

Closed

Assistant message issue fixed, removing bug generation script and upd…

96f1854

…ating README Signed-off-by: Christopher Zhang Cui <czcui@ucsd.edu>

cmunley1 mentioned this pull request Nov 25, 2025

textworld resource server #115

Closed

snowmanwwg added the x-ucsd label Jan 4, 2026

vadam5 closed this Mar 10, 2026

vadam5 force-pushed the christopherzc/tales branch from 98c6cb9 to 96f1854 Compare March 10, 2026 23:08

vadam5 mentioned this pull request Mar 11, 2026

Implementing TALES Resources Server #874

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing TALES Resource Server#203

Implementing TALES Resource Server#203
christopherzc wants to merge 111 commits intomainfrom
christopherzc/tales

christopherzc commented Oct 19, 2025 •

edited

Loading

Uh oh!

vadam5 commented Mar 11, 2026

Uh oh!

vadam5 commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

Conversation

christopherzc commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contributing To NeMo-Gym — PR Answers (TALES Resource Server)

1) Necessary information

2) Simple correctness check

examples_full contain the entire input-output. examples_full have been removed due to the response id triggering the secret detector.

3) Tests

4) Reward profiling

5) Training-based correctness check (after NeMo Gym + NeMo RL integration)

6) PR Check and Review

Signing Your Work

Pointers to examples & docs (from repo layout)

Environment / Setup Recap

Uh oh!

vadam5 commented Mar 11, 2026

Uh oh!

vadam5 commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

christopherzc commented Oct 19, 2025 •

edited

Loading