Skip to content

Implementing TALES Resource Server#203

Closed
christopherzc wants to merge 111 commits intomainfrom
christopherzc/tales
Closed

Implementing TALES Resource Server#203
christopherzc wants to merge 111 commits intomainfrom
christopherzc/tales

Conversation

@christopherzc
Copy link
Copy Markdown
Collaborator

@christopherzc christopherzc commented Oct 19, 2025

Contributing To NeMo-Gym — PR Answers (TALES Resource Server)

1) Necessary information

i. Corresponding dataset on the spreadsheet: N/A

ii. Description of the prompt (source + domain):

  • Domain: Text-game environments (multi-framework agentic evaluation).
  • Source: [TALES / tale-suite] tasks (tt_split branch). Prompts are the system prompt used in TALES plus the observation–action history.

iii. Description of the environment:

  • This is an implementation of TALES: a multi-framework agentic environment that evaluates an agent's ability to reason through and progress through open-ended, situated, text-environments. TALES consists of 5 text-adventure game frameworks with a total of 122 tasks (games) with the rough order difficulty of textworld, textworld_express, alfworld, scienceworld, and jericho.

iv. Description of the verifier:

  • Rewards/scores come from the underlying Gymnasium environments; the environment itself acts as the verifier.
  • Ground-truth walkthroughs exist but are not unique; many actions are accepted via nearest-neighbour parsing (e.g., “take lantern” / “get lantern” / “pick up lantern”).

v. Legal approval status:
N/A


2) Simple correctness check

i Commands used to run the server for the uploaded data:

# 1) Ensure vLLM is installed and simple_weather example works (per main repo)
# 2) Start vLLM in another terminal (example):
vllm serve Qwen/Qwen3-30B-A3B --enable-expert-parallel --host 0.0.0.0 --port 8000

# 3) Start the TALES Gym server:
source .venv/bin/activate
config_paths="responses_api_models/openai_model/configs/openai_model.yaml,\
resources_servers/tales/configs/tales.yaml"
ng_run "+config_paths=[$config_paths]"

# 4) Generate 5 GPT-4o rollouts:
python resources_servers/tales/example_scripts/single_turn/generate_single_turn_gpt_rollouts.py

ii Resulting rollout and judges (5 examples):
Please see examples under data/gpt4o_single_turn_examples

iii Additional notes for running the server properly:
Please see the README.md under example_scripts/single_turn/ for more details

examples_clean are the stripped down input-output for ease of viewing.

examples_full contain the entire input-output. examples_full have been removed due to the response id triggering the secret detector.

3) Tests

Test files / command to run tests:

Notes on coverage / responsibilities:
The use of the walkthroughs for Step 4 implicitly acts as a unit test for the environments. If needed, a test can be added verifying the outcome of the walkthrough. (Am wanting to wait until the actual multi-turn is working before I added this)


4) Reward profiling

Models:

  • Qwen 3 30B A3B
  • Qwen 3 235B Instruct (for agent/agentic coding/instruction/game) or Qwen 3 235B Thinking (for math/competition coding)

We generate 500+ prompt-response pairs for the specified model. As TALES is inherently multi-turn, not every step on a correct trajectory will return a reward. We do the following to emulate the reward distribution for single-turn domains.

Method (from README):

  • Extract walkthrough actions (a₀…a_k) from each environment.
  • Roll out to obtain ((obs_n, a_n, r_n)).
  • For each step where (r_n \ne 0), build a single-turn prompt containing history up to (n) and ask the model to predict (a_n).
  • Sample 16 responses per prompt. For each unique predicted action (a_{pred}), fast-forward env to step (n-1) and execute (a_{pred}) to check acceptance.
  • Applied across TextWorld, TextWorld-Express, ALFWorld, ScienceWorld for walkthroughs of length < 5 (~600 prompts).
  • For Qwen3-30B-A3B, prepend /no_think to the user input.

Command used:

python resources_servers/tales/example_scripts/single_turn/generate_single_turn_rollouts.py

examples_clean are the stripped down input-output for ease of viewing. examples_full contain the entire input-output. examples_full have been removed due to the response id triggering the secret detector.

Report the reward distribution (percent all-correct / all-incorrect / mixture):
See the outputs under data/single_turn_rollouts


5) Training-based correctness check (after NeMo Gym + NeMo RL integration)

N/A (Was told this isn't ready yet)


6) PR Check and Review

Reviewer (independent reproduction):
Prithviraj Ammanabrolu (pammanabrolu@nvidia.com)

Reviewer checklist:

  • Verified steps 1–5 above
  • Checked correctness of 5 examples
  • Re-ran README procedure to regenerate dataset
  • After reproduction success, pinged @banghuaz-nvidia @bxyu-nvidia

Signing Your Work

All commits include a DCO sign-off:

git commit -s -m "Add TALES resource server integration and examples"

Pointers to examples & docs (from repo layout)

  • Single-turn examples & generator scripts: resources_servers/tales/example_scripts/single_turn/

    • generate_single_turn_gpt_rollouts.py (5 GPT-4o examples)
    • generate_single_turn_rollouts.py (~500+ prompts for reward profiling)
  • Multi-turn drafts & notes: see example_scripts/multi_turn/ and notes in README

  • Sample data: data/single_turn_rollouts/example_clean.jsonl (referenced in README)


Environment / Setup Recap

  • Java required (ScienceWorld only):

    sudo apt-get update && sudo apt-get install -y default-jre default-jdk
  • Start vLLM (example):

    vllm serve Qwen/Qwen3-30B-A3B --enable-expert-parallel --host 0.0.0.0 --port 8000
  • Start NeMo Gym server for TALES:

    source .venv/bin/activate
    config_paths="responses_api_models/openai_model/configs/openai_model.yaml,\
    resources_servers/tales/configs/tales.yaml"
    ng_run "+config_paths=[$config_paths]"

chtruong814 and others added 30 commits August 25, 2025 16:39
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com>
Co-authored-by: bxyu-nvidia <bxyu@nvidia.com>
Migrated over from gitlab:

- Display aggregate metrics
- Aggregate generic keys using multineedle
- Display other dynamic aggregations
- Count string totals and unique values
- Remove TrainDataProcessor dependency, add test
- Remove dupe file read, fix arg types hints

---------

Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
…nfo (#27)

Signed-off-by: Brian Yu <bxyu@nvidia.com>
updated the following logging print when running ng_prepare_data from,
for example:

"Found 0 agent server instance configs withOUT datasets:"

to 

"Found 0 agent server instance configs WITHOUT datasets:" 

to match the format of the subsequent logs, for example: 
"Found 1 agent server instance configs WITH datasets:"

Signed-off-by: chrismun <cmunley@nvidia.com>
update readme for resources servers for updated cli

Signed-off-by: chrismun <cmunley@nvidia.com>
bxyu-nvidia and others added 15 commits October 21, 2025 09:44
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
## DRAFT - Seeking Additional Input

### What's Complete
- Types of contributions and priorities
- Development setup and workflow  
- DCO and commit signing (complete guide)
- CI/CD requirements and troubleshooting
- Quality control checklist for resource servers
- Common issues and troubleshooting

### What Needs Input 

#### Resource Server Guidelines
- These need to be updated for OSS community users @banghuaz-nvidia 

#### RL Framework Integrations
- I proposed a checklist of things, but need @bxyu-nvidia to help 

---
Addresses #132

---------

Signed-off-by: Chris Wing <cwing@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Co-authored-by: Brian Yu <bxyu@nvidia.com>
…id being mistaken as a secret

Signed-off-by: Christopher Z. Cui <czcui@ucsd.edu>
…ing secret checker

Signed-off-by: Christopher Z. Cui <czcui@ucsd.edu>
…omehow

Signed-off-by: Christopher Z. Cui <czcui@ucsd.edu>
…atures (#155)

- Add per-record regex extraction from template_metadata.output_regex
- Add full generation rescue when regex extraction fails (partial
credit)
- Add length-based threshold to skip regex for long answers (>120 chars)
- Add 3 new tests covering all new features (7/7 passing)
- Add example_openqa.jsonl with 5 diverse examples + rollouts + metrics
- Update README with new config fields and accurate defaults
- Optimize defaults for OpenQA while maintaining backward compatibility

All features only activate when template_metadata.output_regex is
present,
making them safe for existing datasets without template_metadata.

---------

Signed-off-by: Pritam Gundecha <pgundecha@nvidia.com>
…port STEM MCQA dataset (#128)

Adds support for custom answer extraction in MCQA resources server via
the optional `template_metadata.output_regex` field. This enables
handling STEM datasets with custom prompt formats that don't match the
standard grading modes.

---------

Signed-off-by: Pritam Gundecha <pgundecha@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
#193

---------

Signed-off-by: Sugam Devare <sdevare@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Sugam Devare <sdevare@nvidia.com>
…d in pull request

Signed-off-by: Christopher Zhang Cui <czcui@ucsd.edu>
Signed-off-by: Christopher Zhang Cui <czcui@ucsd.edu>
…ating README

Signed-off-by: Christopher Zhang Cui <czcui@ucsd.edu>
@vadam5 vadam5 closed this Mar 10, 2026
@vadam5 vadam5 force-pushed the christopherzc/tales branch from 98c6cb9 to 96f1854 Compare March 10, 2026 23:08
@vadam5
Copy link
Copy Markdown
Contributor

vadam5 commented Mar 11, 2026

Sorry folks, this PR was mistakenly closed when one of our folks mistakenly force-pushed diverging refs to Github. We are looking to remedy this and re-open the PR.

@vadam5
Copy link
Copy Markdown
Contributor

vadam5 commented Mar 11, 2026

Replacement PR opened here: #874

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

resources-server Resources servers (math, code, etc.) x-ucsd

Projects

None yet

Development

Successfully merging this pull request may close these issues.