feat(ether0): Add boxed and Answer: LETTER extraction fallbacks by jubick1337 · Pull Request #925 · NVIDIA-NeMo/Gym

jubick1337 · 2026-03-20T22:17:22Z

Add multi-format answer extraction to the ether0 verifier. When the original tag extraction fails, try \boxed{} and Answer: LETTER formats as fallbacks.

Add multi-format answer extraction to the ether0 verifier. When the original <answer> tag extraction fails, try \boxed{} and Answer: LETTER formats as fallbacks. This enables using ether0 data with standard GPQA/MCQ prompt formats. Tested with Nano v3 on 10K curriculum: extraction rate improved from 15% (answer tags only) to 50% (with fallbacks), pass rate from 2.4% to 7.2%. Signed-off-by: mnovikov <mnovikov@nvidia.com>

copy-pr-bot · 2026-03-20T22:17:26Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

cmunley1 · 2026-03-21T04:38:32Z

LGTM, I will test to ensure backwards compatibility, thanks!

cmunley1 · 2026-03-21T05:11:31Z

backwards compatible seems fine

ng_collect_rollouts     +agent_name=ether0_simple_agent     +input_jsonl_fpath=resources_servers/ether0/data/example.jsonl     +output_jsonl_fpath=resources_servers/ether0/data/ether0_rollouts.jsonl +limit=10
Limiting the number of rows to 10
Using `ether0_simple_agent` for rows that do not already have an agent ref
Repeating rows 1 times (in a pattern of abc to aabbcc)!
Reading rows: 5it [00:00, 45294.86it/s]
Clearing output fpath since `resume_from_cache=False`!
INFO:     127.0.0.1:16296 - "GET /global_config_dict_yaml HTTP/1.1" 200 OK
Collecting rollouts:   0%|                                                                                                | 0/5 [00:00<?, ?it/s](APIServer pid=3829230) INFO 03-20 22:10:32 [loggers.py:259] Engine 000: Avg prompt throughput: 54.5 tokens/s, Avg generation throughput: 37.3 tokens/s, Running: 5 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.6%, Prefix cache hit rate: 23.7%
(APIServer pid=3829230) INFO:     127.0.0.1:23906 - "POST /v1/chat/completions HTTP/1.1" 200 OK
Examples left:
1. ether0_simple_agent: 4
(APIServer pid=3829230) INFO:     127.0.0.1:23918 - "POST /v1/chat/completions HTTP/1.1" 200 OK
Examples left:
1. ether0_simple_agent: 3
(APIServer pid=3829230) INFO:     127.0.0.1:23878 - "POST /v1/chat/completions HTTP/1.1" 200 OK
Examples left:
1. ether0_simple_agent: 2
(APIServer pid=3829230) INFO:     127.0.0.1:23862 - "POST /v1/chat/completions HTTP/1.1" 200 OK
Examples left:
1. ether0_simple_agent: 1
(APIServer pid=3829230) INFO 03-20 22:10:42 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 189.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.4%, Prefix cache hit rate: 23.7%
(APIServer pid=3829230) INFO 03-20 22:10:52 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 57.5 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.8%, Prefix cache hit rate: 23.7%
(APIServer pid=3829230) INFO:     127.0.0.1:23892 - "POST /v1/chat/completions HTTP/1.1" 200 OK
Collecting rollouts: 100%|████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:23<00:00,  4.78s/it]
Sorting results to ensure consistent ordering
Computing aggregate metrics
INFO:     127.0.0.1:12654 - "GET /global_config_dict_yaml HTTP/1.1" 200 OK

Key metrics for ether0_simple_agent:
{
    "mean/reward": 0.8,
    "mean/input_tokens": 294.8,
    "mean/output_tokens": 1173.2,
    "mean/total_tokens": 1468.0
}
Finished rollout collection! View results at:
Fully materialized inputs: resources_servers/ether0/data/ether0_rollouts_materialized_inputs.jsonl
Rollouts: resources_servers/ether0/data/ether0_rollouts.jsonl
Aggregate metrics: resources_servers/ether0/data/ether0_rollouts_aggregate_metrics.json

cmunley1 · 2026-03-21T05:12:55Z

aai also lgtm

ng_collect_rollouts     +agent_name=ether0_simple_agent     +input_jsonl_fpath=train_curriculum_10k_aai_prompts.jsonl     +output_jsonl_fpath=resources_servers/ether0/data/ether0_rol
louts.jsonl +limit=10
Limiting the number of rows to 10
Using `ether0_simple_agent` for rows that do not already have an agent ref
Repeating rows 1 times (in a pattern of abc to aabbcc)!
Reading rows: 9it [00:00, 26141.78it/s]
Clearing output fpath since `resume_from_cache=False`!
INFO:     127.0.0.1:15516 - "GET /global_config_dict_yaml HTTP/1.1" 200 OK
Collecting rollouts:   0%|                                                                                               | 0/10 [00:00<?, ?it/s](APIServer pid=3829230) INFO 03-20 22:12:22 [loggers.py:259] Engine 000: Avg prompt throughput: 121.2 tokens/s, Avg generation throughput: 164.1 tokens/s, Running: 10 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.8%, Prefix cache hit rate: 19.1%
(APIServer pid=3829230) INFO:     127.0.0.1:20756 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=3829230) INFO:     127.0.0.1:20732 - "POST /v1/chat/completions HTTP/1.1" 200 OK
Examples left:
1. ether0_simple_agent: 8
(APIServer pid=3829230) INFO:     127.0.0.1:20684 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=3829230) INFO:     127.0.0.1:20696 - "POST /v1/chat/completions HTTP/1.1" 200 OK
Examples left:
1. ether0_simple_agent: 6
(APIServer pid=3829230) INFO:     127.0.0.1:20710 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=3829230) INFO:     127.0.0.1:20678 - "POST /v1/chat/completions HTTP/1.1" 200 OK
Examples left:
1. ether0_simple_agent: 4
(APIServer pid=3829230) INFO:     127.0.0.1:20714 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=3829230) INFO:     127.0.0.1:20664 - "POST /v1/chat/completions HTTP/1.1" 200 OK
Examples left:
1. ether0_simple_agent: 2
(APIServer pid=3829230) INFO 03-20 22:12:32 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 337.0 tokens/s, Running: 2 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.0%, Prefix cache hit rate: 19.1%
(APIServer pid=3829230) INFO:     127.0.0.1:20746 - "POST /v1/chat/completions HTTP/1.1" 200 OK
Examples left:
1. ether0_simple_agent: 1
(APIServer pid=3829230) INFO:     127.0.0.1:20726 - "POST /v1/chat/completions HTTP/1.1" 200 OK
Collecting rollouts: 100%|██████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:17<00:00,  1.72s/it]
Sorting results to ensure consistent ordering
Computing aggregate metrics
INFO:     127.0.0.1:64150 - "GET /global_config_dict_yaml HTTP/1.1" 200 OK

Key metrics for ether0_simple_agent:
{
    "mean/reward": 0.8,
    "mean/input_tokens": 290.4,
    "mean/output_tokens": 1043.2,
    "mean/total_tokens": 1333.6
}
Finished rollout collection! View results at:
Fully materialized inputs: resources_servers/ether0/data/ether0_rollouts_materialized_inputs.jsonl
Rollouts: resources_servers/ether0/data/ether0_rollouts.jsonl
Aggregate metrics: resources_servers/ether0/data/ether0_rollouts_aggregate_metrics.json

Signed-off-by: cmunley1 <cmunley@nvidia.com>

Merge branch 'main' into mnovikov/ether0-multi-format-extraction-v2

d65c942

cmunley1 self-requested a review March 21, 2026 05:13

cmunley1 previously approved these changes Mar 21, 2026

View reviewed changes

cmunley1 added 2 commits March 21, 2026 00:02

aai choices

349d608

Signed-off-by: cmunley1 <cmunley@nvidia.com>

lint

f958609

Signed-off-by: cmunley1 <cmunley@nvidia.com>

jubick1337 dismissed cmunley1’s stale review via f958609 March 21, 2026 07:02

cmunley1 previously approved these changes Mar 21, 2026

View reviewed changes

cmunley1 added 3 commits March 26, 2026 12:05

Merge branch 'main' into mnovikov/ether0-multi-format-extraction-v2

bdd2393

add more tests

affa91c

Signed-off-by: cmunley1 <cmunley@nvidia.com>

ruff

e6fdc0d

Signed-off-by: cmunley1 <cmunley@nvidia.com>

cmunley1 dismissed their stale review via e6fdc0d March 26, 2026 19:57

bxyu-nvidia approved these changes Mar 26, 2026

View reviewed changes

cmunley1 merged commit aca89a9 into main Mar 26, 2026
5 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ether0): Add boxed and Answer: LETTER extraction fallbacks#925

feat(ether0): Add boxed and Answer: LETTER extraction fallbacks#925
cmunley1 merged 7 commits intomainfrom
mnovikov/ether0-multi-format-extraction-v2

jubick1337 commented Mar 20, 2026

Uh oh!

copy-pr-bot bot commented Mar 20, 2026

Uh oh!

cmunley1 commented Mar 21, 2026

Uh oh!

cmunley1 commented Mar 21, 2026

Uh oh!

cmunley1 commented Mar 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jubick1337 commented Mar 20, 2026

Uh oh!

copy-pr-bot bot commented Mar 20, 2026

Uh oh!

cmunley1 commented Mar 21, 2026

Uh oh!

cmunley1 commented Mar 21, 2026

Uh oh!

cmunley1 commented Mar 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants