Skip to content

feat(ether0): Add boxed and Answer: LETTER extraction fallbacks#925

Merged
cmunley1 merged 7 commits intomainfrom
mnovikov/ether0-multi-format-extraction-v2
Mar 26, 2026
Merged

feat(ether0): Add boxed and Answer: LETTER extraction fallbacks#925
cmunley1 merged 7 commits intomainfrom
mnovikov/ether0-multi-format-extraction-v2

Conversation

@jubick1337
Copy link
Copy Markdown
Contributor

Add multi-format answer extraction to the ether0 verifier. When the original tag extraction fails, try \boxed{} and Answer: LETTER formats as fallbacks.

Add multi-format answer extraction to the ether0 verifier. When the
original <answer> tag extraction fails, try \boxed{} and Answer: LETTER
formats as fallbacks. This enables using ether0 data with standard
GPQA/MCQ prompt formats.

Tested with Nano v3 on 10K curriculum: extraction rate improved from
15% (answer tags only) to 50% (with fallbacks), pass rate from 2.4%
to 7.2%.

Signed-off-by: mnovikov <mnovikov@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 20, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cmunley1
Copy link
Copy Markdown
Contributor

LGTM, I will test to ensure backwards compatibility, thanks!

@cmunley1
Copy link
Copy Markdown
Contributor

backwards compatible seems fine

ng_collect_rollouts     +agent_name=ether0_simple_agent     +input_jsonl_fpath=resources_servers/ether0/data/example.jsonl     +output_jsonl_fpath=resources_servers/ether0/data/ether0_rollouts.jsonl +limit=10
Limiting the number of rows to 10
Using `ether0_simple_agent` for rows that do not already have an agent ref
Repeating rows 1 times (in a pattern of abc to aabbcc)!
Reading rows: 5it [00:00, 45294.86it/s]
Clearing output fpath since `resume_from_cache=False`!
INFO:     127.0.0.1:16296 - "GET /global_config_dict_yaml HTTP/1.1" 200 OK
Collecting rollouts:   0%|                                                                                                | 0/5 [00:00<?, ?it/s](APIServer pid=3829230) INFO 03-20 22:10:32 [loggers.py:259] Engine 000: Avg prompt throughput: 54.5 tokens/s, Avg generation throughput: 37.3 tokens/s, Running: 5 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.6%, Prefix cache hit rate: 23.7%
(APIServer pid=3829230) INFO:     127.0.0.1:23906 - "POST /v1/chat/completions HTTP/1.1" 200 OK
Examples left:
1. ether0_simple_agent: 4
(APIServer pid=3829230) INFO:     127.0.0.1:23918 - "POST /v1/chat/completions HTTP/1.1" 200 OK
Examples left:
1. ether0_simple_agent: 3
(APIServer pid=3829230) INFO:     127.0.0.1:23878 - "POST /v1/chat/completions HTTP/1.1" 200 OK
Examples left:
1. ether0_simple_agent: 2
(APIServer pid=3829230) INFO:     127.0.0.1:23862 - "POST /v1/chat/completions HTTP/1.1" 200 OK
Examples left:
1. ether0_simple_agent: 1
(APIServer pid=3829230) INFO 03-20 22:10:42 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 189.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.4%, Prefix cache hit rate: 23.7%
(APIServer pid=3829230) INFO 03-20 22:10:52 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 57.5 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.8%, Prefix cache hit rate: 23.7%
(APIServer pid=3829230) INFO:     127.0.0.1:23892 - "POST /v1/chat/completions HTTP/1.1" 200 OK
Collecting rollouts: 100%|████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:23<00:00,  4.78s/it]
Sorting results to ensure consistent ordering
Computing aggregate metrics
INFO:     127.0.0.1:12654 - "GET /global_config_dict_yaml HTTP/1.1" 200 OK

Key metrics for ether0_simple_agent:
{
    "mean/reward": 0.8,
    "mean/input_tokens": 294.8,
    "mean/output_tokens": 1173.2,
    "mean/total_tokens": 1468.0
}
Finished rollout collection! View results at:
Fully materialized inputs: resources_servers/ether0/data/ether0_rollouts_materialized_inputs.jsonl
Rollouts: resources_servers/ether0/data/ether0_rollouts.jsonl
Aggregate metrics: resources_servers/ether0/data/ether0_rollouts_aggregate_metrics.json

@cmunley1
Copy link
Copy Markdown
Contributor

aai also lgtm

ng_collect_rollouts     +agent_name=ether0_simple_agent     +input_jsonl_fpath=train_curriculum_10k_aai_prompts.jsonl     +output_jsonl_fpath=resources_servers/ether0/data/ether0_rol
louts.jsonl +limit=10
Limiting the number of rows to 10
Using `ether0_simple_agent` for rows that do not already have an agent ref
Repeating rows 1 times (in a pattern of abc to aabbcc)!
Reading rows: 9it [00:00, 26141.78it/s]
Clearing output fpath since `resume_from_cache=False`!
INFO:     127.0.0.1:15516 - "GET /global_config_dict_yaml HTTP/1.1" 200 OK
Collecting rollouts:   0%|                                                                                               | 0/10 [00:00<?, ?it/s](APIServer pid=3829230) INFO 03-20 22:12:22 [loggers.py:259] Engine 000: Avg prompt throughput: 121.2 tokens/s, Avg generation throughput: 164.1 tokens/s, Running: 10 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.8%, Prefix cache hit rate: 19.1%
(APIServer pid=3829230) INFO:     127.0.0.1:20756 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=3829230) INFO:     127.0.0.1:20732 - "POST /v1/chat/completions HTTP/1.1" 200 OK
Examples left:
1. ether0_simple_agent: 8
(APIServer pid=3829230) INFO:     127.0.0.1:20684 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=3829230) INFO:     127.0.0.1:20696 - "POST /v1/chat/completions HTTP/1.1" 200 OK
Examples left:
1. ether0_simple_agent: 6
(APIServer pid=3829230) INFO:     127.0.0.1:20710 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=3829230) INFO:     127.0.0.1:20678 - "POST /v1/chat/completions HTTP/1.1" 200 OK
Examples left:
1. ether0_simple_agent: 4
(APIServer pid=3829230) INFO:     127.0.0.1:20714 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=3829230) INFO:     127.0.0.1:20664 - "POST /v1/chat/completions HTTP/1.1" 200 OK
Examples left:
1. ether0_simple_agent: 2
(APIServer pid=3829230) INFO 03-20 22:12:32 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 337.0 tokens/s, Running: 2 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.0%, Prefix cache hit rate: 19.1%
(APIServer pid=3829230) INFO:     127.0.0.1:20746 - "POST /v1/chat/completions HTTP/1.1" 200 OK
Examples left:
1. ether0_simple_agent: 1
(APIServer pid=3829230) INFO:     127.0.0.1:20726 - "POST /v1/chat/completions HTTP/1.1" 200 OK
Collecting rollouts: 100%|██████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:17<00:00,  1.72s/it]
Sorting results to ensure consistent ordering
Computing aggregate metrics
INFO:     127.0.0.1:64150 - "GET /global_config_dict_yaml HTTP/1.1" 200 OK

Key metrics for ether0_simple_agent:
{
    "mean/reward": 0.8,
    "mean/input_tokens": 290.4,
    "mean/output_tokens": 1043.2,
    "mean/total_tokens": 1333.6
}
Finished rollout collection! View results at:
Fully materialized inputs: resources_servers/ether0/data/ether0_rollouts_materialized_inputs.jsonl
Rollouts: resources_servers/ether0/data/ether0_rollouts.jsonl
Aggregate metrics: resources_servers/ether0/data/ether0_rollouts_aggregate_metrics.json

@cmunley1 cmunley1 self-requested a review March 21, 2026 05:13
cmunley1
cmunley1 previously approved these changes Mar 21, 2026
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
cmunley1
cmunley1 previously approved these changes Mar 21, 2026
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
@cmunley1 cmunley1 merged commit aca89a9 into main Mar 26, 2026
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants