Skip to content

feat (OpenQA): Add OpenQA support with per-record regex and rescue features#155

Merged
psgundecha-nv merged 7 commits intomainfrom
psgundecha/rl-templates-openqa
Oct 21, 2025
Merged

feat (OpenQA): Add OpenQA support with per-record regex and rescue features#155
psgundecha-nv merged 7 commits intomainfrom
psgundecha/rl-templates-openqa

Conversation

@psgundecha-nv
Copy link
Copy Markdown
Contributor

  • Add per-record regex extraction from template_metadata.output_regex
  • Add full generation rescue when regex extraction fails (partial credit)
  • Add length-based threshold to skip regex for long answers (>120 chars)
  • Add 3 new tests covering all new features (7/7 passing)
  • Add example_openqa.jsonl with 5 diverse examples + rollouts + metrics
  • Update README with new config fields and accurate defaults
  • Optimize defaults for OpenQA while maintaining backward compatibility

All features only activate when template_metadata.output_regex is present,
making them safe for existing datasets without template_metadata.

Signed-off-by: Pritam Gundecha <pgundecha@nvidia.com>
Signed-off-by: Pritam Gundecha <pgundecha@nvidia.com>
Signed-off-by: Pritam Gundecha <pgundecha@nvidia.com>
- Add check_full_generation_on_fail config to compare expected vs full generation when first pass fails
- Add reward_if_full_generation_succeeds config for partial credit (default 0.5)
- Implement fallback logic: use full generation when use_per_record_regex=true, swap otherwise
- Add length threshold optimization to skip swap check for very long answers
- Maintain 100% backward compatibility with existing behavior
- All existing tests pass

Signed-off-by: Pritam Gundecha <pgundecha@nvidia.com>
Signed-off-by: Pritam Gundecha <pgundecha@nvidia.com>
Signed-off-by: Pritam Gundecha <pgundecha@nvidia.com>
- Add per-record regex extraction from template_metadata.output_regex
- Add full generation rescue when regex extraction fails (partial
  credit)
- Add length-based threshold to skip regex for long answers (>120 chars)
- Add 3 new tests covering all new features (7/7 passing)
- Add example_openqa.jsonl with 5 diverse examples + rollouts + metrics
- Update README with new config fields and accurate defaults
- Optimize defaults for OpenQA while maintaining backward compatibility

All features only activate when template_metadata.output_regex is
present, making them safe for existing datasets without template_metadata.

Signed-off-by: Pritam Gundecha <pgundecha@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Oct 14, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@psgundecha-nv psgundecha-nv force-pushed the psgundecha/rl-templates-openqa branch 2 times, most recently from ad6f55f to ec29578 Compare October 14, 2025 17:38
@psgundecha-nv psgundecha-nv changed the title feat: add OpenQA support with per-record regex and rescue features feat (OpenQA): Add OpenQA support with per-record regex and rescue features Oct 14, 2025
@banghuaz-nvidia
Copy link
Copy Markdown
Contributor

Do we already have a ready-to-train dataset on HF? If so can we put the dataset link and instructions to both config.yaml and readme? This applies to both MCQA and OpenQA.

@@ -0,0 +1,5 @@
{"responses_create_params": {"input": [{"role": "user", "content": "Your final answer (and only the answer) must be enclosed in double parentheses. Solve the problem and include necessary explanations.\n\nConsider\n\\[\n\\frac{dx}{dt}=-x-e^{-rx}+\\sqrt{2}\\alpha+\\frac{3}{2},\n\\]\nwhere \\(r>0\\).\n\nWhat is the nearest bifurcation point to 0?"}]}, "expected_answer": "\\[-\\frac{\\sqrt{2}}{4}\\]", "uuid": "6279204c-9a94-5755-96be-5c313796d3b0", "reward_profiles": [{"model_hf_path": "Qwen/Qwen3-30B-A3B", "num_generations": 3, "pass_rate": 1.0}], "template_metadata": {"template_id": "openqa_generated_153", "template_prompt": "Your final answer (and only the answer) must be enclosed in double parentheses. Solve the problem and include necessary explanations.\n\n{problem}", "output_regex": "\\(\\((.*?)\\)\\)", "weight": 0.004310344827586207, "prompt_type": "generated", "format_type": "openqa"}}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the weight here?

Copy link
Copy Markdown
Contributor

@banghuaz-nvidia banghuaz-nvidia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synced with Pritam on questions there. Can confirm this works fine and backward compatible.

@psgundecha-nv psgundecha-nv force-pushed the psgundecha/rl-templates-openqa branch from df8fc0f to 9d4cbcf Compare October 21, 2025 21:05
@psgundecha-nv psgundecha-nv merged commit eb676a5 into main Oct 21, 2025
5 checks passed
@psgundecha-nv psgundecha-nv deleted the psgundecha/rl-templates-openqa branch October 21, 2025 21:17
abubakaria56 pushed a commit to abubakaria56/Gym that referenced this pull request Mar 2, 2026
…atures (NVIDIA-NeMo#155)

- Add per-record regex extraction from template_metadata.output_regex
- Add full generation rescue when regex extraction fails (partial
credit)
- Add length-based threshold to skip regex for long answers (>120 chars)
- Add 3 new tests covering all new features (7/7 passing)
- Add example_openqa.jsonl with 5 diverse examples + rollouts + metrics
- Update README with new config fields and accurate defaults
- Optimize defaults for OpenQA while maintaining backward compatibility

All features only activate when template_metadata.output_regex is
present,
making them safe for existing datasets without template_metadata.

---------

Signed-off-by: Pritam Gundecha <pgundecha@nvidia.com>
abubakaria56 pushed a commit to abubakaria56/Gym that referenced this pull request Mar 2, 2026
…atures (NVIDIA-NeMo#155)

- Add per-record regex extraction from template_metadata.output_regex
- Add full generation rescue when regex extraction fails (partial
credit)
- Add length-based threshold to skip regex for long answers (>120 chars)
- Add 3 new tests covering all new features (7/7 passing)
- Add example_openqa.jsonl with 5 diverse examples + rollouts + metrics
- Update README with new config fields and accurate defaults
- Optimize defaults for OpenQA while maintaining backward compatibility

All features only activate when template_metadata.output_regex is
present,
making them safe for existing datasets without template_metadata.

---------

Signed-off-by: Pritam Gundecha <pgundecha@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants