Add the RDKit-Chemistry RL Environment by danecor · Pull Request #921 · NVIDIA-NeMo/Gym

danecor · 2026-03-20T20:49:20Z

Implements a Nemo-Gym resources server for verifiable chemistry question answering, with and without tools (python + rdkit). The agent receives a natural-language chemistry question + SMILES molecule and must respond with a number or binary flag. The reward signal is generated from rdkit code that calculates deterministic numerical or boolean properties of SMILES: exact-match for integer/bool/fragment properties; 1/(1 + e) for continuous float properties.

in the no-tools scenario, the goal is to train the model to reason about molecular properties directly and improve basic chemistry skills. In the tools condition, the goal is to train the model to use rdkit to evaluate molecular properties, when it is available.

Implements a Nemo-Gym resources server for verifiable chemistry question answering (direct generation variant). The agent receives a natural-language chemistry question + SMILES and must respond with a single number or binary flag. The reward signal mirrors the offline benchmarking pipeline: exact-match for integer/bool/fragment properties; percentile-based threshold (must beat 95% of the ChEMBL prior) for continuous float properties. Data generated from the full_strat025_perm benchmark experiment via scripts/export_nemo_gym_data.py and scripts/precompute_reward_stats.py in chemistry-benchmarking-for-lead-op (branch: dane/nemo-gym-rl-port). Made-with: Cursor

Replace the percentile-based binary (0/1) float reward with a continuous negative absolute error: reward = -|predicted - actual|. A perfect prediction scores 0.0; larger errors give more negative rewards. This removes the dependency on reward_stats.json and features.parquet quantile precomputation. - app.py: remove quantile machinery (numpy, FLOAT_ACCURACY_THRESHOLD, _float_reward, _load_reward_stats, ChemistryDirectConfig.reward_stats_path); simplify compute_reward() to three-line dispatch; add absolute_error field to ChemistryDirectVerifyResponse - tests/test_app.py: replace quantile tests with MAE-based float reward tests - data/reward_stats.json: deleted (no longer needed) - README.md: update reward table and data generation instructions Made-with: Cursor

Made-with: Cursor

This config should not be part of the rdkit_chemistry MR. Keeping locally for testing only. Made-with: Cursor

Add ns_tools wrapper config for sandboxed Python/RDKit execution, rename agent to rdkit_chemistry_agent, and rewrite README to document both direct and mcp-python methods. Made-with: Cursor

Replace example rows with new ChEMBL molecules covering all five property types for both direct and mcp-python methods. Add use_box_format boolean flag to each row; rows with boxed prompts set it to true. Made-with: Cursor

Support use_box_format flag on dataset rows: when true, extract the predicted value exclusively from \boxed{...} expressions (returning reward 0 if absent). When false, the existing permissive cascade is used unchanged. Also add compute_metrics() for per-method/property-type breakdowns and update get_key_metrics() to surface method-level stats. Made-with: Cursor

…erver Introduce sandbox_launcher.py that starts a local nemo_skills sandbox process directly from setup_webserver(), removing the need for external sandbox orchestration in sbatch/interactive scripts. Configured via SANDBOX_VENV_PATH and SANDBOX_DISCOVERY_PATH env vars, with fallback to no-op when unset. Made-with: Cursor

Made-with: Cursor

…chemistry' Update to RDKit reward function (optional) See merge request bxyu/nemo-gym!240

copy-pr-bot · 2026-03-20T20:49:24Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

resources_servers/rdkit_chemistry/app.py

…om/bxyu/nemo-gym into dane/rdkit-chemistry

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Restore the default request timeout and remove the ad hoc retry path so rollout collection matches the mainline nemo_gym behavior again. Signed-off-by: Dane Corneil <dane.corneil@gretel.ai>

This reverts commit 8ceee69, reversing changes made to 8980bba.

Signed-off-by: Brian Yu <bxyu@nvidia.com> (cherry picked from commit 63c184e)

Record the RDKit Nano training workflow changes from the submodule so this branch points at the EOS training scripts and data-preparation flow. Signed-off-by: Dane Corneil <dane.corneil@gretel.ai>

danecor and others added 20 commits March 13, 2026 15:22

Rename chemistry_direct → rdkit_chemistry, branch → dane/rdkit-chemistry

616abcf

Made-with: Cursor

Shorten license.

11d8b95

Config for gpt oss 20b low.

bf7af8b

Updates to include mcp tools.

6efa376

Remove gpt-oss-20b-reasoning-low.yaml from branch

9139a43

This config should not be part of the rdkit_chemistry MR. Keeping locally for testing only. Made-with: Cursor

Update rdkit_chemistry config and README for mcp-python tool-use support

9c678e3

Add ns_tools wrapper config for sandboxed Python/RDKit execution, rename agent to rdkit_chemistry_agent, and rewrite README to document both direct and mcp-python methods. Made-with: Cursor

Update example data with new samples and use_box_format field

1ad524d

Replace example rows with new ChEMBL molecules covering all five property types for both direct and mcp-python methods. Add use_box_format boolean flag to each row; rows with boxed prompts set it to true. Made-with: Cursor

New prompt format, strict answer extraction.

41eb022

Updates example jsonl.

401d964

Update examples.

81e2d98

Parse both content and tool calls correctly.

f829caf

Update rdkit-chemistry-gym submodule to array-based rollout workflow

c73befc

Made-with: Cursor

Update to RDKit reward function (optional)

adac43b

Merge branch 'michelle/rdkit-chemistry_new_rewards' into 'dane/rdkit-…

dbb1c60

…chemistry' Update to RDKit reward function (optional) See merge request bxyu/nemo-gym!240

Revert defensive behavior in app.py

d53370a

Revert app.py

f54cdab

New example file.

100d7e8

jubick1337 reviewed Mar 21, 2026

View reviewed changes

resources_servers/rdkit_chemistry/app.py Show resolved Hide resolved

danecor and others added 7 commits March 23, 2026 16:45

Merge branch 'dane/rdkit-chemistry' of https://gitlab-master.nvidia.c…

315d85a

…om/bxyu/nemo-gym into dane/rdkit-chemistry

Fix closing parentheses.

44c6ab7

Temp: retry-then-continue logic in core nemo-gym code.

1c6b6b8

try fix duplicated usage counting and

63c184e

Signed-off-by: Brian Yu <bxyu@nvidia.com>

empty commit for qa

3b63617

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Revert rollout retry experiment.

8980bba

Restore the default request timeout and remove the ad hoc retry path so rollout collection matches the mainline nemo_gym behavior again. Signed-off-by: Dane Corneil <dane.corneil@gretel.ai>

Merge branch 'bxyu/fix-938' into dane/rdkit-chemistry

8ceee69

danecor and others added 3 commits March 23, 2026 20:16

Revert "Merge branch 'bxyu/fix-938' into dane/rdkit-chemistry"

c2b3343

This reverts commit 8ceee69, reversing changes made to 8980bba.

try fix duplicated usage counting and

3b46072

Signed-off-by: Brian Yu <bxyu@nvidia.com> (cherry picked from commit 63c184e)

Update rdkit-chemistry-gym submodule.

aae3072

Record the RDKit Nano training workflow changes from the submodule so this branch points at the EOS training scripts and data-preparation flow. Signed-off-by: Dane Corneil <dane.corneil@gretel.ai>

danecor requested a review from a team as a code owner March 24, 2026 00:18

danecor closed this Mar 24, 2026

danecor mentioned this pull request Mar 24, 2026

Add the RDKit-Chemistry RL Environment #940

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the RDKit-Chemistry RL Environment#921

Add the RDKit-Chemistry RL Environment#921
danecor wants to merge 31 commits intomainfrom
dane/rdkit-chemistry

danecor commented Mar 20, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

danecor commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

danecor commented Mar 20, 2026 •

edited

Loading