Skip to content

Add the RDKit-Chemistry RL Environment#921

Closed
danecor wants to merge 31 commits intomainfrom
dane/rdkit-chemistry
Closed

Add the RDKit-Chemistry RL Environment#921
danecor wants to merge 31 commits intomainfrom
dane/rdkit-chemistry

Conversation

@danecor
Copy link
Copy Markdown
Contributor

@danecor danecor commented Mar 20, 2026

Implements a Nemo-Gym resources server for verifiable chemistry question answering, with and without tools (python + rdkit). The agent receives a natural-language chemistry question + SMILES molecule and must respond with a number or binary flag. The reward signal is generated from rdkit code that calculates deterministic numerical or boolean properties of SMILES: exact-match for integer/bool/fragment properties; 1/(1 + e) for continuous float properties.

in the no-tools scenario, the goal is to train the model to reason about molecular properties directly and improve basic chemistry skills. In the tools condition, the goal is to train the model to use rdkit to evaluate molecular properties, when it is available.

danecor and others added 20 commits March 13, 2026 15:22
Implements a Nemo-Gym resources server for verifiable chemistry question
answering (direct generation variant).  The agent receives a natural-language
chemistry question + SMILES and must respond with a single number or binary
flag.  The reward signal mirrors the offline benchmarking pipeline:
exact-match for integer/bool/fragment properties; percentile-based threshold
(must beat 95% of the ChEMBL prior) for continuous float properties.

Data generated from the full_strat025_perm benchmark experiment via
scripts/export_nemo_gym_data.py and scripts/precompute_reward_stats.py in
chemistry-benchmarking-for-lead-op (branch: dane/nemo-gym-rl-port).

Made-with: Cursor
Replace the percentile-based binary (0/1) float reward with a continuous
negative absolute error: reward = -|predicted - actual|. A perfect prediction
scores 0.0; larger errors give more negative rewards. This removes the
dependency on reward_stats.json and features.parquet quantile precomputation.

- app.py: remove quantile machinery (numpy, FLOAT_ACCURACY_THRESHOLD,
  _float_reward, _load_reward_stats, ChemistryDirectConfig.reward_stats_path);
  simplify compute_reward() to three-line dispatch; add absolute_error field
  to ChemistryDirectVerifyResponse
- tests/test_app.py: replace quantile tests with MAE-based float reward tests
- data/reward_stats.json: deleted (no longer needed)
- README.md: update reward table and data generation instructions

Made-with: Cursor
This config should not be part of the rdkit_chemistry MR.
Keeping locally for testing only.

Made-with: Cursor
Add ns_tools wrapper config for sandboxed Python/RDKit execution,
rename agent to rdkit_chemistry_agent, and rewrite README to document
both direct and mcp-python methods.

Made-with: Cursor
Replace example rows with new ChEMBL molecules covering all five property
types for both direct and mcp-python methods. Add use_box_format boolean
flag to each row; rows with boxed prompts set it to true.

Made-with: Cursor
Support use_box_format flag on dataset rows: when true, extract the
predicted value exclusively from \boxed{...} expressions (returning
reward 0 if absent). When false, the existing permissive cascade is
used unchanged.

Also add compute_metrics() for per-method/property-type breakdowns
and update get_key_metrics() to surface method-level stats.

Made-with: Cursor
…erver

Introduce sandbox_launcher.py that starts a local nemo_skills sandbox
process directly from setup_webserver(), removing the need for external
sandbox orchestration in sbatch/interactive scripts.  Configured via
SANDBOX_VENV_PATH and SANDBOX_DISCOVERY_PATH env vars, with fallback
to no-op when unset.

Made-with: Cursor
…chemistry'

Update to RDKit reward function (optional)

See merge request bxyu/nemo-gym!240
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 20, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

danecor and others added 7 commits March 23, 2026 16:45
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Restore the default request timeout and remove the ad hoc retry path so rollout collection matches the mainline nemo_gym behavior again.

Signed-off-by: Dane Corneil <dane.corneil@gretel.ai>
danecor and others added 3 commits March 23, 2026 20:16
This reverts commit 8ceee69, reversing
changes made to 8980bba.
Signed-off-by: Brian Yu <bxyu@nvidia.com>
(cherry picked from commit 63c184e)
Record the RDKit Nano training workflow changes from the submodule so this branch points at the EOS training scripts and data-preparation flow.

Signed-off-by: Dane Corneil <dane.corneil@gretel.ai>
@danecor danecor requested a review from a team as a code owner March 24, 2026 00:18
@danecor danecor closed this Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants