You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+18-18Lines changed: 18 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -450,7 +450,7 @@ source .venv/bin/activate
450
450
pytest
451
451
```
452
452
453
-
At some point, you will want to actually add data that can be used to query your server. Please follow the instructions for [How To: Prepare and validate data for PR submission or RL training](#how-to-prepare-and-validate-data-for-pr-submission-or-rl-training).
453
+
At some point, you will want to actually add data that can be used to query your server. Please follow the instructions for [How To: Prepare and validate data for PR submission or RL training](#how-to-prepare-and-validate-data-for-pr-submission-or-rl-training).
454
454
455
455
456
456
If you need some dataset preprocessing or formatting scripts, please place them your resources server directory e.g. `resources_servers/simple_weather/my_preprocess_script.py`.
@@ -501,7 +501,7 @@ Gitlab uses MLFlow to interface with its model artifact registry. You will need:
501
501
2. The URI will look something like `https://gitlab-master.nvidia.com/api/v4/projects/191584/ml/mlflow/`
502
502
2. Your Gitlab token. Your Gitlab token must have the `api` and `read_api` scopes.
503
503
504
-
Provide your MLFlow credentials in `env.yaml`.
504
+
Provide your MLFlow credentials in `env.yaml`.
505
505
```yaml
506
506
mlflow_tracking_uri: {your NeMo Gym Gitlab URI}
507
507
mlflow_tracking_token: {your Gitlab PAT}
@@ -700,7 +700,7 @@ More often than node, the SHA256 displayed by Github (SHA256:xxxx) should be the
700
700
701
701
For developers that sign commits via SSH keys, this is configuration so that VSCode source control is able to sign commits properly!
702
702
```bash
703
-
git config gpg.format ssh
703
+
git config gpg.format ssh
704
704
git config user.signingkey ~/.ssh/id_ed25519.pub
705
705
```
706
706
@@ -724,35 +724,35 @@ Tying back to NeMo Gym, NeMo gym can be used to create synthetic data for SFT tr
724
724
725
725
# FAQ: Why NeMo Gym?
726
726
727
-
NeMo Gym is a large-scale collection of high-quality verifier environments for multi-verifier RL training.
727
+
NeMo Gym is a large-scale collection of high-quality verifier environments for multi-verifier RL training.
728
728
To enable this, NeMo Gym provides infra support for the rollout server that runs 100+ verifiers in parallel.
729
729
730
730
The document below details why we designed NeMo Gym the way we did. It also includes a direct comparative study that clearly differentiates NeMo Gym from other environment frameworks.
731
731
732
732
\[Banghua\] As of Thu Aug 21:
733
733
734
-
1. Gym is completely different from any of the alternatives above in terms of data **coverage, quantity and quality.** For example, for math only, gym contains 1M+ high-quality math verifiable dataset curated by our internal team, with great math verify \+ LLM-as-a-judge support. In contrast, SkyRL and verifiers above only have a small train subset of GSM8K and AIME. We also have close to 10k SWE development, which require both high quality data curation efforts and good infra support. In contrast, Aviary only focuses on scientific knowledge environment. **None of the existing frameworks support general multi-turn tool-use agent, with tools like search, code execution, and other synthetic tools.**
735
-
2. We will be a **superset** of all existing gym environments. We are already a super-set of Sky RL Lab Gym and verifiers. We have integrated all GEM environments. We’re working with Aviary to incorporate them as well.
734
+
1. Gym is completely different from any of the alternatives above in terms of data **coverage, quantity and quality.** For example, for math only, gym contains 1M+ high-quality math verifiable dataset curated by our internal team, with great math verify \+ LLM-as-a-judge support. In contrast, SkyRL and verifiers above only have a small train subset of GSM8K and AIME. We also have close to 10k SWE development, which require both high quality data curation efforts and good infra support. In contrast, Aviary only focuses on scientific knowledge environment. **None of the existing frameworks support general multi-turn tool-use agent, with tools like search, code execution, and other synthetic tools.**
735
+
2. We will be a **superset** of all existing gym environments. We are already a super-set of Sky RL Lab Gym and verifiers. We have integrated all GEM environments. We’re working with Aviary to incorporate them as well.
736
736
3. As is shown from Brian’s comparison below, we have much **better infra support for scaling**. And the plan is to use NeMo Gym for 500B+ model training for quality improvement. This will make nemo gym battle tested in frontier model training, while the other gyms are mostly for smaller-scale experiments.
737
737
738
738
Key use case requirements to avoid training environment scale, complexity, and diversity limitations:
739
739
740
-
1. Can I easily build my environment without worrying about a training framework?
741
-
2. Can I easily call my model using OpenAI Responses and not worry about reasoning parsing?
742
-
3. Can I easily use your environment framework to build an agent application product?
743
-
4. Can I easily use your environment framework to build a simple multi-agent system?
744
-
5. Can I easily run individual SWE-bench task Docker containers?
745
-
6. Can I easily add an agent built with any agent framework?
746
-
7. Can I easily add any environment framework?
747
-
8. Can I easily simultaneously use math-verify==0.7.0 and math-verify==0.8.0 in 2 different environments?
740
+
1. Can I easily build my environment without worrying about a training framework?
741
+
2. Can I easily call my model using OpenAI Responses and not worry about reasoning parsing?
742
+
3. Can I easily use your environment framework to build an agent application product?
743
+
4. Can I easily use your environment framework to build a simple multi-agent system?
744
+
5. Can I easily run individual SWE-bench task Docker containers?
745
+
6. Can I easily add an agent built with any agent framework?
746
+
7. Can I easily add any environment framework?
747
+
8. Can I easily simultaneously use math-verify==0.7.0 and math-verify==0.8.0 in 2 different environments?
748
748
9. Can I easily spin up multiple environments at once?
Copy file name to clipboardExpand all lines: pyproject.toml
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -64,10 +64,10 @@ dependencies = [
64
64
# 1. Why this dependency is here in NeMo Gym
65
65
# 2. When this dependency was last updated
66
66
# 3. The license of the dependencies.
67
-
#
67
+
#
68
68
# If you are adding or removing dependencies, please do your due diligience to update this information. PRs to main that modify dependencies will not be accepted unless this information is provided.
69
69
# The licenses of the below dependencies include: Apache 2.0, MIT, and BSD 3-Clause
70
-
#
70
+
#
71
71
# By design, most (if not all) dependencies are unfrozen here to be easier to consume. The core pieces we need are server infra like FastAPI, etc.
Verifies competitive programming solutions by executing submitted code against unit tests. The server consumes agent trajectories and returns a reward based on whether the assistant's code produces the correct outputs for given test inputs.
5
+
Data source: [Filtered competitive programming dataset](https://huggingface.co/datasets/Nexusflow/comp_prog_filtered_no_function); split=`train`
- Use only a user message with the problem statement and instructions (e.g., "You are an expert competitive programmer...").
10
+
-`verifier_metadata` (required):
11
+
-`unit_tests` (required): dict with `inputs` and `outputs` arrays containing test cases.
12
+
-`inputs`: list of strings representing stdin input for each test case
13
+
-`outputs`: list of strings representing expected stdout output for each test case
14
+
-`problem_id` (optional): unique identifier for the problem
15
+
16
+
**Notes**
17
+
- All test cases must pass for a solution to receive a reward of 1.0
18
+
- Failed test cases result in a reward of 0.0 with detailed error information
19
+
20
+
### Test execution (for now)
21
+
- Code is executed using Python's `exec()` function in a controlled environment
22
+
- Each test case runs with redirected stdin/stdout:
23
+
-`stdin` is populated with the test input
24
+
-`stdout` is captured for comparison with expected output
25
+
- Available built-ins include common functions: `input`, `print`, `range`, `len`, `int`, `str`, `list`, etc.
26
+
- Newlines in test data are properly handled (converts `\\n` to actual newlines)
27
+
28
+
### Example dataset row
29
+
```json
30
+
{
31
+
"responses_create_params": {
32
+
"input": [
33
+
{
34
+
"role": "user",
35
+
"content": "You are an expert competitive programmer. You will be given a problem statement and must output a complete Python solution that reads from stdin and writes to stdout.\n\nPolycarp has $n$ different binary words. A word called binary if it contains only characters '0' and '1'. For example, these words are binary: \"0001\", \"11\", \"0\" and \"0011100\".\n\nPolycarp wants to offer his set of $n$ binary words to play a game \"words\". In this game, players name words and each next word (starting from the second) must start with the last character of the previous word. The first word can be any. For example, these sequence of words can be named during the game: \"0101\", \"1\", \"10\", \"00\", \"00001\".\n\nWord reversal is the operation of reversing the order of the characters. For example, the word \"0111\" after the reversal becomes \"1110\", the word \"11010\" after the reversal becomes \"01011\".\n\nProbably, Polycarp has such a set of words that there is no way to put them in the order correspondent to the game rules. In this situation, he wants to reverse some words from his set so that: the final set of $n$ words still contains different words (i.e. all words are unique); there is a way to put all words of the final set of words in the order so that the final sequence of $n$ words is consistent with the game rules. \n\nPolycarp wants to reverse minimal number of words. Please, help him.\n\n\n-----Input-----\n\nThe first line of the input contains one integer $t$ ($1 \\le t \\le 10^4$) — the number of test cases in the input. Then $t$ test cases follow.\n\nThe first line of a test case contains one integer $n$ ($1 \\le n \\le 2\\cdot10^5$) — the number of words in the Polycarp's set. Next $n$ lines contain these words. All of $n$ words aren't empty and contains only characters '0' and '1'. The sum of word lengths doesn't exceed $4\\cdot10^6$. All words are different.\n\nGuaranteed, that the sum of $n$ for all test cases in the input doesn't exceed $2\\cdot10^5$. Also, guaranteed that the sum of word lengths for all test cases in the input doesn't exceed $4\\cdot10^6$.\n\n\n-----Output-----\n\nPrint answer for all of $t$ test cases in the order they appear.\n\nIf there is no answer for the test case, print -1. Otherwise, the first line of the output should contain $k$ ($0 \\le k \\le n$) — the minimal number of words in the set which should be reversed. The second line of the output should contain $k$ distinct integers — the indexes of the words in the set which should be reversed. Words are numerated from $1$ to $n$ in the order they appear. If $k=0$ you can skip this line (or you can print an empty line). If there are many answers you can print any of them.\n\n\n-----Example-----\nInput\n4\n4\n0001\n1000\n0011\n0111\n3\n010\n101\n0\n2\n00000\n00001\n4\n01\n001\n0001\n00001\n\nOutput\n1\n3 \n-1\n0\n\n2\n1 2"
0 commit comments