Use cases, pain points, and background
Today there’s no single, consistent place or name for “the data script” for each resource server.
Data scripts are generally located in each resource server, but they are named inconsistently. Most are at the root of the resource server; only reasoning_gym uses a scripts/ subfolder.
| Resource server |
Where the script lives |
Script name(s) |
| newton_bench |
Root (next to app.py) |
generate_dataset.py |
| reasoning_gym |
scripts/ |
scripts/create_dataset.py |
| ns_tools |
Root |
prepare_dataset.py |
| math_formal_lean |
Root |
prepare_nemotron_math_proofs.py, prepare_minif2f.py |
| multichallenge |
Root |
dataset_preprocess.py |
| arc_agi |
Root |
create_dataset.py |
Description:
Recommend to standardize data scripts in resources_servers/<name>/data_curation/
Use cases, pain points, and background
Today there’s no single, consistent place or name for “the data script” for each resource server.
Data scripts are generally located in each resource server, but they are named inconsistently. Most are at the root of the resource server; only reasoning_gym uses a scripts/ subfolder.
Description:
Recommend to standardize data scripts in
resources_servers/<name>/data_curation/