This document provides an end-to-end example of building an Agent reinforcement learning pipeline using the ChatLearn, PyTorch FSDP, SGLang, and LangGraph frameworks.
ChatLearn currently supports building custom Agent workflows using SGLang + LangGraph for end-to-end agentic reinforcement learning training.
- Docker Image Preparation We recommend running this example in PAI DSW or DLC. You need to specify the following image address to launch the instance:
dsw-registry.cn-shanghai.cr.aliyuncs.com/pai-training-algorithm/chatlearn:torch2.8.0-sglang0.5.2-ubuntu24.04-cuda12.6-py312You can use the VPC address to accelerate image pulling speed, which requires modifying the image address according to your current region. For example, to launch a DSW instance in Shanghai, you can use the following image:
dsw-registry-vpc.cn-shanghai.cr.aliyuncs.com/pai-training-algorithm/chatlearn:torch2.8.0-sglang0.5.2-ubuntu24.04-cuda12.6-py312.
- Code Preparation
git clone https://github.com/alibaba/ChatLearn.git && cd ChatLearnUsing the MATH-lighteval dataset as an example.
# Download the dataset
mkdir -p dataset
modelscope download --dataset AI-ModelScope/MATH-lighteval --local_dir dataset/MATH-lighteval
# Preprocess the dataset
python chatlearn/data/data_preprocess/math_lighteval_agent.py --input_dir dataset/MATH-lighteval --local_dir dataset/MATH-lightevalWhen performing agentic training, you need to configure the following parameters:
runtime_args.task_type=agent # Task type
runtime_args.rollout_backend=sglang # Rollout backend engine; agent tasks only support sglang
runtime_args.use_rollout_manager=True # Use rollout manager to manage async requests
models.policy.is_sync_mode=False # Rollout backend engine uses async mode (only supported by sglang)Run the following command on an 8-GPU machine:
# Download model weights
modelscope download --model Qwen/Qwen3-8B --local_dir pretrained_models/Qwen3-8B
bash scripts/fsdp_sglang/train_fsdp_sglang_qwen3_8b_grpo_agentic_math_task.shChatLearn uses LangGraph to build Agent workflows. To construct a custom Agent training process, you need to implement the following three components:
- Custom Dataset
Refer to the example file math_lighteval_agent to prepare your dataset. If your task dataset depends on additional information, you can add it in prompt_dataset, which will be passed through ChatLearn's execution graph.
Compared to Chat task datasets, you need to include two additional fields in the dictionary: agent_name and agent_cfg_path.
agent_name: Used to construct a custom AgentGraph and route corresponding data to the appropriate graph during Rollout sampling.agent_cfg_path: Path to the AgentGraph configuration file, used to configure any custom parameters for the AgentGraph. During training, you can access any parameter in this YAML file viacfg.xxx.
- Custom AgentGraph
Refer to the example file math_eval_agent_graph to build a custom AgentGraph. You need to implement the following core components:
- Registration
Register the agent_name in _graph_registry. The agent graph will later be constructed and routed using this agent_name.
@register("agent_name")
- Implementation of the
build_graphfunction
Implement a custom Agent workflow based on LangGraph. Refer to this link to understand LangGraph concepts.
- Implementation of the
runexecution function
This is an async function and serves as the specific entry point for executing the AgentGraph. Information passed through prompt_dataset for each data sample will be retained in kwargs and can be accessed as needed.
- Agent Configuration YAML File (Optional)
Refer to math_eval.yaml to configure any custom attributes for the AgentGraph in your experiment. The content of the YAML file is stored in the AgentGraph.cfg attribute.