Issue with llm judge tokenizer

In `roll/pipeline/rlvr/rewards/llm_judge_reward_worker.py`, the reward worker receive tokenizer from https://github.com/alibaba/ROLL/blob/2b807c700f1f6025eb5eee3b309a17b75074af9c/roll/pipeline/rlvr/rewards/llm_judge_reward_worker.py#L53
This tokenizer need to decode the actor_infer's rollout 
https://github.com/alibaba/ROLL/blob/2b807c700f1f6025eb5eee3b309a17b75074af9c/roll/pipeline/rlvr/rewards/llm_judge_reward_worker.py#L222
and encode the llm judge query
https://github.com/alibaba/ROLL/blob/2b807c700f1f6025eb5eee3b309a17b75074af9c/roll/pipeline/rlvr/rewards/llm_judge_reward_worker.py#L105

It would be fine if the llm judge model and trained model use the same tokenizer, however, if they differ, there may not be an appropriate tokenizer available to decode the response correctly. To address this, it is reasonable to add a new argument to the reward that allows specifying llm judge tokenizer and trained model tokenizer.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with llm judge tokenizer #91

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue with llm judge tokenizer #91

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions