-
Notifications
You must be signed in to change notification settings - Fork 220
Closed
Description
In roll/pipeline/rlvr/rewards/llm_judge_reward_worker.py, the reward worker receive tokenizer from
| self.tokenizer = default_tokenizer_provider(model_args=self.worker_config.model_args) |
This tokenizer need to decode the actor_infer's rollout
| response_text_list = self.tokenizer.batch_decode(data.batch["responses"], skip_special_tokens=True) |
and encode the llm judge query
| tokenized = self.tokenizer(text, return_tensors="pt") |
It would be fine if the llm judge model and trained model use the same tokenizer, however, if they differ, there may not be an appropriate tokenizer available to decode the response correctly. To address this, it is reasonable to add a new argument to the reward that allows specifying llm judge tokenizer and trained model tokenizer.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels