Skip to content

Issue with llm judge tokenizer #91

@pipiPdesu

Description

@pipiPdesu

In roll/pipeline/rlvr/rewards/llm_judge_reward_worker.py, the reward worker receive tokenizer from

self.tokenizer = default_tokenizer_provider(model_args=self.worker_config.model_args)

This tokenizer need to decode the actor_infer's rollout
response_text_list = self.tokenizer.batch_decode(data.batch["responses"], skip_special_tokens=True)

and encode the llm judge query
tokenized = self.tokenizer(text, return_tensors="pt")

It would be fine if the llm judge model and trained model use the same tokenizer, however, if they differ, there may not be an appropriate tokenizer available to decode the response correctly. To address this, it is reasonable to add a new argument to the reward that allows specifying llm judge tokenizer and trained model tokenizer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions