Hi,
I am trying to use the Creation MM benchmark to evaluate the Qwen-2.5 7B VLM. Since I don’t have access to the OpenAI API key, I’ve modified the code to use the Hyperbolic API and Meta LLaMA as the judge model.
For initial testing, I ran the workflow on only 5 data items from the dataset. As shown in the attached image, I am successfully retrieving results for the objective judge, but I encounter an error for the subjective judge. Could you please explain the difference between these two judges and why I might be seeing this error?
I also ran the following standalone script to verify access to the judge model, and that script works as expected.
from vlmeval.api import OpenAIWrapper
model = OpenAIWrapper('meta-llama/Llama-3.3-70B-Instruct', verbose=True, use_hyperbolic=True)
msgs = [dict(type='text', value='Hello!')]
code, answer, resp = model.generate_inner(msgs)
print(code, answer, resp)
(the code is slighly modified to accommodate hyperbolic api)
Let me know if you would require more context.
Hi,
I am trying to use the Creation MM benchmark to evaluate the Qwen-2.5 7B VLM. Since I don’t have access to the OpenAI API key, I’ve modified the code to use the Hyperbolic API and Meta LLaMA as the judge model.
For initial testing, I ran the workflow on only 5 data items from the dataset. As shown in the attached image, I am successfully retrieving results for the objective judge, but I encounter an error for the subjective judge. Could you please explain the difference between these two judges and why I might be seeing this error?
I also ran the following standalone script to verify access to the judge model, and that script works as expected.
(the code is slighly modified to accommodate hyperbolic api)
Let me know if you would require more context.