-
Notifications
You must be signed in to change notification settings - Fork 559
Open
Labels
needs discussionThis issue is still being discussed. It would be pre-mature to implement it.This issue is still being discussed. It would be pre-mature to implement it.
Description
Currently it seems to be mostly relevant for generative models (blog), but talked with @tomaarsen and it sounds like we could do something similar but where mteb replaces inspect AI as the runner.
I would say that we need the following:
-
- Include the required metadata in the dataset repo (eq. to the eval.yaml). I suppose we need the metadata as well as any additional settings such as
input_column_name,label_column_nameetc. - we could of course also allow a script itself as the config. We could even allow a "mteb version" tag for debugging.
- Include the required metadata in the dataset repo (eq. to the eval.yaml). I suppose we need the metadata as well as any additional settings such as
-
- We need to make this format loadable into MTEB
-
- We need to a way for the community to push results
(the leaderboard in this case will only be a task-specific leaderboard, not for a full benchmark)
fixing 1 and 2 would also allow us to convert a lot of our code into metadata files that can simply be loaded in
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
needs discussionThis issue is still being discussed. It would be pre-mature to implement it.This issue is still being discussed. It would be pre-mature to implement it.