Look into integrating `mteb` with huggingfaces community evals

Currently it seems to be mostly relevant for generative models ([blog](https://huggingface.co/blog/community-evals)), but talked with @tomaarsen and it sounds like we could do something similar but where mteb replaces inspect AI as the runner.

<img width="1108" height="573" alt="Image" src="https://github.com/user-attachments/assets/454bac27-c5c5-44e5-8a31-67f470f3b070" />

---

I would say that we need the following:

- 1) Include the required metadata in the dataset repo (eq. to the [eval.yaml](https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro/blob/main/eval.yaml)). I suppose we need the metadata as well as any additional settings such as `input_column_name`, `label_column_name` etc. - we could of course also allow a script itself as the config. We could even allow a "mteb version" tag for debugging.
- 2) We need to make this format loadable into MTEB
- 3) We need to a way for the community to push results

(the leaderboard in this case will only be a task-specific leaderboard, not for a full benchmark)

fixing 1 and 2 would also allow us to convert a lot of our code into metadata files that can simply be loaded in

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Look into integrating `mteb` with huggingfaces community evals #4055

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Look into integrating mteb with huggingfaces community evals #4055

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Look into integrating `mteb` with huggingfaces community evals #4055