21 Apr 23:47

KennethEnevoldsen

ab30c6f

2.12.26 Latest

Latest

2.12.26 (2026-04-21)

Fix

fix: HF benchmark result (#4344)
init benchmark eval results
add get score to benchmark
update scoring
add method for benchmark card creation
fix typing (990c1cf)

Unknown

[MVEB] Add fps implementation to Video Sampling (#4441)
fix: Reclassify SIBFLEURS as AudioClassification instead of AudioMultilabelClassification
fix: Move SIBFLEURS descriptive stats to AudioClassification
refactor: add FPS-based video frame sampling to collator

FramesCollator and VideoCollator now support two modes:

FPS-based (default, fps=2.0): frame count scales with video duration,
with max_frames=128 as a safety cap for long videos
Fixed-sample (num_frames=N): always selects exactly N frames uniformly,
preserving the previous behavior for models that need it

Existing callers (PE-AV, random baseline) switched to num_frames to
preserve their current fixed-sample behavior.

refactor: switch PE-AV and random baseline to FPS-based frame sampling

Both models now use the default FPS-based mode (fps=2.0, max_frames=256)
instead of fixed num_frames. This gives duration-proportional frame
coverage across videos of different lengths.

fix: address PR review - defaults to None, use end_stream_seconds

Set fps and max_frames defaults to None so models can skip collator
resampling and let their own processors handle frame selection
Use video.metadata.end_stream_seconds for duration instead of
computing num_frames / average_fps (handles VFR videos correctly)
When both fps and num_frames are None, return all frames as-is
instead of raising an error
PE-AV and random baseline explicitly set fps=2.0 to avoid decoding
all frames unnecessarily

refactor: expose collator params in PE-AV init

Allow fps, max_frames, num_frames, and max_samples to be configured
via the PE-AV wrapper constructor instead of being hardcoded.
Defaults to fps=2.0 matching the standard video understanding rate.

fix: address PR review - rename max_frames to max_fps_frames, raise on conflicting args
fix: rename max_fps_frames back to max_frames, clarify docstrings
fix: pass fps=None, num_frames=16 to 16-frame PE-AV variants

The *-16-frame checkpoints were trained with fixed 16-frame uniform sampling
(processor config has do_sample_frames=true, num_frames=16). Without
explicit loader_kwargs, the collator used the default fps=2.0, producing
~40 frames on typical clips that the processor then re-sampled down to 16 —
a distribution shift from training. Setting num_frames=16 makes the
collator do the sampling directly, and the processor's built-in sample
becomes an identity no-op.

fix: clarify fps docstrings - downsamples only, no upsampling (9363ea7)
Don't display license links in the documentation (#4465)

Fixes #4461 (46582d9)

[MVEB] Adding UCF101 Task (Clustering) (#4454) (b8b3722)
leaderboard: add MTEB(spa, v1) to Language-specific section (#4217)

Add MTEB(spa, v1) to leaderboard language-specific menu

Co-authored-by: Clemente <clemente@Clementes-MacBook-Pro.local> (e5521a6)

Add VALOR-32K retrieval tasks (#4453)
Add VALOR-32K retrieval tasks (v2t, t2v, va2t, t2va)

Adds four bidirectional multimodal retrieval tasks for the VALOR-32K
dataset (mteb/VALOR-32K), a vision-audio-language benchmark with 3,491
test samples.

Made-with: Cursor

fix: correct BibTeX field order for VALOR-32K citation

Made-with: Cursor (792f61f)

Assets 6

20 Apr 11:38

KennethEnevoldsen

2.12.25

c99c7de

2.12.25

2.12.25 (2026-04-20)

Fix

fix: drop unused modality columns in dataloader for cross-modal tasks (#4440)
fix: handle None text/image in multimodal retrieval tasks

Cross-modal retrieval tasks (CIRRIT2IRetrieval, NIGHTSI2IRetrieval,
Fashion200kI2TRetrieval, VisualNewsI2TRetrieval) have corpus/query
items where text or image can be None for single-modality entries.

_corpus_to_dict: handle None text/title gracefully
_custom_collate_fn: allow None values in batches instead of raising
_combine_queries_with_instruction_text: skip string ops on None text
random_baseline: skip None items when encoding each modality

Closes #4436

fix: handle None query text in dataloader instead of downstream models

Normalize None text to "" in _combine_queries_with_instruction_text,
matching the existing pattern in _corpus_to_dict. Revert random_baseline
and collation changes as they're no longer needed.

fix: drop unused modality columns in dataloader to prevent None errors

Cross-modal retrieval tasks have None values for modalities not used by
that side of the retrieval (e.g. text=None in image-only corpus for it2i
tasks). Instead of adding None-guards throughout the collate function and
models, drop columns for modalities not needed for the current prompt
type in _prepare_dataset. The task category (e.g. it2i) already encodes
which modalities each side needs.

Closes #4436

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> (eaf2c9d)

Assets 6

20 Apr 08:53

KennethEnevoldsen

2.12.24

30bc62c

2.12.24

2.12.24 (2026-04-20)

Fix

fix: remove columns with none (#4446)
remove columns with none
Apply suggestions from code review

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> (63c0af6)

Unknown

[Mveb] vggsound classification (#4444)

[MVEB] Add VGGSound audio-visual classification tasks

Add VGGSoundVAClassification (video+audio, va2c) and VGGSoundVClassification (video-only, v2c) for the VGGSound audio-visual dataset (Chen et al., ICASSP 2020).
Dataset contains 9,888 test clips across 308 sound classes from YouTube videos. Audio is the primary signal in the original task; the v2c variant serves as a
video-only baseline. Uses 5-fold cross-validation since the released split only contains test. Follows the standard MVEB classification task structure. Addresses
part of #4130 (MVEB Overview - Classification).

Co-authored-by: Yashwanth Devavarapu <yashwanthdevavarapu@Yashwanths-MacBook-Pro.local> (bf113b4)

add mteb/Shot2Story20K dataset (#4449)

add mteb/Shot2Story20K_test dataset (b597f37)

Add YouCook2_val retrieval tasks (#4432)
Add YouCook2_val retrieval tasks (V2T, T2V, A2T, T2A)

Made-with: Cursor

Update mteb/tasks/retrieval/eng/youcook2_retrieval.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

add stats
update

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> (f0e5ab6)

Add VATEX retrieval tasks (#4433)
Add VATEX_test_1k retrieval tasks (V2T, T2V, A2T, T2A)

Made-with: Cursor

Update mteb/tasks/retrieval/eng/vatex_retrieval.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

Update mteb/tasks/retrieval/eng/vatex_retrieval.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

update

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> (f93dc75)

model: add BidirLM/BidirLM-Omni-2.5B-Embedding (#4370)
feat: add BidirLM/BidirLM-Omni-2.5B-Embedding model implementation
fix: address reviewer comments on BidirLM-Omni-2.5B-Embedding:

Load model via SentenceTransformer with trust_remote_code=True
Remove the _get_instruction() description fallback. Add explicit
prompts for 22 MIEB/MAEB tasks

refactor: update to sentence transformers 5.4 and rely on encode function to get embedding
Apply suggestions from code review

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

Feat: instruct with prompt args chat template
Fix: rely on EncodeKwargs for encoder function
Feat: improve readability
Refactor: Change how modality are passed to encode
Fix: lint error
Refactor: args encode
Refactor: Import from Bidir
comments update
Simplify get instruction (ne need for _lookup_prompt stripped)

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> (9e5c592)

Assets 6

19 Apr 13:23

KennethEnevoldsen

2.12.23

e6fbe38

2.12.23

2.12.23 (2026-04-19)

Fix

fix: query filtering for audio (#4430)

update query filtering (65025bb)

Unknown

[MVEB] Add SomethingSomethingV2 video classification task (#4434)
[MVEB] Add SomethingSomethingV2 video classification task
fix: correct bibtex authors for SomethingSomethingV2

Fix Peter Yiber -> Peter Yianilos
Fix Florian Bax -> Ingo Bax
Remove non-authors Manuel Gallo and Ahmed Mehri
Add missing authors: Moritz Mueller-Freitag, Florian Hoppe, Christian Thurau
Fix Materzynska -> Materzy{'n}ska (proper diacritics)

Co-authored-by: zach <zacharie@example.com> (f5775fc)

Assets 6

18 Apr 23:08

KennethEnevoldsen

2.12.22

761d074

2.12.22

2.12.22 (2026-04-18)

Fix

fix: KeyError on aggregated tasks with eval_langs (#4439)

Fix KeyError on aggregated tasks with dict eval_langs

When aggregated tasks (e.g. VisualSTS17Multilingual) have eval_langs
as a dict, hf_subsets_to_langscripts lacks a "default" key. The
aggregated score uses "default" as subset, causing a KeyError in
TaskResult.from_task_results. Fall back to collecting all languages
from the mapping when the subset key is missing.

Closes #4437

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> (5fc2867)

Unknown

Remove skip_first for jamaltartistit (#4435) (16ba72a)
Fix: apply skip_first_result when computing hit_rate metric (#4427)
Fix skip_first_result not applied to hit_rate metric
lint

Co-authored-by: Rakshitha Ireddi <rakshithaireddi@Rakshithas-MacBook-Pro.local>
Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> (8570b74)

[MVEB] Adding MUSIC-AVQA Task (Clustering) (#4426)
[MVEB] Adding MUSIC-AVQA Task (Clustering)
simplify description (04f2f4b)
[MVEB] Add Breakfast video classification task (#4431)
[MVEB] Add Breakfast video classification task\n\nAdd BreakfastClassification task for the Breakfast Actions dataset (Kuehne et al., CVPR 2014). The dataset contains 433 videos of 10 breakfast-related activities recorded in 18 kitchens. Uses 5-fold cross-validation since the dataset only has a test split.\n\nRandom baseline accuracy: 0.1247 (near-random for 10 classes).\n\nAddresses part of #4130 (MVEB Overview - Classification).
lint

Co-authored-by: Yashwanth Devavarapu <yashwanthdevavarapu@Yashwanths-MacBook-Pro.local>
Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> (98d682c)

Add ActivityNet_Captions_val2 video retrieval tasks (V2T and T2V) (#4429)
Add ActivityNet_Captions_val2 video retrieval tasks (V2T and T2V)

Made-with: Cursor

Fix all sort order for isort-style linting (RUF033)

Made-with: Cursor

Update mteb/tasks/retrieval/eng/activitynet_captions_t2v_retrieval.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

Consolidate ActivityNet Captions retrieval tasks into a single file

Made-with: Cursor

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> (7539339)

add mteb/DiDeMo dataset (#4425)
add mteb/MSVD dataset
add mteb/DiDeMo dataset
add comb
update
Consolidate DiDeMo retrieval tasks into a single file

Merge 4 separate DiDeMo task files into didemo_retrieval.py with a
shared _load_didemo helper, reducing duplication while preserving
all task names and metadata.

Made-with: Cursor

Remove unused Dataset import from didemo_retrieval

Made-with: Cursor (2e00e5b)

add mteb/TUNA-Bench_1K dataset (#4428)

Add TUNA-Bench_1K video retrieval tasks (V2T and T2V)

Made-with: Cursor (b14dcc5)

Update vllm_wrapper.py (#4418)

fix compatibility with newer vllm versions (5ef64c3)

Assets 6

18 Apr 10:49

KennethEnevoldsen

2.12.21

dfb5eb1

2.12.21

2.12.21 (2026-04-18)

Ci

ci: add workflow to auto-update leaderboard model list (#4402)

Adds a standalone script that generates the model list from scratch
and a CI workflow that pushes it to the HF leaderboard space weekly,
on model file changes, or via manual dispatch.

Closes #4316 (18e8e63)

Fix

fix: Add required_dependencies to model meta (#4356)
add required_dependencies to model meta
add extra group name
add to model to python
update handling dependencies
fix deps
fix test
remove usage of requires_package
remove image/audio dependencies
fixes after merge
add deprecated function
fix test
skip check for baseline
fix test
update lock
optionally check torchaudio in test (e2e7174)

Unknown

Remove video folder (#4424)

remove video folder (011bbf5)

add mteb/MSVD dataset (#4413) (6427ea5)
Update dataset cardv2 (#4420)
update dataset card
fix cardv2 (a91046e)
Update dataset card (#4419)

update dataset card (43d1b21)

tests: Add test to ensure coverage of reference models (#4216)
Reference models tests
Reference models tests
Reference models tests
fix: address PR review comments for reference model tests

Use cache.load_results() instead of manually walking cache directories
Dynamically compute target benchmarks from all leaderboard benchmarks
minus an exclusion list, so new benchmarks are automatically tested
Add text-only modality check for task-model compatibility
Filter retrieval-only models by task type AND text modalities

fix: use isinstance check for retrieval subtypes

Check isinstance(task, AbsTaskRetrieval) instead of string comparison
with task.metadata.type, so reranking and instruction retrieval tasks
are correctly included for retrieval-only models like bm25s.

fix: handle empty sim_scores in confidence_scores

Return zero confidence scores when sim_scores list is empty,
which can happen when BM25 returns no results for a query
in reranking tasks.

fix: address PR review comments for reference model tests

Remove RTEB variant exclusions to test all RTEB benchmarks
(per Kenneth's feedback to include the full RTEB set)

fix: use benchmark_selector.py as source of truth for leaderboard benchmarks

Address Kenneth's review comments:

Use GP_BENCHMARK_ENTRIES + R_BENCHMARK_ENTRIES from benchmark_selector.py
instead of display_on_leaderboard flag (which includes benchmarks not
actually shown on the leaderboard)
Clean up EXCLUDED_BENCHMARKS to only contain actual leaderboard benchmarks
(multimodal ones that text-only reference models can't run)
Remove RTEB variant exclusions to test the full RTEB set

fix: remove all benchmark exclusions, rely on task-level filtering

Task-level filtering (_is_text_only_task, RETRIEVAL_ONLY_MODELS) already
handles model-task compatibility. No need to exclude entire benchmarks —
non-text tasks within multimodal benchmarks are skipped automatically.

fix: use display_on_leaderboard flag now that PR #4288 is merged

Simplify _get_target_benchmarks to use display_on_leaderboard=True,
which now correctly reflects the actual leaderboard (fixed in #4288).
Remove benchmark_selector imports and exclusion list — task-level
filtering handles model-task compatibility.

fix: pass Benchmark objects directly instead of names

Address Samoed's review: use Benchmark objects in parametrize
instead of looking up by name twice.

speedup test
fix issue with aggregate
fix: address review - reuse _check_model_modalities, trim workflow triggers
fix: restore TARGET_BENCHMARKS definition, remove stale _get_target_benchmarks call
fix: inline modality check to avoid private import, filter image-only tasks
fix: use strict modality subset check to exclude image/multimodal tasks
fix: restore RETRIEVAL_ONLY_MODELS for BM25 task filtering
fix: add mteb/benchmarks/** to workflow triggers

Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> (4a90b28)

[MVEB] Adding WorldSense1Min Task (Clustering) (#4393)
[MVEB] Adding WorldSense1Min Task (Clustering)
remove local test
Update mteb/tasks/init.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

removing stats
moving video clustering tasks to clustering
uncomment Video task
add results
update license
remove results

Co-authored-by: wissam-KH <wissam.siblini@komodohealth.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> (f46cb7b)

[MVEB] Adding AVE-Dataset Task (Clustering) (#4416)
[MVEB] Adding AVE-Dataset Task (Clustering)
uncomment video clustering task
remove results (61e7f3f)
tests: add regression test for double loading (#4407)

add regression test (e946e1e)

add HMDB51 dataset (#4398)
add HMDB51 dataset
update
Update mteb/abstasks/task_metadata.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

Update mteb/tasks/classification/eng/hmdb51_classification.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

fix lint

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> (22bc680)

Assets 6

16 Apr 14:46

KennethEnevoldsen

2.12.20

d340384

2.12.20

2.12.20 (2026-04-16)

Fix

fix: handle Transformers v5 BaseModelOutputWithPooling return types i… (#4328)
fix: handle Transformers v5 BaseModelOutputWithPooling return types

Transformers v5 changed get_text_features, get_image_features, and
get_audio_features to return BaseModelOutputWithPooling instead of
plain tensors. This caused AttributeError when tensor operations
like .norm() were applied directly to the output.

Added isinstance(output, BaseModelOutputWithPooling) checks to
extract pooler_output when needed, maintaining backward compatibility
with Transformers v4 tensor returns.

Affected model wrappers:

clap_models.py: text path (audio path already handled)
align_models.py: text and image paths
wav2clip_model.py: text path (CLIP encoder)
llm2clip_models.py: text and image paths
siglip_models.py: text and image paths (previously accessed
.pooler_output directly without fallback)

Closes #4081

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

lint

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> (367d554)

fix: double retrieval dataset loading (#4399)

fix retrieval dataset loading (316fca3)

Assets 6

16 Apr 12:26

KennethEnevoldsen

2.12.19

ebb85cf

2.12.19

2.12.19 (2026-04-16)

Documentation

docs: Update adding dataset checklist (#4394)
docs: Update adding dataset checklist

fix the checklist to make it less text-specific

docs: Add score reproduction to PR reqirenments (#4396)

add score reproduction to description

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> (dc58c76)

Fix

fix: Auto add base model to ModelMeta (#4395)
fetch source model from hub
fix tests
check if model has model card attr (ed1833a)

Unknown

model: Add Google Gemini embedding 2 (#4247)
Adding Google Gemini embedding 2 model
feat: add per-task prompt mapping and multimodal support for Gemini Embedding 2

Create GEMINI_EMBEDDING_2_PROMPTS dict with 132 per-task Google API
task type mappings from issue #4260
Add GoogleGeminiEmbeddingModel class using google-genai SDK with
support for text, image, and interleaved text+image inputs
Update ModelMeta to use new class, set modalities=["image", "text"]
Add google_genai optional dependency to pyproject.toml

feat: add audio modality support for Gemini Embedding 2

Add _audio_to_wav_bytes helper to convert numpy audio arrays to WAV
Handle audio inputs in encode() via Part.from_bytes with audio/wav MIME
Update modalities to ["audio", "image", "text"]

fix: strip google/ prefix from model name for Gemini API

The google-genai SDK's embed_content doesn't handle the "google/"
prefix format. Strip it in the constructor like Voyage does.

fix: add exponential backoff retry for 429 rate limits

Retry up to 10 times with exponential backoff (60s, 120s, 240s...
up to 600s) when hitting API quota limits. Essential for large
multilingual benchmarks like MIRACL.

refactor: address PR review comments

Replace 132-entry per-task dict with task-type defaults + 62
per-task overrides (KennethEnevoldsen: use metadata)
Add embed_dim parameter to GoogleGeminiEmbeddingModel (Samoed)
Add title formatting for retrieval corpus docs (Samoed)
Add batch size comment referencing API limits (Samoed)
Simplify encode() control flow

fix: replace print with logger.warning for lint compliance
fix: handle audio+text interleaved input and note MRL embed_dim support

Add audio+text branch in encode() for interleaved content
Note embed_dim supports [768, 1536, 3072] once PR #4170 is merged

fix: use MRL embed_dim list and remove duplicate logger
fix unused param

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> (633e41c)

Move Kinetics400 out of video and add zeroshot version (#4383)
distinguish between AV and V tasks
move out of video folder and add zeroshot version
fix task type
update task metadata based on discussions
fix mveb task type mapping
fix: Add is_beta to task metadata (#4392)
fix: Add is_beta to task metadata

Added *is_beta to the task metadata
Added a warning on initializing a dataset when is_beta is True
Added exclude_beta to get_tasks and filter_tasks, for now I set it to False

todo:

add tests

add test and updates metadata
format
re-enable tests for beta datasets
format
feat: comment out MVEB task types without existing tasks

VideoClustering, VideoPairClassification, and VideoCentricQA are defined
in task_metadata but have no corresponding task implementations yet,
causing create_available_tasks.py to fail. Comment them out until tasks
are added. Also regenerate available_tasks docs and add qwen_omni_utils
optional dependency.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: allow beta-only task types in create_available_tasks.py

Change assertion to <= so task types that only have beta tasks don't
break the docs generation. Use .get() with continue to skip task types
with no non-beta tasks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: skip video tasks missing descriptive_stats in metadata test

Skip Kinetics400 video tasks in test_all_metadata_is_filled_and_valid
until descriptive stats are added. Regenerate available_tasks docs.

revert: restore docs/overview/available_tasks to main
revert: remove all generated available_tasks changes from branch

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> (6ec3f40)

Add nicher92/saga-embed_v1 to MTEB models (#4371)
Add nicher92/saga-embed_v1 to MTEB models
Update training_datasets in ModelMeta
fix: fixed naming
Replace custom SagaModel class with standard SentenceTransformerEncoderWrapper and model_prompts dict
chore: remove lingering comment
Update mteb/models/model_implementations/saga_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

update meta
change parameters and memory usage

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> (d7c521c)

Assets 6

16 Apr 09:54

KennethEnevoldsen

2.12.18

5af2aa0

2.12.18

2.12.18 (2026-04-16)

Fix

fix: Handle quantization in sentence transformers as an experiment (#4367)
handle sbert quants
move inside model encode
update type of prompt type
fix tests (e45cbaa)

Assets 6

16 Apr 08:44

KennethEnevoldsen

2.12.17

8c11692

2.12.17

2.12.17 (2026-04-16)

Fix

fix: Corrected incorrect model rename (#4391)

This gave the following incorrect warning:

DeprecationWarning: The model &#39;mteb/baseline-random-encoder&#39; has been renamed to &#39;mteb/baseline-random-encoder&#39;. To prevent this warning use the new name.
  model = mteb.get_model_meta(&#34;mteb/baseline-random-encoder&#34;) ([`b65730d`](https://github.com/embeddings-benchmark/mteb/commit/b65730d833b3e321be759b243f62090066faad45))

## Unknown

* model: add BidirLM text embedding family (270M, 0.6B, 1B, 1.7B) (#4374)

* model: add BidirLM text embedding family (270M, 0.6B, 1B, 1.7B)

* Apply suggestions from code review

Co-authored-by: Roman Solomatin &lt;samoed.roman@gmail.com&gt;

* run lint

---------

Co-authored-by: Isaac Chung &lt;chungisaac1217@gmail.com&gt;
Co-authored-by: Roman Solomatin &lt;samoed.roman@gmail.com&gt; ([`e8a4069`](https://github.com/embeddings-benchmark/mteb/commit/e8a40693b00005cb0362bd6c5798d61287a196f3))

* [MVEB] PE-AV Model, Kinetics400 Dataset, RavdessAV Dataset (#4199)

* fix: Reclassify SIBFLEURS as AudioClassification instead of AudioMultilabelClassification

* fix: Move SIBFLEURS descriptive stats to AudioClassification

* Adding video modality

* Add Kinetics-400 dataset

* Add pe_av model

* fix typo

* fix collator bug

* Edit selecting column in classification abstask

* Properly handle frames in PE_AV

* add self kwarg to method

* Add audio collator

* fix type error

* fix audio_video embeds object handling

* Add Ravdess_av clustering

* fix task metadata

* start video integration

* start video integration

* upd task structure

* upd video input type

* combine video and audio to dict

* fix task side

* fix pe_av model

* lower writer batch size

* fix col labels

* lint

* add pe_av model metadata

* fix datasets metadata

* remove accidently commited files

* remove nested list structure from datasets

* edit collator to handle one video item

* multimodal collator + fix comments

* lint

* metadata update

* using forward pass to get embeds

* replace forward pass + add audio to msrvtt

* fix category metadata

* edit get embeddings

* add n_embedding_parameters

* change input col name to list

* lint + type check

* add classvar

* add str to classvar

* Change list to sequence

* lint + type check error

* edit dataloader and msrvtt handling of input column

* move seqeuence out of type checking

* fix random baseline

* add collator to random baseline

* restore previous dict structure + make audio optional

* clean structure

* lint

* safety check

* decrease writer batch size

* match msrvtt format

* type check fix

* refactor: keep video and audio as separate dataset columns

* fix: handle single-string input_column correctly in _prepare_dataset

* review fixes

* lint

* type hins fix

* address review: simplify input_column_name, remove VideoInputItem, fix collator output

- Revert input_column_name from Mapping[str, str] to str | Sequence[str]
- Remove VideoInputItem wrapper, pass frames tensor directly
- Make VideoCollator return BatchedInput (consistent with AudioCollator)
- MultimodalCollator uses static methods instead of chaining collators

* fix: update clustering_evaluator to use Sequence instead of Mapping

* fix: handle Sequence input_column_name in second create_dataloader call

* fix: skip statistics and text cleaning for multi-column video tasks

* fix: pass explicit None for TypedDict fields in multi-column statistics

* address Kenneth review: rename collators, update docs, simplify annotations

- Rename VideoCollator -&gt; FramesCollator, MultimodalCollator -&gt; VideoCollator
- Update VideoInput docstring to clarify frames-only, audio in AudioInput
- Update input_column_name docs in classification/clustering base classes
- Use ClassVar[Sequence[str]] for video task input_column_name
- Extract isinstance check to top of zeroshot evaluator __call__
- Improve task_pipelines.py skip comment for multi-column tasks
- Add TODO for MSR-VTT dataset reupload

* docs: link to encoder I/O types for default column names in input_column_name

* fix: raise NotImplementedError for multi-column task cleaning

* refactor: use tuples for input_column_name to avoid ClassVar

* refactor: move Sequence handling into create_dataloader, simplify callers

---------

Co-authored-by: Roman Solomatin &lt;36135455+Samoed@users.noreply.github.com&gt; ([`5d3c845`](https://github.com/embeddings-benchmark/mteb/commit/5d3c8453db615a1a4ae4d53033711af21c1d502e))

* dataset: add BrowseComp-Plus (#4226)

* dataset: Add BrowseComp-Plus

* fix linting errors

* fixing bibtext formatting

* Split BrowseCompPlusRetrieval into gold_only and gold_and_evidence subsets

* fix: remove qa as a valid tag for metadata files

* simplify data loading by reuploading the data

---------

Co-authored-by: Kenneth &lt;kennethenevoldsen@gmail.com&gt; ([`e722b76`](https://github.com/embeddings-benchmark/mteb/commit/e722b7640ed1abee68c3df5023a186b36a15325f))

Assets 6

Releases: embeddings-benchmark/mteb

2.12.26

2.12.26 (2026-04-21)

Fix

Unknown

Uh oh!

2.12.25

2.12.25 (2026-04-20)

Fix

Uh oh!

2.12.24

2.12.24 (2026-04-20)

Fix

Unknown

Uh oh!

2.12.23

2.12.23 (2026-04-19)

Fix

Unknown

Uh oh!

2.12.22

2.12.22 (2026-04-18)

Fix

Unknown

Uh oh!

2.12.21

2.12.21 (2026-04-18)

Ci

Fix

Unknown

Uh oh!

2.12.20

2.12.20 (2026-04-16)

Fix

Uh oh!

2.12.19

2.12.19 (2026-04-16)

Documentation

Fix

Unknown

Uh oh!

2.12.18

2.12.18 (2026-04-16)

Fix

Uh oh!

2.12.17

2.12.17 (2026-04-16)

Fix

Uh oh!