model: Add tomoro-colqwen3-embed embedding models by 01234568 · Pull Request #3627 · embeddings-benchmark/mteb

01234568 · 2025-11-26T10:22:03Z

Add inference code and requirements for tomoro-colqwen3-embed 8B and 4B models.

They are based on merged Qwen3-VL-Instruct and Qwen3-Embedding checkpoints finetuned using the ColQwen method. They produce 320-dimension embeddings per text/image token. Finetuning data is a subset of nvidia/llama-nemoretriever-colembed-3b-v1 training data.

Model checkpoints available at:
https://huggingface.co/TomoroAI/tomoro-colqwen3-embed-4b
https://huggingface.co/TomoroAI/tomoro-colqwen3-embed-8b

Checklist:

I have filled out the ModelMeta object to the extent possible
I have ensured that my model can be loaded using
- mteb.get_model(model_name, revision) and
- mteb.get_model_meta(model_name, revision)
I have tested the implementation works on a representative set of tasks.
The model is public, i.e. is available either as an API or the weight are publicly available to download

…milarity scoring - Changed default dtype from float16 to bfloat16 for improved performance. - Added max_num_visual_tokens parameter to AutoProcessor initialization. - Refined embedding extraction logic to avoid boolean casting issues. - Introduced support for score_multi_vector in similarity computation. - Added new model metadata for colqwen3_4b with relevant attributes.

…vision

mteb/models/model_implementations/colqwen_models.py

Samoed

Great work!

01234568 · 2025-11-27T12:22:49Z

Updated the huggingface revision with updated processor supporting fused embeddings. It was tested locally.

KennethEnevoldsen

Great addition!

A few minor things on the metadata, otherwise do see @Samoed's comment on get_fused_embeddings

mteb/models/model_implementations/colqwen_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

01234568 · 2025-12-01T13:22:17Z

Any remaining items from my side?

Samoed · 2025-12-01T13:56:23Z

I don't think so

tankm and others added 6 commits November 26, 2025 01:42

feat(colqwen3): add wrapper and model metadata

27d0ca6

Merge branch 'embeddings-benchmark:main' into feat/tomoro-colqwen

93d3155

fix(colqwen): require transformers>=4.57 and refresh metadata, set re…

fbd961b

…vision

refactor(colqwen): reorder wrappers and metadata definitions for clarity

d05d89f

chore(colqwen): set release date for tomoro colqwen3 8b

37e6acd

Samoed reviewed Nov 26, 2025

View reviewed changes

mteb/models/model_implementations/colqwen_models.py Outdated Show resolved Hide resolved

mteb/models/model_implementations/colqwen_models.py Outdated Show resolved Hide resolved

Samoed reviewed Nov 26, 2025

View reviewed changes

mteb/models/model_implementations/colqwen_models.py Outdated Show resolved Hide resolved

mteb/models/model_implementations/colqwen_models.py Outdated Show resolved Hide resolved

chore(colqwen): remove unused methods and fix lint errors

2c51fa0

hxssgaa mentioned this pull request Nov 26, 2025

Add tomoro-colqwen3 results for Vidore v1-v3 embeddings-benchmark/results#332

Merged

5 tasks

Samoed added the new model Questions related to adding a new model to the benchmark label Nov 26, 2025

feat(colqwen3): add fused image-text encoding path

dd6da48

Samoed reviewed Nov 26, 2025

View reviewed changes

mteb/models/model_implementations/colqwen_models.py Outdated Show resolved Hide resolved

tankm and others added 3 commits November 27, 2025 19:31

refactor(colqwen): unify encode method with get_fused_embeddings

19ed0fa

Merge branch 'embeddings-benchmark:main' into feat/tomoro-colqwen

4abbfd8

chore(colqwen): update encoding progress message

411056f

Samoed approved these changes Nov 27, 2025

View reviewed changes

Samoed requested a review from KennethEnevoldsen November 27, 2025 12:02

chore(colqwen): update model revisions for colqwen models

3f0a246

KennethEnevoldsen reviewed Nov 28, 2025

View reviewed changes

mteb/models/model_implementations/colqwen_models.py Outdated Show resolved Hide resolved

Samoed reviewed Nov 29, 2025

View reviewed changes

mteb/models/model_implementations/colqwen_models.py Outdated Show resolved Hide resolved

01234568 and others added 2 commits November 30, 2025 09:50

docs(colqwen): update train data annotation

1cae25b

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

Merge branch 'embeddings-benchmark:main' into feat/tomoro-colqwen

21a3f4b

Samoed requested a review from KennethEnevoldsen December 1, 2025 13:55

KennethEnevoldsen approved these changes Dec 3, 2025

View reviewed changes

KennethEnevoldsen changed the title ~~Model: Add tomoro-colqwen3-embed embedding models~~ model: Add tomoro-colqwen3-embed embedding models Dec 3, 2025

KennethEnevoldsen merged commit 71ac96c into embeddings-benchmark:main Dec 3, 2025
9 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model: Add tomoro-colqwen3-embed embedding models#3627

model: Add tomoro-colqwen3-embed embedding models#3627
KennethEnevoldsen merged 14 commits intoembeddings-benchmark:mainfrom
01234568:feat/tomoro-colqwen

01234568 commented Nov 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Samoed left a comment

Uh oh!

01234568 commented Nov 27, 2025

Uh oh!

KennethEnevoldsen left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

01234568 commented Dec 1, 2025

Uh oh!

Samoed commented Dec 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

Conversation

01234568 commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Samoed left a comment

Choose a reason for hiding this comment

Uh oh!

01234568 commented Nov 27, 2025

Uh oh!

KennethEnevoldsen left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

01234568 commented Dec 1, 2025

Uh oh!

Samoed commented Dec 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

01234568 commented Nov 26, 2025 •

edited

Loading

KennethEnevoldsen left a comment •

edited

Loading