Skip to content

model: Add tomoro-colqwen3-embed embedding models#3627

Merged
KennethEnevoldsen merged 14 commits intoembeddings-benchmark:mainfrom
01234568:feat/tomoro-colqwen
Dec 3, 2025
Merged

model: Add tomoro-colqwen3-embed embedding models#3627
KennethEnevoldsen merged 14 commits intoembeddings-benchmark:mainfrom
01234568:feat/tomoro-colqwen

Conversation

@01234568
Copy link
Contributor

@01234568 01234568 commented Nov 26, 2025

Add inference code and requirements for tomoro-colqwen3-embed 8B and 4B models.

They are based on merged Qwen3-VL-Instruct and Qwen3-Embedding checkpoints finetuned using the ColQwen method. They produce 320-dimension embeddings per text/image token. Finetuning data is a subset of nvidia/llama-nemoretriever-colembed-3b-v1 training data.

Model checkpoints available at:
https://huggingface.co/TomoroAI/tomoro-colqwen3-embed-4b
https://huggingface.co/TomoroAI/tomoro-colqwen3-embed-8b

Checklist:

  • I have filled out the ModelMeta object to the extent possible
  • I have ensured that my model can be loaded using
    • mteb.get_model(model_name, revision) and
    • mteb.get_model_meta(model_name, revision)
  • I have tested the implementation works on a representative set of tasks.
  • The model is public, i.e. is available either as an API or the weight are publicly available to download

tankm and others added 6 commits November 26, 2025 01:42
…milarity scoring

- Changed default dtype from float16 to bfloat16 for improved performance.
- Added max_num_visual_tokens parameter to AutoProcessor initialization.
- Refined embedding extraction logic to avoid boolean casting issues.
- Introduced support for score_multi_vector in similarity computation.
- Added new model metadata for colqwen3_4b with relevant attributes.
@Samoed Samoed added the new model Questions related to adding a new model to the benchmark label Nov 26, 2025
Copy link
Member

@Samoed Samoed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

@01234568
Copy link
Contributor Author

Updated the huggingface revision with updated processor supporting fused embeddings. It was tested locally.

Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great addition!

A few minor things on the metadata, otherwise do see @Samoed's comment on get_fused_embeddings

@01234568
Copy link
Contributor Author

01234568 commented Dec 1, 2025

Any remaining items from my side?

@Samoed
Copy link
Member

Samoed commented Dec 1, 2025

I don't think so

@KennethEnevoldsen KennethEnevoldsen changed the title Model: Add tomoro-colqwen3-embed embedding models model: Add tomoro-colqwen3-embed embedding models Dec 3, 2025
@KennethEnevoldsen KennethEnevoldsen merged commit 71ac96c into embeddings-benchmark:main Dec 3, 2025
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new model Questions related to adding a new model to the benchmark

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments