Skip to content

model: Add eager-embed embedding model#3602

Merged
Samoed merged 10 commits intoembeddings-benchmark:mainfrom
eagerworks:feat/eagerembed
Nov 26, 2025
Merged

model: Add eager-embed embedding model#3602
Samoed merged 10 commits intoembeddings-benchmark:mainfrom
eagerworks:feat/eagerembed

Conversation

@jpbalarini
Copy link
Contributor

@jpbalarini jpbalarini commented Nov 22, 2025

Add inference code for eager-embed embedding model.
eager-embed-v1 is a multimodal dense embedding model with a 2560 embed dimension based on Qwen3-VL and finetuned on multiple public datasets.

More info here:
https://huggingface.co/eagerworks/eager-embed-v1
https://github.com/eagerworks/eager-embed

Checklist:

  • I have filled out the ModelMeta object to the extent possible
  • I have ensured that my model can be loaded using
    • mteb.get_model(model_name, revision) and
    • mteb.get_model_meta(model_name, revision)
  • I have tested the implementation works on a representative set of tasks.
  • The model is public, i.e. is available either as an API or the weight are publicly available to download

Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments - otherwise the submission looks good

Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments otherwise this looks good

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also probably it would be better to integrate your model with sentence transformers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean loading the model from sentence transformers instead of from transformers? What do I need to change?

Copy link
Member

@Samoed Samoed Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This can be complicated. You can see how this was done for other models as example

  1. mmE5 pr 1 pr 2
  2. Jasper https://huggingface.co/NovaSearch/jasper_en_vision_language_v1/tree/main
  3. gme-Qwen2 https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-2B-Instruct/discussions/9

@jpbalarini
Copy link
Contributor Author

@KennethEnevoldsen @Samoed Thanks for your comments, the code is much cleaner now. Implemented most of them and left some questions. Thanks!

@Samoed Samoed added the new model Questions related to adding a new model to the benchmark label Nov 24, 2025
Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good to merge - @Samoed do you have any remaining issues?

@Samoed
Copy link
Member

Samoed commented Nov 25, 2025

@jpbalarini Did you try to encode images and texts together without separation on image/text modalities

@jpbalarini
Copy link
Contributor Author

jpbalarini commented Nov 25, 2025

@Samoed I did but I was having some breaking changes with the batches when running some tasks (specifically Vidore2ESGReportsHLRetrieval). I rolled back the changes just to check if I had the same issues with the above changes, and it's the same:

Traceback (most recent call last):
  File "/mnt/data/QWEN_EMBEDDINGS/eager-embed-v1/evaluate_mteb.py", line 73, in <module>
    evaluate_mteb_with_custom_model()
  File "/mnt/data/QWEN_EMBEDDINGS/eager-embed-v1/evaluate_mteb.py", line 54, in evaluate_mteb_with_custom_model
    results = mteb.evaluate(model=model, tasks=tasks, encode_kwargs={"batch_size": 8})
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/QWEN_EMBEDDINGS/eager-embed-v1/.venv/lib/python3.12/site-packages/mteb/evaluate.py", line 377, in evaluate
    _res = evaluate(
           ^^^^^^^^^
  File "/mnt/data/QWEN_EMBEDDINGS/eager-embed-v1/.venv/lib/python3.12/site-packages/mteb/evaluate.py", line 473, in evaluate
    result = _evaluate_task(
             ^^^^^^^^^^^^^^^
  File "/mnt/data/QWEN_EMBEDDINGS/eager-embed-v1/.venv/lib/python3.12/site-packages/mteb/evaluate.py", line 168, in _evaluate_task
    task_results[split] = task.evaluate(
                          ^^^^^^^^^^^^^^
  File "/mnt/data/QWEN_EMBEDDINGS/eager-embed-v1/.venv/lib/python3.12/site-packages/mteb/abstasks/retrieval.py", line 310, in evaluate
    return super().evaluate(
           ^^^^^^^^^^^^^^^^^
  File "/mnt/data/QWEN_EMBEDDINGS/eager-embed-v1/.venv/lib/python3.12/site-packages/mteb/abstasks/abstask.py", line 183, in evaluate
    scores[hf_subset] = self._evaluate_subset(
                        ^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/QWEN_EMBEDDINGS/eager-embed-v1/.venv/lib/python3.12/site-packages/mteb/abstasks/retrieval.py", line 372, in _evaluate_subset
    results = retriever(
              ^^^^^^^^^^
  File "/mnt/data/QWEN_EMBEDDINGS/eager-embed-v1/.venv/lib/python3.12/site-packages/mteb/_evaluators/retrieval_evaluator.py", line 62, in __call__
    return search_model.search(
           ^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/QWEN_EMBEDDINGS/eager-embed-v1/.venv/lib/python3.12/site-packages/mteb/models/search_wrappers.py", line 96, in search
    query_embeddings = self.model.encode(
                       ^^^^^^^^^^^^^^^^^^
  File "/mnt/data/QWEN_EMBEDDINGS/eager-embed-v1/mteb_wrapper.py", line 88, in encode
    text_embeddings = self.get_text_embeddings(inputs, prompt_type=prompt_type, **kwargs)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/QWEN_EMBEDDINGS/eager-embed-v1/mteb_wrapper.py", line 182, in get_text_embeddings
    for batch in tqdm(inputs, desc="Encoding texts"):
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/QWEN_EMBEDDINGS/eager-embed-v1/.venv/lib/python3.12/site-packages/tqdm/std.py", line 1181, in __iter__
    for obj in iterable:
               ^^^^^^^^
  File "/mnt/data/QWEN_EMBEDDINGS/eager-embed-v1/.venv/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 701, in __next__
    data = self._next_data()
           ^^^^^^^^^^^^^^^^^
  File "/mnt/data/QWEN_EMBEDDINGS/eager-embed-v1/.venv/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 757, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/QWEN_EMBEDDINGS/eager-embed-v1/.venv/lib/python3.12/site-packages/torch/utils/data/_utils/fetch.py", line 55, in fetch
    return self.collate_fn(data)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/QWEN_EMBEDDINGS/eager-embed-v1/.venv/lib/python3.12/site-packages/torch/utils/data/_utils/collate.py", line 398, in default_collate
    return collate(batch, collate_fn_map=default_collate_fn_map)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/QWEN_EMBEDDINGS/eager-embed-v1/.venv/lib/python3.12/site-packages/torch/utils/data/_utils/collate.py", line 172, in collate
    key: collate(
         ^^^^^^^^
  File "/mnt/data/QWEN_EMBEDDINGS/eager-embed-v1/.venv/lib/python3.12/site-packages/torch/utils/data/_utils/collate.py", line 207, in collate
    raise RuntimeError("each element in list of batch should be of equal size")

and then I remembered why I added this (it was because of this bug): #3602 (comment)

just in case, here's my try on the unified embeddings method (I get the same error as above when running the benchmark):

def encode(
        self,
        inputs: DataLoader[BatchedInput],
        *,
        task_metadata: TaskMetadata,
        hf_split: str,
        hf_subset: str,
        prompt_type: PromptType | None = None,
        **kwargs: Any,
    ) -> Array:
        """Encode inputs (text and/or images) into embeddings."""
        from qwen_vl_utils import process_vision_info

        all_embeddings: list[torch.Tensor] = []

        with torch.no_grad():
            for batch in tqdm(inputs, desc="Encoding"):
                batch_texts = batch.get("text", [])
                batch_images = batch.get("image", [])
                
                messages = []
                for i in range(max(len(batch_texts), len(batch_images))):
                    text_content = batch_texts[i] if batch_texts else ""
                    image_content = batch_images[i] if batch_images else None
                    
                    query_prefix = ('Query: ' if prompt_type == PromptType.query else '')
                    
                    content = [
                        {
                            'type': 'text',
                            'text': f'{query_prefix}{text_content}'
                        }
                    ]
                    
                    if image_content is not None:
                        content.append({
                            'type': 'image',
                            'image': image_content,
                            'resized_height': self.image_size,
                            'resized_width': self.image_size
                        })

                    messages.append([{
                        'role': 'user',
                        'content': content
                    }])

                # Prepare inputs
                texts = [
                    self.processor.apply_chat_template(
                        msg, tokenize=False, add_generation_prompt=False
                    ) + "<|endoftext|>"
                    for msg in messages
                ]

                image_inputs = None
                video_inputs = None
                if batch_images:
                    image_inputs, video_inputs = process_vision_info(messages)

                model_inputs = self.processor(
                    text=texts,
                    images=image_inputs,
                    videos=video_inputs,
                    padding='longest',
                    return_tensors='pt'
                ).to(self.device)

                # Get embeddings
                output = self.mdl(**model_inputs, return_dict=True, output_hidden_states=True)
                embeddings = self.get_embedding(output.hidden_states[-1])
                embeddings = embeddings.cpu().to(torch.float32)
                embeddings = torch.nn.functional.normalize(embeddings, p=2, dim=-1)
                
                all_embeddings.append(embeddings)

        # Concatenate all embeddings
        return torch.cat(all_embeddings, dim=0)

I assume I must be doing something wrong with how I handle the tensors, but I was debugging this for several hours with no luck so far.

@Samoed Samoed mentioned this pull request Nov 25, 2025
@Samoed
Copy link
Member

Samoed commented Nov 25, 2025

@jpbalarini I've added a fix #3618. Thank you for reporting!

@jpbalarini
Copy link
Contributor Author

@jpbalarini I've added a fix #3618. Thank you for reporting!

You're welcome! Let me add the latest changes and rerun the benchmark to see that everything works as expected

@Samoed
Copy link
Member

Samoed commented Nov 26, 2025

@jpbalarini Is this the final version? Have you submitted all the results with the processing of text and images combined?

@jpbalarini
Copy link
Contributor Author

@jpbalarini Is this the final version? Have you submitted all the results with the processing of text and images combined?

Yes @Samoed I updated the new results here (and I added vidore v3 too).
It should be ready for merging! Thanks for all the comments

@Samoed
Copy link
Member

Samoed commented Nov 26, 2025

Great work!

@Samoed Samoed enabled auto-merge (squash) November 26, 2025 20:16
@Samoed Samoed merged commit 7e2fa98 into embeddings-benchmark:main Nov 26, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new model Questions related to adding a new model to the benchmark

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments