Skip to content

[Bug] SentenceTransformerReranker memory vram leak #6112

@kuopching

Description

@kuopching

Description

I'm seeing a VRAM leak using SentenceTransformerReranker.
In the _rerank method, CrossEncoder is initialized all the time. It should be reused.
Something like:

if self.sentence_transformer_client is None:
self.sentence_transformer_client = CrossEncoder(model_name_or_path=self.model, model_kwargs=self.model_kwargs)

Thank you

Steps to Reproduce

knowledge = Knowledge(
    max_results=5,
    vector_db=LanceDb(
        uri="tmp/kbase",
        table_name="PP_SMIB",
        search_type=SearchType.hybrid,
        embedder=OllamaEmbedder(id="snowflake-arctic-embed2:latest",dimensions=1024),
        reranker=SentenceTransformerReranker(),
    ),
)

Agent Configuration (if applicable)

agent = Agent(
    
    model=VLLM(id="Qwen/Qwen3-4B-Instruct-2507", top_k=16, enable_thinking=False,base_url="http://localhost:9999/v1"),
    knowledge=knowledge,
    search_knowledge=True,
    add_knowledge_to_context=True,
    instructions=dedent("""
    You are a knowledge retrieval specialist. Your responsibilities include:

1. Thoroughly understanding the user’s query;
2. Retrieving the most relevant information from the knowledge base;
3. Evaluating whether the retrieved results are pertinent to the query;
4. Returning the results in a structured format, including content and source information;

    """),
      debug_mode=False,
    markdown=True
)

Expected Behavior

No vram leaks

Actual Behavior

vram leaks

Screenshots or Logs (if applicable)

No response

Environment

- OS: Windows 11, wsl2
- Agno version: 2.4.0
- Python: 3.11.12

Possible Solutions (optional)

No response

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstale

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions