-
Notifications
You must be signed in to change notification settings - Fork 14.4k
model: support for LlamaBidirectionalModel architecture #18220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Instead of duplicating the model code, perhaps just template it as for non/swa models, just with/out cache and output. |
| raise ValueError(f"Unprocessed experts: {experts}") | ||
|
|
||
|
|
||
| @ModelBase.register("LlamaBidirectionalModel") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested if adding LlamaBidirectionalForSequenceClassification too just works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not tested it, but I have had a look at nvidia/llama-nemotron-rerank-1b-v2.
I think the classifier tensor (score.weight) need to be added for it to work.
This PR adds support for
LlamaBidirectionalModelarchitectures, resolving feature request #17478.The implementation enables bidirectional LLaMA embedding models—such as:
nvidia/llama-embed-nemotron-8bnvidia/llama-nemotron-embed-1b-v2while keeping changes minimal by reusing/copying the existing implementation of LLAMA wherever possible.
Key Points
LlamaBidirectionalModelValidation
nvidia/llama-embed-nemotron-8bnvidia/llama-nemotron-embed-1b-v2GGUF Models
sabafallah/llama-nemotron-embed-1b-v2-GGUF
sabafallah/llama-embed-nemotron-8b-GGUF
Tests
llama-nemotron-embed-1b-v2
llama-nemotron-embed-1b-v2