Skip to content

Conversation

@arnavgarg1
Copy link
Contributor

This PR updates the README to add another usage option for EETQ through a new open-source framework called LoRAX, which allows users to serve thousands of fine-tuned models on a single GPU.

LoRAX: https://predibase.github.io/lorax/

Docs in LoRAX for EETQ: https://predibase.github.io/lorax/guides/quantization/#eetq

PR to add EETQ support into LoRAX: predibase/lorax@1be43cf

@SidaZh
Copy link
Collaborator

SidaZh commented Jan 29, 2024

Excellent project, and thank you very much for introducing EETQ.

@SidaZh
Copy link
Collaborator

SidaZh commented Jan 29, 2024

We have some recent updates that will help improve performance for small batch size. Welcome to continue following. @arnavgarg1

@SidaZh SidaZh merged commit 8877e96 into NetEase-FuXi:main Jan 29, 2024
@arnavgarg1
Copy link
Contributor Author

We have some recent updates that will help improve performance for small batch size. Welcome to continue following. @arnavgarg1

That sounds fantastic, I will keep a lookout and give it a try. Happy to report performance numbers back if it's useful!

@SidaZh
Copy link
Collaborator

SidaZh commented Feb 1, 2024

We have some recent updates that will help improve performance for small batch size. Welcome to continue following. @arnavgarg1

That sounds fantastic, I will keep a lookout and give it a try. Happy to report performance numbers back if it's useful!

We verified the effect and submitted it to the TGI community. You can see the performance data in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants