Auto-tuning scripts to maximize GPU kernel performance by junrushao · Pull Request #62 · mlc-ai/mlc-llm

junrushao · 2023-05-03T07:29:29Z

Depends on mlc-ai/relax#204, which has been decomposed into 5 and sent separately to mainline:

MarcelDelhez · 2023-05-04T12:43:58Z

I would say more ... but, perhaps I am blind ... It would very interesting to have a documentation on how to train a model (ideally from scratch and using PDFs or a web site (html pages) like whitead/paper-qa does.
But, at least a documentation with a script to tune based on own data.

funnbot · 2023-05-04T15:43:23Z

I was able to use meta_schedule.relax_integration.extract_tasks() to get task weights, convert to tune context and tune_tasks (tune_tir calls this)

It’s also easy to load the cached pickle from the build script and filter out dynamic functions when looping over the modules functions, without a separate split step.

I still can’t figure out how the database in this repo has tunings for NT_matmul, maybe a custom schedule?

yzh119 · 2023-05-04T18:57:41Z

Hi @MarcelDelhez, what do we mean by "tuning" here is tuning the kernel performance (to be faster) instead of fine-tuning weights.
Support fine-tuning in MLC-LLM is indeed very important but it's out of the scope of this PR, you can create an issue to discuss that :)

MarcelDelhez · 2023-05-05T05:51:27Z

I agree tuning was not the correct word. My concern was more about learning from own documents.
Such a bot is useful when it can answer to questions from own documents. So, my 'tunig' word must be replaced by fine tuning but into that case, there is also need for forgetting outdated information and ... a bundle of gpus to train a model and, I can easily understand it is out of your scope

junrushao · 2023-05-07T07:02:38Z

Agreed that "tuning" is a pretty overloaded term - in this particular case, I am referring to "auto-tuning compiler", which is the key to GPU performance. With TVM Unity auto tuning, MLC LLM is able to generate performant code on average phones for those 3b/7b models as fast as 10 tok/sec

junrushao force-pushed the tuning branch 3 times, most recently from bc53a4c to e263a41 Compare May 6, 2023 02:13

junrushao changed the title ~~[WIP] Convenient script for auto tuning~~ Auto-tuning scripts to maximize GPU kernel performance May 6, 2023

junrushao force-pushed the tuning branch 2 times, most recently from 2e07023 to fb71db8 Compare May 6, 2023 20:55

Auto-tuning scripts to maximize GPU kernel performance

b4918a1

junrushao force-pushed the tuning branch from a131d93 to b4918a1 Compare May 7, 2023 07:50

tqchen approved these changes May 7, 2023

View reviewed changes

tqchen merged commit 909f267 into main May 7, 2023

tqchen deleted the tuning branch May 8, 2023 12:29

renillhuang mentioned this pull request Dec 25, 2023

[Bug] Llama2-13b q4f16_1 crash on Snapdragon8 gen3 #1487

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-tuning scripts to maximize GPU kernel performance#62

Auto-tuning scripts to maximize GPU kernel performance#62
tqchen merged 1 commit intomainfrom
tuning

junrushao commented May 3, 2023 •

edited

Loading

Uh oh!

MarcelDelhez commented May 4, 2023

Uh oh!

funnbot commented May 4, 2023

Uh oh!

yzh119 commented May 4, 2023

Uh oh!

MarcelDelhez commented May 5, 2023

Uh oh!

junrushao commented May 7, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

junrushao commented May 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MarcelDelhez commented May 4, 2023

Uh oh!

funnbot commented May 4, 2023

Uh oh!

yzh119 commented May 4, 2023

Uh oh!

MarcelDelhez commented May 5, 2023

Uh oh!

junrushao commented May 7, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

junrushao commented May 3, 2023 •

edited

Loading