Conversation
|
I would say more ... but, perhaps I am blind ... It would very interesting to have a documentation on how to train a model (ideally from scratch and using PDFs or a web site (html pages) like whitead/paper-qa does. |
|
I was able to use It’s also easy to load the cached pickle from the build script and filter out dynamic functions when looping over the modules functions, without a separate split step. I still can’t figure out how the database in this repo has tunings for NT_matmul, maybe a custom schedule? |
|
Hi @MarcelDelhez, what do we mean by "tuning" here is tuning the kernel performance (to be faster) instead of fine-tuning weights. |
|
I agree tuning was not the correct word. My concern was more about learning from own documents. |
bc53a4c to
e263a41
Compare
2e07023 to
fb71db8
Compare
|
Agreed that "tuning" is a pretty overloaded term - in this particular case, I am referring to "auto-tuning compiler", which is the key to GPU performance. With TVM Unity auto tuning, MLC LLM is able to generate performant code on average phones for those 3b/7b models as fast as 10 tok/sec |
Depends on mlc-ai/relax#204, which has been decomposed into 5 and sent separately to mainline:
tune_tirto tune IRModule of TIR Collections apache/tvm#14784