Conversation
|
We should add types |
|
Need some help with the Python code In the meantime, I will now add V1 backward comp in |
|
We should change to uint64_t on all lengths / sizes / counts just to be safe and future-proof, not only change tensor dimensions. |
|
I tested loading a couple GGUF v1 models, the backward compatibility seems to work fine. |
|
Similarly, no issues loading various v1 models. |
|
We can actually use |
|
Looks good, is the plan to update the metadata values for the lengths/etc before merge? |
@klosax Ah, that's useful. For a 7b q4_0 model, I use I don't need |
I dont think those parameters are needed. Maybe we should have a new parameter |
That logic is actually kind of wrong because the k-quants stuff can choose a different type than It probably will work for the non-k-quants types but pretty sure k-quants won't work. (There were also some changes to the decisions k-quants makes for LLaMA2 70B models so in that particular case it wouldn't pass through all the tensors even if the other issues were dealt with.) |
|
Thanks. I used |
|
Thanks everyone for testing. We should merge this - anything else we won't to try before this? |
|
I am a long-term enthusiast for whisper.cpp which I use by default nowadays to transcribe my podcast Unmaking Sense.
|
did you press it more than once? It queues a stop and gives you the control, and then if pressed again, exits the program. try to play with it a bit more :)
did you use the prompt template? |
It seems that if you use Ctrl-C while the assistant is printing a reply, it behaves as expected and described, but if you press it afterwards, it aborts. Thanks for the hint.
I hadn't, but now I have. Thank you, again. Unfortunately it seems to lead to a collapse of the quality of the response to a point where it is worthless, but I therefore obviously need to investigate the process more. |
|
If you'd need to follow up, I'd suggest making an issue specifically to discuss your problem. This is a pull request that doesn't seem directly related. |
Adding 64-bit support as discussed: ggml-org/ggml#302 (comment)
Help with testing is appreciated. Should be backward compatible with v1