id like the option to have direct token-by-token streaming instead of kobold's current buffering method, it makes things feel much faster and more responsive
my idea is to have it as a setting you can toggle, between kobold's buffering method and the more direct method (which is what llamacpp does)
also you told me in voice call to tell you to investigate if theres a way to set how many tokens to buffer per chunk
id like the option to have direct token-by-token streaming instead of kobold's current buffering method, it makes things feel much faster and more responsive
my idea is to have it as a setting you can toggle, between kobold's buffering method and the more direct method (which is what llamacpp does)
also you told me in voice call to tell you to investigate if theres a way to set how many tokens to buffer per chunk