ggml-cuda : add rope f16, restore performance with parallel decoding#3272
Merged
ggerganov merged 4 commits intocustom-attention-maskfrom Sep 20, 2023
Merged
ggml-cuda : add rope f16, restore performance with parallel decoding#3272ggerganov merged 4 commits intocustom-attention-maskfrom
ggerganov merged 4 commits intocustom-attention-maskfrom