ARM64 implementation for poly.PackLe16#563
Conversation
| // length N/2. | ||
| func (p *Poly) PackLe16(buf []byte) { | ||
| p.packLe16Generic(buf) | ||
| // early bounds so we don't have to in assembly code |
There was a problem hiding this comment.
I don't mind the check that much, but I dislike that you write we have to check it. I think it's optional. Most internal functions have a bunch of prerequisites, which aren't always easy to check. One prerequisite you don't check here is that the coefficients of p are indeed less than 16. That's fine: inspecting the call sites we see that it is indeed fine. Same for length of the buffer passed.
|
I wanted to throw in one thing. I'm coming from C etc. which means the ABI was based on which registers are caller and which are callee-saved. Regarding Go I am not 100% sure what must be guaranteed. I've read somewhere that in Go are no callee-save registers. Is it still true or are there any caveats? Based on my personal projects, this was always the assumption, also there were no issues. |
There are caveats. They're documented here. |
To test performance difference on arm64 chips: "go test -benchmem -run=^$ ./sign/internal/dilithium -bench=Le16"
On my machine (Apple M1 Max) on average:
Also consider this are microbenchmarks!