scanner: fix heredoc serialization buffer overflows#287
scanner: fix heredoc serialization buffer overflows#287amaanq merged 1 commit intotree-sitter:masterfrom
Conversation
The heredoc word length was serialized as a single byte, which silently truncated identifiers of >= 256 characters. The deserializer would then read the wrong length, leaving unread bytes in the buffer, and hit the assert(size == length) check.
| for (uint32_t i = 0; i < scanner->open_heredocs.size; i++) { | ||
| Heredoc *heredoc = array_get(&scanner->open_heredocs, i); | ||
| if (size + 2 + heredoc->word.size >= TREE_SITTER_SERIALIZATION_BUFFER_SIZE) { | ||
| if (size + 3 + sizeof(uint32_t) + heredoc->word.size >= TREE_SITTER_SERIALIZATION_BUFFER_SIZE) { |
There was a problem hiding this comment.
I believe 2 is incorrect in the old code: we write 3 bools below, not 2.
|
@amaanq any feedback on this PR? I'd love to get it merged as the bug is causing crashes in GitHub's extraction pipeline. |
|
@look Sorry this slipped through the cracks! I'll try to get CI passing so we can take a look soon. Thanks! |
eb2f0bb to
66e964b
Compare
|
@WillLillis thanks for taking a look! I also tried to fix CI in #288 -- you've made the equivalent change in this branch so that PR can be closed. |
f5090ba to
de8da6e
Compare
|
We'll address the CI issues a little later, that's blocked by fixes to several of the bindings repos. This fix is fine to go in now. (Passed in CI, and I also verified locally) |
|
Thanks @look, and sorry for the delay (also appreciate the disclaimer a lot :) ) |
|
No problem. Thanks for accepting the contribution! |
I identified a crash in GitHub's symbol extraction system related to tree-sitter-ruby. Investigation revealed it was caused by very long HEREDOC identifiers (these are legal in Ruby, I checked...).
The heredoc word length was serialized as a single byte, which silently truncated identifiers of >= 256 characters. The deserializer would then read the wrong length, leaving unread bytes in the buffer, and hit the
assert(size == length)check.To fix, I adopted the approach used in tree-sitter-bash and tree-sitter-php: store the HEREDOC identifier length in a
uint32_t.This PR includes a reproduction test. If you run
tree-sitter testonmasterwithtest/corpus/literals.txtyou will get:On this branch, it is fixed.
Disclaimer: I used GitHub Copilot to help diagnose and fix this bug.