@hildjj, @nathan7
node-cbor depends on some very clever work combining generators with readable streams that has cloned into the node-cbor/vendor/binary-parse-stream directory. The README.md is rather sparse, and says it's based on https://github.com/nathan7/binary-parser, but that's a broken link now. However, https://github.com/nathan7/binary-parse-stream still seems present, if a bit stale... (4 years since last commit.
None-the-less, the approach used permits a stream-based parser to be written with sync-looking code, which is very impressive. My gut feeling however is that the performance of node-cbor may be limited by this, in particular yielding once or more for each parsed data item (perhaps as short as one byte of incoming data.) All of these yields seem like a substantial source of overhead for large data sets. A chunk based approach where the parser can remain synchronous during processing of entire Buffer's seem like it could offer substantial performance gains.
ECMAScript async iterators seems like it might be a good forward-looking and chunk-oriented approach. TC 39 declared async iteration "finished" in January 2018, and V8 and Node v10 both seem to have adopted it, at lest experimentally.
I'm considering writing a binary parsing framework similar to binary-parse-stream but based instead on async iterators. Before I dig into alone, I thought it might be valuable to run the idea by you both, to see if what you think, and to see if you know of other implementations doing something like this.
@hildjj, @nathan7
node-cbor depends on some very clever work combining generators with readable streams that has cloned into the node-cbor/vendor/binary-parse-stream directory. The README.md is rather sparse, and says it's based on https://github.com/nathan7/binary-parser, but that's a broken link now. However, https://github.com/nathan7/binary-parse-stream still seems present, if a bit stale... (4 years since last commit.
None-the-less, the approach used permits a stream-based parser to be written with sync-looking code, which is very impressive. My gut feeling however is that the performance of node-cbor may be limited by this, in particular
yielding once or more for each parsed data item (perhaps as short as one byte of incoming data.) All of these yields seem like a substantial source of overhead for large data sets. A chunk based approach where the parser can remain synchronous during processing of entireBuffer's seem like it could offer substantial performance gains.ECMAScript async iterators seems like it might be a good forward-looking and chunk-oriented approach. TC 39 declared async iteration "finished" in January 2018, and V8 and Node v10 both seem to have adopted it, at lest experimentally.
I'm considering writing a binary parsing framework similar to binary-parse-stream but based instead on async iterators. Before I dig into alone, I thought it might be valuable to run the idea by you both, to see if what you think, and to see if you know of other implementations doing something like this.