ref(metrics): Add normalization and update set metrics hashing#658
ref(metrics): Add normalization and update set metrics hashing#658
Conversation
5f41028 to
801d7ca
Compare
Swatinem
left a comment
There was a problem hiding this comment.
The code looks correct and has good test coverage, so no problem there.
My main concerns are:
- performance, quite a bit so
- dependencies
At the very least, you should follow the performance best practices of the regex crate here: https://docs.rs/regex/latest/regex/#avoid-re-compiling-regexes-especially-in-a-loop
I would consider that a blocker, as you are compiling the same regex over and over again in every single call.
Both for performance and dependencies: Do we really care that tags are limited to N graphemes? Why not unicode chars or even bytes? As in a bunch of cases (keys mainly?) we are filtering for ASCII chars anyway, iterating over graphemes is absolute overkill.
Otherwise, there is a bunch of String allocation happening all over the place which might be avoidable. Those are just a drop in the bucket compared to the regex topic mentioned above, so it might not be worth micro-optimizing the last drop of performance out of this.
Speaking of micro-optimization, this very much reminds me of the work I did recently in rust-lang/rust#121150 which I see is about to land today 🎉
Thanks @Swatinem! I will optimize the regex usage. The graphemes can be removed for tag keys since, as you mentioned, we already filter away multi-byte characters. But for tag values, which allow the full UTF-8 character range, the code will panic if we truncate the tag value in the middle of a multi-byte character. Do you have any suggestion for how to truncate the tag value safely without graphemes? Maybe we can catch the panic and then reduce the truncation with 1 byte until it works? |
|
"catching panics" is not really a thing because people can (and often do) use
Depending on what our goal here is (bytes, chars or graphemes), you might as well just copy over the underlying implementation from |
Great thanks! |
sl0thentr0py
left a comment
There was a problem hiding this comment.
lgtm, but please wait for Arpad's approve to merge and release
c08c0b4 to
c8c0aeb
Compare
There was a problem hiding this comment.
Looks very good. I have one criticism, and it's admittedly nitpicky: I don't think the From impls on NormalizedName/Tags/Unit are appropriate. Specifically, they are neither lossless nor value-preserving, as per https://doc.rust-lang.org/std/convert/trait.From.html#when-to-implement-from. I think free functions normalize_name/tags/unit exported from the normalization module would be better here.
90d0182 to
77f1a42
Compare
77f1a42 to
835f025
Compare
|
@Swatinem Fixed the regex optimization, removed graphemes, and reduced the number of new string allocations 👍 |
@loewenheim On it! |
cab00f7 to
26e4893
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #658 +/- ##
==========================================
+ Coverage 73.09% 73.56% +0.46%
==========================================
Files 62 66 +4
Lines 7448 7727 +279
==========================================
+ Hits 5444 5684 +240
- Misses 2004 2043 +39 |
|
@loewenheim I removed the name and unit structs and just use a function instead now. For the tags, I export a function but keep the struct. Does this look ok? |
26e4893 to
75179bf
Compare
75179bf to
f3e7a57
Compare
| [dependencies] | ||
| cadence = { version = "0.29.0", optional = true } | ||
| crc32fast = "1.4.0" | ||
| itertools = "0.13.0" |
There was a problem hiding this comment.
Yes, crc32 is used for hashing set values and itertools is used for sorting and joining the metric tags! :)
There was a problem hiding this comment.
similar to #659, we should make both these dependencies optional.
Also instead of manually sorting the metric tags, how about you switch to a BTreeMap which is sorted by definition?
Add metrics normalization in accordance with the metrics developer docs.
Add metric name, unit, and tag truncation to adhere to the metrics user docs.
Add
to_envelopefor a singleMetricinstance to facilitate sending metrics from sentry-cli.Change hash function to crc32 as described here, this ensures compatibility between different SDKs using the same metric.
Use
clone_frominstead ofcloneandclone_intoinstead ofto_ownedin a couple of places as suggested by clippy.