Feature/2005 large composite disposable perf#2092
Merged
Conversation
Once the number of elements in a CompositeDisposable gets into the thousands, Remove became very expensive because it performed a linear search. This change causes CompositeDisposable to switch into an alternate mode using a dictionary to enable hash-based lookup once it grows over a particular size. With under 1024 elements, the behaviour is as before. The number of fields remains the same so this should not change the memory footprint of small CompositeDisposables. However, it does require it to check which of the two modes it is in, so there will likely be a small performance penalty for small CompositeDisposables, so we will need to see if this changes any benchmarks in a significant way before merging this change.
Not necessary with the dictionary. We did it for the list purely because we deliberately allowed the list to become peppered with nulls to avoid having to resize it on every call to Remove.
HowardvanRooijen
approved these changes
Apr 29, 2024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Addresses #2005
This change makes the
CompositeDisposablefar more efficient when it is dealing with large numbers (thousands) ofIDisposableobjects. This matters in "fan out/in" scenarios where items are split into large numbers of groups withGroupBy, and then later recombined (usually with some per-group processing having been applied) with eitherSelectManyorMerge, because these operators useCompositeDisposableto keep track of their subscriptions to upstream observables (the ones that emerge byGroupByin the scenario being described here).Fan out/in is an important technique in Rx.NET, as described at https://introtorx.com/chapters/transformation-of-sequences#the-significance-of-selectmany and https://introtorx.com/chapters/partitioning so we want this to work well even with large numbers of groups.
Before,$O(N^2)$ performance (where $N$ is the number of groups being merged) when the source finally calls
CompositeDisposableused aList<IDisposable>internally, which meant that itsRemovemethod had to perform a linear scan of the list to find the item to remove. This resulted inOnCompleted. If you had over 100,000 groups, this could end up taking seconds to complete!With this change, the
CompositeDisposablecontinues to use the same strategy as before for small and moderate sized lists because there's no observable performance problem with that, and it is likely to be more efficient for small collections than a strategy optimized for larger collections. And awful lot ofCompositeDisposableinstances end up with only a handful of items, so it's important that it continues to perform well in that scenario.But with large numbers of disposables, it switches to a different strategy in which it uses a dictionary to track disposables.
We use a dictionary instead of the more obvious
HashSet<IDisposable>becauseCompositeDisposablehas always allowed you add the same disposable instance multiple times. Although it would not be a good idea to depend on this, we've never prevented it, and since this is a public type, some Rx users might depend on it. So the dictionary has anintvalue keeping track of how many times each particular disposable has been added. We expect this to be1in most cases in practice.As the discussion at #2005 (comment) shows, we've added benchmarks that verify that this does address the performance problem described. We've also checked that this doesn't cause any significant change in performance for any other benchmarks. (That's important because
CompositeDisposableis widely used within Rx.NET, so changes of this kind have the potential to have a broad negative impact. We wanted to make sure we hadn't made normal operation much slower just to improve this one particular and relatively unusual scenario.)