Skip to content

opt triple remote flow controller#15983

Merged
zrlw merged 3 commits intoapache:3.3from
EarthChen:opt-tri-remote-flow-benchmarks
Jan 16, 2026
Merged

opt triple remote flow controller#15983
zrlw merged 3 commits intoapache:3.3from
EarthChen:opt-tri-remote-flow-benchmarks

Conversation

@EarthChen
Copy link
Member

What is the purpose of the change?

This PR introduces the TripleHttp2RemoteFlowController, a specialized flow controller for the Triple protocol. It extends Netty's DefaultHttp2RemoteFlowController and optimizes the byte distribution strategy to enhance throughput and performance in high-concurrency scenarios.

Checklist

  • Make sure there is a GitHub_issue field for the change.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Write necessary unit-test to verify your logic correction. If the new feature or significant change is committed, please remember to add sample in dubbo samples project.
  • Make sure gitHub actions can pass. Why the workflow is failing and how to fix it?

@EarthChen
Copy link
Member Author

Why 16KB allocationQuantum Improves HTTP/2 Performance

In HTTP/2's WeightedFairQueueByteDistributor, the allocationQuantum defines the maximum number of bytes allocated to a stream before the distributor moves to the next stream in the queue. Setting this to 16KB is an industry-standard optimization for balancing throughput and fairness.


1. Alignment with HTTP/2 Frame Architecture

The default SETTINGS_MAX_FRAME_SIZE for most HTTP/2 implementations is 16,384 bytes (16KB).

  • Reduced Logic Cycles: If the quantum is smaller than the frame size (e.g., 1KB), the distributor must run its complex weighting logic 16 times just to produce one full 16KB data frame.
  • Optimal Framing: Setting the quantum to 16KB allows the distributor to grant enough "credit" for exactly one full frame in a single allocation cycle, significantly reducing CPU overhead.

2. Efficiency vs. Fairness (The "Goldilocks" Zone)

The distributor manages a classic trade-off:

  • Too Small (< 4KB): High "fairness" but low throughput. The system spends too much time switching between streams (context switching), leading to fragmented writes and high CPU usage.
  • Too Large (> 64KB): High throughput but low fairness. Large streams (like a 5MB image) can "hog" the connection, causing Head-of-Line (HoL) Blocking for small, critical resources like CSS or API responses.
  • The 16KB Choice: It is large enough to keep the TCP "pipe" full and minimize system calls, but small enough to ensure that high-priority streams can still "interleave" frequently.

3. Impact on the Network Stack

  • System Call Reduction: Larger allocations allow the application to pass bigger buffers to the socket at once. This reduces the frequency of expensive write() system calls.
  • TCP Congestion Control: 16KB chunks help maintain a steady flow of data that fits well within modern TCP Congestion Windows (CWND), preventing the "stuttering" effect caused by sending many tiny packets.

Comparison Table

Feature Small Quantum (e.g., 1KB) 16KB Quantum (Optimized) Large Quantum (e.g., 128KB)
CPU Overhead High (Heavy scheduling) Low (Efficient) Minimal
Multiplexing Granular / Very Fair Balanced Poor (Coarsened)
Throughput Suboptimal High Maximal
Latency (TTFB) Low Low High (Potential blocking)

Implementation Example (Netty)

// Initialize the distributor with the current connection
WeightedFairQueueByteDistributor dist = new WeightedFairQueueByteDistributor(connection);

// Optimization: 16KB quantum size to balance fairness and high-throughput performance.
// This aligns with the default HTTP/2 MAX_FRAME_SIZE.
dist.allocationQuantum(16 * 1024);

// Apply the distributor to the remote flow controller
remoteFlowController.byteDistributor(dist);

@codecov-commenter
Copy link

codecov-commenter commented Jan 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.73%. Comparing base (b6f2a0f) to head (394ae85).
⚠️ Report is 16 commits behind head on 3.3.

Additional details and impacted files
@@             Coverage Diff              @@
##                3.3   #15983      +/-   ##
============================================
- Coverage     60.76%   60.73%   -0.03%     
+ Complexity    11769    11767       -2     
============================================
  Files          1948     1949       +1     
  Lines         88752    88760       +8     
  Branches      13382    13382              
============================================
- Hits          53926    53908      -18     
- Misses        29305    29323      +18     
- Partials       5521     5529       +8     
Flag Coverage Δ
integration-tests-java21 32.22% <100.00%> (-0.05%) ⬇️
integration-tests-java8 32.34% <100.00%> (+<0.01%) ⬆️
samples-tests-java21 34.86% <100.00%> (-0.02%) ⬇️
samples-tests-java8 32.51% <100.00%> (-0.08%) ⬇️
unit-tests-java11 59.00% <100.00%> (+<0.01%) ⬆️
unit-tests-java17 58.47% <100.00%> (+<0.01%) ⬆️
unit-tests-java21 58.47% <100.00%> (+0.01%) ⬆️
unit-tests-java25 58.43% <100.00%> (-0.03%) ⬇️
unit-tests-java8 58.97% <100.00%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

* @return a fully initialized {@link Http2RemoteFlowController} instance.
*/
public static Http2RemoteFlowController newController(Http2Connection connection) {
WeightedFairQueueByteDistributor dist = new WeightedFairQueueByteDistributor(connection);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UniformStreamByteDistributor might be a better choice?

Copy link
Member Author

@EarthChen EarthChen Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No!

Example:
Suppose you have a connection performing two operations simultaneously:

  1. Stream A: Downloading a 1GB log file (Weight: 16).
  2. Stream B: A very short control command "STOP_SERVICE" (Weight: 16).
  • With Uniform: It sends 16KB of logs, then 16KB of the command, cycling back and forth. If the network is congested, the critical control command might have to wait in line for a significant amount of time behind the massive log data.
  • With WFQ: It can be configured to give Stream B a higher weight. Even if the weights are equal, the WFQ scheduling algorithm manages the "virtual clock" more precisely when handling concurrent streams, ensuring that the control command can be interleaved and dispatched through the data stream at the fastest possible speed.

Copy link
Member

@RainYuY RainYuY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@wangchengming666 wangchengming666 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zrlw zrlw merged commit 2daf878 into apache:3.3 Jan 16, 2026
33 checks passed
EarthChen added a commit that referenced this pull request Jan 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants