feat: Fast slow separate#3213
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR implements a fast/slow thread pool separation mechanism with cross-pool borrowing to improve resource utilization under unbalanced load conditions. Commands are classified and routed to either a fast or slow pool, with intelligent borrowing when one pool is busy and another is idle.
Key changes include:
- Thread pool task borrowing mechanism based on queue occupancy and EMA-smoothed wait times
- Enhanced observability with thread pool metrics (latency buckets, scheduled/completed task counts, borrow attempts)
- Dynamic thread pool resizing via CONFIG SET commands
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| conf/pika.conf | Adds thread pool borrowing configuration parameters, EMA thresholds, and enables slow-cmd-pool by default |
| tools/pika_migrate/conf/pika.conf | Contains an unintended typo in userblacklist and adds blank line |
| include/pika_define.h | Defines TaskPoolType enum for fast/slow pool classification |
| include/pika_conf.h | Adds configuration fields and accessors for borrowing thresholds and EMA parameters |
| include/pika_server.h | Introduces ThreadPoolMetrics struct and methods for pool management, borrowing logic, and EMA tracking |
| include/pika_admin.h | Extends INFO command to support "threadpool" section |
| include/pika_client_conn.h | Adds pool_type tracking to BgTaskArg and connection state |
| include/pika_binlog.h | Fixes printf format specifier from %llu to %lu |
| src/pika_conf.cc | Loads borrowing and EMA configuration from pika.conf |
| src/pika_server.cc | Implements core borrowing logic, metrics tracking, EMA calculations, pool resizing, and decision algorithms |
| src/pika_client_conn.cc | Records queue wait times, updates EMA statistics, and tracks per-pool metrics during task execution |
| src/pika_admin.cc | Implements CONFIG GET/SET for borrowing parameters and INFO threadpool output |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
|
|
||
| bool PikaServer::ResizeSlowCmdThreadPool(size_t new_size) { | ||
| if (new_size == 0 || new_size > 24) { |
There was a problem hiding this comment.
The upper bound check is inconsistent with the error message. The code checks for 24 but the error message states the limit is 1024. For slow command thread pool, a limit of 1024 would be reasonable, but the validation should be consistent. Consider either changing the check to allow up to 1024 or updating the error message to reflect the actual limit of 24.
| if (new_size == 0 || new_size > 24) { | |
| if (new_size == 0 || new_size > 1024) { |
| // QUEUE_SIZE_THRESHOLD_PERCENTAGE is used to represent a percentage value and should be within the range of 0 to 100. | ||
| const size_t QUEUE_SIZE_THRESHOLD_PERCENTAGE = 75; | ||
|
|
||
| // ThreadPoolMetrics |
There was a problem hiding this comment.
The comment has a typo: "threadpool metrics" should be capitalized or include proper punctuation as "ThreadPool Metrics" or "thread pool metrics".
| // ThreadPoolMetrics | |
| // ThreadPool Metrics |
| role_(PIKA_ROLE_SINGLE) { | ||
| role_(PIKA_ROLE_SINGLE), | ||
| fast_pool_metrics_(new ThreadPoolMetrics()), | ||
| slow_pool_metrics_(new ThreadPoolMetrics()){ |
There was a problem hiding this comment.
The initialization uses curly braces instead of parentheses which is inconsistent with the previous line (line 110). While both are valid, consistency within the same initialization list is preferred.
| # [NOTICE] If this user password is the same as admin password (including both being empty), | ||
| # the value of this parameter will be ignored and all users are considered as administrators, | ||
| # in this scenario, users are not subject to the restrictions imposed by the userblacklist. | ||
| # in this scenario, users are not subject to the restrictions imposed by the userb lacklist. |
There was a problem hiding this comment.
There's an extra space inserted in the middle of "userblacklist", changing it to "userb lacklist". This appears to be an unintentional edit and should be reverted.
| # in this scenario, users are not subject to the restrictions imposed by the userb lacklist. | |
| # in this scenario, users are not subject to the restrictions imposed by the userblacklist. |
| while (SlowCmdThreadPoolCurQueueSize() != 0) { | ||
| std::this_thread::sleep_for(std::chrono::milliseconds(100)); | ||
| } |
There was a problem hiding this comment.
Blocking wait in resize operation could cause performance issues. The code waits in a tight loop checking queue size every 100ms, which blocks the CONFIG SET command execution. If the queue never drains (e.g., due to continuous load), this could hang indefinitely. Consider adding a timeout or a maximum retry count to prevent indefinite blocking.
| res_.AppendStringRaw("-ERR Can't resize slow-cmd-pool-size when there are tasks in the queue\r\n"); | ||
| return; | ||
| } | ||
| long int slow_cmd_thread_pool_size = (1 > ival || 24 < ival) ? 1 : ival; |
There was a problem hiding this comment.
The validation bound (24) is inconsistent with the upper bound check in line 2733 which allows values up to 1024. This creates a mismatch where ConfigSet allows larger values than ResizeSlowCmdThreadPool can actually handle. The limits should be consistent across both validation points.
src/pika_server.cc
Outdated
| if (last > 0) { | ||
| uint64_t elapsed = now - last; | ||
| if (elapsed > 1000) { // If last update was more than 1ms ago, increase alpha to adapt faster | ||
| alpha_num = std::min(alpha_num * 2, alpha_den / 2); | ||
| } | ||
| } |
There was a problem hiding this comment.
The adaptive alpha calculation could cause issues. When elapsed > 1000 microseconds (1ms), alpha_num is doubled, but this modification is only applied locally and doesn't update the global EMA parameters. This means the adaptation effect is inconsistent - it only affects this one update. Additionally, if multiple threads call this simultaneously with large elapsed times, they could all compute different alpha values leading to unpredictable EMA behavior. Consider either removing this adaptive logic or implementing it more carefully with proper synchronization.
| if (last > 0) { | |
| uint64_t elapsed = now - last; | |
| if (elapsed > 1000) { // If last update was more than 1ms ago, increase alpha to adapt faster | |
| alpha_num = std::min(alpha_num * 2, alpha_den / 2); | |
| } | |
| } | |
| (void)last; // last is currently unused; retained for potential future logic |
| std::atomic<uint64_t> tasks_completed{0}; | ||
| std::atomic<uint64_t> borrow_attempts{0}; | ||
|
|
||
| // latency(1ms, 5ms, 10ms, 50ms, 100ms, 500ms, 1s, 5s, >5s) |
There was a problem hiding this comment.
The comment is unclear and has a typo. "latency(1ms" uses Chinese parentheses instead of English ones, and the description "latency" should be more specific like "Latency buckets in microseconds".
| // latency(1ms, 5ms, 10ms, 50ms, 100ms, 500ms, 1s, 5s, >5s) | |
| // Latency buckets (1ms, 5ms, 10ms, 50ms, 100ms, 500ms, 1s, 5s, >5s) |
src/pika_server.cc
Outdated
| } | ||
| pika_client_processor_->SchedulePool(func, arg); | ||
| // if slow cmd thread pool disabled | ||
| if(!g_pika_conf->slow_cmd_pool()) { |
There was a problem hiding this comment.
Missing space after 'if'. The code has 'if(!g_pika_conf->slow_cmd_pool())' but should be 'if (!g_pika_conf->slow_cmd_pool())' for consistent code style.
| if(!g_pika_conf->slow_cmd_pool()) { | |
| if (!g_pika_conf->slow_cmd_pool()) { |
| } | ||
| size_t current_queue_size = g_pika_server->SlowCmdThreadPoolCurQueueSize(); | ||
| if (current_queue_size > 0) { | ||
| res_.AppendStringRaw("-ERR Can't resize slow-cmd-pool-size when there are tasks in the queue\r\n"); |
There was a problem hiding this comment.
The error message uses inconsistent terminology. Earlier in the code and function name, it's referred to as "slow-cmd-thread-pool-size", but the error message says "slow-cmd-pool-size". The error message should match the actual configuration parameter name for clarity.
| res_.AppendStringRaw("-ERR Can't resize slow-cmd-pool-size when there are tasks in the queue\r\n"); | |
| res_.AppendStringRaw("-ERR Can't resize slow-cmd-thread-pool-size when there are tasks in the queue\r\n"); |
aa9a567 to
cb1696b
Compare
cb1696b to
f030ae5
Compare
1. 概况
围绕“命令线程池借用”将命令处理拆分为快/慢线程池,并引入跨池借用机制,同时完善线程池相关的监控输出与在线配置能力,提升负载不均衡下的资源利用率与可观测性。
2. 核心功能点
2.1 快慢命令线程池分离(Fast / Slow)
命令处理任务在调度阶段会被分配到:
slow-cmd-pool且命令被判定为慢命令时使用(slow-cmd-list)2.2 跨池借用(Borrow)调度策略
引入借用逻辑:当某池处于忙碌、另一池处于空闲时,允许把任务投递到对方池子以分担压力。
决策入口统一在
PikaServer::DecidePoolType(...),路由入口在PikaServer::ScheduleClientPool(...)。在任务参数中记录
pool_type,并在执行入口将其写入连接上下文,使得:3. 忙闲判断指标
使用队列等待时间以及线程池任务队列占比作为判断指标;
队列等待时间(queue-wait)EMA 指标
queue_wait_us = dequeue_ts_ - enqueue_ts_queue-wait做平滑计算。队列长度占比判断 busy/idle:
threadpool-borrow-threshold-percentthreadpool-idle-threshold-percent活跃任务数占比 (
busy_by_active)active_tasks实时追踪线程池正在执行的任务数active_tasks >= (thread_num * 80%)时 → 认为忙碌组合判定规则(用于借用 Busy/Idle)
busy = busy_by_queue || busy_by_ema || busy_by_activeidle = idle_by_queue && idle_by_emaBusy 判定:
只要任一信号表明“忙”,就认为忙。
Idle 判定:更保守
必须两个信号都表明“闲”,才认为该池是空闲,才允许借用。
4. 可观测性
线程池运行信息(
info threadpool)线程池信息输出增强,包含:
size,queue_size,max_queue_size,usageborrow_attemptstasks_scheduled5. 参数动态调整
5.1 新增/扩展的 CONFIG GET
支持查询:
threadpool-borrow-enablethreadpool-borrow-threshold-percentthreadpool-idle-threshold-percent5.2 新增/扩展的 CONFIG SET
支持在线设置:
thread-pool-size(快池线程数动态调整,队列为空时允许)slow-cmd-thread-pool-size(慢池线程数动态调整,队列为空时允许)threadpool-borrow-enablethreadpool-borrow-threshold-percentthreadpool-idle-threshold-percent6. pika.conf 新增配置项(线程池 EMA 参数)
在配置文件中新增以下参数,用于控制 EMA 与阈值(单位 microseconds):
threadpool-ema-alpha-numeratorthreadpool-ema-alpha-denominatorthreadpool-fast-busy-thresholdthreadpool-fast-idle-thresholdthreadpool-slow-busy-thresholdthreadpool-slow-idle-threshold7. 调度链路(从接收请求到入池/借用)
redis_cmds、pool_type等)。ScheduleClientPool调用DecidePoolType选择目标池: