Skip to content

Conversation

@liaoxin01
Copy link
Contributor

@liaoxin01 liaoxin01 commented Jun 24, 2024

Proposed changes

When the concurrency of streamload exceeds the number of threads in the remote scanner, streamload may get stuck. The reason is that the libevent thread blocks and waits for streamload to complete, and when there is no intersection between the tasks handled by the scanner thread and the libevent thread, it gets stuck.
The solution is to convert the synchronous waiting tasks of libevent into asynchronous execution by using callbacks in the streamload executor thread.

@liaoxin01
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

return execute_plan_fragment(ctx, [](std::shared_ptr<StreamLoadContext> ctx) {});
}

Status StreamLoadExecutor::execute_plan_fragment(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function 'execute_plan_fragment' exceeds recommended size/complexity thresholds [readability-function-size]

Status StreamLoadExecutor::execute_plan_fragment(
                           ^
Additional context

be/src/runtime/stream_load/stream_load_executor.cpp:73: 95 lines including whitespace and comments (threshold 80)

Status StreamLoadExecutor::execute_plan_fragment(
                           ^

@doris-robot
Copy link

TPC-H: Total hot run time: 39860 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d5c6ccff47a5793398184b22c6a66748c6976e09, data reload: false

------ Round 1 ----------------------------------
q1	17601	4348	4309	4309
q2	2007	193	182	182
q3	10477	1116	1077	1077
q4	10182	822	756	756
q5	7483	2724	2649	2649
q6	224	136	136	136
q7	933	596	609	596
q8	9233	2069	2052	2052
q9	9003	6533	6454	6454
q10	8983	3722	3739	3722
q11	445	248	246	246
q12	467	243	236	236
q13	17765	3013	3001	3001
q14	276	217	215	215
q15	513	487	480	480
q16	523	374	378	374
q17	958	695	747	695
q18	8124	7468	7303	7303
q19	7187	1469	1498	1469
q20	643	322	313	313
q21	4984	3251	3929	3251
q22	403	344	344	344
Total cold run time: 118414 ms
Total hot run time: 39860 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4431	4246	4236	4236
q2	373	274	271	271
q3	3003	2896	2964	2896
q4	1989	1709	1755	1709
q5	5595	5495	5475	5475
q6	224	128	130	128
q7	2255	1896	1914	1896
q8	3270	3434	3424	3424
q9	8785	8784	8753	8753
q10	4136	3766	3805	3766
q11	594	499	495	495
q12	839	661	622	622
q13	17296	3200	3176	3176
q14	309	292	265	265
q15	544	500	485	485
q16	501	446	456	446
q17	1801	1523	1501	1501
q18	8316	7835	7842	7835
q19	1804	1462	1601	1462
q20	3142	1896	1873	1873
q21	5193	5058	4876	4876
q22	726	565	565	565
Total cold run time: 75126 ms
Total hot run time: 56155 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173762 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit d5c6ccff47a5793398184b22c6a66748c6976e09, data reload: false

query1	932	381	355	355
query2	6411	2421	2286	2286
query3	6627	205	210	205
query4	19706	17417	17377	17377
query5	3549	456	454	454
query6	246	169	178	169
query7	4580	292	292	292
query8	332	303	295	295
query9	8414	2457	2437	2437
query10	554	304	269	269
query11	10607	9947	10024	9947
query12	109	83	88	83
query13	1634	373	374	373
query14	10251	6687	7674	6687
query15	249	201	188	188
query16	7743	277	274	274
query17	1694	554	537	537
query18	1922	286	290	286
query19	206	156	158	156
query20	91	83	83	83
query21	211	126	128	126
query22	4373	4008	4109	4008
query23	33789	33621	33802	33621
query24	11150	2819	2891	2819
query25	660	411	407	407
query26	1298	159	156	156
query27	2933	330	338	330
query28	7751	2159	2142	2142
query29	934	656	661	656
query30	262	157	152	152
query31	977	758	769	758
query32	89	53	56	53
query33	753	310	309	309
query34	991	489	474	474
query35	800	659	646	646
query36	1145	987	961	961
query37	164	74	78	74
query38	2935	2860	2819	2819
query39	904	851	827	827
query40	209	127	126	126
query41	56	53	58	53
query42	120	102	108	102
query43	582	549	545	545
query44	1180	739	740	739
query45	198	164	165	164
query46	1105	705	745	705
query47	1902	1758	1748	1748
query48	376	300	303	300
query49	871	415	453	415
query50	798	389	397	389
query51	6940	6892	6749	6749
query52	109	92	92	92
query53	354	289	295	289
query54	942	450	433	433
query55	74	73	69	69
query56	287	256	279	256
query57	1138	1032	1024	1024
query58	243	240	249	240
query59	3650	3501	3313	3313
query60	291	267	276	267
query61	90	93	89	89
query62	597	455	445	445
query63	317	285	283	283
query64	8909	2258	1743	1743
query65	3266	3124	3124	3124
query66	826	325	350	325
query67	15634	14932	15071	14932
query68	8452	542	602	542
query69	736	539	398	398
query70	1195	1097	1155	1097
query71	510	277	271	271
query72	8893	5166	5932	5166
query73	1586	326	323	323
query74	5894	5575	5519	5519
query75	5217	2648	2640	2640
query76	5085	963	906	906
query77	767	305	301	301
query78	10573	10188	9786	9786
query79	6727	509	516	509
query80	1943	462	469	462
query81	557	232	220	220
query82	781	105	101	101
query83	293	166	166	166
query84	265	86	88	86
query85	1320	280	267	267
query86	399	325	302	302
query87	3268	3094	3095	3094
query88	4415	2462	2444	2444
query89	504	427	389	389
query90	1936	190	188	188
query91	127	99	99	99
query92	57	49	48	48
query93	5119	521	516	516
query94	1155	192	188	188
query95	416	310	317	310
query96	616	274	265	265
query97	3224	3042	3057	3042
query98	221	203	200	200
query99	1145	872	867	867
Total cold run time: 292436 ms
Total hot run time: 173762 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.74 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit d5c6ccff47a5793398184b22c6a66748c6976e09, data reload: false

query1	0.04	0.04	0.03
query2	0.08	0.04	0.04
query3	0.22	0.04	0.04
query4	1.68	0.07	0.07
query5	0.49	0.51	0.50
query6	1.13	0.73	0.72
query7	0.02	0.02	0.01
query8	0.05	0.04	0.04
query9	0.55	0.50	0.49
query10	0.54	0.54	0.55
query11	0.16	0.12	0.11
query12	0.14	0.12	0.12
query13	0.59	0.58	0.58
query14	0.76	0.79	0.79
query15	0.83	0.80	0.82
query16	0.36	0.37	0.36
query17	1.03	1.03	0.98
query18	0.22	0.26	0.22
query19	1.81	1.68	1.72
query20	0.01	0.01	0.01
query21	15.41	0.65	0.64
query22	4.32	8.01	1.99
query23	18.26	1.45	1.32
query24	2.08	0.22	0.24
query25	0.16	0.08	0.09
query26	0.27	0.18	0.17
query27	0.09	0.08	0.09
query28	13.19	1.02	1.03
query29	12.62	3.28	3.29
query30	0.25	0.06	0.06
query31	2.86	0.40	0.38
query32	3.29	0.48	0.50
query33	2.89	2.93	2.93
query34	17.28	4.41	4.45
query35	4.48	4.49	4.50
query36	0.64	0.47	0.48
query37	0.18	0.16	0.16
query38	0.16	0.15	0.14
query39	0.04	0.03	0.03
query40	0.17	0.14	0.14
query41	0.10	0.04	0.05
query42	0.06	0.04	0.07
query43	0.04	0.04	0.04
Total cold run time: 109.55 s
Total hot run time: 30.74 s

Copy link
Contributor

@gavinchou gavinchou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jun 24, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit ea39daa into apache:master Jun 25, 2024
liaoxin01 added a commit that referenced this pull request Jun 25, 2024
dataroaring pushed a commit that referenced this pull request Jun 26, 2024
## Proposed changes

When the concurrency of streamload exceeds the number of threads in the
remote scanner, streamload may get stuck. The reason is that the
libevent thread blocks and waits for streamload to complete, and when
there is no intersection between the tasks handled by the scanner thread
and the libevent thread, it gets stuck.
The solution is to convert the synchronous waiting tasks of libevent
into asynchronous execution by using callbacks in the streamload
executor thread.
Userwhite pushed a commit to Userwhite/incubator-doris that referenced this pull request Jul 12, 2025
…e better http performance

* fix some bug for partition

* fix for thrift

* fix the thrift exit bug

* Revert "[feat](http): use async reply to provide better http performance

* ensure free order

* [Fix](stream-load) Fix stream load stuck under high concurrency (apache#36772)

When the concurrency of streamload exceeds the number of threads in the
remote scanner, streamload may get stuck. The reason is that the
libevent thread blocks and waits for streamload to complete, and when
there is no intersection between the tasks handled by the scanner thread
and the libevent thread, it gets stuck.
The solution is to convert the synchronous waiting tasks of libevent
into asynchronous execution by using callbacks in the streamload
executor thread.

See merge request: !740"
Revert commit d9e74efa762c8161a5ca3df4290bbd0ab896f1ef

See merge request: !745"
Revert commit 396cb2ec7e0b1a21bc0d7424c627f0d9321884bc
dataroaring pushed a commit to Userwhite/incubator-doris that referenced this pull request Jul 15, 2025
…e better http performance

* fix some bug for partition

* fix for thrift

* fix the thrift exit bug

* Revert "[feat](http): use async reply to provide better http performance

* ensure free order

* [Fix](stream-load) Fix stream load stuck under high concurrency (apache#36772)

When the concurrency of streamload exceeds the number of threads in the
remote scanner, streamload may get stuck. The reason is that the
libevent thread blocks and waits for streamload to complete, and when
there is no intersection between the tasks handled by the scanner thread
and the libevent thread, it gets stuck.
The solution is to convert the synchronous waiting tasks of libevent
into asynchronous execution by using callbacks in the streamload
executor thread.

See merge request: !740"
Revert commit d9e74efa762c8161a5ca3df4290bbd0ab896f1ef

See merge request: !745"
Revert commit 396cb2ec7e0b1a21bc0d7424c627f0d9321884bc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.0.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants