Skip to content

Commit 2812843

Browse files
richoxzhangli20
andauthored
doc: update tpc-h benchmark result (#614)
Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
1 parent 427d084 commit 2812843

File tree

6 files changed

+89
-207
lines changed

6 files changed

+89
-207
lines changed

README.md

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -144,15 +144,13 @@ spark.sql.adaptive.localShuffleReader.enabled false
144144

145145
## Performance
146146

147-
Check [Benchmark Results](./benchmark-results/20240701-blaze300.md) with the latest date for the performance
148-
comparison with vanilla Spark 3.3.3. The benchmark result shows that Blaze save about 50% time on TPC-DS/TPC-H 1TB datasets.
149-
Stay tuned and join us for more upcoming thrilling numbers.
147+
Check [TPC-H Benchmark Results](./benchmark-results/tpch.md).
148+
The latest benchmark result shows that Blaze saved more than 50% time on TPC-H 1TB datasets comparing with Vanilla Spark 3.5.
150149

151-
TPC-DS Query time: ([How can I run TPC-DS benchmark?](./tpcds/README.md))
152-
![20240701-query-time-tpcds](./benchmark-results/spark-3.3-vs-blaze300-query-time-20240701.png)
150+
Stay tuned and join us for more upcoming thrilling numbers.
153151

154152
TPC-H Query time:
155-
![20240701-query-time-tpch](./benchmark-results/spark-3.3-vs-blaze300-query-time-20240701-tpch.png)
153+
![tpch-blaze400-spark351.png](./benchmark-results/tpch-blaze400-spark351.png)
156154

157155
We also encourage you to benchmark Blaze and share the results with us. 🤗
158156

benchmark-results/20240701-blaze300.md

Lines changed: 0 additions & 201 deletions
This file was deleted.
Binary file not shown.
-90.3 KB
Binary file not shown.
44.4 KB
Loading

benchmark-results/tpch.md

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# TPC-H 1TB Benchmark
2+
3+
### Versions
4+
- Blaze version: [4.0.0](https://github.com/blaze-init/blaze/tree/v4.0.0)
5+
- Vanilla spark version: spark-3.5.1
6+
7+
### Environment
8+
Hadoop 2.10.2 cluster mode running on 7 nodes, See [Kwai server conf](./kwai1-hardware-conf.md).
9+
java version: 1.8.0_102.
10+
11+
### Configuration
12+
13+
Common configurations:
14+
```conf
15+
spark.master yarn
16+
spark.shuffle.service.enabled true
17+
spark.shuffle.service.port 7337
18+
19+
spark.driver.memory 20g
20+
spark.driver.memoryOverhead 4096
21+
22+
spark.executor.instances 10000
23+
spark.dynamicallocation.maxExecutors 10000
24+
25+
spark.io.compression.codec lz4
26+
spark.sql.parquet.compression.codec zstd
27+
28+
# enabled in spark 3.5 by default
29+
spark.sql.optimizer.runtime.bloomFilter.enabled true
30+
31+
# enable HashJoin for small tables, which is faster both in spark and blaze
32+
# note: SortMergeJoin is still used for joining big tables with this configuration enabled
33+
spark.sql.join.preferSortMergeJoin false
34+
```
35+
36+
Configurations for Vanillia spark:
37+
```conf
38+
spark.executor.memory 4g
39+
spark.executor.memoryOverhead 2048
40+
spark.executor.cores 5
41+
```
42+
43+
Configurations for blaze:
44+
45+
note: this configuration is widely used in production environment of Kuaishou.inc, without any tricky optimizations only for benchmark. (for example, you can set `spark.blaze.forceShuffledHashJoin true` to force using HashJoin instead of SortMergeJoin and get much faster benchmark result, but this is unacceptable in production environment)
46+
47+
```conf
48+
spark.executor.memory 3g
49+
spark.executor.memoryOverhead 3072
50+
spark.blaze.enable true
51+
spark.sql.extensions org.apache.spark.sql.blaze.BlazeSparkSessionExtension
52+
spark.shuffle.manager org.apache.spark.sql.execution.blaze.shuffle.BlazeShuffleManager
53+
```
54+
55+
### Benchmark result:
56+
Blaze shows 2.12x query time speed-up comparing with Spark 3.5, with the same CPU/memory resources.
57+
58+
![tpch-blaze400-spark351.png](tpch-blaze400-spark351.png)
59+
60+
| | Spark | Blaze | Speedup |
61+
| --- | -------- | ------- | ------- |
62+
| q01 | 40.473 | 19.834 | 2.04 |
63+
| q02 | 20.527 | 11.639 | 1.76 |
64+
| q03 | 69.091 | 31.199 | 2.21 |
65+
| q04 | 59.58 | 16.585 | 3.59 |
66+
| q05 | 100.958 | 52.267 | 1.93 |
67+
| q06 | 26.713 | 7.928 | 3.37 |
68+
| q07 | 64.729 | 28.175 | 2.30 |
69+
| q08 | 64.465 | 35.043 | 1.84 |
70+
| q09 | 103.011 | 53.203 | 1.94 |
71+
| q10 | 46.543 | 21.805 | 2.13 |
72+
| q11 | 16.458 | 8.561 | 1.92 |
73+
| q12 | 26.626 | 13.784 | 1.93 |
74+
| q13 | 53.072 | 15.445 | 3.44 |
75+
| q14 | 31.561 | 9.279 | 3.40 |
76+
| q15 | 59.57 | 19.212 | 3.10 |
77+
| q16 | 14.533 | 5.944 | 2.44 |
78+
| q17 | 141.243 | 54.49 | 2.59 |
79+
| q18 | 129.022 | 79.808 | 1.62 |
80+
| q19 | 19.561 | 10.149 | 1.93 |
81+
| q20 | 42.451 | 15.934 | 2.66 |
82+
| q21 | 177.553 | 107.276 | 1.66 |
83+
| q22 | 17.429 | 8.244 | 2.11 |
84+
| | | | |
85+
| sum | 1325.169 | 625.804 | 2.12 |

0 commit comments

Comments
 (0)