|
| 1 | +# TPC-H 1TB Benchmark |
| 2 | + |
| 3 | +### Versions |
| 4 | +- Blaze version: [4.0.0](https://github.com/blaze-init/blaze/tree/v4.0.0) |
| 5 | +- Vanilla spark version: spark-3.5.1 |
| 6 | + |
| 7 | +### Environment |
| 8 | +Hadoop 2.10.2 cluster mode running on 7 nodes, See [Kwai server conf](./kwai1-hardware-conf.md). |
| 9 | +java version: 1.8.0_102. |
| 10 | + |
| 11 | +### Configuration |
| 12 | + |
| 13 | +Common configurations: |
| 14 | +```conf |
| 15 | +spark.master yarn |
| 16 | +spark.shuffle.service.enabled true |
| 17 | +spark.shuffle.service.port 7337 |
| 18 | +
|
| 19 | +spark.driver.memory 20g |
| 20 | +spark.driver.memoryOverhead 4096 |
| 21 | +
|
| 22 | +spark.executor.instances 10000 |
| 23 | +spark.dynamicallocation.maxExecutors 10000 |
| 24 | +
|
| 25 | +spark.io.compression.codec lz4 |
| 26 | +spark.sql.parquet.compression.codec zstd |
| 27 | +
|
| 28 | +# enabled in spark 3.5 by default |
| 29 | +spark.sql.optimizer.runtime.bloomFilter.enabled true |
| 30 | +
|
| 31 | +# enable HashJoin for small tables, which is faster both in spark and blaze |
| 32 | +# note: SortMergeJoin is still used for joining big tables with this configuration enabled |
| 33 | +spark.sql.join.preferSortMergeJoin false |
| 34 | +``` |
| 35 | + |
| 36 | +Configurations for Vanillia spark: |
| 37 | +```conf |
| 38 | +spark.executor.memory 4g |
| 39 | +spark.executor.memoryOverhead 2048 |
| 40 | +spark.executor.cores 5 |
| 41 | +``` |
| 42 | + |
| 43 | +Configurations for blaze: |
| 44 | + |
| 45 | +note: this configuration is widely used in production environment of Kuaishou.inc, without any tricky optimizations only for benchmark. (for example, you can set `spark.blaze.forceShuffledHashJoin true` to force using HashJoin instead of SortMergeJoin and get much faster benchmark result, but this is unacceptable in production environment) |
| 46 | + |
| 47 | +```conf |
| 48 | +spark.executor.memory 3g |
| 49 | +spark.executor.memoryOverhead 3072 |
| 50 | +spark.blaze.enable true |
| 51 | +spark.sql.extensions org.apache.spark.sql.blaze.BlazeSparkSessionExtension |
| 52 | +spark.shuffle.manager org.apache.spark.sql.execution.blaze.shuffle.BlazeShuffleManager |
| 53 | +``` |
| 54 | + |
| 55 | +### Benchmark result: |
| 56 | +Blaze shows 2.12x query time speed-up comparing with Spark 3.5, with the same CPU/memory resources. |
| 57 | + |
| 58 | + |
| 59 | + |
| 60 | +| | Spark | Blaze | Speedup | |
| 61 | +| --- | -------- | ------- | ------- | |
| 62 | +| q01 | 40.473 | 19.834 | 2.04 | |
| 63 | +| q02 | 20.527 | 11.639 | 1.76 | |
| 64 | +| q03 | 69.091 | 31.199 | 2.21 | |
| 65 | +| q04 | 59.58 | 16.585 | 3.59 | |
| 66 | +| q05 | 100.958 | 52.267 | 1.93 | |
| 67 | +| q06 | 26.713 | 7.928 | 3.37 | |
| 68 | +| q07 | 64.729 | 28.175 | 2.30 | |
| 69 | +| q08 | 64.465 | 35.043 | 1.84 | |
| 70 | +| q09 | 103.011 | 53.203 | 1.94 | |
| 71 | +| q10 | 46.543 | 21.805 | 2.13 | |
| 72 | +| q11 | 16.458 | 8.561 | 1.92 | |
| 73 | +| q12 | 26.626 | 13.784 | 1.93 | |
| 74 | +| q13 | 53.072 | 15.445 | 3.44 | |
| 75 | +| q14 | 31.561 | 9.279 | 3.40 | |
| 76 | +| q15 | 59.57 | 19.212 | 3.10 | |
| 77 | +| q16 | 14.533 | 5.944 | 2.44 | |
| 78 | +| q17 | 141.243 | 54.49 | 2.59 | |
| 79 | +| q18 | 129.022 | 79.808 | 1.62 | |
| 80 | +| q19 | 19.561 | 10.149 | 1.93 | |
| 81 | +| q20 | 42.451 | 15.934 | 2.66 | |
| 82 | +| q21 | 177.553 | 107.276 | 1.66 | |
| 83 | +| q22 | 17.429 | 8.244 | 2.11 | |
| 84 | +| | | | | |
| 85 | +| sum | 1325.169 | 625.804 | 2.12 | |
0 commit comments