Skip to content

[CH-191] Support generate exec#574

Merged
zzcclp merged 1 commit intoapache:mainfrom
bigo-sg:gluten_191
Dec 13, 2022
Merged

[CH-191] Support generate exec#574
zzcclp merged 1 commit intoapache:mainfrom
bigo-sg:gluten_191

Conversation

@taiyang-li
Copy link
Copy Markdown
Contributor

@taiyang-li taiyang-li commented Nov 17, 2022

What changes were proposed in this pull request?

Support generate exec. close Kyligence/ClickHouse#191

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

test by CH[[194]]

@github-actions
Copy link
Copy Markdown

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[Gluten-${ISSUES_ID}] ${detailed message}

See also:

@taiyang-li taiyang-li force-pushed the gluten_191 branch 2 times, most recently from 0bda556 to 51b8a23 Compare November 17, 2022 06:31
@taiyang-li
Copy link
Copy Markdown
Contributor Author

taiyang-li commented Nov 17, 2022

Spark plan:

0: jdbc:hive2://localhost:10000/> explain formatted  select explode(array(id, id+1)) as elem, cast(id as string) as spark  from range(3);   


+----------------------------------------------------+
|                        plan                        |
+----------------------------------------------------+
| == Physical Plan ==
CHNativeColumnarToRow (6)
+- * ProjectExecTransformer (4)
   +- * GenerateExecTransformer (3)
      +- RowToCHNativeColumnar (2)
         +- * Range (1)


(1) Range [codegen id : 1]
Output [1]: [id#121L]
Arguments: Range (0, 3, step=1, splits=None)

(2) RowToCHNativeColumnar
Input [1]: [id#121L]

(3) GenerateExecTransformer
Input [1]: [id#121L]
Arguments: explode(array(id#121L, (id#121L + 1))), [id#121L], false, [elem#122L]

(4) ProjectExecTransformer
Input [2]: [id#121L, elem#122L]
Arguments: [elem#122L, cast(id#121L as string) AS spark#115]

(5) WholeStageCodegenTransformer (8)
Input [2]: [elem#122L, spark#115]

(6) CHNativeColumnarToRow
Input [2]: [elem#122L, spark#115]

Return result:

0: jdbc:hive2://localhost:10000/> select explode(array(id, id+1)) as elem, cast(id as string) as spark  from range(3);   
+-------+--------+
| elem  | spark  |
+-------+--------+
| 0     | 0      |
| 1     | 0      |
| 1     | 1      |
| 2     | 1      |
| 2     | 2      |
| 3     | 2      |
+-------+--------+

@taiyang-li
Copy link
Copy Markdown
Contributor Author

Native arrayJoin in clickhouse backend is almost 3x faster than Spark explode.

echo "SELECT * FROM generateRandom('a Array(String), b String', 1, 20, 10) limit 100000 FORMAT Parquet"  | clickhouse-client > test_generate_exec.parquet  

> CREATE TEMPORARY VIEW test_generate_exec
USING org.apache.spark.sql.parquet
OPTIONS (
  path  "/data1/liyang/cppproject/gluten/test_generate_exec.parquet"
) ;

> set spark.gluten.sql.columnar.generate=true;
> select count(*) from (select explode(a), b from test_generate_exec);
+-----------+
| count(1)  |
+-----------+
| 500257    |
+-----------+
1 row selected (0.205 seconds)



> set spark.gluten.sql.columnar.generate=false;
> select count(*) from (select explode(a), b from test_generate_exec);  
+-----------+
| count(1)  |
+-----------+
| 500257    |
+-----------+
1 row selected (0.594 seconds)
 

@taiyang-li
Copy link
Copy Markdown
Contributor Author

@rui-mo Can you help review this pr, thanks!

@taiyang-li
Copy link
Copy Markdown
Contributor Author

@rui-mo It it possible running gluten CI/CD with clickhouse version binding to Kyligence/ClickHouse#194 ?

@rui-mo
Copy link
Copy Markdown
Contributor

rui-mo commented Nov 25, 2022

@rui-mo It it possible running gluten CI/CD with clickhouse version binding to Kyligence/ClickHouse#194 ?

Sure. I think it is OK.

import io.glutenproject.expression.ExpressionMappings._

object CHExpressionUtil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove?

Copy link
Copy Markdown
Contributor

@lviiii lviiii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@zzcclp zzcclp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zzcclp zzcclp merged commit 4d12a03 into apache:main Dec 13, 2022
RelCommon common = 1;
Rel input = 2;

Expression generator = 3;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this relation be removed in favor of the official Substrait repo's ExpandRel?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GenerateRel and ExpandRel is different. The former is used to represent spark lateral view explode/posexpode, and the latter is used to represent spark grouping sets or with cube

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support explode function

5 participants