Skip to content

[connector] pass Fluss schema to lake writer#1192

Merged
luoyuxia merged 2 commits intoapache:mainfrom
xx789633:lake_schema
Jun 25, 2025
Merged

[connector] pass Fluss schema to lake writer#1192
luoyuxia merged 2 commits intoapache:mainfrom
xx789633:lake_schema

Conversation

@xx789633
Copy link
Copy Markdown
Contributor

@xx789633 xx789633 commented Jun 25, 2025

Purpose

Use the schema in Fluss as the single source of truth for the lake writers to avoid any inconsistency.

Moreover, we sometimes need the Fluss schema to pick the correct field writers when creating lake writer. For example, currently the LocalZonedTimestampType and TimestampType type in Fluss map to the same Arrow type:
https://github.com/alibaba/fluss/blob/main/fluss-common/src/main/java/com/alibaba/fluss/utils/ArrowUtils.java#L476
https://github.com/alibaba/fluss/blob/main/fluss-common/src/main/java/com/alibaba/fluss/utils/ArrowUtils.java#L489

When we tier the table to a data lake with Arrow schema, we are not able to choose the appropriate type of filed writers.

Brief change log

Add the schema in Fluss to WriterInitContext.

Tests

n/a

API and Format

n/a

Documentation

n/a

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Jun 25, 2025

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cwang9208 Thanks for the pr. Left minor comments. Otherwise, LGTM!

@Nullable
String partition();

Schema schema();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
add java doc for this method

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to consider compatibility for example use default Optional<Schema> schema() {return Optional.empty()}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer not to keep code clean since currently it still for inner use in some degree.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I forgot it. Fixed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there will be any compatibility issues. The tiering reader will automatically fill the WriterInitContext struct with the Fluss schema and it depends on the connector to decide whether to use it or not. For now, Paimon doesn't even touch this field.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compatibility issue actually exists, WriterInitContext is a public API with annotation WriterInitContext, add a method in this API is an compatibility-breaking change, imaging this case, user implement their own xxLakeWriter base on Fluss 0.7 API. And once they bump fluss version from 0.7 to 0.8, they need to adjust their xxLakeWriter implementation and build fluss-lake-xx.jar for 0.8 instead of using existing fluss-lake-xx.jar of 0.7.
But as there should not have this user case at this moment and 0.8 will be a apache version, I agree to keep code clean and introduce this method here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted. Thanks for the clarification! @leonardBang


@Override
public com.alibaba.fluss.metadata.Schema schema() {
return null;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: throw unsupportException.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for suggestion. Fixed.

@luoyuxia luoyuxia merged commit 32c066f into apache:main Jun 25, 2025
4 checks passed
polyzos pushed a commit to polyzos/fluss that referenced this pull request Aug 30, 2025
polyzos pushed a commit to Alibaba-HZY/fluss that referenced this pull request Aug 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants