[connector] pass Fluss schema to lake writer#1192
Conversation
luoyuxia
left a comment
There was a problem hiding this comment.
@cwang9208 Thanks for the pr. Left minor comments. Otherwise, LGTM!
| @Nullable | ||
| String partition(); | ||
|
|
||
| Schema schema(); |
There was a problem hiding this comment.
nit:
add java doc for this method
There was a problem hiding this comment.
Do we need to consider compatibility for example use default Optional<Schema> schema() {return Optional.empty()}
There was a problem hiding this comment.
I prefer not to keep code clean since currently it still for inner use in some degree.
There was a problem hiding this comment.
Sorry I forgot it. Fixed.
There was a problem hiding this comment.
I don't think there will be any compatibility issues. The tiering reader will automatically fill the WriterInitContext struct with the Fluss schema and it depends on the connector to decide whether to use it or not. For now, Paimon doesn't even touch this field.
There was a problem hiding this comment.
The compatibility issue actually exists, WriterInitContext is a public API with annotation WriterInitContext, add a method in this API is an compatibility-breaking change, imaging this case, user implement their own xxLakeWriter base on Fluss 0.7 API. And once they bump fluss version from 0.7 to 0.8, they need to adjust their xxLakeWriter implementation and build fluss-lake-xx.jar for 0.8 instead of using existing fluss-lake-xx.jar of 0.7.
But as there should not have this user case at this moment and 0.8 will be a apache version, I agree to keep code clean and introduce this method here.
There was a problem hiding this comment.
Noted. Thanks for the clarification! @leonardBang
|
|
||
| @Override | ||
| public com.alibaba.fluss.metadata.Schema schema() { | ||
| return null; |
There was a problem hiding this comment.
nit: throw unsupportException.
There was a problem hiding this comment.
Thanks for suggestion. Fixed.
Purpose
Use the schema in Fluss as the single source of truth for the lake writers to avoid any inconsistency.
Moreover, we sometimes need the Fluss schema to pick the correct field writers when creating lake writer. For example, currently the LocalZonedTimestampType and TimestampType type in Fluss map to the same Arrow type:
https://github.com/alibaba/fluss/blob/main/fluss-common/src/main/java/com/alibaba/fluss/utils/ArrowUtils.java#L476
https://github.com/alibaba/fluss/blob/main/fluss-common/src/main/java/com/alibaba/fluss/utils/ArrowUtils.java#L489
When we tier the table to a data lake with Arrow schema, we are not able to choose the appropriate type of filed writers.
Brief change log
Add the schema in Fluss to WriterInitContext.
Tests
n/a
API and Format
n/a
Documentation
n/a