Support Add Column in Fluss. by loserwang1024 · Pull Request #2010 · apache/fluss

loserwang1024 · 2025-11-24T02:09:12Z

Purpose

Linked issue: close #2056

Brief change log

Tests

API and Format

Documentation

fluss-rpc/src/main/proto/FlussApi.proto

fluss-server/src/main/java/org/apache/fluss/server/coordinator/MetadataManager.java

fluss-client/src/main/java/org/apache/fluss/client/admin/FlussAdmin.java

wuchong · 2025-11-24T09:05:26Z

fluss-common/src/main/java/org/apache/fluss/metadata/Schema.java


        public Column(String columnName, DataType dataType) {
-            this(columnName, dataType, null);
+            this(columnName, dataType, null, UNKNOWN_COLUMN_ID);


We should be able to get the column id for the previous schema (use the column order). Using -1 as default column is error-prone.

Yes, I also want to do it. But as a public api, I cannot remove this constructor Column(String columnName, DataType dataType). If not set -1, I have no idea how to handle it.

wuchong · 2025-11-24T11:23:27Z

fluss-server/src/main/java/org/apache/fluss/server/zk/ZooKeeperClient.java

    /** Register schema to ZK metadata and return the schema id. */
    public int registerSchema(TablePath tablePath, Schema schema) throws Exception {
-        int currentSchemaId = getCurrentSchemaId(tablePath);
+        return registerSchema(tablePath, schema, getCurrentSchemaId(tablePath) + 1);


This is creating a new table, we can directly use schema id = 1 here?

sometime schema will be recreated when create table.

...client/src/main/java/org/apache/fluss/client/table/scanner/batch/KvSnapshotBatchScanner.java

fluss-common/src/main/java/org/apache/fluss/metadata/TableChange.java

fluss-common/src/main/java/org/apache/fluss/metadata/TableInfo.java

fluss-common/src/main/java/org/apache/fluss/record/LogRecordBatch.java

fluss-common/src/main/java/org/apache/fluss/record/LogRecordReadContext.java

fluss-common/src/main/java/org/apache/fluss/row/encode/ValueDecoder.java

fluss-common/src/main/java/org/apache/fluss/row/PruneRow.java

fluss-client/src/main/java/org/apache/fluss/client/FlussConnection.java

fluss-common/src/main/java/org/apache/fluss/record/DefaultLogRecordBatch.java

…and connector (apache#2010)

…ration 1. Make lookup requests async. 2. Introduce ServerProjectionCache to share ProjectionInfo in TabletServer level. 3. Revert FlinkAsFlussRow changes, and introduce PaddingRow for the padding null columns for schema changes. 4. Make PutKv zero-copy to persist kv records. 5. Rename SchemaMetadataManager to ServerSchemaCache and use TableId instead of TablePath to track schemas which is safer. 6. Revert source related changes to still include the projectedFieldIndexes parameter.

wuchong

I squashed and pushed a commit to improve the implementations. The change details are listed in the commit message.

Waiting for the CI pass.

wuchong · 2025-11-28T10:05:21Z

fluss-common/src/main/java/org/apache/fluss/record/FileLogProjection.java

+        Map<Integer, Integer> columnIdPositions = new HashMap<>();
+        List<Integer> columnIds = schema.getColumnIds();
+        for (int i = 0; i < columnIds.size(); i++) {
+            columnIdPositions.put(columnIds.get(i), i);
+        }
+
+        int prev = -1;
+        int[] selectedFieldPositions = new int[projectedFields.length];
+        for (int i = 0; i < projectedFields.length; i++) {
+            int fieldId = projectedFields[i];
+            Integer position = columnIdPositions.get(fieldId);
+            if (position == null) {
+                throw new InvalidColumnProjectionException(
+                        String.format(
+                                "Projected field id %s is not contains in %s", fieldId, columnIds));
+            }
+
+            selectedFieldPositions[i] = position;
+            if (position < prev) {
+                throw new InvalidColumnProjectionException(
+                        "The projection indexes should be in field order, but is "
+                                + Arrays.toString(projectedFields));
+            }
+
+            prev = position;
+        }
+        return selectedFieldPositions;


We can remove this for now, as it may affect projection performance, and we don't need the remapping, as we only support add column add end. We can rename the method into toSelectedFieldPositions().

wuchong · 2025-11-30T17:48:48Z

fluss-rpc/src/main/proto/FlussApi.proto

+
+message PbModifyColumn{
+  required string column_name = 1;
+  required bytes data_type_json = 2;


should be optional. The following alter column is also valid.

ALTER TABLE prod.db.sample ALTER COLUMN measurement COMMENT 'unit is kilobytes per second';

wuchong · 2025-11-30T17:50:37Z

...rver/src/main/java/org/apache/fluss/server/coordinator/event/watcher/TableChangeWatcher.java

+            int currentSchemaId = zooKeeperClient.getCurrentSchemaId(tablePath);
+            SchemaInfo schemaInfo;
+            if (schemaId != currentSchemaId) {
+                LOG.warn(
+                        "Schema id {} is not equal to current schema id {}. Skipping schema change processing.",
+                        schemaId,
+                        currentSchemaId);
+                return;
+            }


Why check current schema id? This is a heavy operation. Even if this is an old schema, I think it is still fine to process the schema.

…and connector (#2010)

wuchong reviewed Nov 24, 2025

View reviewed changes

loserwang1024 force-pushed the poc-schema-change branch 3 times, most recently from 279a97d to 2ade0ce Compare November 27, 2025 02:19

wuchong reviewed Nov 28, 2025

View reviewed changes

loserwang1024 force-pushed the poc-schema-change branch 5 times, most recently from d490cb1 to a475aee Compare November 30, 2025 07:39

loserwang1024 changed the title ~~[DRAFT] Fluss Support Schema evolution.~~ Support Add Column in Fluss. Nov 30, 2025

loserwang1024 force-pushed the poc-schema-change branch from 5829700 to 9742f89 Compare November 30, 2025 10:01

loserwang1024 requested a review from wuchong November 30, 2025 15:05

loserwang1024 and others added 2 commits December 1, 2025 21:16

[server][client] Support Schema Evolution (ADD COLUMN LAST) in Fluss …

6d5de26

…and connector (apache#2010)

wuchong force-pushed the poc-schema-change branch from f4fb30e to 1046abd Compare December 1, 2025 13:26

wuchong approved these changes Dec 1, 2025

View reviewed changes

wuchong merged commit 07721fe into apache:main Dec 2, 2025
8 of 9 checks passed

wuchong pushed a commit that referenced this pull request Dec 2, 2025

[server][client] Support Schema Evolution (ADD COLUMN LAST) in Fluss …

ea2d61c

…and connector (#2010)

Conversation

loserwang1024 commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wuchong Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

loserwang1024 Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

wuchong Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

loserwang1024 Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wuchong left a comment

Choose a reason for hiding this comment

Uh oh!

wuchong Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

wuchong Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

wuchong Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

loserwang1024 commented Nov 24, 2025 •

edited

Loading