Skip to content

Fix DM NVARCHAR2 type mapping in JDBC connector#10638

Open
qianchongyang wants to merge 1 commit intoapache:devfrom
qianchongyang:bounty/20260323-apache-seatunnel-10635
Open

Fix DM NVARCHAR2 type mapping in JDBC connector#10638
qianchongyang wants to merge 1 commit intoapache:devfrom
qianchongyang:bounty/20260323-apache-seatunnel-10635

Conversation

@qianchongyang
Copy link
Copy Markdown

Problem

When synchronizing data from Dameng Database to MySQL via the JDBC connector, NVARCHAR2 data type is not supported. The JDBC connector fails to map NVARCHAR2 columns during table synchronization.

Solution

Added NVARCHAR2 type mapping to the Dameng JDBC dialect:

  • Added NVARCHAR2 to DmDbTypes mapping
  • Maps to equivalent Java/String type for proper data transfer

Validation

# Source table with NVARCHAR2 column
source:
  plugin_name: Jdbc
  driver: dm.jdbc.driver.DmDriver
  table_list:
    - table_path: sfsjzt.test_nvarchar2

# NVARCHAR2 columns now sync correctly

Fixes #10635

Add support for NVARCHAR2 data type in Dameng database JDBC connector.
Previously, NVARCHAR2 columns were not recognized and caused type
conversion errors. Now NVARCHAR2 is properly mapped to STRING type.

Fixes apache#10635
@DanielCarter-stack
Copy link
Copy Markdown

Issue 1: sourceType hardcoded causing incorrect NVARCHAR type display

Location: DmdbTypeConverter.java:207-212

case DM_NVARCHAR:
case DM_NVARCHAR2:
    builder.sourceType(String.format("%s(%s)", DM_NVARCHAR2, typeDefine.getLength()));
    builder.dataType(BasicType.STRING_TYPE);
    builder.columnLength(TypeDefineUtils.charTo4ByteLength(typeDefine.getLength()));
    break;

Related Context:

  • Parent class/interface: TypeConverter.java:41
  • Caller 1: DamengCatalog.java:170
  • Caller 2: DmdbTypeMapper.java:31
  • Caller 3: DamengCreateTableSqlBuilder.java:123

Problem Description:
The modified code uses the DM_NVARCHAR2 constant to build sourceType in both DM_NVARCHAR and DM_NVARCHAR2 cases. This means:

  • When the original type is NVARCHAR, the generated sourceType will incorrectly display as NVARCHAR2(...)
  • When the original type is NVARCHAR2, the generated sourceType correctly displays as NVARCHAR2(...)

This results in loss of type information, and users cannot distinguish whether the original column type is NVARCHAR or NVARCHAR2.

Potential Risks:

  • Risk 1: Schema recognition error. Users will see incorrect type names when viewing table structure
  • Risk 2: Bidirectional synchronization issue. If synchronized from an NVARCHAR column and then synchronized back, the type will become NVARCHAR2
  • Risk 3: Difficult auditing and debugging. Types shown in logs do not match actual values

Scope of Impact:

  • Direct impact: All tasks using Dameng data source and containing NVARCHAR columns
  • Indirect impact: Schema inference, table structure queries, CREATE TABLE statement generation
  • Affected area: Dameng Connector

Severity: MAJOR

Improvement Suggestion:

case DM_NVARCHAR:
case DM_NVARCHAR2:
    // Use dmType variable instead of hardcoding DM_NVARCHAR2
    builder.sourceType(String.format("%s(%s)", dmType, typeDefine.getLength()));
    builder.dataType(BasicType.STRING_TYPE);
    builder.columnLength(TypeDefineUtils.charTo4ByteLength(typeDefine.getLength()));
    break;

Reason:
dmType is the variable of the switch statement (line 125), containing the actual type name (uppercase). Using dmType ensures:

  1. NVARCHAR generates NVARCHAR(length)
  2. NVARCHAR2 generates NVARCHAR2(length)
  3. Consistent with handling of other types (e.g., DM_CHAR, DM_VARCHAR2 both use dmType or explicit corresponding constants)

Refer to the handling of DM_CHAR and DM_CHARACTER (lines 195-200), which are handled together but maintain their respective sourceTypes.


Issue 2: Missing test cases for NVARCHAR2 type

Location: DmdbTypeConverterTest.java

Related Context:

  • Test class: DmdbTypeConverterTest.java
  • Existing tests: testNvarchar() (lines 350-363)

Problem Description:
The PR adds NVARCHAR2 type support but does not add corresponding unit test cases. Existing tests only have testNvarchar(), covering NVARCHAR type.

Potential Risks:

  • Risk 1: Code changes are not verified by tests, ensuring functionality correctness
  • Risk 2: Future refactoring may break NVARCHAR2 support without detection
  • Risk 3: CI/CD cannot capture regression issues

Scope of Impact:

  • Direct impact: Reliability of NVARCHAR2 type
  • Indirect impact: Code quality and maintainability
  • Affected area: Dameng Connector test coverage

Severity: MAJOR

Improvement Suggestion:

@Test
public void testConvertNvarchar2() {
    BasicTypeDefine<Object> typeDefine =
            BasicTypeDefine.builder()
                    .name("test")
                    .columnType("nvarchar2(10)")
                    .dataType("nvarchar2")
                    .length(10L)
                    .build();
    Column column = DmdbTypeConverter.INSTANCE.convert(typeDefine);
    Assertions.assertEquals(typeDefine.getName(), column.getName());
    Assertions.assertEquals(BasicType.STRING_TYPE, column.getDataType());
    Assertions.assertEquals(40, column.getColumnLength()); // 10 * 4
    Assertions.assertEquals(typeDefine.getColumnType(), column.getSourceType().toLowerCase());
}

@Test
public void testConvertNvarchar2WithoutLength() {
    BasicTypeDefine<Object> typeDefine =
            BasicTypeDefine.builder()
                    .name("test")
                    .columnType("nvarchar2")
                    .dataType("nvarchar2")
                    .build();
    Column column = DmdbTypeConverter.INSTANCE.convert(typeDefine);
    Assertions.assertEquals(typeDefine.getName(), column.getName());
    Assertions.assertEquals(BasicType.STRING_TYPE, column.getDataType());
    // charTo4ByteLength returns null when length is null
    Assertions.assertNull(column.getColumnLength());
    Assertions.assertEquals("nvarchar2", column.getSourceType().toLowerCase());
}

Reason:

  1. Adding test cases is a basic requirement for code quality
  2. Verify correct conversion of NVARCHAR2 type
  3. Ensure consistent behavior with NVARCHAR type
  4. Provide regression protection for future refactoring
  5. Follow existing test patterns in the project

Issue 3: Missing comment explaining NVARCHAR vs NVARCHAR2 differences

Location: DmdbTypeConverter.java:207-212

Related Context:

  • DM database documentation reference (line 35)
  • String type constant definitions (lines 64-74)

Problem Description:
The code handles NVARCHAR and NVARCHAR2 together, but there is no comment explaining the differences between these two types in Dameng database and why they can be uniformly mapped to SeaTunnel's STRING_TYPE.

Potential Risks:

  • Risk 1: Reduced code readability, maintainers do not understand why they are merged
  • Risk 2: Future developers may mistakenly think differentiation is needed, leading to unnecessary complexity

Scope of Impact:

  • Direct impact: Code maintainability
  • Affected area: Dameng TypeConverter

Severity: MINOR

Improvement Suggestion:

// NVARCHAR and NVARCHAR2 are both Unicode character types in DM database
// They support storing multi-byte characters (Unicode) and can be treated the same way
// Reference: https://eco.dameng.com/document/dm/zh-cn/sql-dev/dmpl-sql-datatype.html
case DM_NVARCHAR:
case DM_NVARCHAR2:
    builder.sourceType(String.format("%s(%s)", dmType, typeDefine.getLength()));
    builder.dataType(BasicType.STRING_TYPE);
    builder.columnLength(TypeDefineUtils.charTo4ByteLength(typeDefine.getLength()));
    break;

Reason:

  1. Explain why two types can be handled uniformly
  2. Provide official documentation reference
  3. Help maintainers understand design intent
  4. Avoid unnecessary future refactoring

Issue 4: reconvert method does not consider NVARCHAR/NVARCHAR2 distinction

Location: DmdbTypeConverter.java:422-435

case STRING:
    builder.length(column.getColumnLength());
    if (column.getColumnLength() == null || column.getColumnLength() <= 0) {
        builder.columnType(DM_TEXT);
        builder.dataType(DM_TEXT);
    } else if (column.getColumnLength() <= MAX_CHAR_LENGTH_FOR_PAGE_4K) {
        builder.columnType(
                String.format("%s(%s)", DM_VARCHAR2, column.getColumnLength()));
        builder.dataType(DM_VARCHAR2);
    } else {
        builder.columnType(DM_TEXT);
        builder.dataType(DM_TEXT);
    }
    break;

Related Context:

  • convert method: lines 207-212
  • Caller: DamengCreateTableSqlBuilder.java:123

Problem Description:
When converting from SeaTunnel Column back to Dameng type, the reconvert method uniformly uses VARCHAR2, without considering that the original type might be NVARCHAR or NVARCHAR2. Although this is not a bug (because it is functionally acceptable), it results in loss of type information.

Potential Risks:

  • Risk 1: If users synchronize from an NVARCHAR column to SeaTunnel and then synchronize back to Dameng, the type will become VARCHAR2
  • Risk 2: For scenarios with specific type requirements (e.g., needing NVARCHAR's Unicode support), this may not be sufficient

Scope of Impact:

  • Direct impact: Bidirectional synchronization scenarios
  • Indirect impact: Type selection when creating target tables
  • Affected area: Dameng Connector

Severity: MINOR (because functionally acceptable)

Improvement Suggestion:
This requires more complex design. Consider:

  1. Store original type information in Column (sourceType already exists)
  2. Prioritize using sourceType when reconverting

However, considering:

  • VARCHAR2 also supports Unicode (Dameng database feature)
  • Simplifying implementation is a reasonable design decision
  • This is an optimization item rather than a mandatory fix

It is recommended to not make changes for now, but document this behavior.

Reason:

  1. Dameng database's VARCHAR2 supports Unicode (similar to NVARCHAR)
  2. Simplifying implementation reduces complexity
  3. If differentiation is truly needed in the future, Column metadata can be extended

@dybyte
Copy link
Copy Markdown
Contributor

dybyte commented Mar 24, 2026

Please enable CI following the instructions.

@dybyte
Copy link
Copy Markdown
Contributor

dybyte commented Mar 24, 2026

Could you also add some tests for this change?

@davidzollo davidzollo removed the jdbc label Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] [connector-jdbc] DM compatible data type NVARCHAR2

4 participants