feat: add `custom_string_literal_override` to unparser Dialect trait by goldmedal · Pull Request #20590 · apache/datafusion

goldmedal · 2026-02-27T11:46:31Z

Which issue does this PR close?

Closes #.

Rationale for this change

When unparsing queries targeting databases like MSSQL, non-ASCII string literals need special handling. MSSQL requires the N'...' (national string literal) prefix for strings containing Unicode characters. Currently the unparser always emits single-quoted strings with no way for dialects to customize this behavior.

What changes are included in this PR?

Add a new custom_string_literal_override method to the Dialect trait with a default implementation returning None (no override).
Consolidate the Utf8, Utf8View, and LargeUtf8 match arms in scalar_value_to_sql and route them through the new dialect hook.

Are these changes tested?

Yes. A test-only MsSqlDialect is defined in the test module to verify:

ASCII strings produce standard single-quoted literals (no N prefix)
Non-ASCII strings produce national string literals (N'...')
The default dialect is unaffected (no N prefix regardless of content)

It's used by Wren AI in production for a while: Canner#8

Are there any user-facing changes?

Yes. The Dialect trait gains a new method custom_string_literal_override. This is a non-breaking change since the method has a default implementation. Dialect implementors can override it to customize string literal unparsing.

kosiew

@goldmedal
Thanks for working on this.

kosiew · 2026-03-12T13:55:33Z

datafusion/sql/src/unparser/dialect.rs

+    ///
+    /// For example, MSSQL requires non-ASCII strings to use national string
+    /// literal syntax (`N'datafusion資料融合'`).
+    fn custom_string_literal_override(&self, _s: &str) -> Option<ast::Expr> {


Since this only affects scalar UTF8 literal unparsing today, a narrower name or a helper scoped to scalar string literals may make the API easier to understand and extend. Perhaps string_literal_to_sql – to keep the focus on literals, not “any scalar”?

kosiew · 2026-03-12T13:58:55Z

datafusion/sql/src/unparser/expr.rs

+        let unparser = Unparser::new(&dialect);
+
+        let expr =
+            Expr::Literal(ScalarValue::Utf8(Some("national string".to_string())), None);


The regression test currently exercises ScalarValue::Utf8, but the change also covers Utf8View and LargeUtf8.

Adding tests for each would make the coverage line up with the implementation.

goldmedal · 2026-03-15T11:07:02Z

Thanks @kosiew for reviewing. All the comments has been addressed.

kosiew

lgtm
🚀

alamb · 2026-03-16T20:44:56Z

Thanks -- looks good to me @kosiew and @goldmedal

goldmedal · 2026-03-17T01:42:44Z

Thanks @kosiew and @alamb 👍

goldmedal added 2 commits February 27, 2026 19:21

introduce to_unicode_string_literal for unparser dialect

3887d2a

rename the method and make mssql dialect be testing only

dfb5eda

github-actions bot added the sql SQL Planner label Feb 27, 2026

fix format

a98bdbd

kosiew reviewed Mar 12, 2026

View reviewed changes

goldmedal added 2 commits March 14, 2026 20:25

rename the api

387fbb4

add test for Utf8view and LargeUtf8

8a5693b

goldmedal requested a review from kosiew March 15, 2026 11:07

kosiew approved these changes Mar 16, 2026

View reviewed changes

alamb added this pull request to the merge queue Mar 16, 2026

Merged via the queue into apache:main with commit bd071be Mar 16, 2026
30 checks passed

goldmedal deleted the feat/unparse-unicode-literal branch March 17, 2026 01:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `custom_string_literal_override` to unparser Dialect trait#20590

feat: add `custom_string_literal_override` to unparser Dialect trait#20590
alamb merged 5 commits intoapache:mainfrom
goldmedal:feat/unparse-unicode-literal

goldmedal commented Feb 27, 2026

Uh oh!

kosiew left a comment

Uh oh!

kosiew Mar 12, 2026

Uh oh!

kosiew Mar 12, 2026

Uh oh!

goldmedal commented Mar 15, 2026

Uh oh!

kosiew left a comment

Uh oh!

alamb commented Mar 16, 2026

Uh oh!

Uh oh!

goldmedal commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

goldmedal commented Feb 27, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

kosiew left a comment

Choose a reason for hiding this comment

Uh oh!

kosiew Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

kosiew Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

goldmedal commented Mar 15, 2026

Uh oh!

kosiew left a comment

Choose a reason for hiding this comment

Uh oh!

alamb commented Mar 16, 2026

Uh oh!

Uh oh!

goldmedal commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants