Overview
The ConversionSpecifier::format method in datafusion/spark/src/function/string/format_string.rs contains highly repetitive branches for handling %c (character) conversion across all signed and unsigned integer scalar types (Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32, UInt64). Each branch duplicates the same validation and conversion logic, making the code difficult to maintain and prone to inconsistency when invariants change.
Problem Statement
Currently, the code has near-identical match arms for each numeric type:
Int8, Int16, Int32, Int64: All call signed_to_char(*value as i64)
UInt8, UInt16, UInt32, UInt64: All call unsigned_to_char(*value as u64)
Maintenance burden: When a new invariant or fix must be applied to %c handling (as happened in PR #22077), every branch must be updated individually, increasing the risk of missed updates and regressions.
Example of duplication:
(ConversionType::CharLower | ConversionType::CharUpper, Some(value)) => {
self.format_char(string, signed_to_char(*value as i64)?)
}
This pattern appears ~8 times with only the type changed.
Solution
Create a helper dispatch mechanism that unifies the conversion logic once, eliminating duplication:
Overview
The
ConversionSpecifier::formatmethod indatafusion/spark/src/function/string/format_string.rscontains highly repetitive branches for handling%c(character) conversion across all signed and unsigned integer scalar types (Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32, UInt64). Each branch duplicates the same validation and conversion logic, making the code difficult to maintain and prone to inconsistency when invariants change.Problem Statement
Currently, the code has near-identical match arms for each numeric type:
Int8,Int16,Int32,Int64: All callsigned_to_char(*value as i64)UInt8,UInt16,UInt32,UInt64: All callunsigned_to_char(*value as u64)Maintenance burden: When a new invariant or fix must be applied to
%chandling (as happened in PR #22077), every branch must be updated individually, increasing the risk of missed updates and regressions.Example of duplication:
This pattern appears ~8 times with only the type changed.
Solution
Create a helper dispatch mechanism that unifies the conversion logic once, eliminating duplication: