Add UTF-16/UTF-8 byte offset conversion utilities by tilladam · Pull Request #10873 · slint-ui/slint

tilladam · 2026-02-26T14:27:57Z

Summary

Adds unicode_utils module to i-slint-core consolidating duplicate UTF-16/UTF-8 offset conversion code
Replaces inline implementations in android-activity/javahelper.rs, qt/qt_window.rs, and core/items/text.rs with calls to the shared module
Byte offset boundary helpers (ceil_byte_offset, floor_byte_offset) replace a hand-rolled floor_char_boundary equivalent (stdlib version requires Rust 1.91, MSRV is 1.88)

Functions

Byte offset utilities:

is_valid_byte_offset — check if offset is on a UTF-8 character boundary
floor_byte_offset / ceil_byte_offset — snap to nearest valid boundary
byte_offset_to_char_count / char_count_to_byte_offset — character count conversions

UTF-16 conversions (needed for Android InputConnection + iOS UITextInput):

utf16_offset_to_byte_offset — UTF-16 → UTF-8 (None for invalid mid-surrogate offsets)
byte_offset_to_utf16_offset — UTF-8 → UTF-16
utf16_offset_to_byte_offset_clamped — UTF-16 → UTF-8 with forward clamping (matches original convert_utf16_index_to_utf8 semantics)

Test plan

39 unit tests + 3 doctests covering ASCII, BMP, supplementary plane (emoji), combining characters, surrogate pair edge cases, empty strings, roundtrip conversions
cargo build for i-slint-core, i-slint-backend-qt, i-slint-backend-android-activity
cargo test -p i-slint-core unicode_utils

Split out from #10557 per review feedback to use smaller increments.

Add unicode_utils module to i-slint-core with utility functions for converting between UTF-8 byte offsets and UTF-16 code unit offsets, and for snapping byte offsets to character boundaries. Replace duplicate inline implementations in the Android backend (javahelper.rs), Qt backend (qt_window.rs), and core text handling (text.rs) with calls to the shared module.

tronical

I'm generally in favour of de-duplicating code when there's value - in terms of improving readability or unifying complexity. I have the feeling that the LSP might have similar code duplicated.

That said, I prefer introducing code when it's used.

So here the android and qt backend could benefit from a byte_offset_to_utf16_offset shared function, because that's shared. I even agree :-) that the expression here becomes more readable, i.e. byte_offset_to_utf16_offset is better than in_str[..utf8_index].encode_utf16().count(). We could bike-shed if we should call them indices or offsets, but I'm fine either way :-).

Also, while qt and android backends use different versions of the inverse (clamped vs. non-clamped), they could be seen as "siblings" and I think that's also fine to share in this PR.

However, for the other functions I'd prefer them to be introduced with changes that use them, so it's more apparent what the value is.

internal/core/items/text.rs

tronical · 2026-02-27T07:40:11Z

internal/core/unicode_utils.rs

+/// ```
+pub fn byte_offset_to_utf16_offset(text: &str, byte_offset: usize) -> usize {
+    assert!(
+        is_valid_byte_offset(text, byte_offset),


This is - as far as I can tell - the only call site for a function where I feel the one-liner it is (offset <= text.len() && text.is_char_boundary(offset)) is more readable than is_valid_byte_offset().

I'd prefer to inline here.

However: Why is a panic here better than what I suspect would be "rounding up" behaviour?

Code wise this seems "resolved" in the sense of inlined, but I'm curious about what you think about my panic question :)

tronical · 2026-02-27T07:42:19Z

internal/core/unicode_utils.rs

+/// assert_eq!(utf16_offset_to_byte_offset("a😀b", 2), None); // inside surrogate pair
+/// assert_eq!(utf16_offset_to_byte_offset("a😀b", 3), Some(5));
+/// ```
+pub fn utf16_offset_to_byte_offset(text: &str, utf16_offset: usize) -> Option<usize> {


There's no call site for this function yet. I'd prefer to introduce this function in a PR that uses the code.

Add unicode_utils module to i-slint-core with utility functions for converting between UTF-8 byte offsets and UTF-16 code unit offsets, and for snapping byte offsets to character boundaries. Replace duplicate inline implementations in the Android backend (javahelper.rs), Qt backend (qt_window.rs), and core text handling (text.rs) with calls to the shared module. floor_byte_offset / ceil_byte_offset are polyfills for str::floor_char_boundary / str::ceil_char_boundary (stabilized in Rust 1.91, MSRV is currently 1.88).

tronical

AFAICS there are two unresolved comments :)

tilladam and others added 2 commits February 26, 2026 15:24

[autofix.ci] apply automated fixes

c42349a

tronical reviewed Feb 27, 2026

View reviewed changes

tilladam force-pushed the unicode-utils branch from f7d2453 to d6d9a47 Compare February 27, 2026 08:01

tilladam force-pushed the unicode-utils branch from 017e104 to 6fa0195 Compare February 27, 2026 08:06

[autofix.ci] apply automated fixes

0e3fcf7

tronical requested changes Mar 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add UTF-16/UTF-8 byte offset conversion utilities#10873

Add UTF-16/UTF-8 byte offset conversion utilities#10873
tilladam wants to merge 4 commits intoslint-ui:masterfrom
tilladam:unicode-utils

tilladam commented Feb 26, 2026

Uh oh!

tronical left a comment

Uh oh!

Uh oh!

tronical Feb 27, 2026

Uh oh!

tronical Mar 18, 2026

Uh oh!

tronical Feb 27, 2026

Uh oh!

tronical left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tilladam commented Feb 26, 2026

Summary

Functions

Test plan

Uh oh!

tronical left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tronical Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

tronical Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

tronical Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

tronical left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants