Skip to content

fix(multibyte): handle utf-8 multibyte characters in text manipulation operations#146

Merged
YousefHadder merged 2 commits intomainfrom
fix/utf8-multibyte-character-handling
Nov 30, 2025
Merged

fix(multibyte): handle utf-8 multibyte characters in text manipulation operations#146
YousefHadder merged 2 commits intomainfrom
fix/utf8-multibyte-character-handling

Conversation

@YousefHadder
Copy link
Owner

@YousefHadder YousefHadder commented Nov 30, 2025

Fixes #144 footnote, link, image insertion, and list operations corrupting multibyte UTF-8 characters (Chinese, emoji, etc.) by using character-level string splitting instead of byte-level operations.

Copilot AI review requested due to automatic review settings November 30, 2025 05:55
@YousefHadder YousefHadder changed the title fix: handle UTF-8 multibyte characters in text manipulation operations fix: handle utf-8 multibyte characters in text manipulation operations Nov 30, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug where text manipulation operations (list handling, link/image/footnote insertion) were corrupting UTF-8 multibyte characters (Chinese, emoji, etc.) by incorrectly using byte-level string operations instead of character-aware operations.

Key Changes

  • Introduced two new UTF-8-safe utility functions (split_at_cursor and split_after_cursor) that properly handle character boundaries using vim.fn.charidx and vim.fn.byteidx
  • Updated all text manipulation operations across the codebase to use these new utilities instead of direct byte-level string.sub() calls
  • Added comprehensive test coverage (142 new test cases) for the utility functions covering ASCII, Chinese characters, emoji, and edge cases

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
spec/markdown-plus/utils_spec.lua Adds 142 lines of comprehensive UTF-8 tests for both new utility functions, covering ASCII, Chinese, emoji, and edge cases
lua/markdown-plus/utils.lua Implements two new UTF-8-safe string splitting functions with proper character boundary detection using vim functions
lua/markdown-plus/list/handlers.lua Updates handle_enter, continue_list_content, and handle_tab to use UTF-8-safe splitting for list content manipulation
lua/markdown-plus/links/init.lua Updates insert_link to use UTF-8-safe splitting and corrects cursor positioning after insertion
lua/markdown-plus/images/init.lua Updates insert_image to use UTF-8-safe splitting and corrects cursor positioning after insertion
lua/markdown-plus/format/init.lua Refactors get_word_boundaries to iterate by characters instead of bytes, with proper UTF-8 boundary detection
lua/markdown-plus/footnotes/insertion.lua Updates insert_footnote to use UTF-8-safe splitting for reference insertion

@YousefHadder YousefHadder changed the title fix: handle utf-8 multibyte characters in text manipulation operations fix(multibyte): handle utf-8 multibyte characters in text manipulation operations Nov 30, 2025
@YousefHadder YousefHadder merged commit 7fc2f32 into main Nov 30, 2025
15 of 16 checks passed
@YousefHadder YousefHadder deleted the fix/utf8-multibyte-character-handling branch November 30, 2025 06:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] The word which is non-enligsh is broken when I insert footnote.

1 participant