Skip to content

Handled UTF-8 BOM#1285

Merged
ehuss merged 3 commits into
rust-lang:masterfrom
FrankHB:patch-1
Nov 10, 2020
Merged

Handled UTF-8 BOM#1285
ehuss merged 3 commits into
rust-lang:masterfrom
FrankHB:patch-1

Conversation

@FrankHB
Copy link
Copy Markdown
Contributor

@FrankHB FrankHB commented Jul 25, 2020

Fixed #1155 .

Copy link
Copy Markdown
Contributor

@ehuss ehuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Can you also add a test for this?

Comment thread src/book/book.rs Outdated
})?;

if content.as_bytes().starts_with(b"\xef\xbb\xbf") {
content = content[3..].to_string()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to avoid the extra allocation, perhaps this could be content.replace_range(..3, "");

@ehuss ehuss added the S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author. label Sep 6, 2020
Signed-off-by: FrankHB <frankhb1989@gmail.com>
@FrankHB
Copy link
Copy Markdown
Contributor Author

FrankHB commented Sep 29, 2020

Not sure where should be the test. Is an additional BOM in tests/dummy_book/src/first/unicode.md enough?

@ehuss
Copy link
Copy Markdown
Contributor

ehuss commented Sep 30, 2020

I'm a little concerned about having invisible bytes in a file. unicode.md is already quite subtle. But I think if you have an editor that can't handle a BOM properly, it will have a risk of corrupting the existing unicode, so it's probably fine.

My only other idea is to create a simple 1-chapter book in a temp directory like book_with_a_reserved_filename_does_not_build does. Whichever seems easiest!

@FrankHB
Copy link
Copy Markdown
Contributor Author

FrankHB commented Oct 2, 2020

Sorry, I don't get the point of the temp directory.

As of the subtleness... I largely agree. Nevertheless, I can use a hex editor to make sure only the BOM is inserted exactly. Anyway, the visible part of the file has already warned us about this. (Or one more visible warning about invisible bytes?)

@ehuss
Copy link
Copy Markdown
Contributor

ehuss commented Oct 5, 2020

Creating a book in Rust code (and writing it to a temp directory) would allow writing the content in a Rust string where the characters can be escaped and explicit. For example, fs::write(chapter, "\u{feff}# Example chapter").

Signed-off-by: FrankHB <frankhb1989@gmail.com>
@ehuss ehuss removed the S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author. label Nov 10, 2020
Copy link
Copy Markdown
Contributor

@ehuss ehuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@ehuss ehuss merged commit 643d5ec into rust-lang:master Nov 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing support of byte order mark

2 participants