Skip to content

Commit 5478b84

Browse files
authored
Merge pull request #715 from jawah/release-3.4.6
Release 3.4.6
2 parents 7411396 + 5c0a09e commit 5478b84

24 files changed

Lines changed: 960 additions & 235 deletions

.github/workflows/cd.yml

Lines changed: 25 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,6 @@ jobs:
8585
CIBW_ENVIRONMENT: CHARSET_NORMALIZER_USE_MYPYC='1'
8686
CIBW_TEST_REQUIRES: pytest
8787
CIBW_TEST_COMMAND: pytest -c {package} {package}/tests
88-
CIBW_SKIP: "cp31?t-*"
8988
- name: Upload artifacts
9089
uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f
9190
with:
@@ -204,4 +203,28 @@ jobs:
204203
env:
205204
GITHUB_TOKEN: "${{ secrets.GITHUB_TOKEN }}"
206205
run: |
207-
gh release upload ${{ github.ref_name }} dist/* --repo ${{ github.repository }}
206+
set -euo pipefail
207+
208+
files=(dist/*)
209+
batch_size=50
210+
max_retries=3
211+
212+
for (( i=0; i<${#files[@]}; i+=batch_size )); do
213+
batch=("${files[@]:i:batch_size}")
214+
echo "Uploading batch $((i/batch_size + 1)) (${#batch[@]} files)..."
215+
216+
for (( attempt=1; attempt<=max_retries; attempt++ )); do
217+
if gh release upload ${{ github.ref_name }} "${batch[@]}" --repo ${{ github.repository }} --clobber; then
218+
break
219+
fi
220+
221+
if (( attempt == max_retries )); then
222+
echo "Failed to upload batch after $max_retries attempts"
223+
exit 1
224+
fi
225+
226+
delay=$(( attempt * 15 ))
227+
echo "Upload failed (attempt $attempt/$max_retries). Retrying in ${delay}s..."
228+
sleep "$delay"
229+
done
230+
done

.github/workflows/ci.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ jobs:
8989
- name: Coverage WITH preemptive
9090
run: nox -s coverage -- --coverage 97 --with-preemptive
9191
- name: Coverage WITHOUT preemptive
92-
run: nox -s coverage -- --coverage 95
92+
run: nox -s coverage -- --coverage 96
9393
- name: "Upload artifact"
9494
uses: "actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f"
9595
with:
@@ -166,6 +166,7 @@ jobs:
166166
- "3.12"
167167
- "3.13"
168168
- "3.14"
169+
- "3.14t"
169170
os: [ ubuntu-latest, macos-latest, windows-latest ]
170171
env:
171172
PYTHONIOENCODING: utf8 # only needed for Windows (console IO output encoding)

CHANGELOG.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,21 @@
22
All notable changes to charset-normalizer will be documented in this file. This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
33
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
44

5+
## [3.4.6](https://github.com/Ousret/charset_normalizer/compare/3.4.5...3.4.6) (2026-03-15)
6+
7+
### Changed
8+
- Flattened the logic in `charset_normalizer.md` for higher performance. Removed `eligible(..)` and `feed(...)`
9+
in favor of `feed_info(...)`.
10+
- Raised upper bound for mypy[c] to 1.20, for our optimized version.
11+
- Updated `UNICODE_RANGES_COMBINED` using Unicode blocks v17.
12+
13+
### Fixed
14+
- Edge case where noise difference between two candidates can be almost insignificant. (#672)
15+
- CLI `--normalize` writing to wrong path when passing multiple files in. (#702)
16+
17+
### Misc
18+
- Freethreaded pre-built wheels now shipped in PyPI starting with 3.14t. (#616)
19+
520
## [3.4.5](https://github.com/Ousret/charset_normalizer/compare/3.4.4...3.4.5) (2026-03-06)
621

722
### Changed

README.md

Lines changed: 26 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -31,49 +31,50 @@
3131
> A library that helps you read text from an unknown charset encoding.<br /> Motivated by `chardet`,
3232
> I'm trying to resolve the issue by taking a new approach.
3333
> All IANA character set names for which the Python core library provides codecs are supported.
34+
> You can also register your own set of codecs, and yes, it would work as-is.
3435
3536
<p align="center">
3637
>>>>> <a href="https://charsetnormalizerweb.ousret.now.sh" target="_blank">👉 Try Me Online Now, Then Adopt Me 👈 </a> <<<<<
3738
</p>
3839
3940
This project offers you an alternative to **Universal Charset Encoding Detector**, also known as **Chardet**.
4041

41-
| Feature | [Chardet](https://github.com/chardet/chardet) | Charset Normalizer | [cChardet](https://github.com/PyYoshi/cChardet) |
42-
|--------------------------------------------------|:-----------------------------------------------------------:|:-----------------------------------------------------------------------------------------------:|:-----------------------------------------------:|
43-
| `Fast` | |||
44-
| `Universal**` | |||
45-
| `Reliable` **without** distinguishable standards | |||
46-
| `Reliable` **with** distinguishable standards | |||
47-
| `License` | _Public Domain_<br>and/or<br>_LGPL-2.1_***<br>_restrictive_ | MIT | MPL-1.1<br>_restrictive_ |
48-
| `Native Python` | |||
49-
| `Detect spoken language` | || N/A |
50-
| `UnicodeDecodeError Safety` | |||
51-
| `Whl Size (min)` | 500 kB | 150 kB | ~200 kB |
52-
| `Supported Encoding` | 99 | [99](https://charset-normalizer.readthedocs.io/en/latest/user/support.html#supported-encodings) | 40 |
53-
| `Can register custom encoding` | |||
42+
| Feature | [Chardet](https://github.com/chardet/chardet) | Charset Normalizer | [cChardet](https://github.com/PyYoshi/cChardet) |
43+
|--------------------------------------------------|:---------------------------------------------:|:-----------------------------------------------------------------------------------------------:|:-----------------------------------------------:|
44+
| `Fast` | |||
45+
| `Universal`[^1] | |||
46+
| `Reliable` **without** distinguishable standards | |||
47+
| `Reliable` **with** distinguishable standards | |||
48+
| `License` | _Disputed_[^2]<br>_restrictive_ | MIT | MPL-1.1<br>_restrictive_ |
49+
| `Native Python` | |||
50+
| `Detect spoken language` | || N/A |
51+
| `UnicodeDecodeError Safety` | |||
52+
| `Whl Size (min)` | 500 kB | 150 kB | ~200 kB |
53+
| `Supported Encoding` | 99 | [99](https://charset-normalizer.readthedocs.io/en/latest/user/support.html#supported-encodings) | 40 |
54+
| `Can register custom encoding` | |||
5455

5556
<p align="center">
5657
<img src="https://i.imgflip.com/373iay.gif" alt="Reading Normalized Text" width="226"/><img src="https://media.tenor.com/images/c0180f70732a18b4965448d33adba3d0/tenor.gif" alt="Cat Reading Text" width="200"/>
5758
</p>
5859

59-
*\*\* : They are clearly using specific code for a specific encoding even if covering most of used one.*<br>
60-
*\*\*\* : The vast majority of the code is issued from an LLM agent (Claude), even if the author label this project now as MIT in his own name, it's clearly debatable. Most jurisdictions on copyright laws would nullify the license. With my personal education, **Public Domain or/and LGPL-2.1** is the most likely one based on Anthropic declarations about how they train their LLMs and the LGPL-2.1 itself (the original license as it's still the same statistical principle behind the scene, hugely refactored).*<br>
60+
[^1]: They are clearly using specific code for a specific encoding even if covering most of used one.
61+
[^2]: Chardet 7.0+ was relicensed from LGPL-2.1 to MIT following an AI-assisted rewrite. This relicensing is disputed on two independent grounds: **(a)** the original author [contests](https://github.com/chardet/chardet/issues/327) that the maintainer had the right to relicense, arguing the rewrite is a derivative work of the LGPL-licensed codebase since it was not a clean room implementation; **(b)** the copyright claim itself is [questionable](https://github.com/chardet/chardet/issues/334) given the code was primarily generated by an LLM, and AI-generated output may not be copyrightable under most jurisdictions. Either issue alone could undermine the MIT license. Beyond licensing, the rewrite raises questions about responsible use of AI in open source: key architectural ideas pioneered by charset-normalizer - notably decode-first validity filtering (our foundational approach since v1) and encoding pairwise similarity with the same algorithm and threshold — surfaced in chardet 7 without acknowledgment. The project also imported test files from charset-normalizer to train and benchmark against it, then claimed superior accuracy on those very files. Charset-normalizer has always been MIT-licensed, encoding-agnostic by design, and built on a verifiable human-authored history.
6162

6263
## ⚡ Performance
6364

64-
This package offer acceptable performances against Chardet. Here are some numbers.
65+
This package offer better performances (99th, and 95th) against Chardet. Here are some numbers.
6566

66-
| Package | Accuracy | Mean per file (ms) | File per sec (est) |
67-
|-------------------------------------------------|:--------:|:------------------:|:------------------:|
68-
| [chardet 7](https://github.com/chardet/chardet) | 89 % | **5 ms** | 200 file/sec |
69-
| charset-normalizer | **97 %** | 8 ms | 125 file/sec |
67+
| Package | Accuracy | Mean per file (ms) | File per sec (est) |
68+
|---------------------------------------------------|:--------:|:------------------:|:------------------:|
69+
| [chardet 7.1](https://github.com/chardet/chardet) | 89 % | 3 ms | 333 file/sec |
70+
| charset-normalizer | **97 %** | 3 ms | 333 file/sec |
7071

71-
| Package | 99th percentile | 95th percentile | 50th percentile |
72-
|-------------------------------------------------|:---------------:|:---------------:|:---------------:|
73-
| [chardet 7](https://github.com/chardet/chardet) | 32 ms | 17 ms | 1 ms |
74-
| charset-normalizer | 63 ms | 29 ms | 3 ms |
72+
| Package | 99th percentile | 95th percentile | 50th percentile |
73+
|---------------------------------------------------|:---------------:|:---------------:|:---------------:|
74+
| [chardet 7.1](https://github.com/chardet/chardet) | 32 ms | 17 ms | < 1 ms |
75+
| charset-normalizer | 16 ms | 10 ms | 1 ms |
7576

76-
_updated as of Mars 2026 using CPython 3.12, and Chardet 7_
77+
_updated as of March 2026 using CPython 3.12, Charset-Normalizer 3.4.6, and Chardet 7.1.0_
7778

7879
~Chardet's performance on larger file (1MB+) are very poor. Expect huge difference on large payload.~ No longer the case since Chardet 7.0+
7980

_mypyc_hook/backend.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
from setuptools import build_meta as _orig # type: ignore[import-untyped]
88

99
USE_MYPYC = os.getenv("CHARSET_NORMALIZER_USE_MYPYC", "0") == "1"
10-
MYPYC_SPEC = "mypy>=1.4.1,<=1.19.1"
10+
MYPYC_SPEC = "mypy>=1.4.1,<=1.20"
1111

1212
# Expose all the PEP 517 hooks from setuptools
1313
get_requires_for_build_sdist = _orig.get_requires_for_build_sdist

0 commit comments

Comments
 (0)