Skip to content

Commit 024fda4

Browse files
authored
closes bpo-45190: Update Unicode data to version 14.0.0. (GH-28336)
1 parent 797c8eb commit 024fda4

File tree

11 files changed

+33032
-31939
lines changed

11 files changed

+33032
-31939
lines changed

Doc/library/stdtypes.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -352,7 +352,7 @@ Notes:
352352
The numeric literals accepted include the digits ``0`` to ``9`` or any
353353
Unicode equivalent (code points with the ``Nd`` property).
354354

355-
See https://www.unicode.org/Public/13.0.0/ucd/extracted/DerivedNumericType.txt
355+
See https://www.unicode.org/Public/14.0.0/ucd/extracted/DerivedNumericType.txt
356356
for a complete list of code points with the ``Nd`` property.
357357

358358

Doc/library/unicodedata.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,8 @@
1717

1818
This module provides access to the Unicode Character Database (UCD) which
1919
defines character properties for all Unicode characters. The data contained in
20-
this database is compiled from the `UCD version 13.0.0
21-
<https://www.unicode.org/Public/13.0.0/ucd>`_.
20+
this database is compiled from the `UCD version 14.0.0
21+
<https://www.unicode.org/Public/14.0.0/ucd>`_.
2222

2323
The module uses the same names and symbols as defined by Unicode
2424
Standard Annex #44, `"Unicode Character Database"
@@ -175,6 +175,6 @@ Examples:
175175

176176
.. rubric:: Footnotes
177177

178-
.. [#] https://www.unicode.org/Public/13.0.0/ucd/NameAliases.txt
178+
.. [#] https://www.unicode.org/Public/14.0.0/ucd/NameAliases.txt
179179
180-
.. [#] https://www.unicode.org/Public/13.0.0/ucd/NamedSequences.txt
180+
.. [#] https://www.unicode.org/Public/14.0.0/ucd/NamedSequences.txt

Doc/reference/lexical_analysis.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -316,16 +316,16 @@ The Unicode category codes mentioned above stand for:
316316
* *Nd* - decimal numbers
317317
* *Pc* - connector punctuations
318318
* *Other_ID_Start* - explicit list of characters in `PropList.txt
319-
<https://www.unicode.org/Public/13.0.0/ucd/PropList.txt>`_ to support backwards
319+
<https://www.unicode.org/Public/14.0.0/ucd/PropList.txt>`_ to support backwards
320320
compatibility
321321
* *Other_ID_Continue* - likewise
322322

323323
All identifiers are converted into the normal form NFKC while parsing; comparison
324324
of identifiers is based on NFKC.
325325

326326
A non-normative HTML file listing all valid identifier characters for Unicode
327-
4.1 can be found at
328-
https://www.unicode.org/Public/13.0.0/ucd/DerivedCoreProperties.txt
327+
14.0.0 can be found at
328+
https://www.unicode.org/Public/14.0.0/ucd/DerivedCoreProperties.txt
329329

330330

331331
.. _keywords:

Doc/whatsnew/3.11.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -239,6 +239,11 @@ time
239239
interval specified with nanosecond precision.
240240
(Contributed by Livius and Victor Stinner in :issue:`21302`.)
241241

242+
unicodedata
243+
-----------
244+
245+
* The Unicode database has been updated to version 14.0.0. (:issue:`45190`).
246+
242247

243248
Removed
244249
=======

Lib/test/test_unicodedata.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
class UnicodeMethodsTest(unittest.TestCase):
1919

2020
# update this, if the database changes
21-
expectedchecksum = 'fbdf8106a3c7c242086b0a9efa03ad4d30d5b85d'
21+
expectedchecksum = '4739770dd4d0e5f1b1677accfc3552ed3c8ef326'
2222

2323
@requires_resource('cpu')
2424
def test_method_checksum(self):
@@ -71,7 +71,7 @@ class UnicodeFunctionsTest(UnicodeDatabaseTest):
7171

7272
# Update this if the database changes. Make sure to do a full rebuild
7373
# (e.g. 'make distclean && make') to get the correct checksum.
74-
expectedchecksum = 'd1e37a2854df60ac607b47b51189b9bf1b54bfdb'
74+
expectedchecksum = '98d602e1f69d5c5bb8a5910c40bbbad4e18e8370'
7575

7676
@requires_resource('cpu')
7777
def test_function_checksum(self):
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Update Unicode databases to Unicode 14.0.0.

Modules/unicodedata.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1045,9 +1045,9 @@ is_unified_ideograph(Py_UCS4 code)
10451045
{
10461046
return
10471047
(0x3400 <= code && code <= 0x4DBF) || /* CJK Ideograph Extension A */
1048-
(0x4E00 <= code && code <= 0x9FFC) || /* CJK Ideograph */
1049-
(0x20000 <= code && code <= 0x2A6DD) || /* CJK Ideograph Extension B */
1050-
(0x2A700 <= code && code <= 0x2B734) || /* CJK Ideograph Extension C */
1048+
(0x4E00 <= code && code <= 0x9FFF) || /* CJK Ideograph */
1049+
(0x20000 <= code && code <= 0x2A6DF) || /* CJK Ideograph Extension B */
1050+
(0x2A700 <= code && code <= 0x2B738) || /* CJK Ideograph Extension C */
10511051
(0x2B740 <= code && code <= 0x2B81D) || /* CJK Ideograph Extension D */
10521052
(0x2B820 <= code && code <= 0x2CEA1) || /* CJK Ideograph Extension E */
10531053
(0x2CEB0 <= code && code <= 0x2EBE0) || /* CJK Ideograph Extension F */

Modules/unicodedata_db.h

Lines changed: 3250 additions & 3161 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Modules/unicodename_db.h

Lines changed: 28414 additions & 27462 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Objects/unicodetype_db.h

Lines changed: 1345 additions & 1299 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)