Merge pull request #46 from eadensplace/unicode-flag-u

longo-andrea · web-flow · commit 8241bee66dbf · 2019-10-04T16:02:08.000+02:00
Unicode: flag "u" and class \p{...}
diff --git a/9-regular-expressions/20-regexp-unicode/article.md b/9-regular-expressions/20-regexp-unicode/article.md
@@ -1,88 +1,88 @@
 
 # Unicode: flag "u"
 
-The unicode flag `/.../u` enables the correct support of surrogate pairs.
+La flag unicode `/.../u` abilita il corretto supporto delle coppie surrogate.
 
-Surrogate pairs are explained in the chapter <info:string>.
+Le coppie surrogate sono spiegate nel capitolo <info:string>.
 
-Let's briefly review them here. In short, normally characters are encoded with 2 bytes. That gives us 65536 characters maximum. But there are more characters in the world.
+Rivediamole brevemente qui. In poche parole, i caratteri normali sono codificati con 2 byte. Questo ci dà un massimo di 65536 caratteri. Ma ci sono più caratteri nel mondo.
 
-So certain rare characters are encoded with 4 bytes, like `𝒳` (mathematical X) or `😄` (a smile).
+Quindi alcuni caratteri più rari sono codificati con 4 byte, come `𝒳` (la X matematica) o `😄` (uno smile).
 
-Here are the unicode values to compare:
+Qui vi sono i valori unicode da comparare:
 
-| Character  | Unicode | Bytes  |
+| Carattere  | Unicode | Byte  |
 |------------|---------|--------|
 | `a` | 0x0061 |  2 |
 | `≈` | 0x2248 |  2 |
 |`𝒳`| 0x1d4b3 | 4 |
 |`𝒴`| 0x1d4b4 | 4 |
 |`😄`| 0x1f604 | 4 |
 
-So characters like `a` and `≈` occupy 2 bytes, and those rare ones take 4.
+Dunque caratteri come `a` e `≈` occupano 2 bytes, e quelli rari ne occupano 4.
 
-The unicode is made in such a way that the 4-byte characters only have a meaning as a whole.
+Unicode è stato fatto in modo tale che i caratteri a 4 byte abbiano un significato solo considerando l'intero insieme.
 
-In the past JavaScript did not know about that, and many string methods still have problems. For instance, `length` thinks that here are two characters:
+In precedenza JavaScript non ne sapeva nulla, e molti metodi delle stringhe ancora presentano problemi. Per esempio, `length` pensa che qui ci siano due caratteri:
 
 ```js run
 alert('😄'.length); // 2
 alert('𝒳'.length); // 2
 ```
 
-...But we can see that there's only one, right? The point is that `length` treats 4 bytes as two 2-byte characters. That's incorrect, because they must be considered only together (so-called "surrogate pair").
+...Ma possiamo vedere che ce n'è solo uno, giusto? Il punto è che `length` tratta i caratteri a 4 byte come due caratteri a 2-byte. Questo non è corretto, perché devono essere considerati solo insieme (per cui chiamati "coppie surrogate").
 
-Normally, regular expressions also treat "long characters" as two 2-byte ones.
+Usualmente, anche le espressioni regolari trattano questi "caratteri lunghi" come due caratteri a 2-byte.
 
-That leads to odd results, for instance let's try to find `pattern:[𝒳𝒴]` in the string `subject:𝒳`:
+Questo porta a strani risultati, ad esempio proviamo a cercare `pattern:[𝒳𝒴]` nella stringa `subject:𝒳`:
 
 ```js run
-alert( '𝒳'.match(/[𝒳𝒴]/) ); // odd result (wrong match actually, "half-character")
+alert( '𝒳'.match(/[𝒳𝒴]/) ); // risultato strano (in realtà è una corrispondenza errata, "mezzo carattere")
 ```
 
-The result is wrong, because by default the regexp engine does not understand surrogate pairs.
+Il risultato è errato, perché di default il motore delle regexp non comprende le coppie surrogate.
 
-So, it thinks that `[𝒳𝒴]` are not two, but four characters:
-1. the left half of `𝒳` `(1)`,
-2. the right half of `𝒳` `(2)`,
-3. the left half of `𝒴` `(3)`,
-4. the right half of `𝒴` `(4)`.
+Dunque, pensa che `[𝒳𝒴]` non siano due, ma quattro caratteri:
+1. la metà sinistra di `𝒳` `(1)`,
+2. la metà destra di `𝒳` `(2)`,
+3. la metà sinistra di `𝒴` `(3)`,
+4. la metà destra di `𝒴` `(4)`.
 
-We can list them like this:
+Li possiamo elencare così:
 
 ```js run
 for(let i=0; i<'𝒳𝒴'.length; i++) {
   alert('𝒳𝒴'.charCodeAt(i)); // 55349, 56499, 55349, 56500
 };
 ```
 
-So it finds only the "left half" of `𝒳`.
+Quindi trova solo la "metà sinistra" di `𝒳`.
 
-In other words, the search works like `'12'.match(/[1234]/)`: only `1` is returned.
+In altre parole, la ricerca funziona come `'12'.match(/[1234]/)`: solo `1` viene restituito.
 
-## The "u" flag
+## La flag "u"
 
-The `/.../u` flag fixes that.
+La flag `/.../u` risolve questo problema.
 
-It enables surrogate pairs in the regexp engine, so the result is correct:
+Essa abilita le coppie surrogate nel motore delle regexp, in modo tale che il risultato sia:
 
 ```js run
 alert( '𝒳'.match(/[𝒳𝒴]/u) ); // 𝒳
 ```
 
-Let's see one more example.
+Vediamo un altro esempio.
 
-If we forget the `u` flag and occasionally use surrogate pairs, then we can get an error:
+Se dimentichiamo la flag `u` e occasionalmente usiamo le coppie surrogate, possiamo incorrere in errori:
 
 ```js run
-'𝒳'.match(/[𝒳-𝒴]/); // SyntaxError: invalid range in character class
+'𝒳'.match(/[𝒳-𝒴]/); // SyntaxError: intervallo non valido nella classe di caratteri
 ```
 
-Normally, regexps understand `[a-z]` as a "range of characters with codes between codes of `a` and `z`.
+Di solito, le regexp interpretano `[a-z]` come un "intervallo di caratteri con codici tra `a` e `z`.
 
-But without `u` flag, surrogate pairs are assumed to be a "pair of independent characters", so `[𝒳-𝒴]` is like `[<55349><56499>-<55349><56500>]` (replaced each surrogate pair with code points). Now we can clearly see that the range `56499-55349` is unacceptable, as the left range border must be less than the right one.
+Ma senza la flag `u`, le coppie surrogate vengono interpretate come "coppie di caratteri indipendenti", quindi `[𝒳-𝒴]` è come `[<55349><56499>-<55349><56500>]` (sostituito a ogni coppia surrogata il codice corrispondente). Ora possiamo vedere con più chiarezza che l'intervallo `56499-55349` non è accettabile, dato che il valore a sinistra dell'intervallo deve essere inferiore rispetto a quello a destra.
 
-Using the `u` flag makes it work right:
+Usando la flag `u` tutto funziona di nuovo:
 
 ```js run
 alert( '𝒴'.match(/[𝒳-𝒵]/u) ); // 𝒴
diff --git a/9-regular-expressions/21-regexp-unicode-properties/article.md b/9-regular-expressions/21-regexp-unicode-properties/article.md
@@ -1,81 +1,81 @@
 
-# Unicode character properties \p
+# Proprietà dei caratteri Unicode \p
 
-[Unicode](https://en.wikipedia.org/wiki/Unicode), the encoding format used by JavaScript strings, has a lot of properties for different characters (or, technically, code points). They describe which "categories" character belongs to, and a variety of technical details.
+[Unicode](https://en.wikipedia.org/wiki/Unicode), il formato di codifica usato dalle stringhe di JavaScript, ha molte proprietà per diversi caratteri. Esse descrivono a quali "categorie" appartiene il carattere, e una varietà di dettagli tecnici.
 
-In regular expressions these can be set by `\p{…}`. And there must be flag `'u'`.
+Nelle espressioni regolari queste possono essere impostate con `\p{…}`. E deve esserci la flag `'u'`.
 
-For instance, `\p{Letter}` denotes a letter in any of language. We can also use `\p{L}`, as `L` is an alias of `Letter`, there are shorter aliases for almost every property.
+Per esempio, `\p{Letter}` indica una lettera in qualsiasi lingua. Possiamo anche usare `\p{L}`, o `L` al posto di `Letter`, ci sono alias più corti quasi per tutte le proprietà.
 
-Here's the main tree of properties:
+Qui c'è l'albero principale delle proprietà:
 
-- Letter `L`:
-  - lowercase `Ll`, modifier `Lm`, titlecase `Lt`, uppercase `Lu`, other `Lo`
-- Number `N`:
-  - decimal digit `Nd`, letter number `Nl`, other `No`
-- Punctuation `P`:
-  - connector `Pc`, dash `Pd`, initial quote `Pi`, final quote `Pf`, open `Ps`, close `Pe`, other `Po`
-- Mark `M` (accents etc):
-  - spacing combining `Mc`, enclosing `Me`, non-spacing `Mn`
-- Symbol `S`:
-  - currency `Sc`, modifier `Sk`, math `Sm`, other `So`
-- Separator `Z`:
-  - line `Zl`, paragraph `Zp`, space `Zs`
-- Other `C`:
-  - control `Cc`, format `Cf`, not assigned `Cn`, private use `Co`, surrogate `Cs`
+- Lettera `L`:
+  - minuscolo `Ll`, modificatore `Lm`, titolo `Lt`, maiuscolo `Lu`, altro `Lo`
+- Numero `N`:
+  - cifra decimale `Nd`, numero letterale `Nl`, altro `No`
+- Punteggiatura `P`:
+  - connettore `Pc`, trattino `Pd`, apri virgolette `Pi`, chiudi virgolette `Pf`, apri `Ps`, chiudi `Pe`, altro `Po`
+- Mark `M` (accenti ecc.):
+  - combinazione di spazi `Mc`, simboli di enclosing `Me`, caratteri non-spacing `Mn`
+- Simbolo `S`:
+  - valuta `Sc`, modificatore `Sk`, matematico `Sm`, altro `So`
+- Separatore `Z`:
+  - linea `Zl`, paragrafo `Zp`, spazio `Zs`
+- Altro `C`:
+  - controllo `Cc`, formato `Cf`, non assegnato `Cn`, uso privato `Co`, surrogato `Cs`
 
-```smart header="More information"
-Interested to see which characters belong to a property? There's a tool at <http://cldr.unicode.org/unicode-utilities/list-unicodeset> for that.
+```smart header="Maggiori informazioni"
+Ti interessa scoprire quali caratteri appartengono a una proprietà? C'è uno strumento in <http://cldr.unicode.org/unicode-utilities/list-unicodeset> che serve a questo.
 
-You could also explore properties at [Character Property Index](http://unicode.org/cldr/utility/properties.jsp).
+Potresti anche esplorare le proprietà in [Character Property Index](http://unicode.org/cldr/utility/properties.jsp).
 
-For the full Unicode Character Database in text format (along with all properties), see <https://www.unicode.org/Public/UCD/latest/ucd/>.
+Per il Database completo dei Caratteri Unicode in formato testuale (insieme a tutte le proprietà), vedi <https://www.unicode.org/Public/UCD/latest/ucd/>.
 ```
 
-There are also other derived categories, like:
-- `Alphabetic` (`Alpha`), includes Letters `L`, plus letter numbers `Nl` (e.g. roman numbers Ⅻ), plus some other symbols `Other_Alphabetic` (`OAltpa`).
-- `Hex_Digit` includes hexadecimal digits: `0-9`, `a-f`.
-- ...Unicode is a big beast, it includes a lot of properties.
+Ci sono anche altre categorie derivate, come:
+- `Alphabetic` (`Alpha`), include Lettere `L`, più numeri letterali `Nl` (es. i numeri romani Ⅻ), più qualche altro simbolo `Other_Alphabetic` (`OAltpa`).
+- `Hex_Digit` include i numeri esadecimali: `0-9`, `a-f`.
+- ...Unicode è un sistema complesso, include moltissime proprietà.
 
-For instance, let's look for a 6-digit hex number:
+Per esempio, cerchiamo un numero esadecimale a 6 cifre:
 
 ```js run
-let reg = /\p{Hex_Digit}{6}/u; // flag 'u' is required
+let reg = /\p{Hex_Digit}{6}/u; // è richiesta la flag 'u'
 
 alert("color: #123ABC".match(reg)); // 123ABC
 ```
 
-There are also properties with a value. For instance, Unicode "Script" (a writing system) can be Cyrillic, Greek, Arabic, Han (Chinese) etc, the [list is long]("https://en.wikipedia.org/wiki/Script_(Unicode)").
+Ci sono anche proprietà con un valore. Ad esempio, Unicode "Script" (un sistema di scrittura) può essere Cirillico, Greco, Arabo, Han (Cinese) ecc, la [lista è lunga]("https://en.wikipedia.org/wiki/Script_(Unicode)").
 
-To search for characters in certain scripts ("alphabets"), we should supply `Script=<value>`, e.g. to search for cyrillic letters: `\p{sc=Cyrillic}`, for Chinese glyphs: `\p{sc=Han}`, etc:
+Per cercare caratteri in certi script ("alfabeti"), dovremmo fornire `Script=<value>`, ad esempio per cercare lettere in Cirillico: `\p{sc=Cyrillic}`, per glifi Cinesi: `\p{sc=Han}`, ecc:
 
 ```js run
-let regexp = /\p{sc=Han}+/gu; // get chinese words
+let regexp = /\p{sc=Han}+/gu; // ottieni parole cinesi
 
 let str = `Hello Привет 你好 123_456`;
 
 alert( str.match(regexp) ); // 你好
 ```
 
-## Building multi-language \w
+## Costruire un multi linguaggio \w
 
-The pattern `pattern:\w` means "wordly characters", but doesn't work for languages that use non-Latin alphabets, such as Cyrillic and others. It's just a shorthand for `[a-zA-Z0-9_]`, so `pattern:\w+` won't find any Chinese words etc.
+Il pattern `pattern:\w` vuol dire "caratteri per formare parole", ma non funziona per lingue che usano alfabeti non Latini, come il Cirillico e altri. È solo un'abbreviazione per `[a-zA-Z0-9_]`, quindi `pattern:\w+` non troverà nessuna parola Cinese ecc.
 
-Let's make a "universal" regexp, that looks for wordly characters in any language. That's easy to do using Unicode properties:
+Creiamo una regexp "universale", che cerca "caratteri letterari" in qualsiasi lingua. È semplice da fare utilizzando le proprietà di Unicode:
 
 ```js
 /[\p{Alphabetic}\p{Mark}\p{Decimal_Number}\p{Connector_Punctuation}\p{Join_Control}]/u
 ```
 
-Let's decipher. Just as `pattern:\w` is the same as `pattern:[a-zA-Z0-9_]`, we're making a set of our own, that includes:
+Decifriamolo. Proprio come `pattern:\w` è lo stesso di `pattern:[a-zA-Z0-9_]`, stiamo creando un nostro set personalizzato, che include:
 
-- `Alphabetic` for letters,
-- `Mark` for accents, as in Unicode accents may be represented by separate code points,
-- `Decimal_Number` for numbers,
-- `Connector_Punctuation` for the `'_'` character and alike,
-- `Join_Control` -– two special code points with hex codes `200c` and `200d`, used in ligatures e.g. in arabic.
+- `Alphabetic` per le lettere,
+- `Mark` per accenti, dato che in Unicode gli accenti potrebbero essere rappresentati con codici separati,
+- `Decimal_Number` per i numeri,
+- `Connector_Punctuation` per il carattere `'_'`  e simili,
+- `Join_Control` -– due codici speciali con codice esadecimale `200c` e `200d`, usati ad esempio in Arabo.
 
-Or, if we replace long names with aliases (a list of aliases [here](https://www.unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt)):
+O, se sostituiamo nomi lunghi con degli alias (una lista di alias [qui](https://www.unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt)):
 
 ```js run
 let regexp = /([\p{Alpha}\p{M}\p{Nd}\p{Pc}\p{Join_C}]+)/gu;