e.g.
using System;
using System.Globalization;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
CultureInfo.CurrentCulture = new CultureInfo("tr-TR");
var r = new Regex(@"[A-Z]", RegexOptions.IgnoreCase);
Console.WriteLine(r.IsMatch("\u0131")); // should print true, but prints false
}
}
In Turkish, I lowercases to ı (\u0131), so the above repro should print out true. But whereas Regex is using the target culture when dealing with individual characters in a set:
|
SingleRange range = rangeList[i]; |
|
if (range.First == range.Last) |
|
{ |
|
char lower = culture.TextInfo.ToLower(range.First); |
|
rangeList[i] = new SingleRange(lower, lower); |
|
} |
when it instead has a range with multiple characters, it delegates to this AddLowercaseRange function:
|
private void AddLowercaseRange(char chMin, char chMax) |
which doesn't factor in the target culture into its decision, instead using a precomputed table:
|
private static readonly LowerCaseMapping[] s_lcTable = new LowerCaseMapping[] |
@tarekgh, @GrabYourPitchforks, am I correct that such a table couldn't possibly be right, given that different cultures case differently?
Note that if the above repro is instead changed to spell out the whole range of uppercase letters:
using System;
using System.Globalization;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
CultureInfo.CurrentCulture = new CultureInfo("tr-TR");
var r = new Regex(@"[ABCDEFGHIJKLMNOPQRSTUVWXYZ]", RegexOptions.IgnoreCase);
Console.WriteLine(r.IsMatch("\u0131")); // prints true
}
}
it then correctly prints true.
cc: @eerhardt, @pgovind
e.g.
In Turkish,
Ilowercases toı(\u0131), so the above repro should print out true. But whereasRegexis using the target culture when dealing with individual characters in a set:runtime/src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCharClass.cs
Lines 551 to 556 in fd82afe
when it instead has a range with multiple characters, it delegates to this AddLowercaseRange function:
runtime/src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCharClass.cs
Line 569 in fd82afe
which doesn't factor in the target culture into its decision, instead using a precomputed table:
runtime/src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCharClass.cs
Line 301 in fd82afe
@tarekgh, @GrabYourPitchforks, am I correct that such a table couldn't possibly be right, given that different cultures case differently?
Note that if the above repro is instead changed to spell out the whole range of uppercase letters:
it then correctly prints
true.cc: @eerhardt, @pgovind