Frozen/String/Hashing: replace unsafe pointer arithmetic with safe span indexing#125547
Frozen/String/Hashing: replace unsafe pointer arithmetic with safe span indexing#125547
Conversation
…pan indexing Co-authored-by: EgorBo <523221+EgorBo@users.noreply.github.com>
|
Tagging subscribers to this area: @dotnet/area-system-collections |
This comment was marked as outdated.
This comment was marked as outdated.
|
@copilot please use the benchmark skill to validate performance of this code using public APIs and for different lengths. |
This comment was marked as outdated.
This comment was marked as outdated.
thanks for writing a benchmark with an infinite loop in Setup.. 😞 |
... Sorry about that — @EgorBot -linux_amd -osx_arm64 using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Collections.Frozen;
BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);
[MemoryDiagnoser]
public class Bench
{
private FrozenDictionary<string, int> _ordinalDict = default!;
private FrozenDictionary<string, int> _ignoreCaseDict = default!;
private string[] _keys = default!;
[Params(1, 3, 4, 5, 8, 20)]
public int KeyLength { get; set; }
private static string GenerateKey(int index, int length)
{
var chars = new char[length];
for (int i = length - 1; i >= 0; i--)
{
chars[i] = (char)('a' + (index % 26));
index /= 26;
}
return new string(chars);
}
[GlobalSetup]
public void Setup()
{
int maxUnique = 1;
for (int i = 0; i < KeyLength && maxUnique < 100; i++)
maxUnique *= 26;
int numKeys = Math.Min(100, maxUnique);
var dict = new Dictionary<string, int>(numKeys);
for (int i = 0; i < numKeys; i++)
dict[GenerateKey(i, KeyLength)] = i;
_ordinalDict = dict.ToFrozenDictionary(StringComparer.Ordinal);
_ignoreCaseDict = dict.ToFrozenDictionary(StringComparer.OrdinalIgnoreCase);
_keys = dict.Keys.ToArray();
}
[Benchmark]
public int Lookup_Ordinal()
{
int sum = 0;
var dict = _ordinalDict;
var keys = _keys;
for (int i = 0; i < keys.Length; i++)
sum += dict[keys[i]];
return sum;
}
[Benchmark]
public int Lookup_OrdinalIgnoreCase()
{
int sum = 0;
var dict = _ignoreCaseDict;
var keys = _keys;
for (int i = 0; i < keys.Length; i++)
sum += dict[keys[i]];
return sum;
}
[Benchmark]
public FrozenDictionary<string, int> Create_Ordinal()
{
var source = new Dictionary<string, int>(_keys.Length);
var keys = _keys;
for (int i = 0; i < keys.Length; i++)
source[keys[i]] = i;
return source.ToFrozenDictionary(StringComparer.Ordinal);
}
[Benchmark]
public FrozenDictionary<string, int> Create_OrdinalIgnoreCase()
{
var source = new Dictionary<string, int>(_keys.Length);
var keys = _keys;
for (int i = 0; i < keys.Length; i++)
source[keys[i]] = i;
return source.ToFrozenDictionary(StringComparer.OrdinalIgnoreCase);
}
} |
This comment was marked as outdated.
This comment was marked as outdated.
Both jobs timed out — the @EgorBot -linux_amd -osx_arm64 using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Collections.Frozen;
BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);
public class Bench
{
private FrozenDictionary<string, int> _ordinalDict = default!;
private FrozenDictionary<string, int> _ignoreCaseDict = default!;
private string[] _keys = default!;
[Params(4, 5, 8, 20)]
public int KeyLength { get; set; }
[GlobalSetup]
public void Setup()
{
var dict = new Dictionary<string, int>(50);
for (int i = 0; i < 50; i++)
{
var chars = new char[KeyLength];
int idx = i;
for (int j = KeyLength - 1; j >= 0; j--)
{
chars[j] = (char)('a' + (idx % 26));
idx /= 26;
}
dict[new string(chars)] = i;
}
_ordinalDict = dict.ToFrozenDictionary(StringComparer.Ordinal);
_ignoreCaseDict = dict.ToFrozenDictionary(StringComparer.OrdinalIgnoreCase);
_keys = dict.Keys.ToArray();
}
[Benchmark]
public int Lookup_Ordinal()
{
int sum = 0;
var dict = _ordinalDict;
var keys = _keys;
for (int i = 0; i < keys.Length; i++)
sum += dict[keys[i]];
return sum;
}
[Benchmark]
public int Lookup_OrdinalIgnoreCase()
{
int sum = 0;
var dict = _ignoreCaseDict;
var keys = _keys;
for (int i = 0; i < keys.Length; i++)
sum += dict[keys[i]];
return sum;
}
}Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
@copilot, benchmark results are ready: Please analyze the results and act accordingly. |
Benchmark analysis across both platforms shows no regressions — the PR is consistently equal or slightly faster: Linux AMD (EPYC Turin):
macOS ARM64 (Apple M2):
The safe span indexing approach generates equal or better codegen than the unsafe pointer arithmetic on both x64 and ARM64. No action needed — the change is a net positive for both safety and performance. |
|
@copilot, benchmark results are ready: Please analyze the results and act accordingly. |
Second round of benchmarks confirms the same trend — no regressions, PR is consistently equal or faster: Linux AMD (EPYC 9V45 Turin):
macOS ARM64 (Apple M4):
The improvement on x64 is particularly notable at longer key lengths, consistent with the main |
EgorBo
left a comment
There was a problem hiding this comment.
LGTM. I've checked locally that the safe impl generates no bounds checks, it comes with a bi to extra codegen in the default: case due to loop cloning, but the fast loop is as fast as the unsafe code.
There was a problem hiding this comment.
Pull request overview
This PR updates the Frozen string hashing implementation in System.Collections.Immutable to remove unsafe pointer-based reads and replace them with safe ReadOnlySpan<char> indexing plus explicit 2x-UTF16-to-uint bit packing.
Changes:
- Removed
unsafe/fixed/uint*casts fromHashing.GetHashCodeOrdinaland replaced them with span indexing and explicit packing of two UTF-16 chars into auint. - Applied the same safe refactor to
Hashing.GetHashCodeOrdinalIgnoreCaseAscii, including usingLowercaseCharfor trailing single chars. - Removed the now-unused
System.Runtime.InteropServicesimport.
src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/String/Hashing.cs
Show resolved
Hide resolved
src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/String/Hashing.cs
Show resolved
Hide resolved
src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/String/Hashing.cs
Show resolved
Hide resolved
| char* ptrChar = (char*)ptrUInt32; | ||
| while (length-- > 0) | ||
| { | ||
| hash2 = (BitOperations.RotateLeft(hash2, 5) + hash2) ^ (*ptrChar | LowercaseUInt32); |
There was a problem hiding this comment.
a copy-paste issue in the previous code - the single-char loop uses LowercaseUInt32 (for 2 chars at a time) instead of LowercaseChar. Not a correctness issue
|
PTAL @tannergooding @MihaZupan @stephentoub No new bounds checks in the safer code, benchmarks even report slight improvements. |
IIRC, it was done this way not because of bounds check on core, but because of performance on netfx. Does this not regress collection lookup times there? |
I guess I can manually check, but it definitely regress the bounds checks on netfx like pretty much all "reduce-unsafe" PRs do I think. If the goal is to avoid perf regressions for netfx at all costs - there will be quite a small area to remove unsafe from 😞 |
The goal isn't "remove unsafe at all costs", either. We're trying to remove unsafe thoughtfully, where either it's not necessary because there's just as good or better a way to do it without unsafe, or because it's not adding enough value to be worth the extra risk. For places where it still adds sufficient value, we're happy to keep using unsafe. For packages that build downlevel, performance on netfx absolutely needs to factor into those equations. |
@stephentoub I've checked - it is a big regression for .NET Framework for large inputs (for 1-5 lengths it's a fraction of a nanosecond, but for len=100 could be 120ns vs 20ns) - bench.cs, do I just copy-paste it for .NET with #if ? I believe there is a value in replacing this code with a safer implementation. The current code is difficult to verify at a glance that it doesn't go out of bounds, contains a non-portable unaligned store, and appears to have a copy-paste mistake. The new version resolves these issues and is also faster. |
|
Personally, I would prefer instead that we address: then this code won't exist in this package at all, at least for .NET 11, and we'd just leave the code untouched for downlevel (other than if there's a copy/paste bug to fix like you suggest). |
|
Ok, I guess that new API can re-use this impl once it's introduced |
Hashing.GetHashCodeOrdinalandHashing.GetHashCodeOrdinalIgnoreCaseAsciiusedunsafe/fixed/uint*casts to read two UTF-16 chars at a time as a packeduint. Replace with explicit bit-packing via safe span indexing —((uint)s[i] | ((uint)s[i+1] << 16))— which is mathematically identical on all little-endian .NET platforms and removes theunsafeblocks and theSystem.Runtime.InteropServicesimport entirely.GetHashCodeOrdinalcase 4:((uint*)src)[0/1]→ explicit pack((uint)s[0] | ((uint)s[1] << 16))etc.default:uint* ptrUInt32+while (length >= 4)→ index-basedforloop; trailingchar*pointer walk → secondforloop overs[i]GetHashCodeOrdinalIgnoreCaseAscii| LowercaseUInt32on packed pairs and| LowercaseCharon trailing single chars (the old code erroneously usedLowercaseUInt32on single chars, which set a spurious high-lane bit; semantically equivalent since these are internal non-persisted hashes, but now consistent)Hash values are only used internally within the frozen collection build+lookup cycle and are never persisted, so no stable-hash contract is broken.
Original prompt
please take only Hashing changes (don't remove other changes for now, just don't commit them) and send a PR to dotnet/runtime. Use the benchmark skill to validate perf impact.
**Chronological Review:** 1. User requested a plan to find and fix trivially-replaceable unsafe code in dotnet/runtime Libraries and CoreLib 2. Multiple exploration agents (agent-0 through agent-10) searched the codebase for unsafe patterns: pointers, `fixed` statements, `Unsafe.*`, `MemoryMarshal.*`, `Marshal.*` APIs 3. Initial plan was created with candidates including Enum.cs (Unsafe.As replacement), FixedBufferExtensions.cs, WebHeaderEncoding.cs, Hashing.cs 4. User rejected the plan: "do not replace unsafe with 'safe' Unsafe, MemoryMarshal. Wasn't it clear??" — meaning replacements must use TRULY safe APIs (Span, indexing, ToString, etc.), not just swap one unsafe API for another 5. Plan was revised with 4 truly-safe candidates: FixedBufferExtensions.cs, WebHeaderEncoding.cs, StringParser.cs, Hashing.cs 6. Plan approved, autopilot mode entered 7. Baseline build run successfully (clr+libs) 8. All 4 files were edited with safe replacements 9. All affected projects built successfully (CoreLib, HttpListener, Process, Collections.Immutable including netstandard2.0) 10. Tests run: HttpListener (693 pass), Common (1695 pass), Immutable Collections (22068 pass), Process (363 pass) — all 0 failures 11. Code review completed — all changes approved 12. User then asked to update `GetHashCodeOrdinalIgnoreCaseAscii` to match the shape they'd manually changed `GetHashCodeOrdinal` to (using `for` loops with `s.Length` instead of decrementing `length` counter) 13. That change was made, built, tested (22068 pass) 14. User then asked to save all changes EXCEPT Hashing.cs to a .diff patch, keeping only Hashing.cs modifiedIntent Mapping:
Technical Inventory:
C:\prj\runtime-main2, branchreplace-fixed-buffersdotnet buildfor individual projectsdotnet build /t:testCode Archaeology:
fixed+new string(ptr)→span.Slice().ToString()(REVERTED, saved to patch)fixedpointer loops → Span indexing (REVERTED, saved to patch)fixed+char*iteration →string.AsSpan()indexing in 4 parse methods (REVERTED, saved to patch)fixed+(uint*)casts → span indexing + bit packing((uint)s[i] | ((uint)s[i+1] << 16))(STILL MODIFIED)Progress Assessment:
forloopsnon-hashing-unsafe-cleanup.diff✅Recent Commands Analysis:
git diff -- <3 files> > non-hashing-unsafe-cleanup.diff— saved patchgit checkout -- <3 files>— reverted the 3 non-Hashing filesgit diff --stat— confirmed only Hashing.cs remains modified (84 insertions, 100 deletions)task_completecalled with summary1. Conversation Overview: - Primary Objectives: User requested: "Please walk over the codebase and try to find places with unsafe code that you can trivially make safe." Rules: only trivial changes, ignore interop/PInvokes, focus on optimization-related unsafe code (pointers for performance), replace with safe alternatives like `Span`. Only focus on Libraries and CoreLib. Do not commit changes in git. - Session Context: Extensive codebase exploration (11 background agents) → plan creation → plan rejection ("do not replace unsafe with 'safe' Unsafe, MemoryMarshal") → revised plan → implementation of 4 file changes → builds + tests → user manually tweaked one file → matching update → patch extraction for 3 files, keeping only Hashing.cs modified - User Intent Evolution: Initial broad search → narrowed to truly safe replacements only (no Unsafe.*/MemoryMarshal.* in replacement code) → user wanted to ship Hashing.cs separately, saving other changes as a .diff patch- Technical Foundation:
- Repository: dotnet/runtime at
- .NET SDK: 11.0.100-preview.1.26104.118 at
- Build approach:
- Target frameworks vary: System.Net.HttpL...
C:\prj\runtime-main2, branchreplace-fixed-buffers, commit53c6c4e7d3dC:\prj\runtime-main2\.dotnet\dotnet.exedotnet build <project>.csprojfor individual projects;dotnet build /t:test <test-project>.csprojfor testsCreated from Copilot CLI via the copilot delegate command.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.