Atomics.waitreturning"not-equal"does not provide happens-before ordering for third-party stores with 3+ workers.Chromium Issue: https://issues.chromium.org/issues/495679735
This repository contains a minimal, runnable demonstration of a memory ordering bug in V8 (Chrome/Node.js) where the Atomics.wait / memory.atomic.wait32 "not-equal" fast path fails to establish happens-before relationships, causing stale reads of shared memory.
https://lostbeard.github.io/v8-atomics-wait-bug/
Run the tests directly in your browser — no install required. The demo auto-escalates iterations until the bug is detected or 100K iterations pass clean.
When 3 or more Web Workers synchronize using a generation-counting barrier with Atomics.wait / Atomics.notify:
- Workers write data to shared memory
- Workers enter a barrier (atomic arrival counter + generation bump + wait/notify)
- After the barrier, workers read each other's data
Expected: All workers see all other workers' writes after the barrier.
Actual (V8): Workers whose Atomics.wait returns "not-equal" (because the generation was already bumped by the last arriver) do not see prior stores from other workers. ~66% of cross-worker reads are stale.
The 66% figure is exact: with 3 workers, each reads 2 other workers' slots. 2/3 = 66.7%.
The happens-before edge flows: Writer's stores -> Last Arriver (arrival counter) -> Atomics.notify -> Woken Waiters. But when a waiter's Atomics.wait returns "not-equal" (the generation already changed before wait was called), V8 appears to skip the full seq_cst memory fence. The return value is correct (the generation did change), but the ordering guarantee is missing.
Replacing Atomics.wait with while (Atomics.load(view, genIdx) === myGen) {} fixes the issue completely. Every Atomics.load is seq_cst — when the load finally observes the new generation, the total order property of seq_cst guarantees that all prior stores from all threads are visible. No ambiguity, no fast paths.
Open index.html in Chrome. The page uses a service worker to enable cross-origin isolation (SharedArrayBuffer support) automatically.
If the service worker doesn't activate (e.g., file:// protocol), serve locally:
# Python
python -m http.server 8080
# Node.js
npx serve -p 8080
# Then open http://localhost:8080# Bug reproducer (3 workers + wait/notify) — expect FAIL
node three_worker_barrier.mjs
# Workaround (3 workers + spin barrier) — expect PASS
node spin_barrier_3w.mjsThree tests isolate the bug precisely:
| Test | Workers | Barrier | Stale Reads | Error Rate | Result |
|---|---|---|---|---|---|
| 1. Control (2 workers) | 2 | wait/notify | 0 | 0% | PASS |
| 2. Bug trigger (3 workers) | 3 | wait/notify | ~98,000 / 150,000 | ~66% | FAIL |
| 3. Workaround (spin) | 3 | spin (Atomics.load) | 0 | 0% | PASS |
| Test | Workers | Barrier | Stale Reads | Error Rate | Result |
|---|---|---|---|---|---|
| 1. Control (2 workers) | 2 | wait/notify | 0 / 200,000 | 0% | PASS |
| 2. Bug trigger (3 workers) | 3 | wait/notify | 39,368 / 375,000 | 10.5% | FAIL |
| 3. Workaround (spin) | 3 | spin (Atomics.load) | 0 / 288,000 | 0% | PASS |
| Test | Workers | Barrier | Stale Reads | Error Rate | Result |
|---|---|---|---|---|---|
| 1. Control (2 workers) | 2 | wait/notify | 0 / 200,000 | 0% | PASS |
| 2. Bug trigger (3 workers) | 3 | wait/notify | 1,897 / 3,000 | 63.2% | FAIL |
| 3. Workaround (spin) | 3 | spin (Atomics.load) | 0 / 9,000 | 0% | PASS |
- Test 1 proves the barrier algorithm is correct with 2 workers.
- Test 2 proves it breaks with 3 workers and
Atomics.waiton all tested engines (V8, SpiderMonkey). Firefox fails at just 1,000 iterations with 63.2% — nearly identical to Node.js V8's ~66%. - Test 3 proves the spin workaround fixes it on all tested engines.
Section 25.4.12 — The agent enters the WaiterList critical section, compares the value, and returns "not-equal" if they differ. The critical section entry/exit should synchronize with Atomics.notify per the memory model.
Section 29 — Defines Synchronize events and happens-before. Atomics.notify synchronizes-with agents it wakes. The "not-equal" path should also establish ordering through the shared critical section, but V8 appears to optimize this away.
memory.atomic.wait32 — Performs an ARDSEQCST (atomic read with sequential consistency) as its first step. The seq_cst ordering should apply regardless of whether the result is "ok", "not-equal", or "timed-out".
| Environment | Engine | Error Rate | Status |
|---|---|---|---|
| Node.js 22.14.0 | V8 12.4.254.21 | ~66% stale reads | Affected — highly reproducible |
| Chrome 146 | V8 ~14.6.x | 10.5% stale reads | Affected — confirmed with escalating test |
| Chrome Canary 148.0.7751.0 | V8 (latest) | 1 / 135,000 stale reads | Affected — rare but confirmed |
| Firefox 148 | SpiderMonkey | 63.2% stale reads | Affected — fails at 1K iterations |
| Android Chrome (ARM) | V8 (latest) | 22.3% (2 workers!), 6.8% (3 workers) | Affected — ARM fails even with 2 workers |
| Safari (JavaScriptCore) | N/A | — | Not tested |
CRITICAL UPDATE: This is NOT engine-specific. Firefox 148 (SpiderMonkey) exhibits 63.2% stale reads at just 1,000 iterations — nearly identical to Node.js V8's ~66%. Chrome V8 ~14.6 shows a lower rate (10.5%) but all three engines fail. This points to either:
- A spec gap in the ECMAScript/WebAssembly memory model (the "not-equal" path genuinely lacks ordering guarantees)
- A platform-level issue (Windows
WaitOnAddress/ futex implementation) - A hardware-level issue (AMD Ryzen 5 7500F TSO behavior)
The fact that two completely independent JavaScript engines (V8 and SpiderMonkey) exhibit the same bug at nearly the same rate strongly suggests this is a spec-level issue, not an engine implementation bug.
ARM is the definitive proof. On Android (MediaTek Dimensity 8300, ARM Cortex-A715/A510), the bug manifests with just 2 workers at a 22.3% error rate — the same 2-worker test that passes on every x86 system. x86's Total Store Order (TSO) hardware memory model was partially masking the bug by providing store ordering that ARM's relaxed memory model does not. The Atomics.wait "not-equal" fast path is missing a memory fence that ARM requires and x86 provides implicitly.
System tested on: Windows 11, AMD Ryzen 5 7500F (6 cores / 12 threads)
The browser demo auto-escalates Test 2 from 1,000 to 100,000 iterations (doubling each round) until a stale read is detected. Results on Chrome 146:
- Test 1 (2 workers, wait/notify, 50K): PASS — 0 / 200,000 stale reads
- Test 2 (3 workers, wait/notify, escalating): FAIL — 39,368 / 375,000 stale reads (10.5%)
- Test 3 (3 workers, spin, matched iterations): PASS — 0 / 288,000 stale reads
If Test 2 reaches 100,000 iterations with 0 stale reads, the bug is considered not applicable to that environment.
This bug affects any multi-worker SharedArrayBuffer code that uses Atomics.wait/Atomics.notify barriers with 3+ workers — including WebAssembly memory.atomic.wait32/memory.atomic.notify.
This bug was discovered by the SpawnDev.ILGPU development team while implementing multi-worker WebAssembly kernel dispatch in SpawnDev.ILGPU v4.6.0. The library compiles .NET GPU kernels to WebAssembly and dispatches them across multiple Web Workers with barrier synchronization.
The team:
- TJ (Todd Tanner / @LostBeard) — Project lead, SpawnDev.ILGPU author
- Riker (Claude CLI #1) — Isolated the bug to
wait32"not-equal" return path, built the definitive 3-test reproducer proving 2 workers pass / 3 workers fail / spin works - Data (Claude CLI #2) — Confirmed the 2/3 stale-read fraction analysis, correlated with seq_cst spec requirements, identified the "not-equal" fast path as the likely V8 implementation gap
- Tuvok (Claude CLI #3) — Traced the full fence layout and barrier protocol, confirming generation advancement logic correctness
The workaround — pure spin barriers using i32.atomic.load instead of memory.atomic.wait32 — is shipped in SpawnDev.ILGPU v4.6.0 and resolves 249 Wasm backend tests with 0 failures.
V8Bug/
├── index.html # Interactive browser demo (run all 3 tests)
├── worker.js # Shared worker (both barrier modes)
├── coi-serviceworker.js # Cross-origin isolation for GitHub Pages
├── style.css # Dark theme styling
├── three_worker_barrier.mjs # Node.js reproducer (FAIL expected)
├── spin_barrier_3w.mjs # Node.js control test (PASS expected)
├── LICENSE # MIT
└── README.md # This file
MIT