Skip to content

fix: abort connection establishment when connection is closed#1737

Open
arthurschreiber wants to merge 2 commits into
masterfrom
fix-close-during-connect
Open

fix: abort connection establishment when connection is closed#1737
arthurschreiber wants to merge 2 commits into
masterfrom
fix-close-during-connect

Conversation

@arthurschreiber

Copy link
Copy Markdown
Collaborator

Calling close while a connection was being established did not abort the connection process. Nothing connected close to the abort controller inside initialiseConnection, and the connection process never re-checked the connection state after resuming from await points.

If close was called before the socket was created, the connection process would continue unaffected, log in, reset the closed flag, and emit a successful connect event after end was already emitted, leaving behind a fully established "zombie" connection. If close was called after the socket was created, destroying the socket only emitted a close event, which the login logic ignores, so the connection process would dangle until the connect timer fired, emitting a connect event with an ETIMEOUT error and a duplicate end event.

Introduce a closeController that is created in connect and aborted by close with an ECLOSE connection error. Its signal is combined with the per-attempt timeout signal, so every abort checkpoint in the connection process now also observes connection closure:

  • Check for early abort after the socket is connected, as abort listeners added to an already aborted signal are never called.
  • No longer reset the closed flag during connection establishment, as it allowed a closed connection to be resurrected.
  • Route the connection failure path through cleanupConnection so that the end event is emitted exactly once.
  • Make the transient failure retry delay abortable so that closing during a retry wait does not emit a spurious retry event.

Calling `close` while a connection was being established did not abort
the connection process. Nothing connected `close` to the abort
controller inside `initialiseConnection`, and the connection process
never re-checked the connection state after resuming from `await`
points.

If `close` was called before the socket was created, the connection
process would continue unaffected, log in, reset the `closed` flag,
and emit a successful `connect` event after `end` was already emitted,
leaving behind a fully established "zombie" connection. If `close` was
called after the socket was created, destroying the socket only
emitted a `close` event, which the login logic ignores, so the
connection process would dangle until the connect timer fired,
emitting a `connect` event with an `ETIMEOUT` error and a duplicate
`end` event.

Introduce a `closeController` that is created in `connect` and aborted
by `close` with an `ECLOSE` connection error. Its signal is combined
with the per-attempt timeout signal, so every abort checkpoint in the
connection process now also observes connection closure:

- Check for early abort after the socket is connected, as abort
  listeners added to an already aborted signal are never called.
- No longer reset the `closed` flag during connection establishment,
  as it allowed a closed connection to be resurrected.
- Route the connection failure path through `cleanupConnection` so
  that the `end` event is emitted exactly once.
- Make the transient failure retry delay abortable so that closing
  during a retry wait does not emit a spurious `retry` event.
@claude

claude Bot commented Jun 10, 2026

Copy link
Copy Markdown

Code Review

Overview

This PR fixes a real and subtle race condition: calling close() during connect() could leave behind a "zombie" connected session (if close() was called before the socket was created) or a dangling attempt that eventually emits a spurious connect error and duplicate end event (if called after the socket was created). The root causes were: nothing wired close() to the async connection flow, and this.closed = false inside the login path was allowing a closed connection to be resurrected.

The fix is conceptually clean: introduce a closeController whose signal is composed with the per-attempt timeout signal, so every await checkpoint in initialiseConnection() automatically observes a closure.


Correctness

Event ordering is correct. When close() is called:

  1. cleanupConnection() runs synchronously, setting closed = true and scheduling emit('end') via process.nextTick
  2. The initialiseConnection() rejection handler runs as a microtask (after close() returns), scheduling emit('connect', err) via process.nextTick

Since both are nextTick-scheduled, and the end nextTick is enqueued before the connect nextTick, end always fires first. The tests correctly validate this with assert.strictEqual(endCount, 1) inside the connect callback.

signal.throwIfAborted() after socket connects (line 2072) is necessary and correct. Adding an abort listener to an already-aborted AbortSignal is a no-op — the listener is never called. This explicit check closes the window where close() is called while connectOnPort is in flight but the socket callback fires after the signal is already aborted.

Removal of this.closed = false is the key fix for the zombie connection bug. Previously, this reset the closed flag mid-connection, allowing a closed connection to complete its login and emit connect successfully.

The finally block in performTransientFailureRetry correctly cleans up both the timer (clearTimeout) and the event listener (removeEventListener) on both success and abort paths, preventing leaks.


Issues

Missing test for retry delay abort. The PR description specifically calls out "Make the transient failure retry delay abortable so that closing during a retry wait does not emit a spurious retry event", but there's no test covering close() being called during the connectionRetryInterval wait. The existing two tests cover pre-socket and mid-login scenarios. A third test triggering a transient error, waiting until the retry delay is active, then calling close() would complete coverage.

closeController is never reset after use. After a successful or failed connection, closeController remains set on the instance. A later call to close() on an established connection will abort the controller with the message "Connection closed before the connection was established." — which is misleading. This is functionally harmless (the aborted signal is only observed by initialiseConnection(), which has already returned), but it could confuse future readers or tooling.

Unused chunks variables in the test. In connection-close-test.ts, both const chunks: Buffer[] = [] arrays are populated in for await loops but never read. Minor cleanup.


Minor Observations

  • this.closeController!.signal (non-null assertion) at lines 2018 and 3419 is safe — both are only reachable from initialiseConnection(), which is only called after closeController is assigned. The invariant holds, but it implicitly relies on the call chain remaining stable.
  • AbortSignal.any() was already in use at line 2067 (pre-existing code), so adding it at line 2018 is consistent with the project's established Node.js version requirements.
  • Routing connection failure through cleanupConnection() in the rejection handler (replacing the manual this.closed = true + two process.nextTick calls) is a good simplification — it centralises end emission and makes the failure path match the success path's cleanup.

Summary

The fix is well-reasoned and correctly addresses all the root causes described in the PR. The implementation is idiomatic and the two tests cover the primary scenarios with a realistic mock server (full PRELOGIN → LOGIN7 → Initial SQL sequence). The main gap is a test for the transient retry abort path. Everything else is minor.

@codecov

codecov Bot commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 79.84%. Comparing base (5717f91) to head (33e8902).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1737      +/-   ##
==========================================
+ Coverage   79.82%   79.84%   +0.02%     
==========================================
  Files          90       90              
  Lines        4901     4908       +7     
  Branches      923      923              
==========================================
+ Hits         3912     3919       +7     
  Misses        687      687              
  Partials      302      302              

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@claude

claude Bot commented Jun 14, 2026

Copy link
Copy Markdown

Code Review: fix: abort connection establishment when connection is closed

Overview

This PR fixes a real and tricky race condition: calling close() during connect() previously left behind zombie connections or caused duplicate lifecycle events. The root cause was that nothing connected close to the async initialiseConnection process. The fix introduces a closeController: AbortController whose signal is composed with the per-attempt timeout signal via AbortSignal.any(), so every await checkpoint in the connection pipeline observes closure.

The design is sound. A few observations below.


Correctness

Event ordering when close() is called synchronously after connect()

When close() is called before initialiseConnection() has a chance to reject, the sequence is:

  1. close()cleanupConnection() queues nextTick(() => emit('end')) and sets closed = true
  2. initialiseConnection() eventually rejects (signal already aborted)
  3. Rejection handler queues nextTick(() => emit('connect', err)), then calls cleanupConnection() (no-op — closed is already true)

This means end fires before the connect callback in the "close immediately" scenario. The test asserts this (assert.strictEqual(endCount, 1) inside the connect callback), so it appears intentional. Worth a brief comment documenting the ordering contract since it's counterintuitive.

performReRouting() is covered — it calls initialiseConnection() which picks up the same closeController!.signal. No gap there.

performTransientFailureRetry() correctly uses only closeController.signal (not the overall combined timeout signal), which is right because each initialiseConnection() call creates its own fresh timeout.


Missing test coverage

The retry wait abort path is new code:

// src/connection.ts:3441-3455
const closeSignal = this.closeController!.signal;
closeSignal.throwIfAborted();
const onAbort = () => { reject(closeSignal.reason); };
closeSignal.addEventListener('abort', onAbort, { once: true });
...

There's no test that calls close() during a transient-failure retry interval. That scenario is what motivated adding reject to withResolvers and the new onAbort listener. A third test case covering it would complete the coverage and guard against regression.


Minor observations

Non-null assertions on closeController: this.closeController!.signal appears in initialiseConnection and performTransientFailureRetry. These are safe because both are only reachable after connect() sets closeController, but the pattern hides a class of future bugs. An alternative is to throw with a clear message if closeController is undefined, though the non-null assertion is an acceptable choice here.

closeController is never cleared after a successful connection: After initialiseConnection() resolves successfully, this.closeController still holds the now-unused controller. It's not harmful (the signal is not aborted, the controller is not referenced elsewhere), but nulling it out post-connection would make the lifecycle clearer and signal to readers that the field is only meaningful during connection establishment.


Summary

The fix is correct, well-targeted, and handles the described zombie-connection and duplicate-end-event bugs. The AbortSignal.any() composition pattern is idiomatic and clean. The main gap is a missing test for close-during-retry-interval. Everything else is minor. The PR is in good shape.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant