Skip to content

feat: add opt-in automatic recovery for promise and callback APIs#833

Merged
cressie176 merged 15 commits intoamqp-node:mainfrom
ShiriNmi1520:recoveryConnection
May 9, 2026
Merged

feat: add opt-in automatic recovery for promise and callback APIs#833
cressie176 merged 15 commits intoamqp-node:mainfrom
ShiriNmi1520:recoveryConnection

Conversation

@ShiriNmi1520
Copy link
Copy Markdown
Contributor

Summary

This PR adds a generic, opt-in recovery mechanism to amqplib for both public APIs:

  • Promise API (require('amqplib'))
  • Callback API (require('amqplib/callback_api'))

Recovery is enabled only when connect(..., { recovery: ... }) is provided, so the default behaviour remains unchanged.

Motivation

Fix #25

What changed

  • Added shared recovery engine: lib/recovery.js
  • Wired recovery into:
    • channel_api.js
    • callback_api.js
  • Added docs in README.md for usage/options/events
  • Added unit tests in test/recovery.js

Recovery options

  • initialDelay
  • maxDelay
  • factor
  • jitter
  • maxRetries (also accepts maxAttempts)
  • setup(model, client[, done]) called after each successful connect/reconnect

Mermaid sequence chart

sequenceDiagram
    autonumber
    participant App
    participant Client as amqplib (recovering client)
    participant Broker as RabbitMQ

    App->>Client: connect(url, { recovery })
    Client->>Broker: open TCP + AMQP handshake
    Broker-->>Client: open-ok
    Client->>Client: run recovery.setup(...)
    Client-->>App: emit "connect"

    Note over Broker,Client: normal operation

    Broker-x Client: socket/connection closed
    Client-->>App: emit "disconnect"
    Client-->>App: emit "reconnectScheduled"

    loop retry with backoff+jitter
        Client->>Broker: reconnect + handshake
        alt success
            Broker-->>Client: open-ok
            Client->>Client: run recovery.setup(...)
            Client-->>App: emit "connect"
        else failed and retries remain
            Client-->>App: emit "connectFailed"
            Client-->>App: emit "reconnectScheduled"
        end
    end

    alt retries exhausted
        Client-->>App: emit "reconnectFailed"
    end
Loading

Backward compatibility

  • No behavior change unless recovery is explicitly configured.

Notes / limitations

  • This provides connection-level recovery and setup replay.
  • It does not guarantee transparent replay of in-flight channel operations across disconnects.

Signed-off-by: Ivan <s20026352@gmail.com>
Signed-off-by: Ivan <s20026352@gmail.com>
Signed-off-by: Ivan <s20026352@gmail.com>
Signed-off-by: Ivan <s20026352@gmail.com>
Signed-off-by: Ivan <s20026352@gmail.com>
@ShiriNmi1520 ShiriNmi1520 marked this pull request as draft February 13, 2026 07:59
@ShiriNmi1520 ShiriNmi1520 marked this pull request as ready for review February 13, 2026 08:03
Signed-off-by: Ivan <s20026352@gmail.com>
@cressie176
Copy link
Copy Markdown
Collaborator

Thank you for this @ShiriNmi1520, it looks like a very comprehensive piece of work. I've have some time off next week and will review.

Resolved conflicts in callback_api.js, channel_api.js, and
package-lock.json. Updated recovery code to use arrow functions and
node: protocol imports to match main branch conventions. Renamed
test/recovery.js to recovery.test.js and converted from suite/test
to describe/it for the Node.js built-in test runner.
Comment thread lib/recovery.js
@cressie176
Copy link
Copy Markdown
Collaborator

@ShiriNmi1520, thank you for your patience. It is a really good PR. Just have a lot on at the moment, so taking time to get to it.

Comment thread lib/recovery.js Outdated
Comment thread lib/recovery.js
Comment thread lib/recovery.js
Comment thread lib/recovery.js
Comment thread lib/recovery.js Outdated
Comment thread lib/recovery.js
Comment thread lib/recovery.js
Comment thread lib/recovery.js Outdated
@cressie176
Copy link
Copy Markdown
Collaborator

@ShiriNmi1520 I've finished my review now. I have a few questions which I suspect will lead to some changes. I also wanted to add that this is some of most well written code I have ever seen. Very well crafted. Thank you.

@ShiriNmi1520
Copy link
Copy Markdown
Contributor Author

@ShiriNmi1520 I've finished my review now. I have a few questions which I suspect will lead to some changes. I also wanted to add that this is some of most well written code I have ever seen. Very well crafted. Thank you.

Appreciate your review, please allow me some time to implement modifications based on your review :)

waitForConnect previously resolved with the raw amqplib model, letting
callers bypass the recovery wrapper and hold a reference that becomes
invalid after a reconnect.

RecoveringPromiseModel.waitForConnect now resolves with the wrapper,
and RecoveringCallbackModel.waitForConnect invokes its callback with
the wrapper. The internal _initialWait no longer carries the model,
and connectWithRecovery* helpers drop the now-redundant
.then(() => recovering) shim.
When unbinding a closed model, swap the error listener for a no-op
instead of removing it. Late errors emitted during socket teardown
would otherwise have no listener and crash the application.

Mark the reconnect setTimeout as unref'd so a pending retry does not
keep the event loop alive and block graceful shutdown.
When _scheduleReconnect runs while _stopped is true, it can only have
been triggered from _connect's catch path after close() already rejected
initial and pending waiters. Both reject calls were idempotent no-ops at
that point, so remove them.
The underlying connection emits update-secret-ok when the broker
acknowledges a ConnectionUpdateSecret request. _bindModel now listens
for it on the model and re-emits it on the core, and wireCommonEvents
forwards it to the outer wrapper, matching the treatment of blocked,
unblocked, and connectionError.
Drop the undocumented maxAttempts alias; maxRetries is the only public
key and matches the implemented semantics (retries exclude the initial
connect).

Drop the client parameter from setup. Exposing RecoveringCore leaked
private methods into the public surface, and the wrapper that callers
already hold provides the same lifecycle hooks.
@ShiriNmi1520
Copy link
Copy Markdown
Contributor Author

Thank you again for taking the time to review everything so carefully. Each of your points was helpful and made a difference. I've pushed five follow-up commits, with each one addressing a separate change.

Comment thread lib/recovery.js Outdated
Comment thread lib/recovery.js
Align event naming with the rest of the surface (`update-secret-ok`,
`disconnect`): `connectFailed`, `reconnectScheduled`, `reconnectFailed`
become `connect-failed`, `reconnect-scheduled`, `reconnect-failed`.

Rename `connectionError` to `error` so existing handlers registered on
the underlying connection keep working when wrapped in recovery, and so
EventEmitter's default crash-on-unhandled-error semantics are preserved.
@cressie176
Copy link
Copy Markdown
Collaborator

My bad re unref - adding it broke the recovery tests in Node 18, 20 and 22. Thinking about it, we didn't want to unref the timer between recovery attempts, as if the event loop was drained it would have caused the application to exit. You already clear the timer when close is called, so the user can still exit the application gracefully.

@cressie176 cressie176 merged commit 36cb910 into amqp-node:main May 9, 2026
4 checks passed
@cressie176
Copy link
Copy Markdown
Collaborator

Published as amqplib v1.1.0. Thank you very much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Auto-reconnection

2 participants