feat: add opt-in automatic recovery for promise and callback APIs#833
feat: add opt-in automatic recovery for promise and callback APIs#833cressie176 merged 15 commits intoamqp-node:mainfrom
Conversation
Signed-off-by: Ivan <s20026352@gmail.com>
Signed-off-by: Ivan <s20026352@gmail.com>
Signed-off-by: Ivan <s20026352@gmail.com>
Signed-off-by: Ivan <s20026352@gmail.com>
Signed-off-by: Ivan <s20026352@gmail.com>
Signed-off-by: Ivan <s20026352@gmail.com>
|
Thank you for this @ShiriNmi1520, it looks like a very comprehensive piece of work. I've have some time off next week and will review. |
Resolved conflicts in callback_api.js, channel_api.js, and package-lock.json. Updated recovery code to use arrow functions and node: protocol imports to match main branch conventions. Renamed test/recovery.js to recovery.test.js and converted from suite/test to describe/it for the Node.js built-in test runner.
92e53bd to
b26f16a
Compare
|
@ShiriNmi1520, thank you for your patience. It is a really good PR. Just have a lot on at the moment, so taking time to get to it. |
|
@ShiriNmi1520 I've finished my review now. I have a few questions which I suspect will lead to some changes. I also wanted to add that this is some of most well written code I have ever seen. Very well crafted. Thank you. |
Appreciate your review, please allow me some time to implement modifications based on your review :) |
waitForConnect previously resolved with the raw amqplib model, letting callers bypass the recovery wrapper and hold a reference that becomes invalid after a reconnect. RecoveringPromiseModel.waitForConnect now resolves with the wrapper, and RecoveringCallbackModel.waitForConnect invokes its callback with the wrapper. The internal _initialWait no longer carries the model, and connectWithRecovery* helpers drop the now-redundant .then(() => recovering) shim.
When unbinding a closed model, swap the error listener for a no-op instead of removing it. Late errors emitted during socket teardown would otherwise have no listener and crash the application. Mark the reconnect setTimeout as unref'd so a pending retry does not keep the event loop alive and block graceful shutdown.
When _scheduleReconnect runs while _stopped is true, it can only have been triggered from _connect's catch path after close() already rejected initial and pending waiters. Both reject calls were idempotent no-ops at that point, so remove them.
The underlying connection emits update-secret-ok when the broker acknowledges a ConnectionUpdateSecret request. _bindModel now listens for it on the model and re-emits it on the core, and wireCommonEvents forwards it to the outer wrapper, matching the treatment of blocked, unblocked, and connectionError.
Drop the undocumented maxAttempts alias; maxRetries is the only public key and matches the implemented semantics (retries exclude the initial connect). Drop the client parameter from setup. Exposing RecoveringCore leaked private methods into the public surface, and the wrapper that callers already hold provides the same lifecycle hooks.
|
Thank you again for taking the time to review everything so carefully. Each of your points was helpful and made a difference. I've pushed five follow-up commits, with each one addressing a separate change. |
Align event naming with the rest of the surface (`update-secret-ok`, `disconnect`): `connectFailed`, `reconnectScheduled`, `reconnectFailed` become `connect-failed`, `reconnect-scheduled`, `reconnect-failed`. Rename `connectionError` to `error` so existing handlers registered on the underlying connection keep working when wrapped in recovery, and so EventEmitter's default crash-on-unhandled-error semantics are preserved.
|
My bad re unref - adding it broke the recovery tests in Node 18, 20 and 22. Thinking about it, we didn't want to unref the timer between recovery attempts, as if the event loop was drained it would have caused the application to exit. You already clear the timer when close is called, so the user can still exit the application gracefully. |
|
Published as amqplib v1.1.0. Thank you very much |
Summary
This PR adds a generic, opt-in recovery mechanism to
amqplibfor both public APIs:require('amqplib'))require('amqplib/callback_api'))Recovery is enabled only when
connect(..., { recovery: ... })is provided, so the default behaviour remains unchanged.Motivation
Fix #25
What changed
lib/recovery.jschannel_api.jscallback_api.jsREADME.mdfor usage/options/eventstest/recovery.jsRecovery options
initialDelaymaxDelayfactorjittermaxRetries(also acceptsmaxAttempts)setup(model, client[, done])called after each successful connect/reconnectMermaid sequence chart
sequenceDiagram autonumber participant App participant Client as amqplib (recovering client) participant Broker as RabbitMQ App->>Client: connect(url, { recovery }) Client->>Broker: open TCP + AMQP handshake Broker-->>Client: open-ok Client->>Client: run recovery.setup(...) Client-->>App: emit "connect" Note over Broker,Client: normal operation Broker-x Client: socket/connection closed Client-->>App: emit "disconnect" Client-->>App: emit "reconnectScheduled" loop retry with backoff+jitter Client->>Broker: reconnect + handshake alt success Broker-->>Client: open-ok Client->>Client: run recovery.setup(...) Client-->>App: emit "connect" else failed and retries remain Client-->>App: emit "connectFailed" Client-->>App: emit "reconnectScheduled" end end alt retries exhausted Client-->>App: emit "reconnectFailed" endBackward compatibility
recoveryis explicitly configured.Notes / limitations