Skip to content

Add optional retry logic to RabbitMQ publisher for transient connection failures#295

Draft
Copilot wants to merge 5 commits intomainfrom
copilot/add-amqpstorm-wrapper
Draft

Add optional retry logic to RabbitMQ publisher for transient connection failures#295
Copilot wants to merge 5 commits intomainfrom
copilot/add-amqpstorm-wrapper

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Oct 26, 2025

Workers crash when publishing status after encountering Connection dead, no heartbeat or data received in >= 60s, terminating game servers unnecessarily on transient network issues.

Changes

Enhanced RabbitPublisher with optional retry

  • Added retry_config parameter to __init__() that defaults to None (no retry by default)
  • When retry_config is provided, publish() operations retry with exponential backoff
  • When retry_config is None (default), publish fails immediately without retry to prevent duplicate messages
  • Added _ensure_channel() to validate channel health before publish operations
  • Leverages existing is_transient_rmq_error filter from retry module

Added ResilientConnection wrapper (optional for future use)

  • Validates connection health and auto-reconnects on stale connections
  • Invalidates failed connections to force clean reconnection on retry
  • Supports context manager pattern for resource cleanup

Usage

Default behavior (no retry) - prevents duplicate messages:

# Default: Fails immediately on connection timeout (no retry)
publisher = RabbitPublisher(connection, binding_configs)
publisher.publish("status: RUNNING")

Opt-in retry behavior - explicitly enable when duplicate message handling is in place:

from libs.python.retry import RetryConfig, is_transient_rmq_error

# Explicitly enable retry with configuration
retry_config = RetryConfig(
    max_attempts=3,
    initial_delay=0.5,
    max_delay=5.0,
    exponential_base=2.0,
    exception_filter=is_transient_rmq_error,
)
publisher = RabbitPublisher(connection, binding_configs, retry_config=retry_config)
publisher.publish("status: RUNNING")  # Retries up to 3 times on transient failures

Testing

  • 11 tests for ResilientConnection (using pytest)
  • 8 tests for enhanced RabbitPublisher retry logic (using pytest)
  • All 19 tests passing
  • Includes test for default no-retry behavior
Original prompt

Add a lightweight wrapper for the AMQPStorm connection to handle timeouts/retries

The worker successfully started the CS2 dedicated server process but immediately encountered a fatal AMQP connection error (Connection dead, no heartbeat or data received in >= 60s) while attempting to publish its "RUNNING" status, leading to the immediate shutdown of the game server service.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 3 commits October 26, 2025 23:39
Co-authored-by: QMasterMoo <3915399+QMasterMoo@users.noreply.github.com>
Co-authored-by: QMasterMoo <3915399+QMasterMoo@users.noreply.github.com>
Co-authored-by: QMasterMoo <3915399+QMasterMoo@users.noreply.github.com>
Copilot AI changed the title [WIP] Add wrapper for AMQPStorm connection to handle timeouts and retries Add retry logic to RabbitMQ publisher for transient connection failures Oct 26, 2025
Copilot AI requested a review from QMasterMoo October 26, 2025 23:47
Comment thread libs/python/rmq/connection_wrapper_test.py Outdated
Comment thread libs/python/rmq/CONNECTION_WRAPPER.md Outdated
Comment thread libs/python/rmq/publisher.py Outdated
…onal

Co-authored-by: QMasterMoo <3915399+QMasterMoo@users.noreply.github.com>
Copilot AI changed the title Add retry logic to RabbitMQ publisher for transient connection failures Add optional retry logic to RabbitMQ publisher for transient connection failures Oct 28, 2025
Copilot AI requested a review from QMasterMoo October 28, 2025 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants