Skip to content

Comments

sending err back to fetch to trigger retry#2136

Open
aswanidutt wants to merge 5 commits intomainfrom
aswanidutt/VAULT-38829-Vault-Agent-retrybackoff-config
Open

sending err back to fetch to trigger retry#2136
aswanidutt wants to merge 5 commits intomainfrom
aswanidutt/VAULT-38829-Vault-Agent-retrybackoff-config

Conversation

@aswanidutt
Copy link
Collaborator

@aswanidutt aswanidutt commented Feb 19, 2026

issue: When a rotating secret that has rotation_period but ttl=0, it should not be treated as a rotating secret. Instead, it should wait and retry exponentially until maxbackoff=5m

What Changed:

  • Modified leaseCheckWait() to return (time.Duration, error) instead of just time.Duration
  • Added ErrVaultTTLZero error constant for TTL=0 detection
  • When a rotating secret returns ttl=0, the error propagates to trigger the existing retry mechanism
  • Added maxJitter constant (50%)
  • Updated VaultReadQuery and VaultWriteQuery to handle and propagate the error
  • Added comprehensive test coverage for TTL=0 scenarios

Fixes: VAULT-38829-https://hashicorp.atlassian.net/browse/VAULT-38829?

before this change: checking to rotate every second

image

after the code change : Exponential backoff: 250ms,500ms,1s,2s, 4s, 8s, 16s, 32s, 64s, the max is 5m

image

var retry bool
var sleep time.Duration

// Check if this is a TTL=0 error and use custom retry logic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following why ttl=0 needs custom retry logic. Seems like whatever calls consul template should be able to set the RetryConfig as needed?

Copy link
Collaborator Author

@aswanidutt aswanidutt Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tvoran when I talked to Steve about Citibank s problem, he said they dont want frequent pings but a exponential backoff retry instead and he agreed to a 5 min maxbackoff with unlimited retries is a good fit considering other customers also. So, the default of consul template`s retry.go - max 12 retries of 1 minute each is not going to work in our case

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, and it looks like the desired retry behavior can be configured in the RetryConfig? We should try to avoid one-off behavior if at all possible.

From what I can tell in command/agent/internal/ctmanager/runner_config.go, it looks like the RetryConfig parameters are set from the agent configuration. That's why I'm suggesting treating ttl=0 returned from a static role as an error in consul-template and letting consul-template run its exponential backoff retries as configured.

@aswanidutt aswanidutt requested a review from tvoran February 24, 2026 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants