Skip to content

fix: Use a new context for the domain name check loop.#1007

Merged
hessjcg merged 2 commits intomainfrom
fix-dns-resolver-error
Aug 27, 2025
Merged

fix: Use a new context for the domain name check loop.#1007
hessjcg merged 2 commits intomainfrom
fix-dns-resolver-error

Conversation

@hessjcg
Copy link
Collaborator

@hessjcg hessjcg commented Aug 27, 2025

This fixes a bug in the go connector that was exposed by the Auth Proxy. If
the Auth Proxy is configured with a domain name, and then the DNS record is
changed to point to a new instance. Connections to the new instance will be
terminated every 30 seconds.

This was caused by a bug in how the MonitoredCache set up the background
Context for for the goroutine that checks for changes in the DNS record.

test: Adds test to simulate the way that the Auth Proxy calls the dialer:

Added test TestDialerClosesOldConnectionsAfterDnsChange which will demonstrate
that the dialer behaves correctly when it conencts using a DNS name, and then
the value of the DNS record changes.
The dialer should

  • close existing connections,
  • open new connections to the updated instance.
  • new connections should stay open after the DNS loop completes its first cycle.

This also makes the mock resolver more realistic. If it is called with a context that
is done, the resolver will return an error.

fix: Create a new context for the domain name check loop.

When a monitoredCache runs a goroutine to check for changes in the domain
name, it needs its own context. The proxy (and other library users) call
dialer.Dial(ctx...), they pass in a context with a 30 second timeout.

The background process that checks for domains should not use this, it should
create its own context with no timeout.

This fixes a customer-discovered bug in the proxy.

@hessjcg hessjcg requested a review from a team as a code owner August 27, 2025 17:06
@hessjcg hessjcg force-pushed the fix-dns-resolver-error branch from 6aea1dc to 008b416 Compare August 27, 2025 17:29
Added test TestDialerClosesOldConnectionsAfterDnsChange which will demonstrate
that the dialer behaves correctly when it conencts using a DNS name, and then 
the value of the DNS record changes.
The dialer should
- close existing connections, 
- open new connections to the updated instance. 
- new connections should stay open after the DNS loop completes its first cycle. 

This also makes the mock resolver more realistic. If it is called with a context that
is done, the resolver will return an error.
When a monitoredCache runs a goroutine to check for changes in the domain
name, it needs its own context. The proxy (and other library users) call
dialer.Dial(ctx...), they pass in a context with a 30 second timeout. 

The background process that checks for domains should not use this, it should
create its own context with no timeout. 

This fixes a customer-discovered bug in the proxy.
@hessjcg hessjcg force-pushed the fix-dns-resolver-error branch from 008b416 to ddcf0dd Compare August 27, 2025 19:24
@hessjcg hessjcg merged commit 908d0cf into main Aug 27, 2025
15 checks passed
@hessjcg hessjcg deleted the fix-dns-resolver-error branch August 27, 2025 20:52
hessjcg added a commit to GoogleCloudPlatform/cloud-sql-proxy that referenced this pull request Aug 27, 2025
…y fix for DNS name configurations. (#2487)

This incorporates a high-priority when the proxy is configured using the go connector.

See GoogleCloudPlatform/cloud-sql-go-connector#1007
hessjcg added a commit to GoogleCloudPlatform/cloud-sql-proxy-operator that referenced this pull request Aug 28, 2025
Operator will use the proxy that includes the fix to the check DNS name
changed loop. 

See GoogleCloudPlatform/cloud-sql-go-connector#1007
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants