STF-322: Bounded transport-failure retry in WebServiceClient#619
STF-322: Bounded transport-failure retry in WebServiceClient#619
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a transport-failure retry mechanism to the WebServiceClient, defaulting to one retry for specific IOExceptions such as connection resets, broken pipes, and connect timeouts. The implementation includes a new maxRetries configuration in the WebServiceClient.Builder, updated documentation in the README and CHANGELOG, and comprehensive test coverage for various retry scenarios. Feedback was provided regarding a potential integer overflow in the sendWithRetry method; if maxRetries is set to Integer.MAX_VALUE, the loop condition fails and results in a NullPointerException because lastException remains null. A refactor was suggested to use a while loop to avoid this overflow and simplify the logic.
| IOException lastException = null; | ||
| int attempts = maxRetries + 1; | ||
| for (int i = 0; i < attempts; i++) { | ||
| try { | ||
| return httpClient.send(request, HttpResponse.BodyHandlers.ofInputStream()); | ||
| } catch (IOException e) { | ||
| if (!isRetriableTransportFailure(e) || i == attempts - 1) { | ||
| throw e; | ||
| } | ||
| lastException = e; | ||
| } | ||
| } | ||
| // Unreachable: loop either returns or throws. | ||
| throw lastException; |
There was a problem hiding this comment.
The current implementation of sendWithRetry has a potential NullPointerException due to integer overflow and contains redundant logic.
If maxRetries is set to Integer.MAX_VALUE, the expression maxRetries + 1 overflows to Integer.MIN_VALUE. Consequently, the loop condition i < attempts (i.e., 0 < -2147483648) evaluates to false, the loop is skipped entirely, and the method proceeds to throw lastException;. Since lastException is initialized to null, a NullPointerException is thrown instead of the intended transport error.
Additionally, the lastException variable is redundant because the loop is designed to either return a successful response or throw the current exception e on the final attempt or upon encountering a non-retriable failure. Refactoring to a while loop avoids the overflow risk and simplifies the logic.
private HttpResponse<InputStream> sendWithRetry(HttpRequest request)
throws IOException, InterruptedException {
int attempts = 0;
while (true) {
try {
return httpClient.send(request, HttpResponse.BodyHandlers.ofInputStream());
} catch (IOException e) {
if (!isRetriableTransportFailure(e) || attempts >= maxRetries) {
throw e;
}
attempts++;
}
}
}|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements a configurable retry mechanism for transport-level failures in the WebServiceClient. It introduces a maxRetries setting in the WebServiceClient.Builder, defaulting to one retry for issues like connection resets, broken pipes, and connection timeouts, while specifically excluding request-phase timeouts and HTTP 4xx/5xx responses. The changes include updates to the documentation, logic to handle retries within the client, and a comprehensive suite of tests to verify the new behavior and ensure correct thread interruption handling. I have no feedback to provide.
|
@gemini-code-assist review |
There was a problem hiding this comment.
Code Review
This pull request introduces a transport-failure retry mechanism to the WebServiceClient, defaulting to one retry for transient errors such as connection resets or broken pipes. It adds a maxRetries(int) configuration to the Builder and includes comprehensive documentation and tests. Review feedback identifies a discrepancy between the broad implementation of the retry predicate and the narrower description in the PR, as well as an inconsistency regarding the handling of HttpConnectTimeoutException.
I am having trouble creating individual review comments. Click here to see my feedback.
src/main/java/com/maxmind/minfraud/WebServiceClient.java (401-422)
There is a discrepancy between the PR description and the implementation of this predicate. The description mentions a "narrow" predicate that "walks the cause chain" to match specific errors like Connection reset or Broken pipe. However, the current implementation is broad, retrying any IOException that isn't a timeout or interruption.
If the intent was to limit retries to specific transport failures as described, the implementation should be updated to inspect the exception's cause or message. If the broad approach is intentional (which is often acceptable for a single retry on a stale connection), the PR description should be updated to avoid confusion.
src/main/java/com/maxmind/minfraud/WebServiceClient.java (409-411)
The PR summary states that HttpConnectTimeoutException is retried, but this implementation returns false for all HttpTimeoutException instances, which includes HttpConnectTimeoutException. The README and CHANGELOG also state that it is excluded.
If the intention was indeed to retry connection timeouts while excluding request-phase timeouts (as suggested by the PR summary), you should check for HttpConnectTimeoutException specifically before the general HttpTimeoutException check.
if (e instanceof java.net.http.HttpConnectTimeoutException) {
return true;
}
if (e instanceof HttpTimeoutException) {
return false;
}When the JDK HttpClient pool reuses an idle connection that an intermediary (load balancer, proxy, NAT) has silently closed, the next send() fails with "Connection reset", "Broken pipe", or related transport errors. A single retry recovers transparently without exposing this race to callers. The default JDK keep-alive timeout exceeds many intermediaries' idle timeout, making this mismatch the common case. The retry predicate is permissive by exclusion: any IOException from httpClient.send() is retried EXCEPT HttpTimeoutException (covering both request-phase and connect-phase timeouts, since HttpConnectTimeoutException is a subclass) and InterruptedIOException. Both timeouts are customer-set budgets that retrying would silently extend; InterruptedIOException is a user-cancellation signal. HTTP 4xx and 5xx responses are surfaced as HttpException (and subclasses) from a separate code path -- they come back as HttpResponse objects rather than IOExceptions, so the predicate is structurally unable to retry them. Customers can opt out via .maxRetries(0). Default is 1 (one retry, two total attempts). The interrupt flag is restored before rewrapping InterruptedException, and a pre-set interrupt short-circuits the predicate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The existing catch (InterruptedException) blocks in reportTransaction() and responseFor() rewrap into MinFraudException without restoring the thread's interrupt status, silently swallowing the cancellation signal. Per Java's interruption protocol, code that catches InterruptedException without rethrowing it should re-set the flag so callers up the stack can observe the cancellation. This is an independent bug fix bundled into the STF-322 retry work because the retry feature exposes the path more often. Per project commit hygiene it lands as a separate commit so it can be cherry-picked or reverted on its own. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cover all 8 scenarios: connection-reset retry on score and reportTransaction endpoints, no retry on HttpTimeoutException, retry on connect timeout (deterministic via a closed local ServerSocket), no retry on 4xx/5xx, .maxRetries(0) opt-out, and pre-interrupt short-circuit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
WebServiceClient.Builder.maxRetries(int)(defaults to 1) so a single transient transport failure (Connection reset,Broken pipe, EOF, closed channel, etc.) is retried transparently.IOExceptionfromhttpClient.send()is retried EXCEPTHttpTimeoutException(which covers both request-phase and connect-phase timeouts, sinceHttpConnectTimeoutExceptionextends it) andInterruptedIOException. Both timeouts are customer-set budgets that retrying would silently extend;InterruptedIOExceptionis a user-cancellation signal we honor. HTTP 4xx/5xx come back asHttpResponseobjects (notIOExceptions) and are surfaced through the existing exception hierarchy, so the predicate is structurally unable to retry them.InterruptedExceptionrewrap paths. Updates README and CHANGELOG; opt out via.maxRetries(0).Fixes STF-322. Parallel PR for
geoip2-javauses a byte-identical predicate (maxmind/GeoIP2-java#693).Test plan
mvn checkstyle:checkcleanmvn verify— 235/235 pass (24 existing + 8 newWebServiceClientTestcases)jdk.httpclient.keepalive.timeoutas the complementary workaround🤖 Generated with Claude Code