Don't crash the server when there's a listen error by jbr · Pull Request #587 · http-rs/tide

jbr · 2020-06-11T23:43:03Z

This change is an attempt to address #577, and also seems like correct behavior for a server. Regardless of what goes wrong with any individual request, we attempt to continue to respond to subsequent requests. Previously, the use of ? was returning from the listen function scope, which shut down the server.

Extracting the inner functions was mostly for clarity and will be further refactored/extracted in future PRs

closes #577

yoshuawuyts

LGTM

yoshuawuyts · 2020-06-12T14:40:20Z

I think we should merge this; though the more structural fix seems to be #386 by @tailhook which allows setting more fine-grained limits. Happy to merge this in the interim though!

tailhook · 2020-06-13T10:11:26Z

As far as I can see this will hang single CPU with 100% usage in case ENFILE or EMFILE is encountered (depending on one's use case it may be worse than just crashing, because in case of crashing supervisor would restart the process). I've tried to explain that in the book: https://book.async.rs/patterns/accept-loop.html#handling-errors Let me know if book is not clear enough on this issue.

Sorry, for not having enough time for finishing #386

Fishrock123 · 2020-06-15T18:47:55Z

src/server.rs

+            let stream = match stream {
+                Err(ref e) if is_transient_error(e) => continue,
+                Err(error) => {
+                    let delay = std::time::Duration::from_millis(500);


This seems kind of large? I guess it should be pretty unusual so maybe it's fine. Ideally we'd have a dynamically increasing backoff here.

I didn't have a good sense of an appropriate number here so I used the same number in the book section cited above

Well, it's 500ms in the book because it's reasonably fine to write a log message every 500ms. Skipping log message always will make the issue a nightmare to debug. Sleeping less but keeping message rarely will substantially complicate the code (which is fine for prod, but bad for a book).

But at the end of the day, you should not expect code to hit this timeout for a long time. I always consider this as an infrastructure issue: either bump file descriptor limit if you have enough memory, or lower listen queue, or put a load-balancer/queue in front of the process to get rid of error. So large timeout here is fine. Also, TCP delays can be larger than 500ms, so any client should be fine to wait that much.

jbr · 2020-06-16T20:10:17Z

Merging this based on @yoshuawuyts' comment above, especially since this is a short-term fix for multiple reasons:

When/if async-listen is Clone (Derive Clone for ByteStream and Token tailhook/async-listen#3), tide will be able to use that instead of this code
When tide supports multiple simultaneous listeners (a near-term goal), we'll need a totally different approach that pools open fds across multiple listeners of different types, and that will require rewriting this yet again

Don't crash the server when there's a listen error

d5e9c7b

jbr requested a review from yoshuawuyts June 11, 2020 23:43

yoshuawuyts approved these changes Jun 12, 2020

View reviewed changes

Fishrock123 approved these changes Jun 12, 2020

View reviewed changes

add additional error handling from async-std book

4947f9f

Fishrock123 reviewed Jun 15, 2020

View reviewed changes

Fishrock123 approved these changes Jun 15, 2020

View reviewed changes

jbr merged commit d2f56d3 into http-rs:master Jun 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't crash the server when there's a listen error#587

Don't crash the server when there's a listen error#587
jbr merged 2 commits intohttp-rs:masterfrom
jbr:dont-crash-on-listen-errors

jbr commented Jun 11, 2020 •

edited

Loading

Uh oh!

yoshuawuyts left a comment

Uh oh!

yoshuawuyts commented Jun 12, 2020

Uh oh!

tailhook commented Jun 13, 2020

Uh oh!

Fishrock123 Jun 15, 2020

Uh oh!

jbr Jun 15, 2020

Uh oh!

tailhook Jun 16, 2020

Uh oh!

jbr commented Jun 16, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jbr commented Jun 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yoshuawuyts left a comment

Choose a reason for hiding this comment

Uh oh!

yoshuawuyts commented Jun 12, 2020

Uh oh!

tailhook commented Jun 13, 2020

Uh oh!

Fishrock123 Jun 15, 2020

Choose a reason for hiding this comment

Uh oh!

jbr Jun 15, 2020

Choose a reason for hiding this comment

Uh oh!

tailhook Jun 16, 2020

Choose a reason for hiding this comment

Uh oh!

jbr commented Jun 16, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jbr commented Jun 11, 2020 •

edited

Loading