Skip to content

RFC: Streaming response bodies #158

@julik

Description

@julik

I started work on the streaming response bodies in https://github.com/toland/patron/tree/response-body-callback – using a Ruby callback to get the body data as it gets returned. However, the current API Patron has for this is not a good fit because the Response object is only returned once the curl_easy_perform has returned. In practice, this leads to a pattern like this:

sess = Patron::Session.new
sess.on_body do |body_chunk_binary_str|
  ...
  # This block will execute first
end
resp = sess.get('/endpoint') # This will execute second
resp.status #=> 401, we were actually unauthorized 😭

This in turn means that implementing streaming response bodies is very obtuse, because we only get to the status code of the response later. I can envision the following APIs instead:

sess = Patron::Session.new
resp = sess.streaming_get('/get') # This must return a StreamingResponse or similar
begin
  if resp.status != 200
     error_body = resp.receive_body_and_close
     raise "Invalid status #{resp.status} - #{error_body}"
  end
  resp.each do |body_chunk|
    # do something with a body chunk
  end
ensure
  resp.close # because we are still inside `curl_easy_perform` at this stage
end

We could stretch that paradigm even further out and use something like https://github.com/Tonkpils/celluloid-eventsource use pattern which is very similar to EventMachine, and is also known as "callback spaghetti":

sess = Patron::AsyncSession.new
sess.on_status_and_headers do |status, header_hash|
  if status != 200
    sess.abort
  end
end
sess.on_body_chunk do |chunk|
  @event_bus.deliver(chunk)
end
# This will actually block the current thread until `curl_easy_perform` returns
sess.perform!

Yet another option is going for a minimalist API akin to what Excon uses, which would reduce the calling code to this:

sess = Patron::Session.new
sess.streaming_get('/url') do |status, header_hash, body_chunk|
  # here you can abort or continue, same status and header_hash
  # get yielded every time, and we need to implement something like
  # a...
  throw :patron_abort
  # or alternatively just a
  break
  # which would force the block to return `nil` to the calling C code.
end

That is only possible if we remove the Response objects from the picture, but it is probably not dramatic because the code calling long-polled endpoints is going to end up looking very different from the code that reads eagerly.

When you are using a streaming response you don't want to buffer on the Ruby side - actually you probably want the opposite - no buffering at all and the Curl buffer size set to a minimum, so we cannot buffer until curl_easy_perform returns. Besides, we might be dealing with long-running responses here, which would mean that the curl_easy_perform continues until the process/thread quits.

Aside from the fact that this needs a lot of rearchitecting of the C code - what are people's thoughts on these APIs and what would be a better fit for Patron? I am at loss - what I do know is that all of these are possible, but I rather double-check we all like this before I do this change which is going to span hundreds of lines of C.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions