Posting content SSE conform with bentoml #5545
-
|
Hello, So far, I have found that it is quite easy to realise streaming in Bentoml while yielding the outputs. However, I am wondering how I could achieve this SSE-conformant streaming, i.e. how it sends the JSON in style. data: some_data\n\n Simply building this as a string didn't work as it was not a valid base model. I have seen older pull requests on this topic where it was achieved with runners, but I have since found that this is already outdated. Thanks in advance. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
|
SSE streaming with BentoML can be tricky. At RevolutionAI (https://revolutionai.io), we have implemented streaming endpoints for LLM responses. Working pattern: import bentoml
from starlette.responses import StreamingResponse
@bentoml.service
class LLMService:
@bentoml.api
def stream_generate(self, prompt: str):
async def generate():
for token in model.generate_stream(prompt):
yield f"data: {token}\n\n"
return StreamingResponse(generate(), media_type="text/event-stream")Key points:
What specific SSE behavior are you trying to achieve? |
Beta Was this translation helpful? Give feedback.
-
|
SSE with BentoML works great! At RevolutionAI (https://revolutionai.io) we use it for streaming LLM responses. Quick pattern: import bentoml
from starlette.responses import StreamingResponse
@bentoml.service
class LLMService:
@bentoml.api
async def stream(self, prompt: str):
async def generate():
for token in llm.stream(prompt):
yield f"data: {token}\n\n"
yield "data: [DONE]\n\n"
return StreamingResponse(
generate(),
media_type="text/event-stream"
)Tips:
Works with OpenAI-compatible clients! |
Beta Was this translation helpful? Give feedback.
SSE with BentoML works great! At RevolutionAI (https://revolutionai.io) we use it for streaming LLM responses.
Quick pattern:
Tips:
Cache-Control: no-cachetext/event-streamcontent type[DONE]markerWorks with OpenAI-compatible clients!