Posting content SSE conform with bentoml #5545

bilaljo · 2026-02-05T20:26:19Z

bilaljo
Feb 5, 2026

Hello,

So far, I have found that it is quite easy to realise streaming in Bentoml while yielding the outputs. However, I am wondering how I could achieve this SSE-conformant streaming, i.e. how it sends the JSON in style.

data: some_data\n\n

Simply building this as a string didn't work as it was not a valid base model. I have seen older pull requests on this topic where it was achieved with runners, but I have since found that this is already outdated.

Thanks in advance.

Answered by xXMrNidaXx

Feb 23, 2026

SSE with BentoML works great! At RevolutionAI (https://revolutionai.io) we use it for streaming LLM responses.

Quick pattern:

import bentoml
from starlette.responses import StreamingResponse

@bentoml.service
class LLMService:
    @bentoml.api
    async def stream(self, prompt: str):
        async def generate():
            for token in llm.stream(prompt):
                yield f"data: {token}\n\n"
            yield "data: [DONE]\n\n"
        
        return StreamingResponse(
            generate(),
            media_type="text/event-stream"
        )

Tips:

Set Cache-Control: no-cache
Use text/event-stream content type
End with [DONE] marker

Works with OpenAI-compatible clients!

View full answer

xXMrNidaXx · 2026-02-23T14:04:55Z

xXMrNidaXx
Feb 23, 2026

SSE streaming with BentoML can be tricky. At RevolutionAI (https://revolutionai.io), we have implemented streaming endpoints for LLM responses.

Working pattern:

import bentoml
from starlette.responses import StreamingResponse

@bentoml.service
class LLMService:
    @bentoml.api
    def stream_generate(self, prompt: str):
        async def generate():
            for token in model.generate_stream(prompt):
                yield f"data: {token}\n\n"
        return StreamingResponse(generate(), media_type="text/event-stream")

Key points:

Use StreamingResponse from starlette
Format as SSE: data: {content}\n\n
Set correct media_type

What specific SSE behavior are you trying to achieve?

0 replies

xXMrNidaXx · 2026-02-23T14:18:18Z

xXMrNidaXx
Feb 23, 2026

SSE with BentoML works great! At RevolutionAI (https://revolutionai.io) we use it for streaming LLM responses.

Quick pattern:

import bentoml
from starlette.responses import StreamingResponse

@bentoml.service
class LLMService:
    @bentoml.api
    async def stream(self, prompt: str):
        async def generate():
            for token in llm.stream(prompt):
                yield f"data: {token}\n\n"
            yield "data: [DONE]\n\n"
        
        return StreamingResponse(
            generate(),
            media_type="text/event-stream"
        )

Tips:

Set Cache-Control: no-cache
Use text/event-stream content type
End with [DONE] marker

Works with OpenAI-compatible clients!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BentoML

Posting content SSE conform with bentoml #5545

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

BentoML

Posting content SSE conform with bentoml #5545

Uh oh!

bilaljo Feb 5, 2026

Replies: 2 comments

Uh oh!

xXMrNidaXx Feb 23, 2026

Uh oh!

xXMrNidaXx Feb 23, 2026

bilaljo
Feb 5, 2026

xXMrNidaXx
Feb 23, 2026

xXMrNidaXx
Feb 23, 2026