Skip to content
Discussion options

You must be logged in to vote

SSE with BentoML works great! At RevolutionAI (https://revolutionai.io) we use it for streaming LLM responses.

Quick pattern:

import bentoml
from starlette.responses import StreamingResponse

@bentoml.service
class LLMService:
    @bentoml.api
    async def stream(self, prompt: str):
        async def generate():
            for token in llm.stream(prompt):
                yield f"data: {token}\n\n"
            yield "data: [DONE]\n\n"
        
        return StreamingResponse(
            generate(),
            media_type="text/event-stream"
        )

Tips:

  • Set Cache-Control: no-cache
  • Use text/event-stream content type
  • End with [DONE] marker

Works with OpenAI-compatible clients!

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by bilaljo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants