Skip to content

doc: model serving options (vLLM) #194

@cwing-nvidia

Description

@cwing-nvidia

Description
Current tutorials only demonstrate using OpenAI's Responses API for model serving, and we lack documentation for configuring our other supported inference options.

Design
The vLLM model server should be a dedicated page in the Model Server section. This should cover how the middleware works to convert between chat and responses, how to use vllm endpoint and local vllm

Acceptance Criteria:

  • getting started guides should link to the vLLM docs for ease of user navigation
  • doc page for vLLM

Metadata

Metadata

Assignees

Labels

community-requestIssue reported or requested by someone from the communitydocumentationImprovements to documentationusabilityimprovements to user experience

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions