doc: model serving options (vLLM)

**Description**
Current tutorials only demonstrate using OpenAI's Responses API for model serving, and we lack documentation for configuring our other supported inference options.

**Design**
The vLLM model server should be a dedicated page in the `Model Server` section. This should cover how the middleware works to convert between chat and responses, how to use vllm endpoint and local vllm

**Acceptance Criteria**:
- [ ] getting started guides should link to the vLLM docs for ease of user navigation
- [ ] doc page for vLLM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc: model serving options (vLLM) #194

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

doc: model serving options (vLLM) #194

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions