Skip to content

Commit a2bb1cb

Browse files
committed
vLLM server blogpost
1 parent 87d64f6 commit a2bb1cb

2 files changed

Lines changed: 194 additions & 0 deletions

File tree

src/assets/vllm-linux.png

1.32 MB
Loading
Lines changed: 194 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,194 @@
1+
---
2+
title: Deploying vLLM on your Linux Server
3+
description: A complete step-by-step guide for installing vLLM, configuring systemd, setting up virtual environments, and troubleshooting GPU-backed inference servers.
4+
pubDate: 2025-12-03
5+
heroImage: ../../assets/vllm-linux.png
6+
tags:
7+
- vLLM
8+
- Linux
9+
- LLM
10+
---
11+
12+
# 🚀 Deploying vLLM on Your Linux Server
13+
14+
Running **vLLM** as a persistent, reliable background service is one of the best ways to expose a fast local LLM API on your Linux machine.
15+
This guide walks through:
16+
17+
- Installing dependencies
18+
- Creating a virtual environment
19+
- Setting up a **systemd** service
20+
- Running vLLM from a fixed directory (`/home/nurbot/ws/models`)
21+
- Checking logs and debugging
22+
- Enabling auto-start on boot
23+
24+
---
25+
26+
# 🧰 1. Install System Dependencies
27+
28+
```bash
29+
sudo apt-get update
30+
sudo apt-get install -y python3-pip python3-venv docker.io
31+
```
32+
33+
Docker is optional but useful if you want containerized workflows.
34+
35+
---
36+
37+
# 🎮 2. Verify NVIDIA GPU Support (Optional but Recommended)
38+
39+
Check whether the machine has working NVIDIA drivers:
40+
41+
```bash
42+
nvidia-smi
43+
```
44+
45+
If the command is missing, install drivers before running GPU-backed vLLM.
46+
47+
---
48+
49+
# 🐍 3. Create the vLLM Virtual Environment
50+
51+
We place it in `/opt/vllm-env`:
52+
53+
```bash
54+
sudo python3 -m venv /opt/vllm-env
55+
sudo chown -R $USER:$USER /opt/vllm-env
56+
source /opt/vllm-env/bin/activate
57+
```
58+
59+
Install vLLM + OpenAI API compatibility:
60+
61+
```bash
62+
pip install vllm openai
63+
```
64+
65+
---
66+
67+
# 📁 4. Configure where vLLM Runs From
68+
69+
We want vLLM to run from:
70+
71+
```
72+
/home/nurbot/ws/models
73+
```
74+
75+
This directory will contain the `start_vllm.sh` script.
76+
77+
Ensure the start script is executable:
78+
79+
```bash
80+
chmod +x /home/nurbot/ws/models/infrastructure/scripts/start_vllm.sh
81+
```
82+
83+
---
84+
85+
# 🧩 5. Create the Systemd Service
86+
87+
Create the service file:
88+
89+
```bash
90+
sudo nano /etc/systemd/system/vllm.service
91+
```
92+
93+
Paste:
94+
95+
```ini
96+
[Unit]
97+
Description=vLLM Inference Server
98+
After=network.target
99+
100+
[Service]
101+
Type=simple
102+
User=nurbot
103+
WorkingDirectory=/home/nurbot/ws/models
104+
ExecStart=/home/nurbot/ws/models/infrastructure/scripts/start_vllm.sh
105+
Restart=always
106+
Environment=MODEL_NAME=facebook/opt-125m
107+
108+
[Install]
109+
WantedBy=multi-user.target
110+
```
111+
112+
Then reload systemd:
113+
114+
```bash
115+
sudo systemctl daemon-reload
116+
```
117+
118+
---
119+
120+
# ▶️ 6. Starting, Stopping, and Enabling the Service
121+
122+
Start vLLM:
123+
124+
```bash
125+
sudo systemctl start vllm
126+
```
127+
128+
Check its status:
129+
130+
```bash
131+
systemctl status vllm
132+
```
133+
134+
Enable auto-start on boot:
135+
136+
```bash
137+
sudo systemctl enable vllm
138+
```
139+
140+
---
141+
142+
# 📡 7. Checking Logs
143+
144+
To see the real-time logs from vLLM:
145+
146+
```bash
147+
journalctl -u vllm -f
148+
```
149+
150+
To see historical logs:
151+
152+
```bash
153+
journalctl -u vllm
154+
```
155+
156+
To see recent errors:
157+
158+
```bash
159+
journalctl -u vllm -xe
160+
```
161+
162+
---
163+
164+
# 🛠 8. Troubleshooting
165+
166+
### **Service says “failed”**
167+
168+
Run:
169+
170+
```bash
171+
systemctl status vllm
172+
journalctl -u vllm -xe
173+
```
174+
175+
Common issues:
176+
177+
- Wrong `ExecStart` path
178+
- Missing execute permission
179+
- Python crash inside vLLM
180+
- GPU not available / out of memory
181+
182+
---
183+
184+
# 🎯 Conclusion
185+
186+
You now have a fully functional **vLLM OpenAI-compatible server** running as a background service on Linux.
187+
It's stable, auto-starts on reboot, logs to systemd, and uses a clean virtual environment with GPU acceleration.
188+
189+
If you'd like, we can extend this tutorial with:
190+
191+
- Logging to `/var/log/vllm`
192+
- Running multiple models
193+
- Adding an Nginx reverse proxy
194+
- Token-based authentication

0 commit comments

Comments
 (0)