Checklist
Describe the bug
I started a vllm server and a sglang server with the same model. I found sglang server output unreadable tokens with high temeratur while vllm not.
Example:
sampling_params = {"temperature":1,"n":1}
sglang server: 'Expossible! description! description!爽 ipairs soc!!爽爽爽爽'
vllm server: "Let's start by using the given information to set up three equations:\n1."
sampling_params = {"temperature":0.2,"n":1}
sglang server: "Let's start by using the given information to set up three equations:\n1."
vllm server: "Let's start by using the given information to set up three equations:\n1."
maybe it's related to #523 , I don't know how to fix it
Reproduction
CUDA_VISIBLE_DEVICES=0 nohup python3 -m sglang.launch_server --model-path llama3-8B-instruct --port 9554 --disable-cuda-graph --mem-fraction-static 0.75 --max-prefill-tokens 12800
Environment
python3 -m sglang.check_env
Python: 3.9.17 (main, Jul 5 2023, 20:41:20) [GCC 11.2.0]
CUDA available: True
GPU 0,1: NVIDIA A800-SXM4-80GB
CUDA_HOME: /zhanghongbo/CUDA/cuda-11_8
NVCC: Cuda compilation tools, release 11.8, V11.8.89
CUDA Driver Version: 525.105.17
525.105.17
PyTorch: 2.3.0+cu118
flashinfer: 0.1.1+cu118torch2.3
requests: 2.31.0
tqdm: 4.66.1
numpy: 1.25.0
aiohttp: 3.8.5
fastapi: 0.110.0
hf_transfer: Module Not Found
huggingface_hub: 0.23.2
interegular: 0.3.3
packaging: 24.0
pillow: Module Not Found
psutil: 5.9.8
pydantic: 2.5.0
uvicorn: 0.23.2
uvloop: 0.19.0
zmq: 25.1.2
vllm: 0.5.3.post1
openai: 1.30.0
anthropic: Module Not Found
litellm: Module Not Found
NVIDIA Topology:
GPU0 GPU1 NIC0 CPU Affinity NUMA Affinity
GPU0 X NV8 SYS 48-63 3
GPU1 NV8 X SYS 48-63 3
NIC0 SYS SYS X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_0
ulimit soft: 1048576
Checklist
Describe the bug
I started a vllm server and a sglang server with the same model. I found sglang server output unreadable tokens with high temeratur while vllm not.
Example:
sampling_params = {"temperature":1,"n":1}
sglang server: 'Expossible! description! description!爽 ipairs soc!!爽爽爽爽'
vllm server: "Let's start by using the given information to set up three equations:\n1."
sampling_params = {"temperature":0.2,"n":1}
sglang server: "Let's start by using the given information to set up three equations:\n1."
vllm server: "Let's start by using the given information to set up three equations:\n1."
maybe it's related to #523 , I don't know how to fix it
Reproduction
CUDA_VISIBLE_DEVICES=0 nohup python3 -m sglang.launch_server --model-path llama3-8B-instruct --port 9554 --disable-cuda-graph --mem-fraction-static 0.75 --max-prefill-tokens 12800
Environment
python3 -m sglang.check_env Python: 3.9.17 (main, Jul 5 2023, 20:41:20) [GCC 11.2.0] CUDA available: True GPU 0,1: NVIDIA A800-SXM4-80GB CUDA_HOME: /zhanghongbo/CUDA/cuda-11_8 NVCC: Cuda compilation tools, release 11.8, V11.8.89 CUDA Driver Version: 525.105.17 525.105.17 PyTorch: 2.3.0+cu118 flashinfer: 0.1.1+cu118torch2.3 requests: 2.31.0 tqdm: 4.66.1 numpy: 1.25.0 aiohttp: 3.8.5 fastapi: 0.110.0 hf_transfer: Module Not Found huggingface_hub: 0.23.2 interegular: 0.3.3 packaging: 24.0 pillow: Module Not Found psutil: 5.9.8 pydantic: 2.5.0 uvicorn: 0.23.2 uvloop: 0.19.0 zmq: 25.1.2 vllm: 0.5.3.post1 openai: 1.30.0 anthropic: Module Not Found litellm: Module Not Found NVIDIA Topology: GPU0 GPU1 NIC0 CPU Affinity NUMA Affinity GPU0 X NV8 SYS 48-63 3 GPU1 NV8 X SYS 48-63 3 NIC0 SYS SYS X Legend: X = Self SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) PIX = Connection traversing at most a single PCIe bridge NV# = Connection traversing a bonded set of # NVLinks NIC Legend: NIC0: mlx5_0 ulimit soft: 1048576