Skip to content

[Bug] generate wrong sequences with higher temperature #771

@StevenZHB

Description

@StevenZHB

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

I started a vllm server and a sglang server with the same model. I found sglang server output unreadable tokens with high temeratur while vllm not.
Example:
sampling_params = {"temperature":1,"n":1}
sglang server: 'Expossible! description! description!爽 ipairs soc!!爽爽爽爽'
vllm server: "Let's start by using the given information to set up three equations:\n1."

sampling_params = {"temperature":0.2,"n":1}
sglang server: "Let's start by using the given information to set up three equations:\n1."
vllm server: "Let's start by using the given information to set up three equations:\n1."

maybe it's related to #523 , I don't know how to fix it

Reproduction

CUDA_VISIBLE_DEVICES=0 nohup python3 -m sglang.launch_server --model-path llama3-8B-instruct --port 9554 --disable-cuda-graph --mem-fraction-static 0.75 --max-prefill-tokens 12800

Environment

python3 -m sglang.check_env
Python: 3.9.17 (main, Jul  5 2023, 20:41:20) [GCC 11.2.0]
CUDA available: True
GPU 0,1: NVIDIA A800-SXM4-80GB
CUDA_HOME: /zhanghongbo/CUDA/cuda-11_8
NVCC: Cuda compilation tools, release 11.8, V11.8.89
CUDA Driver Version: 525.105.17
525.105.17
PyTorch: 2.3.0+cu118
flashinfer: 0.1.1+cu118torch2.3
requests: 2.31.0
tqdm: 4.66.1
numpy: 1.25.0
aiohttp: 3.8.5
fastapi: 0.110.0
hf_transfer: Module Not Found
huggingface_hub: 0.23.2
interegular: 0.3.3
packaging: 24.0
pillow: Module Not Found
psutil: 5.9.8
pydantic: 2.5.0
uvicorn: 0.23.2
uvloop: 0.19.0
zmq: 25.1.2
vllm: 0.5.3.post1
openai: 1.30.0
anthropic: Module Not Found
litellm: Module Not Found
NVIDIA Topology: 
        GPU0    GPU1    NIC0    CPU Affinity    NUMA Affinity
GPU0     X      NV8     SYS     48-63   3
GPU1    NV8      X      SYS     48-63   3
NIC0    SYS     SYS      X 

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_0


ulimit soft: 1048576

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions