-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Bugfix: fix broken of download models from modelscope
#5233
opened Jun 3, 2024 by
liuyhwangyh
Loading…
[Core][Doc] Default to multiprocessing for single-node distributed case
#5230
opened Jun 3, 2024 by
njhill
Loading…
[Bugfix] Fix torch.compile() error when using MultiprocessingGPUExecutor
#5229
opened Jun 3, 2024 by
zifeitong
Loading…
[Bugfix] Fix prompt_logprobs when SamplingParams.detokenize is set to True
#5226
opened Jun 3, 2024 by
zifeitong
Loading…
[Misc] Adding Speculative decoding to Throughput Benchmarking script
#5223
opened Jun 3, 2024 by
abhibambhaniya
Loading…
[Bugfix]: During testing, use pytest monkeypatch for safely overriding the env var that indicates the vLLM backend
#5210
opened Jun 3, 2024 by
afeldman-nm
Loading…
[Misc] Improve error message when LoRA parsing fails
#5194
opened Jun 2, 2024 by
DarkLight1337
Loading…
2
[Core][Prefix Caching] Fix hashing logic for non-full blocks
#5188
opened Jun 2, 2024 by
zhuohan123
Loading…
[Bugfix] [Frontend] vLLM api_server.py when using with prompt_token_ids causes error.
#5187
opened Jun 1, 2024 by
TikZSZ
Loading…
[Kernel] Switch fp8 layers to use the CUTLASS kernels
#5183
opened Jun 1, 2024 by
tlrmchlsmth
•
Draft
bug fixed: cuda out of memory lead to 'AsyncEngineDeadError: Background loop has errored already.
#5173
opened Jun 1, 2024 by
charent
Loading…
[Bugfix] Fix KeyError: 1 When Using LoRA adapters
#5164
opened May 31, 2024 by
BlackBird-Coding
Loading…
[Core] Bump up the default of --gpu_memory_utilization to be more similar to TensorRT Triton's default
#5158
opened May 31, 2024 by
alexm-neuralmagic
Loading…
Previous Next
ProTip!
no:milestone will show everything without a milestone.