vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 2.8k
Star 20.4k

Code
Issues 853
Pull requests 262
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Pull requests: vllm-project/vllm

Labels 41 Milestones 0

New pull request New

262 Open 1,822 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Bugfix: fix broken of download models from modelscope

#5233 opened Jun 3, 2024 by liuyhwangyh

Loading…

[Model] Correct Mixtral FP8 checkpoint loading

#5231 opened Jun 3, 2024 by comaniac

Loading…

[Core][Doc] Default to multiprocessing for single-node distributed case

#5230 opened Jun 3, 2024 by njhill

Loading…

[Bugfix] Fix torch.compile() error when using MultiprocessingGPUExecutor

#5229 opened Jun 3, 2024 by zifeitong

Loading…

[Bugfix] Fix prompt_logprobs when SamplingParams.detokenize is set to True

#5226 opened Jun 3, 2024 by zifeitong

Loading…

[Misc] Adding Speculative decoding to Throughput Benchmarking script

#5223 opened Jun 3, 2024 by abhibambhaniya

Loading…

Support W4A8 quantization for vllm

#5218 opened Jun 3, 2024 by HandH1998

Loading…

[Bugfix] Support prompt_logprobs==0

#5217 opened Jun 3, 2024 by toslunar

Loading…

[CI/Build] Add inputs tests

#5215 opened Jun 3, 2024 by DarkLight1337

Loading…

[Core] Registry for processing model inputs

#5214 opened Jun 3, 2024 by DarkLight1337

Loading…

[Bugfix]: During testing, use pytest monkeypatch for safely overriding the env var that indicates the vLLM backend

#5210 opened Jun 3, 2024 by afeldman-nm

Loading…

[BugFix] Fix the detokenize delay

#5207 opened Jun 3, 2024 by DriverSong

Loading…

[Frontend] Customizable RoPE theta

#5197 opened Jun 2, 2024 by sasha0552

Loading…

[Misc] Improve error message when LoRA parsing fails

#5194 opened Jun 2, 2024 by DarkLight1337

Loading…

[Core] Support loading GGUF model

#5191 opened Jun 2, 2024 by Isotr0py • Draft

1 of 4 tasks

[Model] Add PaliGemma

#5189 opened Jun 2, 2024 by ywang96 • Draft

[Core][Prefix Caching] Fix hashing logic for non-full blocks

#5188 opened Jun 2, 2024 by zhuohan123

Loading…

[Bugfix] [Frontend] vLLM api_server.py when using with prompt_token_ids causes error.

#5187 opened Jun 1, 2024 by TikZSZ

Loading…

[Kernel] Switch fp8 layers to use the CUTLASS kernels

#5183 opened Jun 1, 2024 by tlrmchlsmth • Draft

[Model] LoRA support added for command-r

#5178 opened Jun 1, 2024 by sergey-tinkoff

Loading…

draft2

#5175 opened Jun 1, 2024 by khluu • Draft

bug fixed: cuda out of memory lead to 'AsyncEngineDeadError: Background loop has errored already.

#5173 opened Jun 1, 2024 by charent

Loading…

[Bugfix] Fix illegal memory access for lora

#5169 opened May 31, 2024 by sfc-gh-zhwang • Draft

[Bugfix] Fix KeyError: 1 When Using LoRA adapters

#5164 opened May 31, 2024 by BlackBird-Coding

Loading…

[Core] Bump up the default of --gpu_memory_utilization to be more similar to TensorRT Triton's default

#5158 opened May 31, 2024 by alexm-neuralmagic

Loading…

Previous 1 2 3 4 5 … 10 11 Next

Previous Next

ProTip! no:milestone will show everything without a milestone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly