Releases · ggerganov/llama.cpp

03 Jun 19:33

bde7cd3

b3078

llama : offload to RPC in addition to other backends (#7640)

* llama : offload to RPC in addition to other backends

* - fix copy_tensor being called on the src buffer instead of the dst buffer

- always initialize views in the view_src buffer

- add RPC backend to Makefile build

- add endpoint to all RPC object names

* add rpc-server to Makefile

* Update llama.cpp

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>

Assets 21

03 Jun 19:10

github-actions

b3077

a5735e4

b3077

ggml : use OpenMP as a thread pool (#7606)

* ggml: Added OpenMP for multi-threads processing

* ggml : Limit the number of threads used to avoid deadlock

* update shared state n_threads in parallel region

* clear numa affinity for main thread even with openmp

* enable openmp by default

* fix msvc build

* disable openmp on macos

* ci : disable openmp with thread sanitizer

* Update ggml.c

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Assets 21

03 Jun 19:08

github-actions

b3076

0b832d5

b3076

make: fix debug options not being applied to NVCC (#7714)

Assets 21

03 Jun 11:29

github-actions

b3075

3d7ebf6

b3075

Vulkan Mixture of Experts (MoE) support (#7628)

* Finish Vulkan mul_mat_id implementation

* Add Vulkan sum_rows and div ops

* Fix MUL_MAT_ID matrix matrix shader

* Fix MUL_MAT_ID matrix vector shader dispatch size

* Fix MUL_MAT_ID matrix vector shader and dispatch code

* Update Vulkan CPU offload for MUL_MAT_ID

* Fix crash when using split mode none and setting a main GPU

Assets 21

03 Jun 11:16

github-actions

b3074

a10cda5

b3074

cmake : add pkg-config spec file for llama.cpp (#7702)

Assets 21

03 Jun 09:47

github-actions

b3073

6f28a33

b3073

llama : MiniCPM support tied embeddings (#7664)

* support lm_head

* remove the code block

---------

Co-authored-by: zhangkaihuo <zhangkaihuo@modelbest.cn>

Assets 21

03 Jun 07:20

github-actions

b3072

549279d

b3072

llama : avoid double token-to-piece cache (#7654)

ggml-ci

Assets 21

03 Jun 07:09

github-actions

b3071

9e405b6

b3071

kompute : implement op_getrows_f32 (#6403)

op_getrows_f32 is required since https://github.com/ggerganov/llama.cpp/pull/6122
for the Vulkan w/ Kompute backend to be functional.

As such, implement this op to make this backend functional again.

Assets 21

02 Jun 22:36

github-actions

b3070

3413ae2

b3070

fix bug introduced in using calloc (#7701)

compilade pointed this out on the previous MR

Assets 21

02 Jun 09:59

github-actions

b3067

9422c5e

b3067

[SYCL] Update rpc-server.cpp to include SYCL backend (#7682)

* Update rpc-server.cpp to include SYCL backend

Draft PR to address inclusion of SYCL backend for RPC server

* Update rpc-server.cpp

Assets 21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggerganov/llama.cpp

b3078

b3077

b3076

b3075

b3074

b3073

b3072

b3071

b3070

b3067