Skip to content

Releases: ggerganov/llama.cpp

b3078

03 Jun 19:33
bde7cd3
Compare
Choose a tag to compare
llama : offload to RPC in addition to other backends (#7640)

* llama : offload to RPC in addition to other backends

* - fix copy_tensor being called on the src buffer instead of the dst buffer

- always initialize views in the view_src buffer

- add RPC backend to Makefile build

- add endpoint to all RPC object names

* add rpc-server to Makefile

* Update llama.cpp

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>

b3077

03 Jun 19:10
a5735e4
Compare
Choose a tag to compare
ggml : use OpenMP as a thread pool (#7606)

* ggml: Added OpenMP for multi-threads processing

* ggml : Limit the number of threads used to avoid deadlock

* update shared state n_threads in parallel region

* clear numa affinity for main thread even with openmp

* enable openmp by default

* fix msvc build

* disable openmp on macos

* ci : disable openmp with thread sanitizer

* Update ggml.c

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

b3076

03 Jun 19:08
0b832d5
Compare
Choose a tag to compare
make: fix debug options not being applied to NVCC (#7714)

b3075

03 Jun 11:29
3d7ebf6
Compare
Choose a tag to compare
Vulkan Mixture of Experts (MoE) support (#7628)

* Finish Vulkan mul_mat_id implementation

* Add Vulkan sum_rows and div ops

* Fix MUL_MAT_ID matrix matrix shader

* Fix MUL_MAT_ID matrix vector shader dispatch size

* Fix MUL_MAT_ID matrix vector shader and dispatch code

* Update Vulkan CPU offload for MUL_MAT_ID

* Fix crash when using split mode none and setting a main GPU

b3074

03 Jun 11:16
a10cda5
Compare
Choose a tag to compare
cmake : add pkg-config spec file for llama.cpp (#7702)

b3073

03 Jun 09:47
6f28a33
Compare
Choose a tag to compare
llama : MiniCPM support tied embeddings (#7664)

* support lm_head

* remove the code block

---------

Co-authored-by: zhangkaihuo <zhangkaihuo@modelbest.cn>

b3072

03 Jun 07:20
549279d
Compare
Choose a tag to compare
llama : avoid double token-to-piece cache (#7654)

ggml-ci

b3071

03 Jun 07:09
9e405b6
Compare
Choose a tag to compare
kompute : implement op_getrows_f32 (#6403)

op_getrows_f32 is required since https://github.com/ggerganov/llama.cpp/pull/6122
for the Vulkan w/ Kompute backend to be functional.

As such, implement this op to make this backend functional again.

b3070

02 Jun 22:36
3413ae2
Compare
Choose a tag to compare
fix bug introduced in using calloc (#7701)

compilade pointed this out on the previous MR

b3067

02 Jun 09:59
9422c5e
Compare
Choose a tag to compare
[SYCL] Update rpc-server.cpp to include SYCL backend (#7682)

* Update rpc-server.cpp to include SYCL backend

Draft PR to address inclusion of SYCL backend for RPC server

* Update rpc-server.cpp