Skip to content

Releases: ggerganov/llama.cpp

b1488

05 Nov 13:04
3d48f42
Compare
Choose a tag to compare
llama : mark LLM_ARCH_STARCODER as full offload supported (#3945)

as done in https://github.com/ggerganov/llama.cpp/pull/3827

b1487

05 Nov 09:26
c41ea36
Compare
Choose a tag to compare
cmake : MSVC instruction detection (fixed up #809) (#3923)

* Add detection code for avx

* Only check hardware when option is ON

* Modify per code review sugguestions

* Build locally will detect CPU

* Fixes CMake style to use lowercase like everywhere else

* cleanup

* fix merge

* linux/gcc version for testing

* msvc combines avx2 and fma into /arch:AVX2 so check for both

* cleanup

* msvc only version

* style

* Update FindSIMD.cmake

---------

Co-authored-by: Howard Su <howard0su@gmail.com>
Co-authored-by: Jeremy Dunn <jeremydunn123@gmail.com>

b1486

05 Nov 09:06
a7fac01
Compare
Choose a tag to compare
ci : use intel sde when ci cpu doesn't support avx512 (#3949)

b1485

05 Nov 07:34
48ade94
Compare
Choose a tag to compare
cuda : revert CUDA pool stuff (#3944)

* Revert "cuda : add ROCM aliases for CUDA pool stuff (#3918)"

This reverts commit 629f917cd6b96ba1274c49a8aab163b1b189229d.

* Revert "cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903)"

This reverts commit d6069051de7165a4e06662c89257f5d2905bb156.

ggml-ci

b1483

03 Nov 20:21
d9b33fe
Compare
Choose a tag to compare
metal : round up to 16 to fix MTLDebugComputeCommandEncoder assertion…

… (#3938)

b1481

03 Nov 12:36
abb77e7
Compare
Choose a tag to compare
ggml-cuda : move row numbers to x grid dim in mmv kernels (#3921)

b1477

02 Nov 20:39
629f917
Compare
Choose a tag to compare
cuda : add ROCM aliases for CUDA pool stuff (#3918)

b1476

02 Nov 20:12
51b2fc1
Compare
Choose a tag to compare
cmake : fix relative path to git submodule index (#3915)

b1474

02 Nov 19:07
c7743fe
Compare
Choose a tag to compare
cuda : fix const ptrs warning causing ROCm build issues (#3913)

b1473

02 Nov 17:50
d606905
Compare
Choose a tag to compare
cuda : use CUDA memory pool with async memory allocation/deallocation…

… when available (#3903)

* Using cuda memory pools for async alloc/dealloc.

* If cuda device doesnt support memory pool than use old implementation.

* Removed redundant cublasSetStream

---------

Co-authored-by: Oleksii Maryshchenko <omaryshchenko@dtis.com>