Releases · ggerganov/llama.cpp

05 Nov 13:04

3d48f42

b1488

llama : mark LLM_ARCH_STARCODER as full offload supported (#3945)

as done in https://github.com/ggerganov/llama.cpp/pull/3827

Assets 12

05 Nov 09:26

github-actions

b1487

c41ea36

b1487

cmake : MSVC instruction detection (fixed up #809) (#3923)

* Add detection code for avx

* Only check hardware when option is ON

* Modify per code review sugguestions

* Build locally will detect CPU

* Fixes CMake style to use lowercase like everywhere else

* cleanup

* fix merge

* linux/gcc version for testing

* msvc combines avx2 and fma into /arch:AVX2 so check for both

* cleanup

* msvc only version

* style

* Update FindSIMD.cmake

---------

Co-authored-by: Howard Su <howard0su@gmail.com>
Co-authored-by: Jeremy Dunn <jeremydunn123@gmail.com>

Assets 12

05 Nov 09:06

github-actions

b1486

a7fac01

b1486

ci : use intel sde when ci cpu doesn't support avx512 (#3949)

Assets 12

05 Nov 07:34

github-actions

b1485

48ade94

b1485

cuda : revert CUDA pool stuff (#3944)

* Revert "cuda : add ROCM aliases for CUDA pool stuff (#3918)"

This reverts commit 629f917cd6b96ba1274c49a8aab163b1b189229d.

* Revert "cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903)"

This reverts commit d6069051de7165a4e06662c89257f5d2905bb156.

ggml-ci

Assets 12

03 Nov 20:21

github-actions

b1483

d9b33fe

b1483

metal : round up to 16 to fix MTLDebugComputeCommandEncoder assertion…

… (#3938)

Assets 12

03 Nov 12:36

github-actions

b1481

abb77e7

b1481

ggml-cuda : move row numbers to x grid dim in mmv kernels (#3921)

Assets 12

02 Nov 20:39

github-actions

b1477

629f917

b1477

cuda : add ROCM aliases for CUDA pool stuff (#3918)

Assets 12

02 Nov 20:12

github-actions

b1476

51b2fc1

b1476

cmake : fix relative path to git submodule index (#3915)

Assets 12

02 Nov 19:07

github-actions

b1474

c7743fe

b1474

cuda : fix const ptrs warning causing ROCm build issues (#3913)

Assets 12

02 Nov 17:50

github-actions

b1473

d606905

b1473

cuda : use CUDA memory pool with async memory allocation/deallocation…

… when available (#3903)

* Using cuda memory pools for async alloc/dealloc.

* If cuda device doesnt support memory pool than use old implementation.

* Removed redundant cublasSetStream

---------

Co-authored-by: Oleksii Maryshchenko <omaryshchenko@dtis.com>

Assets 12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggerganov/llama.cpp

b1488

b1487

b1486

b1485

b1483

b1481

b1477

b1476

b1474

b1473