Releases · ggerganov/llama.cpp

18 May 16:58

059031b

b2928 Latest

Latest

ci : re-enable sanitizer runs (#7358)

* Revert "ci : temporary disable sanitizer builds (#6128)"

This reverts commit 4f6d1337ca5a409dc74aca8c479b7c34408a69c0.

* ci : trigger

Assets 21

cudart-llama-bin-win-cu11.7.1-x64.zip

293 MB 2024-05-18T16:58:45Z
cudart-llama-bin-win-cu12.2.0-x64.zip

413 MB 2024-05-18T16:58:51Z
llama-b2928-bin-macos-arm64.zip

41.3 MB 2024-05-18T16:58:59Z
llama-b2928-bin-macos-x64.zip

37.9 MB 2024-05-18T16:59:00Z
llama-b2928-bin-ubuntu-x64.zip

45.9 MB 2024-05-18T16:59:01Z
llama-b2928-bin-win-avx-x64.zip

6.61 MB 2024-05-18T16:59:02Z
llama-b2928-bin-win-avx2-x64.zip

6.59 MB 2024-05-18T16:59:03Z
llama-b2928-bin-win-avx512-x64.zip

6.61 MB 2024-05-18T16:59:04Z
llama-b2928-bin-win-clblast-x64.zip

7.79 MB 2024-05-18T16:59:04Z
llama-b2928-bin-win-cuda-cu11.7.1-x64.zip

64.9 MB 2024-05-18T16:59:05Z
Source code (zip)

2024-05-18T15:55:54Z
Source code (tar.gz)

2024-05-18T15:55:54Z

18 May 15:14

github-actions

b2927

511182e

b2927

android : use "ci-android" branch for CI (#7341)

* android : use "ci-android" branch for CI

* ggml : disable SIMD exp and silu for 32-bit ARM

ggml-ci

* android : do not fetch, use add_subdirectory instead

* cmake : provide binary dir

Assets 21

18 May 15:08

github-actions

b2926

133d99c

b2926

CUDA: deduplicate FlashAttention code (#7352)

Assets 21

18 May 14:59

github-actions

b2923

0f98acf

b2923

llama : add support for larger Granite Code Models (20B, 34B) (#7324)

Tie the weights for ARCH_STARCODER to support the larger Granite code models.
Partially addresses ggerganov/issues/7116

There still remains to be a few things to fix.
Currently requires `--override-kv tokenizer.ggml.add_bos_token=bool:false`

Assets 21

18 May 13:54

github-actions

b2922

ca57e0f

b2922

perplexity : ndot progress and show stats with < 100 tasks (#7348)

Fix floating point error with ndot printing, allow end stats on lower task numbers if multiple-choice tasks.

Assets 21

18 May 08:42

github-actions

b2921

c1b295e

b2921

Update and fix Vulkan soft_max and argsort implementations (#7237)

* Update and fix Vulkan softmax implementation

* Update and fix Vulkan argsort implementation

Assets 21

18 May 02:15

github-actions

b2918

0583484

b2918

ggml : fix quants nans when all the group weights are very close to z…

…ero (#7313)

Assets 21

18 May 02:15

github-actions

b2917

ef277de

b2917

cmake : fix typo in AMDGPU_TARGETS (#7356)

Assets 21

18 May 00:20

github-actions

b2916

b43272a

b2916

Unicode codepoint flags for custom regexs (#7245)

* Replace CODEPOINT_TYPE_* with codepoint_flags
* Update and bugfix brute force random test
* Deterministic brute force random test
* Unicode normalization NFD
* Get rid of BOM

Assets 21

17 May 17:55

github-actions

b2915

0fc1e82

b2915

CUDA: faster large batch FA without tensor cores (#7314)

Assets 21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggerganov/llama.cpp

b2928

b2927

b2926

b2923

b2922

b2921

b2918

b2917

b2916

b2915