Releases: ggerganov/llama.cpp
Releases · ggerganov/llama.cpp
b2928
b2927
android : use "ci-android" branch for CI (#7341) * android : use "ci-android" branch for CI * ggml : disable SIMD exp and silu for 32-bit ARM ggml-ci * android : do not fetch, use add_subdirectory instead * cmake : provide binary dir
b2926
CUDA: deduplicate FlashAttention code (#7352)
b2923
llama : add support for larger Granite Code Models (20B, 34B) (#7324) Tie the weights for ARCH_STARCODER to support the larger Granite code models. Partially addresses ggerganov/issues/7116 There still remains to be a few things to fix. Currently requires `--override-kv tokenizer.ggml.add_bos_token=bool:false`
b2922
perplexity : ndot progress and show stats with < 100 tasks (#7348) Fix floating point error with ndot printing, allow end stats on lower task numbers if multiple-choice tasks.
b2921
Update and fix Vulkan soft_max and argsort implementations (#7237) * Update and fix Vulkan softmax implementation * Update and fix Vulkan argsort implementation
b2918
ggml : fix quants nans when all the group weights are very close to z… …ero (#7313)
b2917
cmake : fix typo in AMDGPU_TARGETS (#7356)
b2916
Unicode codepoint flags for custom regexs (#7245) * Replace CODEPOINT_TYPE_* with codepoint_flags * Update and bugfix brute force random test * Deterministic brute force random test * Unicode normalization NFD * Get rid of BOM
b2915
CUDA: faster large batch FA without tensor cores (#7314)