Releases · ggerganov/llama.cpp

02 Jun 22:36

3413ae2

b3070 Latest

Latest

fix bug introduced in using calloc (#7701)

compilade pointed this out on the previous MR

Assets 21

cudart-llama-bin-win-cu11.7.1-x64.zip

293 MB 2024-06-02T22:36:48Z
cudart-llama-bin-win-cu12.2.0-x64.zip

413 MB 2024-06-02T22:36:57Z
llama-b3070-bin-macos-arm64.zip

42 MB 2024-06-02T22:37:10Z
llama-b3070-bin-macos-x64.zip

38.6 MB 2024-06-02T22:37:12Z
llama-b3070-bin-ubuntu-x64.zip

46.5 MB 2024-06-02T22:37:14Z
llama-b3070-bin-win-avx-x64.zip

6.69 MB 2024-06-02T22:37:16Z
llama-b3070-bin-win-avx2-x64.zip

6.67 MB 2024-06-02T22:37:17Z
llama-b3070-bin-win-avx512-x64.zip

6.68 MB 2024-06-02T22:37:18Z
llama-b3070-bin-win-clblast-x64.zip

7.86 MB 2024-06-02T22:37:19Z
llama-b3070-bin-win-cuda-cu11.7.1-x64.zip

106 MB 2024-06-02T22:37:20Z
Source code (zip)

2024-06-02T21:59:54Z
Source code (tar.gz)

2024-06-02T21:59:54Z

02 Jun 09:59

github-actions

b3067

9422c5e

b3067

[SYCL] Update rpc-server.cpp to include SYCL backend (#7682)

* Update rpc-server.cpp to include SYCL backend

Draft PR to address inclusion of SYCL backend for RPC server

* Update rpc-server.cpp

Assets 21

01 Jun 22:37

github-actions

b3066

e141ce6

b3066

Fix FlashAttention debug test, FP32 assert (#7684)

Assets 21

01 Jun 20:14

github-actions

b3065

2e66683

b3065

server : new UI (#7633)

* ic

* migrate my eary work

* add the belonging stuff: css,favicon etc

* de prompts

* chore: Update HTML meta tags in index.html file

* add api-key css classes

* some necessary fixes

* Add API key CSS classes and update styling in style.css

* clean the code

* move API to the top, rearrange param sliders. update css

* add tooltips to the parameters with comprehensible explanations

* fix FloatField and BoolField tooltips

* fix grammar field width

* use template literales for promptFormats.js

* update const ModelGenerationInfo

* remove ms per token, since not relevant for most webui users and use cases

* add phi-3 prompt template

* add phi3 to dropdown

* add css class

* update forgotten css theme

* add user message suffix

* fix chatml & add llama3 format

* fix llama3 prompt template

* more prompt format fixes

* add more comon stop tokens

* add missing char

* do not separate with new line or comma

* move prompt style

* add hacky llama2 prompt solution, reduce redundancy in promptFormats.js

* fix toggle state localstorage

* add cmd-r prompt et reduce redundancy

* set default prompt to empty

* move files, clean code

* fix css path

* add a button to the new ui

* move new ui to "/public" due to otherwise problematic CORS behaviour

* include new ui in cpp

* fix wrong link to old ui

* renaming to ensure consistency

* fix typos "prompt-format" -> "prompt-formats"

* use correct indent

* add new ui files to makefile

* fix typo

Assets 21

01 Jun 14:48

github-actions

b3063

750f60c

b3063

CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (#7681)

Assets 21

31 May 16:12

github-actions

b3058

30e238b

b3058

Improve HIP compatibility (#7672)

Assets 21

31 May 12:19

github-actions

b3056

0c27e6f

b3056

ggml : fix loongson compile warnings (#7537)

* ggml : fix loongson compile warnings

ggml-ci

* Fix loongarch quantize test fail.

Fix unexpected error introduced during rebase code.

* tests : disable json test due to lack of python on the CI node

ggml-ci

---------

Co-authored-by: junchao-loongson <zhaojunchao@loongson.cn>

Assets 21

30 May 22:43

github-actions

b3051

5921b8f

b3051

llama : cache llama_token_to_piece (#7587)

* llama : cache llama_token_to_piece

ggml-ci

* llama : use vectors and avoid has_cache

ggml-ci

* llama : throw on unknown tokenizer types

ggml-ci

* llama : print a log of the total cache size

Assets 21

30 May 14:39

github-actions

b3046

9c4c9cc

b3046

Move convert.py to examples/convert-legacy-llama.py (#7430)

* Move convert.py to examples/convert-no-torch.py

* Fix CI, scripts, readme files

* convert-no-torch -> convert-legacy-llama

* Move vocab thing to vocab.py

* Fix convert-no-torch -> convert-legacy-llama

* Fix lost convert.py in ci/run.sh

* Fix imports

* Fix gguf not imported correctly

* Fix flake8 complaints

* Fix check-requirements.sh

* Get rid of ADDED_TOKENS_FILE, FAST_TOKENIZER_FILE

* Review fixes

Assets 21

30 May 14:23

github-actions

b3045

59b0d07

b3045

faster avx512 exp implementation (#7551)

* faster avx512 exp implementation

* x->r

* improve accuracy, handle special cases

* remove `e`

Assets 21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggerganov/llama.cpp

b3070

b3067

b3066

b3065

b3063

b3058

b3056

b3051

b3046

b3045