Releases: ggerganov/llama.cpp
Releases · ggerganov/llama.cpp
b1500
server : add min_p param (#3877) * Update server.cpp with min_p after it was introduced in https://github.com/ggerganov/llama.cpp/pull/3841 * Use spaces instead of tabs * Update index.html.hpp after running deps.sh * Fix test - fix line ending
b1499
ggml-alloc : fix backend assignments of views (#3982)
b1497
make : do not add linker flags when compiling static llava lib (#3977)
b1496
ggml : fix backward rope after YaRN (#3974) * fix backward process of rope rope backward process was broken after YaRN RoPE (#2268) implementation, due to missing changes in backward functions. the code for the backward process is nearly identically to the forward process: the only difference is the sign of the sin-values. to avoid future regressions remove the near-duplicate backward functions and reuse the forward code: for this a new function argument `bool forward` was added to `ggml_compute_forward_rope_f32` and `ggml_compute_forward_rope_f16`. the sin-values will be negated when forward is false. * fix finetune rope call to use correct default attn_factor of 1.0f * remove unused `ggml_rope_xpos_back` it is better to have only one `ggml_rope_back` function that accepts all rope parameters, so that `ggml_compute_backward` can propagate all parameters without having to switch between different rope_back variants. * fix comments explaining the sinus sign in ggml_forward_rope * add missing function arguments in declaration * fix function argument type in declaration
b1495
Use params when loading models in llava-cli (#3976) llava-cli was loading models with default params and ignoring settings from the cli. This switches to a generic function to load the params from the cli options.
b1494
cuda : supports running on CPU for GGML_USE_CUBLAS=ON build (#3946) * protyping the idea that supports running on CPU for a GGML_USE_CUBLAS=on build * doc: add comments to ggml_cublas_loaded() * fix defined(...)
b1493
llava : expose as a shared library for downstream projects (#3613) * wip llava python bindings compatibility * add external llava API * add base64 in-prompt image support * wip refactor image loading * refactor image load out of llava init * cleanup * further cleanup; move llava-cli into its own file and rename * move base64.hpp into common/ * collapse clip and llava libraries * move llava into its own subdir * wip * fix bug where base64 string was not removed from the prompt * get libllava to output in the right place * expose llava methods in libllama.dylib * cleanup memory usage around clip_image_* * cleanup and refactor *again* * update headerdoc * build with cmake, not tested (WIP) * Editorconfig * Editorconfig * Build with make * Build with make * Fix cyclical depts on Windows * attempt to fix build on Windows * attempt to fix build on Windows * Upd TODOs * attempt to fix build on Windows+CUDA * Revert changes in cmake * Fix according to review comments * Support building as a shared library * address review comments --------- Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com> Co-authored-by: Jared Van Bortel <jared@nomic.ai>
b1492
ggml-cuda : fix f16 mul mat (#3961) * ggml-cuda : fix f16 mul mat ggml-ci * silence common.cpp warning (bonus)
b1491
Allow common process_escapes to handle \x sequences (#3928) * Allow common process_escapes to handle \x sequences * Fix edge case when second hex digit is NUL
b1489
cuda : fix disabling device with --tensor-split 1,0 (#3951) Co-authored-by: slaren <slarengh@gmail.com>