Skip to content

llamafile v0.7.1

Compare
Choose a tag to compare
@jart jart released this 13 Apr 04:21
· 113 commits to main since this release
e5d53ac

This release fixes bugs in the 0.7.0 release.

  • Fix 2 embeddings-related issues in server.cpp (#324)
  • Detect search query to start webchat (#333)
  • Use LLAMAFILE_GPU_ERROR value -2 instead of -1 (#291)
  • Fix --silent-prompt flag regression #328
  • Clamp out of range values in K quantizer ef0307e
  • Update to latest q5_k quantization code a8b0b15
  • Change file format magic number for recently bf16 file format introduced in 0.7.0. This is a breaking change. It's due to a numbering conflict with the upstream project. We're still waiting on a permanent assignment for bfloat16 so this could potentially change again. Follow ggerganov/llama.cpp#6412 for updates.

Mixtral 8x22b and Grok support are not available in this release, but they are available if you build llamafile from source on the main branch at HEAD. We're currently dealing with an AMD Windows GPU support regression there. Once it's resolved, a 0.8 release will ship.