llama.cpp failing #371

bet0x · 2024-04-22T21:14:59Z

llama.cpp is failing to generate quantize versions for the trained models.

Error:

You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.

But when i do clone this with recursive it works.

llama.cpp is failing to generate quantize versions for the trained models. Error: ```bash You might have to compile llama.cpp yourself, then run this again. You do not need to close this Python program. Run the following commands in a new terminal: You must run this in the same folder as you're saving your model. git clone https://github.com/ggerganov/llama.cpp cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j Once that's done, redo the quantization. ``` But when i do clone this with recursive it works.

danielhanchen · 2024-04-23T12:53:48Z

Oh my I need to check this asap thanks for the heads up

dynamite9999 · 2024-06-03T03:24:16Z

I have the same issue, but I followed the instructions to clone and make and still the same error.
However if I manually go into llama.cpp, it works partially on the untrained unmerged model

..INFO:hf-to-gguf:Model successfully exported to '../unsloth/llama-3-8b-bnb-4bit/ggml-model-f16.gguf'

Here is the error message using
model.save_pretrained_gguf(TRAINED_GGUF_MODEL, tokenizer, quantization_method = "q4_k_m")

which used to work a few days ago.

Unsloth: Converting llama model. Can use fast conversion = True.
==((====))== Unsloth: Conversion from QLoRA to GGUF information
\ /| [0] Installing llama.cpp will take 3 minutes.
O^O/ _/ \ [1] Converting HF to GUUF 16bits will take 3 minutes.
\ / [2] Converting GGUF 16bits to q4_k_m will take 20 minutes.
"-____-" In total, you will have to wait around 26 minutes.

Unsloth: [0] Installing llama.cpp. This will take 3 minutes...
Unsloth: [1] Converting model at unsloth/llama-3-8b-bnb-4bit into f16 GGUF format.
The output location will be ./unsloth/llama-3-8b-bnb-4bit-unsloth.F16.gguf
This will take 3 minutes...
Traceback (most recent call last):
File "/home/d/hp/NetAnalytics/dev/netai/syslog/syslog_scraper_netai/t59_nie_func_data/nie_trainer.v1.py", line 1264, in
main()
File "/home/d/hp/NetAnalytics/dev/netai/syslog/syslog_scraper_netai/t59_nie_func_data/nie_trainer.v1.py", line 1232, in main
model.save_pretrained_gguf(new_model, tokenizer, quantization_method = "q4_k_m")
File "/home/d/.local/lib/python3.11/site-packages/unsloth/save.py", line 1340, in unsloth_save_pretrained_gguf
file_location = save_to_gguf(model_type, new_save_directory, quantization_method, first_conversion, makefile)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/d/.local/lib/python3.11/site-packages/unsloth/save.py", line 964, in save_to_gguf
raise RuntimeError(
RuntimeError: Unsloth: Quantization failed for ./unsloth/llama-3-8b-bnb-4bit-unsloth.F16.gguf
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone --recursive https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama.cpp failing #371

llama.cpp failing #371

bet0x commented Apr 22, 2024

danielhanchen commented Apr 23, 2024

dynamite9999 commented Jun 3, 2024

llama.cpp failing #371

Are you sure you want to change the base?

llama.cpp failing #371

Conversation

bet0x commented Apr 22, 2024

danielhanchen commented Apr 23, 2024

dynamite9999 commented Jun 3, 2024