Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert.py still fails on llama3 8B-Instruct downloaded directly from Meta (Huggingface works) #7339

Open
aleloi opened this issue May 17, 2024 · 5 comments

Comments

@aleloi
Copy link

aleloi commented May 17, 2024

I downloaded the llama3 8B Instruct weights directly from the Meta repository (not Huggingface) https://llama.meta.com/llama-downloads. I then tried to run the convert script using the command suggestions that I found in the comments at #6745 and #6819.

tokenizer.model in the contains this. It's definitely not Protobuf, not sure whether it's bpe

IQ== 0
Ig== 1
Iw== 2
JA== 3
JQ== 4
Jg== 5
Jw== 6
KA== 7
KQ== 8
Kg== 9

I'm running llama.cpp at current master, which is commit 29c60d8. I skimmed the discussion in #6745 and #6920 for a solution, couldn't find one and downloaded the Huggingface version of llama3 8B Instruct instead, which converted without issues. Here are a few of the commands that I tried to run:

python convert.py ../Meta-Llama-3-8B-Instruct/ --outfile /models/meta-llama/ggml-meta-llama-3-8b-f16.gguf  --outtype f16

INFO:convert:Loading model file ../Meta-Llama-3-8B-Instruct/consolidated.00.pth
INFO:convert:model parameters count : 8030261248 (8B)
INFO:convert:params = Params(n_vocab=128256, n_embd=4096, n_layer=32, n_ctx=4096, n_ff=14336, n_head=32, n_head_kv=8, n_experts=None, n_experts_used=None, f_norm_eps=1e-05, rope_scaling_type=None, f_rope_freq_base=500000.0, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=<GGMLFileType.MostlyF16: 1>, path_model=PosixPath('../Meta-Llama-3-8B-Instruct'))
Traceback (most recent call last):
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert.py", line 1714, in <module>
    main()
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert.py", line 1671, in main
    vocab, special_vocab = vocab_factory.load_vocab(vocab_types, model_parent_path)
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert.py", line 1522, in load_vocab
    vocab = self._create_vocab_by_path(vocab_types)
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert.py", line 1507, in _create_vocab_by_path
    vocab = cls(self.path)
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert.py", line 506, in __init__
    self.sentencepiece_tokenizer.LoadFromFile(str(fname_tokenizer))
  File "/home/alex/.pyenv/versions/llama.cpp/lib/python3.10/site-packages/sentencepiece/__init__.py", line 316, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: could not parse ModelProto from ../Meta-Llama-3-8B-Instruct/tokenizer.model

(llama.cpp) alex@ml-burken:~/test-run-llama-cpp/llama.cpp$ python convert.py ../Meta-Llama-3-8B-Instruct/ --outfile /models/meta-llama/ggml-meta-llama-3-8b-f16.gguf --vocab-type bpe --outtype f16
INFO:convert:Loading model file ../Meta-Llama-3-8B-Instruct/consolidated.00.pth
INFO:convert:model parameters count : 8030261248 (8B)
INFO:convert:params = Params(n_vocab=128256, n_embd=4096, n_layer=32, n_ctx=4096, n_ff=14336, n_head=32, n_head_kv=8, n_experts=None, n_experts_used=None, f_norm_eps=1e-05, rope_scaling_type=None, f_rope_freq_base=500000.0, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=<GGMLFileType.MostlyF16: 1>, path_model=PosixPath('../Meta-Llama-3-8B-Instruct'))
Traceback (most recent call last):
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert.py", line 1714, in <module>
    main()
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert.py", line 1671, in main
    vocab, special_vocab = vocab_factory.load_vocab(vocab_types, model_parent_path)
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert.py", line 1522, in load_vocab
    vocab = self._create_vocab_by_path(vocab_types)
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert.py", line 1512, in _create_vocab_by_path
    raise FileNotFoundError(f"Could not find a tokenizer matching any of {vocab_types}")
FileNotFoundError: Could not find a tokenizer matching any of ['bpe']
@giannisanni
Copy link

I have a similar problem. I merged 2 llama3 8b models with mergekit and i now want to conver them to gguf.

This is the output i got:

(.venv) PS C:\Users\gsanr\PycharmProjects\llama.cpp> python convert.py penny-dolphin-einstean-llama
3 --outfile penny-dolphin-einstein-llama3.gguf --outtype f16
Loading model file penny-dolphin-einstean-llama3\model-00001-of-00004.safetensors
Loading model file penny-dolphin-einstean-llama3\model-00001-of-00004.safetensors
Loading model file penny-dolphin-einstean-llama3\model-00002-of-00004.safetensors
Loading model file penny-dolphin-einstean-llama3\model-00003-of-00004.safetensors
Loading model file penny-dolphin-einstean-llama3\model-00004-of-00004.safetensors
params = Params(n_vocab=128258, n_embd=4096, n_layer=32, n_ctx=8192, n_ff=14336, n_head=32, n_head_
kv=8, n_experts=None, n_experts_used=None, f_norm_eps=1e-05, rope_scaling_type=None, f_rope_freq_ba
se=500000.0, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=<GGMLFileType.MostlyF16
: 1>, path_model=WindowsPath('penny-dolphin-einstean-llama3'))
Traceback (most recent call last):
File "C:\Users\gsanr\PycharmProjects\llama.cpp\convert.py", line 1555, in
main()
File "C:\Users\gsanr\PycharmProjects\llama.cpp\convert.py", line 1522, in main
vocab, special_vocab = vocab_factory.load_vocab(vocab_types, model_parent_path)
File "C:\Users\gsanr\PycharmProjects\llama.cpp\convert.py", line 1424, in load_vocab
vocab = self._create_vocab_by_path(vocab_types)
File "C:\Users\gsanr\PycharmProjects\llama.cpp\convert.py", line 1409, in _create_vocab_by_path
vocab = cls(self.path)
File "C:\Users\gsanr\PycharmProjects\llama.cpp\convert.py", line 533, in init
raise TypeError('Llama 3 must be converted with BpeVocab')
TypeError: Llama 3 must be converted with BpeVocab

@sdmorrey
Copy link

Could it be related to this issue? #7289

@jukofyork
Copy link
Contributor

Have you tried using convert-hf-to-gguf.py instead?

@aleloi
Copy link
Author

aleloi commented May 21, 2024

convert-hf-to-gguf.py expects a config.json file in the model folder. The hf version has one that looks like this:

{
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128009,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 8192,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.40.0.dev0",
  "use_cache": true,
  "vocab_size": 128256
}

The Meta version doesn't have one, but has a params.json that looks like this and seems to specify similar params. It doesn't list "architectures" though, which is a required key for the convert-hf script:

{
   "dim": 4096,
    "n_layers": 32,
    "n_heads": 32,
    "n_kv_heads": 8,
    "vocab_size": 128256,
    "multiple_of": 1024,
    "ffn_dim_multiplier": 1.3,
    "norm_eps": 1e-05,
    "rope_theta": 500000.0
}
(llama.cpp) alex@ml-burken:~/test-run-llama-cpp/llama.cpp$ python convert-hf-to-gguf.py  ../Meta-Llama-3-8B-Instruct --outfile  ../llama-3-8b-instruct-converted.bin
INFO:hf-to-gguf:Loading model: Meta-Llama-3-8B-Instruct
Traceback (most recent call last):
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert-hf-to-gguf.py", line 2546, in <module>
    main()
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert-hf-to-gguf.py", line 2521, in main
    hparams = Model.load_hparams(dir_model)
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert-hf-to-gguf.py", line 351, in load_hparams
    with open(dir_model / "config.json", "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: '../Meta-Llama-3-8B-Instruct/config.json'

@teleprint-me
Copy link
Contributor

teleprint-me commented May 22, 2024

Llama 3 uses the gpt-2 vocab and tiktoken encoder and decoder. The conversion scripts only implemented support for the HF releases.

I'm working on streamlining this entire process because converting has become cumbersome and would like a more fluid experience.

If I can get the initial stuff ironed out (it's proving challenging), then I'll see if I can get it in there if I have enough time.

If not, hopefully have it setup so someone else can easily plug it in and just play it.

For now, it's just best to use the hf to gguf script as the official release isn't currently supported due the complicated nature of how BPE is implemented.

Also, it looks like it will be moved to examples to reduce confusion since the majority of users are using huggingface. Not sure what the future for convert.py is, but it looks like it will still be kept around which I appreciate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants