onnxruntime_genai.onnxruntime_genai.OrtException: Non-zero status code returned while running GroupQueryAttention node. Name:'/model/layers.0/attn/GroupQueryAttention' Status Message: cos_cache dimension 1 must be <= head_size / 2 and a multiple of 8. #482

Positronx · 2024-05-21T15:57:29Z

I built onnxruntime_genai from source with the cuda execution provider then installed the python wheel.
I tried to run the model microsoft/phi-2 but it seems there is a problem with the GroupQueryAttention node.

Here is the command to build the phi-2 model from Hugging Face :

python -m onnxruntime_genai.models.builder -m microsoft/phi-2 -e cuda -p int4 -o ./example-models/phi2-int4-cuda

Here is the python code to reproduce the error :

import onnxruntime_genai as og
import time

prompt = '''def is_prime(n):
    """
    Determine if n is prime or not
    """'''

model=og.Model(f'example-models\phi2-int4-cuda')

tokenizer = og.Tokenizer(model)

tokens = tokenizer.encode(prompt)

params=og.GeneratorParams(model)
params.set_search_options(max_length=100)
params.input_ids = tokens

start_time = time.time()

output_tokens=model.generate(params)[0]

end_time = time.time()

text = tokenizer.decode(output_tokens)

print(text)

Here is the error that I obtain :

onnxruntime_genai.onnxruntime_genai.OrtException: Non-zero status code returned while running GroupQueryAttention node. Name:'/model/layers.0/attn/GroupQueryAttention' Status Message: cos_cache dimension 1 must be <= head_size / 2 and a multiple of 8.

OS : Windows 10
Architecture : x64
Language : Python
Onnxruntime version : 1.17.1
Onnxruntime_genai version : 0.3.0-dev
Cuda version : 12.3

The text was updated successfully, but these errors were encountered:

natke · 2024-05-21T18:08:49Z

Hi, can you please try with ONNX Runtime version 1.18.0?

https://github.com/microsoft/onnxruntime/releases/tag/v1.18.0

Positronx · 2024-05-22T12:40:34Z

Thank you @natke , that solved the issue.

Positronx closed this as completed May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

onnxruntime_genai.onnxruntime_genai.OrtException: Non-zero status code returned while running GroupQueryAttention node. Name:'/model/layers.0/attn/GroupQueryAttention' Status Message: cos_cache dimension 1 must be <= head_size / 2 and a multiple of 8. #482

onnxruntime_genai.onnxruntime_genai.OrtException: Non-zero status code returned while running GroupQueryAttention node. Name:'/model/layers.0/attn/GroupQueryAttention' Status Message: cos_cache dimension 1 must be <= head_size / 2 and a multiple of 8. #482

Positronx commented May 21, 2024 •

edited

natke commented May 21, 2024

Positronx commented May 22, 2024

onnxruntime_genai.onnxruntime_genai.OrtException: Non-zero status code returned while running GroupQueryAttention node. Name:'/model/layers.0/attn/GroupQueryAttention' Status Message: cos_cache dimension 1 must be <= head_size / 2 and a multiple of 8. #482

onnxruntime_genai.onnxruntime_genai.OrtException: Non-zero status code returned while running GroupQueryAttention node. Name:'/model/layers.0/attn/GroupQueryAttention' Status Message: cos_cache dimension 1 must be <= head_size / 2 and a multiple of 8. #482

Comments

Positronx commented May 21, 2024 • edited

natke commented May 21, 2024

Positronx commented May 22, 2024

Positronx commented May 21, 2024 •

edited