Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

onnxruntime_genai.onnxruntime_genai.OrtException: Non-zero status code returned while running GroupQueryAttention node. Name:'/model/layers.0/attn/GroupQueryAttention' Status Message: cos_cache dimension 1 must be <= head_size / 2 and a multiple of 8. #482

Closed
Positronx opened this issue May 21, 2024 · 2 comments

Comments

@Positronx
Copy link

Positronx commented May 21, 2024

I built onnxruntime_genai from source with the cuda execution provider then installed the python wheel.
I tried to run the model microsoft/phi-2 but it seems there is a problem with the GroupQueryAttention node.

Here is the command to build the phi-2 model from Hugging Face :

python -m onnxruntime_genai.models.builder -m microsoft/phi-2 -e cuda -p int4 -o ./example-models/phi2-int4-cuda

Here is the python code to reproduce the error :

import onnxruntime_genai as og
import time

prompt = '''def is_prime(n):
    """
    Determine if n is prime or not
    """'''

model=og.Model(f'example-models\phi2-int4-cuda')

tokenizer = og.Tokenizer(model)

tokens = tokenizer.encode(prompt)

params=og.GeneratorParams(model)
params.set_search_options(max_length=100)
params.input_ids = tokens

start_time = time.time()

output_tokens=model.generate(params)[0]

end_time = time.time()

text = tokenizer.decode(output_tokens)

print(text)

Here is the error that I obtain :

onnxruntime_genai.onnxruntime_genai.OrtException: Non-zero status code returned while running GroupQueryAttention node. Name:'/model/layers.0/attn/GroupQueryAttention' Status Message: cos_cache dimension 1 must be <= head_size / 2 and a multiple of 8.

OS : Windows 10
Architecture : x64
Language : Python
Onnxruntime version : 1.17.1
Onnxruntime_genai version : 0.3.0-dev
Cuda version : 12.3

@natke
Copy link
Contributor

natke commented May 21, 2024

Hi, can you please try with ONNX Runtime version 1.18.0?

https://github.com/microsoft/onnxruntime/releases/tag/v1.18.0

@Positronx
Copy link
Author

Thank you @natke , that solved the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants