You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
onnxruntime_genai.onnxruntime_genai.OrtException: Non-zero status code returned while running GroupQueryAttention node. Name:'/model/layers.0/attn/GroupQueryAttention' Status Message: cos_cache dimension 1 must be <= head_size / 2 and a multiple of 8.
#482
Closed
Positronx opened this issue
May 21, 2024
· 2 comments
I built onnxruntime_genai from source with the cuda execution provider then installed the python wheel.
I tried to run the model microsoft/phi-2 but it seems there is a problem with the GroupQueryAttention node.
Here is the command to build the phi-2 model from Hugging Face :
import onnxruntime_genai as og
import time
prompt = '''def is_prime(n):
"""
Determine if n is prime or not
"""'''
model=og.Model(f'example-models\phi2-int4-cuda')
tokenizer = og.Tokenizer(model)
tokens = tokenizer.encode(prompt)
params=og.GeneratorParams(model)
params.set_search_options(max_length=100)
params.input_ids = tokens
start_time = time.time()
output_tokens=model.generate(params)[0]
end_time = time.time()
text = tokenizer.decode(output_tokens)
print(text)
Here is the error that I obtain :
onnxruntime_genai.onnxruntime_genai.OrtException: Non-zero status code returned while running GroupQueryAttention node. Name:'/model/layers.0/attn/GroupQueryAttention' Status Message: cos_cache dimension 1 must be <= head_size / 2 and a multiple of 8.
OS : Windows 10
Architecture : x64
Language : Python
Onnxruntime version : 1.17.1
Onnxruntime_genai version : 0.3.0-dev
Cuda version : 12.3
The text was updated successfully, but these errors were encountered:
I built onnxruntime_genai from source with the cuda execution provider then installed the python wheel.
I tried to run the model microsoft/phi-2 but it seems there is a problem with the GroupQueryAttention node.
Here is the command to build the phi-2 model from Hugging Face :
python -m onnxruntime_genai.models.builder -m microsoft/phi-2 -e cuda -p int4 -o ./example-models/phi2-int4-cuda
Here is the python code to reproduce the error :
Here is the error that I obtain :
OS : Windows 10
Architecture : x64
Language : Python
Onnxruntime version : 1.17.1
Onnxruntime_genai version : 0.3.0-dev
Cuda version : 12.3
The text was updated successfully, but these errors were encountered: