Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while running Phi-3 with DML #336

Open
tomas-pet opened this issue Apr 27, 2024 · 9 comments
Open

Error while running Phi-3 with DML #336

tomas-pet opened this issue Apr 27, 2024 · 9 comments
Assignees

Comments

@tomas-pet
Copy link

Here was my input command:
python model-qa.py -m Phi-3-mini-128k-instruct-onnx/directml/directml-int4-awq-block-128 -l 2048

Here is the error I am getting:
Input: hi

Output: Traceback (most recent call last):
File "model-qa.py", line 82, in
main(args)
File "model-qa.py", line 47, in main
generator.compute_logits()
onnxruntime_genai.onnxruntime_genai.OrtException: Failed to parse the cuda graph annotation id: -1

@PatriceVignola
Copy link
Contributor

Hi @tomas-pet,
Which version of onnxruntime-genai-directml do you have?

@tomas-pet
Copy link
Author

I am using latest

@tomas-pet
Copy link
Author

Any update on this?

@natke
Copy link
Contributor

natke commented Apr 30, 2024

@tomas-pet We need some more information to try and repro this.

Can you share the output of pip list?

Which model are you using? Did you download it from HuggingFace?

Can you share the genai_config.json file from the model folder please.

@tomas-pet
Copy link
Author

Here is output of pip list:
Package Version


accelerate 0.29.2
aiohttp 3.9.3
aiosignal 1.3.1
async-timeout 4.0.3
attrs 23.2.0
auto-gptq 0.7.1
certifi 2024.2.2
charset-normalizer 3.3.2
cmake 3.29.1
colorama 0.4.6
coloredlogs 15.0.1
datasets 2.18.0
diffusers 0.27.2
dill 0.3.8
filelock 3.13.3
flatbuffers 24.3.25
frozenlist 1.4.1
fsspec 2024.2.0
gekko 1.1.0
huggingface-hub 0.22.2
humanfriendly 10.0
idna 3.6
importlib-metadata 7.1.0
inquirerpy 0.3.4
Jinja2 3.1.3
MarkupSafe 2.1.5
mpmath 1.3.0
multidict 6.0.5
multiprocess 0.70.16
networkx 3.1
numpy 1.24.4
onnx 1.16.0
onnxruntime-directml 1.17.3
onnxruntime-genai 0.1.0
onnxruntime-genai-directml 0.2.0rc4
optimum 1.18.0
ort-nightly-qnn 1.18.0.dev20240428001
packaging 24.0
pandas 2.0.3
peft 0.10.0
pfzy 0.3.4
pillow 10.3.0
pip 21.1.1
prompt-toolkit 3.0.43
protobuf 5.26.1
psutil 5.9.8
pyarrow 15.0.2
pyarrow-hotfix 0.6
pyreadline3 3.4.1
python-dateutil 2.9.0.post0
pytz 2024.1
PyYAML 6.0.1
regex 2023.12.25
requests 2.31.0
rouge 1.0.1
safetensors 0.4.2
sentencepiece 0.2.0
setuptools 56.0.0
six 1.16.0
sympy 1.12
tokenizers 0.15.2
torch 2.2.2
tqdm 4.66.2
transformers 4.40.0.dev0
typing-extensions 4.10.0
tzdata 2024.1
urllib3 2.2.1
wcwidth 0.2.13
xxhash 3.4.1
yarl 1.9.4
zipp 3.18.1

I am following instructions from here: https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/phi-3-tutorial.md . I am using this model: Phi-3-mini-128k-instruct-onnx. I am using exact instructions from the MD to download model: git clone https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx

Attached is genai_config.json.
genai_config.json

@PatriceVignola
Copy link
Contributor

Hi @tomas-pet,

Which hardware are you using? I see that you have the ort-nightly-qnn package installed, but onnxruntime-genai-directml doesn't officially support ARM builds yet. It might work due to the x64 emulation layer, but it will probably use a lot more memory than expected and won't be nearly as performant, and it might break in unexpected ways.

Nevertheless, we recently made changes to adapter selection that could potentially fix your issue. You can test it out by building from source.

@tomas-pet
Copy link
Author

Still getting same error. the problem is in your phi3-qa.py.
Look at this line:
params.try_use_cuda_graph_with_max_batch_size(1)

This is by default thinking I am using CUDA

@PatriceVignola
Copy link
Contributor

params.try_use_cuda_graph_with_max_batch_size(1) is not the issue here (it is misleading, but it also enables DML graph. Probably something that we should rename eventually).

Can you tell me which hardware you're trying to run on? We don't support ARM builds yet, and although it might work with the x64 emulation layer, it's probably not going to be the best experience even if it does work. We'll be adding ARM builds in the future to have a good native experience on those devices.

Either way, if you tell me which GPU/hardware you're running on, I can try to see if I can reproduce your issue.

@natke
Copy link
Contributor

natke commented May 21, 2024

Hi @tomas-pet, can you please share the hardware you are running on?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants