Error while running Phi-3 with DML #336

tomas-pet · 2024-04-27T17:32:32Z

Here was my input command:
python model-qa.py -m Phi-3-mini-128k-instruct-onnx/directml/directml-int4-awq-block-128 -l 2048

Here is the error I am getting:
Input: hi

Output: Traceback (most recent call last):
File "model-qa.py", line 82, in
main(args)
File "model-qa.py", line 47, in main
generator.compute_logits()
onnxruntime_genai.onnxruntime_genai.OrtException: Failed to parse the cuda graph annotation id: -1

PatriceVignola · 2024-04-29T19:00:43Z

Hi @tomas-pet,
Which version of onnxruntime-genai-directml do you have?

tomas-pet · 2024-04-29T23:47:26Z

I am using latest

tomas-pet · 2024-04-30T16:17:43Z

Any update on this?

natke · 2024-04-30T23:27:20Z

@tomas-pet We need some more information to try and repro this.

Can you share the output of pip list?

Which model are you using? Did you download it from HuggingFace?

Can you share the genai_config.json file from the model folder please.

tomas-pet · 2024-05-01T06:17:18Z

Here is output of pip list:
Package Version

accelerate 0.29.2
aiohttp 3.9.3
aiosignal 1.3.1
async-timeout 4.0.3
attrs 23.2.0
auto-gptq 0.7.1
certifi 2024.2.2
charset-normalizer 3.3.2
cmake 3.29.1
colorama 0.4.6
coloredlogs 15.0.1
datasets 2.18.0
diffusers 0.27.2
dill 0.3.8
filelock 3.13.3
flatbuffers 24.3.25
frozenlist 1.4.1
fsspec 2024.2.0
gekko 1.1.0
huggingface-hub 0.22.2
humanfriendly 10.0
idna 3.6
importlib-metadata 7.1.0
inquirerpy 0.3.4
Jinja2 3.1.3
MarkupSafe 2.1.5
mpmath 1.3.0
multidict 6.0.5
multiprocess 0.70.16
networkx 3.1
numpy 1.24.4
onnx 1.16.0
onnxruntime-directml 1.17.3
onnxruntime-genai 0.1.0
onnxruntime-genai-directml 0.2.0rc4
optimum 1.18.0
ort-nightly-qnn 1.18.0.dev20240428001
packaging 24.0
pandas 2.0.3
peft 0.10.0
pfzy 0.3.4
pillow 10.3.0
pip 21.1.1
prompt-toolkit 3.0.43
protobuf 5.26.1
psutil 5.9.8
pyarrow 15.0.2
pyarrow-hotfix 0.6
pyreadline3 3.4.1
python-dateutil 2.9.0.post0
pytz 2024.1
PyYAML 6.0.1
regex 2023.12.25
requests 2.31.0
rouge 1.0.1
safetensors 0.4.2
sentencepiece 0.2.0
setuptools 56.0.0
six 1.16.0
sympy 1.12
tokenizers 0.15.2
torch 2.2.2
tqdm 4.66.2
transformers 4.40.0.dev0
typing-extensions 4.10.0
tzdata 2024.1
urllib3 2.2.1
wcwidth 0.2.13
xxhash 3.4.1
yarl 1.9.4
zipp 3.18.1

I am following instructions from here: https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/phi-3-tutorial.md . I am using this model: Phi-3-mini-128k-instruct-onnx. I am using exact instructions from the MD to download model: git clone https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx

Attached is genai_config.json.
genai_config.json

PatriceVignola · 2024-05-02T09:01:53Z

Hi @tomas-pet,

Which hardware are you using? I see that you have the ort-nightly-qnn package installed, but onnxruntime-genai-directml doesn't officially support ARM builds yet. It might work due to the x64 emulation layer, but it will probably use a lot more memory than expected and won't be nearly as performant, and it might break in unexpected ways.

Nevertheless, we recently made changes to adapter selection that could potentially fix your issue. You can test it out by building from source.

tomas-pet · 2024-05-05T19:58:16Z

Still getting same error. the problem is in your phi3-qa.py.
Look at this line:
params.try_use_cuda_graph_with_max_batch_size(1)

This is by default thinking I am using CUDA

PatriceVignola · 2024-05-06T09:08:57Z

params.try_use_cuda_graph_with_max_batch_size(1) is not the issue here (it is misleading, but it also enables DML graph. Probably something that we should rename eventually).

Can you tell me which hardware you're trying to run on? We don't support ARM builds yet, and although it might work with the x64 emulation layer, it's probably not going to be the best experience even if it does work. We'll be adding ARM builds in the future to have a good native experience on those devices.

Either way, if you tell me which GPU/hardware you're running on, I can try to see if I can reproduce your issue.

natke · 2024-05-21T19:12:09Z

Hi @tomas-pet, can you please share the hardware you are running on?

yufenglee assigned PatriceVignola Apr 29, 2024

yufenglee mentioned this issue May 1, 2024

Ignore software adapters from DML device enumeration and use DXGI enumeration instead of dxcore #377

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error while running Phi-3 with DML #336

Error while running Phi-3 with DML #336

tomas-pet commented Apr 27, 2024

PatriceVignola commented Apr 29, 2024

tomas-pet commented Apr 29, 2024

tomas-pet commented Apr 30, 2024

natke commented Apr 30, 2024

tomas-pet commented May 1, 2024

PatriceVignola commented May 2, 2024

tomas-pet commented May 5, 2024

PatriceVignola commented May 6, 2024

natke commented May 21, 2024

Error while running Phi-3 with DML #336

Error while running Phi-3 with DML #336

Comments

tomas-pet commented Apr 27, 2024

PatriceVignola commented Apr 29, 2024

tomas-pet commented Apr 29, 2024

tomas-pet commented Apr 30, 2024

natke commented Apr 30, 2024

tomas-pet commented May 1, 2024

PatriceVignola commented May 2, 2024

tomas-pet commented May 5, 2024

PatriceVignola commented May 6, 2024

natke commented May 21, 2024