Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMD ROCm: 8x22B Model Causes 100% GPU Utilization Stall #7344

Open
Trat8547 opened this issue May 17, 2024 · 0 comments
Open

AMD ROCm: 8x22B Model Causes 100% GPU Utilization Stall #7344

Trat8547 opened this issue May 17, 2024 · 0 comments

Comments

@Trat8547
Copy link

When running WizardLM-2-8x22B the model loads into VRAM but then freezes at 100% GPU usage when attempting to process kv cache. The power draw is 100W/300W and stays like this until terminating the server.

Oddly Llama-3-70B works perfectly fine for the setup below, but fails for other kernels and ROCm versions.

  • OS: Ubuntu 22.04.4
  • Linux Kernel: 5.19.0-50-generic
  • Virtualization: Xen Hypervisor
  • GPU: x2 MI100
  • ROCm: 6.0.0
  • Llama.cpp/Server Version: Any

When switching to kernel 6.5 with ROCm 6.0 or 6.1 neither Llama-3-70B or WizardLM-2-8x22B work causing the 100% stall bug.

  • iommu=pt has no effect
  • GPU_MAX_HW_QUEUES=1 has no effect for any ROCm version or kernel
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant