AMD ROCm: 8x22B Model Causes 100% GPU Utilization Stall #7344

Trat8547 · 2024-05-17T16:25:06Z

When running WizardLM-2-8x22B the model loads into VRAM but then freezes at 100% GPU usage when attempting to process kv cache. The power draw is 100W/300W and stays like this until terminating the server.

Oddly Llama-3-70B works perfectly fine for the setup below, but fails for other kernels and ROCm versions.

OS: Ubuntu 22.04.4
Linux Kernel: 5.19.0-50-generic
Virtualization: Xen Hypervisor
GPU: x2 MI100
ROCm: 6.0.0
Llama.cpp/Server Version: Any

When switching to kernel 6.5 with ROCm 6.0 or 6.1 neither Llama-3-70B or WizardLM-2-8x22B work causing the 100% stall bug.

iommu=pt has no effect
GPU_MAX_HW_QUEUES=1 has no effect for any ROCm version or kernel

Trat8547 added the bug-unconfirmed label May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMD ROCm: 8x22B Model Causes 100% GPU Utilization Stall #7344

AMD ROCm: 8x22B Model Causes 100% GPU Utilization Stall #7344

Trat8547 commented May 17, 2024

AMD ROCm: 8x22B Model Causes 100% GPU Utilization Stall #7344

AMD ROCm: 8x22B Model Causes 100% GPU Utilization Stall #7344

Comments

Trat8547 commented May 17, 2024