You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OS: Windows 11, running Text Generation WebUI, up to date on all releases.
Processor: Intel Core i5-8500 3GHz (6 Cores - no HT)
Memory: 16GB System Memory
GPUs: Five nVidia RTX 3600 - 12GB VRAM versions (First iteration during Covid)
Model: Coomand-R-35B-v1-OLD_Q4_K_M.gguf
Model Parameters:
n-gpu-layers: 41 (41 of 41, loading FULLY into VRAM)
The error is exactly what it says, a call to cudaMalloc failed with an "out of memory" error, which is something outside the control of llama.cpp. NVIDIA may be able to help you with that, but I suspect that increasing the amount of system memory would fix the issue, I have found that CUDA allocations can fail when low on system memory.
I moved all cards to a new system, with 128GB of system ram, same issue is occuring. The model loads without issue on transformers in 4bit, 8 bit, and full load in vram. It also loads in Q8 as a GGUF on the same system. Something about command-R is not happy with llama.cpp. @slaren it uses ~77GB of system ram and 96GB of vram before bombing when I choose llama.cpp. That's not normal. A 35B model should fit just fine in 96GB vram and 128GB system memory at 8192 context.
OS: Windows 11, running Text Generation WebUI, up to date on all releases.
Processor: Intel Core i5-8500 3GHz (6 Cores - no HT)
Memory: 16GB System Memory
GPUs: Five nVidia RTX 3600 - 12GB VRAM versions (First iteration during Covid)
Model: Coomand-R-35B-v1-OLD_Q4_K_M.gguf
Model Parameters:
Output from Model Load:
This really doesn't make any sense to me, as a 35B paramter at Q4 should load into 50GB VRAM without issue.
The text was updated successfully, but these errors were encountered: