Unable to use fine-tuned Llama 3 model on CPU #477

code-ksu · 2024-05-16T15:30:23Z

Hello,

I have fine-tuned a Llama 3 model and now I would love to use it on a CPU. I tried to use device_map = 'cpu' when loading the model.
However, I am still encountering CUDA issues such as

RuntimeError: CUDA error: an illegal memory access was encountered or my kernel crashing.

After taking a deeper look into the code, I've noticed that many parts are hardwired to use CUDA: https://github.com/search?q=repo%3Aunslothai%2Funsloth+cuda&type=code

Could you provide any tips on how to use my fine-tuned model on the CPU, or let me know if it's not possible?

Thank you!

The text was updated successfully, but these errors were encountered:

danielhanchen · 2024-05-17T18:04:09Z

Oh for inference on CPU only, please use transformers directly - sadly we don't support CPU

code-ksu · 2024-05-17T19:36:16Z

Thank you for your answer. I already feared that would be the case.
I was wondering if it is possible to convert the model I already trained with unsloth into transformers? Or is there a way to import the checkpoints into a compatible transformer model?

erwe324 · 2024-05-25T15:56:08Z

@code-ksu I believe the model can be loaded directly in to Transformers. Moreover, I dont know your use case but converting to GGUF (llama.cpp) may also help for CPU inference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to use fine-tuned Llama 3 model on CPU #477

Unable to use fine-tuned Llama 3 model on CPU #477

code-ksu commented May 16, 2024

danielhanchen commented May 17, 2024

code-ksu commented May 17, 2024

erwe324 commented May 25, 2024

Unable to use fine-tuned Llama 3 model on CPU #477

Unable to use fine-tuned Llama 3 model on CPU #477

Comments

code-ksu commented May 16, 2024

danielhanchen commented May 17, 2024

code-ksu commented May 17, 2024

erwe324 commented May 25, 2024