Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to use fine-tuned Llama 3 model on CPU #477

Open
code-ksu opened this issue May 16, 2024 · 3 comments
Open

Unable to use fine-tuned Llama 3 model on CPU #477

code-ksu opened this issue May 16, 2024 · 3 comments

Comments

@code-ksu
Copy link

Hello,

I have fine-tuned a Llama 3 model and now I would love to use it on a CPU. I tried to use device_map = 'cpu' when loading the model.
However, I am still encountering CUDA issues such as

RuntimeError: CUDA error: an illegal memory access was encountered or my kernel crashing.

After taking a deeper look into the code, I've noticed that many parts are hardwired to use CUDA: https://github.com/search?q=repo%3Aunslothai%2Funsloth+cuda&type=code

Could you provide any tips on how to use my fine-tuned model on the CPU, or let me know if it's not possible?

Thank you!

@danielhanchen
Copy link
Contributor

Oh for inference on CPU only, please use transformers directly - sadly we don't support CPU

@code-ksu
Copy link
Author

Thank you for your answer. I already feared that would be the case.
I was wondering if it is possible to convert the model I already trained with unsloth into transformers? Or is there a way to import the checkpoints into a compatible transformer model?

@erwe324
Copy link

erwe324 commented May 25, 2024

@code-ksu I believe the model can be loaded directly in to Transformers. Moreover, I dont know your use case but converting to GGUF (llama.cpp) may also help for CPU inference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants