-
Notifications
You must be signed in to change notification settings - Fork 8.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuda : clear error after buffer allocation failure #7376
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that this error either occurs when ooming or when there is no CUDA device available.
It should only happen when oom, but the goal is to let the applications load a different model or with fewer layers offloaded without crashing the process or creating more errors. |
Buffer allocation should be a recoverable error, but the CUDA error was not cleared, which may cause the next operation to fail.