Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container suddenly stopping without explicit reason #1652

Open
tontan2545 opened this issue May 8, 2024 · 0 comments
Open

Container suddenly stopping without explicit reason #1652

tontan2545 opened this issue May 8, 2024 · 0 comments

Comments

@tontan2545
Copy link

tontan2545 commented May 8, 2024

Hi, I've been running a particular model in Kubernetes using Cog. Whenever we have high workloads (4-5 prediction in queue) the Cog model seems to be stopping without notifying the reason. We initially thought this was a memory issue, however upon further investigation we found that we still have plenty of memory left for it to be an issue.
It would be great if you could provide any hypothesis on this issue, looking forward to be following them.

Here's an example of the log, keep it mind that we have multiple replicas running and we are displaying logs on every pods.

Note: There's no presence of cog.server.runner exception logs at all, just plain shutdown by cog http
Screenshot 2567-05-09 at 01 09 13

@tontan2545 tontan2545 changed the title COG Container suddenly stopping without explicit reason Container suddenly stopping without explicit reason May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant