Concourse workers keep restarting #8953
Unanswered
Anonymous-Coward
asked this question in
Q&A
Replies: 1 comment
-
Based on the code here: concourse/worker/healthchecker.go Lines 44 to 62 in 705404a You should check your logs for either |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We run a concourse setup (v7.10.0) on Kubernetes. Workers use persistent volume (claims) for their data. We have a few hundred pipelines, with on average probably less than 3 jobs and 10 input resources.
We recently updated to Kubernetes 1.29.2. Since the upgrade, workers keep restarting because the liveness check occasionally returns 503. Things were running smoothly before that, with Kubernetes 1.27.x.
This effectively renders concourse unusable - workers restarting keep breaking jobs, since tasks in progress keep disappearing.
What can be the cause?
What I've tried so far: set the initial delay for the liveness check to 120s, set the timeout for the liveness check to 15s, pausing all pipelines, scaling everything to 0, deleting all persistent volumes, also deleting everything from the volumes and containers tables in the concourse database, then scaling up again to 6 workers - that's how many we used before the Kubernetes update. Less than 30 minutes after the restart, every single worker already has restarted several times, even with no pipeline enabled.
What I see in the logs - the only suspicious thing:
Beta Was this translation helpful? Give feedback.
All reactions