-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Sidecar Containers] Sidecar containers finish time needs to be accounted for in job controller #124942
[Sidecar Containers] Sidecar containers finish time needs to be accounted for in job controller #124942
Conversation
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall
e8cf4da
to
b7d9a05
Compare
b7d9a05
to
020b059
Compare
020b059
to
57e2681
Compare
57e2681
to
e4348a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/assign @alculquicondor
/cc @atiratree
LGTM label has been added. Git tree hash: f084e0312fa47363218ca898c6a51c71a0c8ca03
|
pkg/controller/job/backoff_utils.go
Outdated
// We need to check InitContainerStatuses here also, | ||
// because with the sidecar (restartable init) containers, | ||
// sidecar containers will always finish later than regular containers. | ||
return latestFinishTime(finishTime, p.Status.InitContainerStatuses) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about checking the feature gate here to not break the existing behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if a sidecar container is already created, we should always take it into consideration. Regardless of whether the featuregate is enabled.
And for other init containers(not sidecar). I think this won't break the existing behavior as they will always terminate before the regular container.
pkg/controller/job/backoff_utils.go
Outdated
// We need to check InitContainerStatuses here also, | ||
// because with the sidecar (restartable init) containers, | ||
// sidecar containers will always finish later than regular containers. | ||
return latestFinishTime(finishTime, p.Status.InitContainerStatuses) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to only check finish time of restartable init containers - otherwise we may change the current behavior for some Pod that behaves badly and reports incorrect finish time for the regular init container. I am mostly concerned of some badly written controllers that set this time to some random large value for the regular init containers.
Since sidecars are new, accounting for them is not supposed to break annything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am mostly concerned of some badly written controllers that set this time to some random large value for the regular init containers.
Well, I'm not aware of this...
I added check in the second commit, let's wait for some reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I liked the simplicity of the previous solution, but I'm not very familiar with the sidecar containers yet.
I am mostly concerned of some badly written controllers that set this time to some random large value for the regular init containers.
Would the container finish time, in case of sidecar containers, be set by a user-supplied controller? I think if the finish at time is set by kubelet then we could rely on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is set by kubelet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can simplify the code and check the InitContainers regardless of the restartPolicy
. For safety, we just wrap with the SidecarContainers
feature-gate check as suggested in #124942 (comment).
This way we complicate the code only until the feature-gate is around. WDYT @alculquicondor @SergeyKanzhelev ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only worry is to encounter some crazy controller like one that made me write this comment: #124918. Semantically I am all for not checking for restartPolicy
. Breaking some barely supported scenario is what keeping me uneasy here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, I agree with the current implementation that restricts the change to initContainers with restartPolicy: Always.
aa3d4b0
to
3a2a500
Compare
pkg/controller/job/backoff_utils.go
Outdated
if containerState.State.Terminated == nil { | ||
return nil | ||
finishTime := latestFinishTime(nil, p.Status.ContainerStatuses, nil) | ||
// We need to check InitContainerStatuses here also, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code doesn't look risky, but we should be consistent and check the feature gate to skip the additional calculations when the feature is disabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, most of cases it is. However there is almost no validation on consistency of statuses: #124918
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you saying that the code i this PR is more risky than it appears?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the only risk is for the crazy scenarios when some sort of virtual kubelet was setting some bad values for init containers and it used to be ignored before. This is why I suggest only add a logic for sidecars.
Added check of feature gate to skip the additional calculations. |
Quick question: Is there any other component in Kubernetes that cares about the latest termination timestamp of a Pod? |
Not sure about this. But I check the code, only find kubelet and job controller using this field |
Wondering if the code for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
one non-blocking nit
/assign @atiratree @alculquicondor
LGTM label has been added. Git tree hash: 0ae82c023f46d7e05e2d9c4976524c8cbf1c2d19
|
ba5cbf7
to
d972820
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
LGTM label has been added. Git tree hash: 70ea193e40b21a87c09f2ad5596626f282062d5b
|
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, AxeZhan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Today the function getFinishTimeFromContainers (pkg/controller/job/backoff_utils.go) only account for the regular containers finish time,
presumably assuming that init containers have completed before before.
However, with the sidecar (restartable init) containers, sidecar containers will always finish later than
regular containers. And those needs to be accounted for the calculation.
Which issue(s) this PR fixes:
Fixes #124937
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: