New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update flannel to use sd-notify #24
Conversation
Note that systemd will still prompt a daemon-reload for docker. |
Nice, LGTM, if it works. :^) |
great! The only disadvantage is that we will need runc 1.0. On my Fedora 25 I still see runc version 0.1.1 |
I believe the functionality was added to runc 0.9.0. I did some testing in f25 and it seemed to work fine with 0.1.1 (I put in a delay and docker waited for the sd-notify --ready). Unless you mean something else @giuseppe |
I cannot get it working on my machine. I think I am hitting: systemd/systemd#2739 I can confirm that if I add a
If systemd-notify exits before systemd handles the message I have in the journald:
Also https://github.com/shishir-a412ed/runc-notify/blob/master/sd_notify.c works for me, as the programs doesn't exit after the call to This was the reason I added the hacky loop in service.template :( |
Strange, I cannot replicate that locally on f25, will take a deeper look first. Although locally, for a simple run script I set up, the service seems to notify ready before I hit sd_notify, somewhat of the opposite of what you're seeing. I suppose using the c command may be better if it can avoid the potential race condition. Will take a look. |
I still cannot replicate the systemd issue, but I changed it to use shishir's sd_notify and added the binary. Does this version work for you @giuseppe ? Note: the service runs fine, but when I stop the service, the service goes into a failed state, namely: |
@yuqi-zhang might it depend from the retcode of sd-notify? What happens if it exits with success? It should do that on signals instead of "return 1" and "return 2". |
Ah right, sorry let me fix that Edit: Upon further inspection it doesn't seem that handling SIGTERM changes anything. The current upstream image also displays said behaviour (i.e. /bin/runc start flannel fails with 143) without the notify. I'm still investigating, but do you think it would be ok to just mask that code as a success @giuseppe ? |
I opened a separate issue for the service failure for now. As for systemd-notify, we currently have 3 ways to include it as per discussion with @giuseppe , either as a binary, as a .c to be built in the container from source, or we can include it as part of atomic itself. What do you think @rhatdan ? |
I think adding this to atomic itself is the sanest thing. atomic-sdnotify. I was hoping we could do this in python, but I don's see any common bindings. |
@rhatdan runc 1.0.0-rc3 has been in -testing for a week now. Has the CVE-2016-9962 fix too. Everyone, please add karma so we can get it into stable: https://bodhi.fedoraproject.org/updates/FEDORA-2017-0200646669 |
before we add another tool, @yuqi-zhang, could you please verify if there is any work in progress in systemd/systemd#2739? Fixing it in systemd would probably be the best solution and we can use directly systemd-notify |
Looks like this stalled 9 months ago. |
Yeah there doesn't seem to be a fix in the foreseeable future. Interestingly this should really only happen in non-root cases (unless I'm mis-understanding the conversation in the original thread). I still haven't replicated the original issue locally either, so I'm not even sure I should open a new issue in systemd or something along those lines... |
it is a race condition (which I can reproduce almost all the time :)). Another solution would be to pass the PID used by runc inside the container itself, so that systemd-notify can use this PID instead of letting systemd guessing it and failing when the process is already gone. |
Isn't the way this is supposed to work is that PID 1 of the container is supposed to call sd_notify, which triggeres runc to call sd_notify which passes it on to systemd? |
from what I can see, all that runc does is to setup a bind mount to NOTIFY_SOCKET without intercepting anything. |
Is that a bug? Does systemd verify that the processes that is notifying the socket is the child of systemd? |
It is controlled via Controls access to the service status notification socket, as accessible via the sd_notify(3) call.Takes one of none (the default), main or all. If none, no daemon status updates are accepted from the service processes, all status update messages are ignored. If main, only service updates sent from the main process of the service are accepted. If all, all services updates from all members of the service's control group are accepted. This option should be set to open access to the notification socket when using Type=notify or WatchdogSec= (see above). If those options are used but NotifyAccess= is not configured, it will be implicitly set to main. We can specify to systemd-notify the PID of the process that is ready, but inside the container we will have PID from the new PID namespace so it is not really helpful, we will need to pass this information from outside. |
Well is PID1 of the container the same as runc's pid? runc just execs the pd and disappears, So as long as pid1 send the notify, it should work correctly. |
well the issue with the race condition still exists. I am able to reproduce here, a @yuqi-zhang have you a chance to verify it with systemd guys? |
The current support of systemd-notify has a race condition as the message send to the systemd notify socket might be dropped if the sender process is not running by the time systemd checks for the sender of the datagram. A proper fix of this in systemd would require changes to the kernel to maintain the cgroup of the sender process when it is dead (but it is not probably going to happen...) Generally, the solution to this issue is to specify the PID in the message itself so that systemd has not to guess the sender, but this wouldn't work when running in a PID namespace as the container will pass the PID known in its namespace (something like PID=1,2,3..) and systemd running on the host is not able to map it to the runc service. The proposed solution is to have a proxy in runc that forwards the messages to the host systemd. Example of this issue: projectatomic/atomic-system-containers#24 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@yuqi-zhang could you try if it still work with opencontainers/runc#1308 for you? It solves the issue for me here, let's see if it will be accepted upstream :) |
The current support of systemd-notify has a race condition as the message send to the systemd notify socket might be dropped if the sender process is not running by the time systemd checks for the sender of the datagram. A proper fix of this in systemd would require changes to the kernel to maintain the cgroup of the sender process when it is dead (but it is not probably going to happen...) Generally, the solution to this issue is to specify the PID in the message itself so that systemd has not to guess the sender, but this wouldn't work when running in a PID namespace as the container will pass the PID known in its namespace (something like PID=1,2,3..) and systemd running on the host is not able to map it to the runc service. The proposed solution is to have a proxy in runc that forwards the messages to the host systemd. Example of this issue: projectatomic/atomic-system-containers#24 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
The current support of systemd-notify has a race condition as the message send to the systemd notify socket might be dropped if the sender process is not running by the time systemd checks for the sender of the datagram. A proper fix of this in systemd would require changes to the kernel to maintain the cgroup of the sender process when it is dead (but it is not probably going to happen...) Generally, the solution to this issue is to specify the PID in the message itself so that systemd has not to guess the sender, but this wouldn't work when running in a PID namespace as the container will pass the PID known in its namespace (something like PID=1,2,3..) and systemd running on the host is not able to map it to the runc service. The proposed solution is to have a proxy in runc that forwards the messages to the host systemd. Example of this issue: projectatomic/atomic-system-containers#24 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
The current support of systemd-notify has a race condition as the message send to the systemd notify socket might be dropped if the sender process is not running by the time systemd checks for the sender of the datagram. A proper fix of this in systemd would require changes to the kernel to maintain the cgroup of the sender process when it is dead (but it is not probably going to happen...) Generally, the solution to this issue is to specify the PID in the message itself so that systemd has not to guess the sender, but this wouldn't work when running in a PID namespace as the container will pass the PID known in its namespace (something like PID=1,2,3..) and systemd running on the host is not able to map it to the runc service. The proposed solution is to have a proxy in runc that forwards the messages to the host systemd. Example of this issue: projectatomic/atomic-system-containers#24 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe the patch works for me as well. If it gets accepted I'll revert back to the CLI notify. Thanks for the patch! |
@yuqi-zhang Add a comment on the runc patch please. |
The current support of systemd-notify has a race condition as the message send to the systemd notify socket might be dropped if the sender process is not running by the time systemd checks for the sender of the datagram. A proper fix of this in systemd would require changes to the kernel to maintain the cgroup of the sender process when it is dead (but it is not probably going to happen...) Generally, the solution to this issue is to specify the PID in the message itself so that systemd has not to guess the sender, but this wouldn't work when running in a PID namespace as the container will pass the PID known in its namespace (something like PID=1,2,3..) and systemd running on the host is not able to map it to the runc service. The proposed solution is to have a proxy in runc that forwards the messages to the host systemd. Example of this issue: projectatomic/atomic-system-containers#24 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
The current support of systemd-notify has a race condition as the message send to the systemd notify socket might be dropped if the sender process is not running by the time systemd checks for the sender of the datagram. A proper fix of this in systemd would require changes to the kernel to maintain the cgroup of the sender process when it is dead (but it is not probably going to happen...) Generally, the solution to this issue is to specify the PID in the message itself so that systemd has not to guess the sender, but this wouldn't work when running in a PID namespace as the container will pass the PID known in its namespace (something like PID=1,2,3..) and systemd running on the host is not able to map it to the runc service. The proposed solution is to have a proxy in runc that forwards the messages to the host systemd. Example of this issue: projectatomic/atomic-system-containers#24 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
The current support of systemd-notify has a race condition as the message send to the systemd notify socket might be dropped if the sender process is not running by the time systemd checks for the sender of the datagram. A proper fix of this in systemd would require changes to the kernel to maintain the cgroup of the sender process when it is dead (but it is not probably going to happen...) Generally, the solution to this issue is to specify the PID in the message itself so that systemd has not to guess the sender, but this wouldn't work when running in a PID namespace as the container will pass the PID known in its namespace (something like PID=1,2,3..) and systemd running on the host is not able to map it to the runc service. The proposed solution is to have a proxy in runc that forwards the messages to the host systemd. Example of this issue: projectatomic/atomic-system-containers#24 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@yuqi-zhang the patch got merged into runc. We can use systemd-notify now. |
Remove hacky loop to check for Docker configs, and update to use sd-notify instead to alert systemd when flannel is ready. Signed-off-by: Yu Qi Zhang <jerzhang@redhat.com>
cba7f48
to
18d69b4
Compare
@giuseppe I've reverted it to use the CLI sd-notify. Thanks for the patch in runc! |
the patch LGTM, thanks for working on it. Let's wait until the patched runc gets into Fedora before we merge this PR |
☀️ Test successful - status-redhatci |
Remove hacky loop to check for Docker configs, and update to use
sd-notify instead to alert systemd when flannel is ready.