Update flannel to use sd-notify #24

yuqi-zhang · 2017-01-10T22:04:39Z

Remove hacky loop to check for Docker configs, and update to use
sd-notify instead to alert systemd when flannel is ready.

yuqi-zhang · 2017-01-10T22:05:34Z

Note that systemd will still prompt a daemon-reload for docker.

rhatdan · 2017-01-11T13:32:25Z

Nice, LGTM, if it works. :^)

giuseppe · 2017-01-11T13:42:40Z

great! The only disadvantage is that we will need runc 1.0. On my Fedora 25 I still see runc version 0.1.1

rhatdan · 2017-01-11T15:23:39Z

@lsm5 @runcom can we get the version of runc on Fedora 25 updated.

yuqi-zhang · 2017-01-11T17:17:52Z

I believe the functionality was added to runc 0.9.0. I did some testing in f25 and it seemed to work fine with 0.1.1 (I put in a delay and docker waited for the sd-notify --ready). Unless you mean something else @giuseppe

giuseppe · 2017-01-11T20:32:08Z

I cannot get it working on my machine.

I think I am hitting: systemd/systemd#2739

I can confirm that if I add a sleep() in systemd/src/notify/notify.c so to keep the process alive, then it works:

diff --git a/src/notify/notify.c b/src/notify/notify.c
index 70b6f86..3117277 100644
--- a/src/notify/notify.c
+++ b/src/notify/notify.c
@@ -190,6 +190,7 @@ int main(int argc, char* argv[]) {
         }
 
         r = sd_pid_notify(arg_pid ? arg_pid : getppid(), false, n);
+        sleep(5);
         if (r < 0) {
                 log_error_errno(r, "Failed to notify init system: %m");
                 goto finish;

If systemd-notify exits before systemd handles the message I have in the journald:

Jan 11 21:28:31 helium systemd[1]: Cannot find unit for notify message of PID 15778.

Also https://github.com/shishir-a412ed/runc-notify/blob/master/sd_notify.c works for me, as the programs doesn't exit after the call to sd_notify.

This was the reason I added the hacky loop in service.template :(

yuqi-zhang · 2017-01-11T20:41:09Z

Strange, I cannot replicate that locally on f25, will take a deeper look first.

Although locally, for a simple run script I set up, the service seems to notify ready before I hit sd_notify, somewhat of the opposite of what you're seeing. I suppose using the c command may be better if it can avoid the potential race condition. Will take a look.

yuqi-zhang · 2017-01-16T20:58:11Z

I still cannot replicate the systemd issue, but I changed it to use shishir's sd_notify and added the binary. Does this version work for you @giuseppe ?

Note: the service runs fine, but when I stop the service, the service goes into a failed state, namely:
Process: 17985 ExecStart=/bin/runc start flannel (code=exited, status=143)
This does not otherwise affect the service itself. Have you seen this error?

giuseppe · 2017-01-17T11:06:46Z

@yuqi-zhang might it depend from the retcode of sd-notify? What happens if it exits with success? It should do that on signals instead of "return 1" and "return 2".

yuqi-zhang · 2017-01-17T14:31:13Z

Ah right, sorry let me fix that

Edit: Upon further inspection it doesn't seem that handling SIGTERM changes anything. The current upstream image also displays said behaviour (i.e. /bin/runc start flannel fails with 143) without the notify. I'm still investigating, but do you think it would be ok to just mask that code as a success @giuseppe ?

yuqi-zhang · 2017-01-18T18:12:27Z

I opened a separate issue for the service failure for now.

As for systemd-notify, we currently have 3 ways to include it as per discussion with @giuseppe , either as a binary, as a .c to be built in the container from source, or we can include it as part of atomic itself. What do you think @rhatdan ?

rhatdan · 2017-01-18T19:14:41Z

I think adding this to atomic itself is the sanest thing. atomic-sdnotify. I was hoping we could do this in python, but I don's see any common bindings.

lsm5 · 2017-01-18T19:29:36Z

@rhatdan runc 1.0.0-rc3 has been in -testing for a week now. Has the CVE-2016-9962 fix too. Everyone, please add karma so we can get it into stable: https://bodhi.fedoraproject.org/updates/FEDORA-2017-0200646669

giuseppe · 2017-01-18T21:28:37Z

before we add another tool, @yuqi-zhang, could you please verify if there is any work in progress in systemd/systemd#2739? Fixing it in systemd would probably be the best solution and we can use directly systemd-notify

rhatdan · 2017-01-19T14:09:37Z

Looks like this stalled 9 months ago.

yuqi-zhang · 2017-01-19T21:45:45Z

Yeah there doesn't seem to be a fix in the foreseeable future. Interestingly this should really only happen in non-root cases (unless I'm mis-understanding the conversation in the original thread). I still haven't replicated the original issue locally either, so I'm not even sure I should open a new issue in systemd or something along those lines...

giuseppe · 2017-01-19T22:55:43Z

it is a race condition (which I can reproduce almost all the time :)). Another solution would be to pass the PID used by runc inside the container itself, so that systemd-notify can use this PID instead of letting systemd guessing it and failing when the process is already gone.

rhatdan · 2017-01-20T15:15:51Z

Isn't the way this is supposed to work is that PID 1 of the container is supposed to call sd_notify, which triggeres runc to call sd_notify which passes it on to systemd?

giuseppe · 2017-01-20T15:47:44Z

from what I can see, all that runc does is to setup a bind mount to NOTIFY_SOCKET without intercepting anything.

rhatdan · 2017-01-20T17:38:56Z

Is that a bug? Does systemd verify that the processes that is notifying the socket is the child of systemd?

giuseppe · 2017-01-20T17:59:26Z

It is controlled via NotifyAccess:

Controls access to the service status notification socket, as accessible via the sd_notify(3) call.Takes one of none (the default), main or all. If none, no daemon status updates are accepted from the service processes, all status update messages are ignored. If main, only service updates sent from the main process of the service are accepted. If all, all services updates from all members of the service's control group are accepted. This option should be set to open access to the notification socket when using Type=notify or WatchdogSec= (see above). If those options are used but NotifyAccess= is not configured, it will be implicitly set to main.

We can specify to systemd-notify the PID of the process that is ready, but inside the container we will have PID from the new PID namespace so it is not really helpful, we will need to pass this information from outside.

rhatdan · 2017-01-20T19:09:28Z

Well is PID1 of the container the same as runc's pid? runc just execs the pd and disappears, So as long as pid1 send the notify, it should work correctly.

giuseppe · 2017-01-20T19:56:22Z

well the issue with the race condition still exists. I am able to reproduce here, a sleep() in systemd-notify after sd_notify is called makes it work on my machine.
If we don't specify any PID then system probably tries to get the PID of the caller process via SO_PEERCRED, but by the time it does that then the process is already dead. We cannot specify the PID as the PID namespace is different so telling PID 1 doesn't work either. We will need to pass this from outside, or get it fixed in systemd.

@yuqi-zhang have you a chance to verify it with systemd guys?

The current support of systemd-notify has a race condition as the message send to the systemd notify socket might be dropped if the sender process is not running by the time systemd checks for the sender of the datagram. A proper fix of this in systemd would require changes to the kernel to maintain the cgroup of the sender process when it is dead (but it is not probably going to happen...) Generally, the solution to this issue is to specify the PID in the message itself so that systemd has not to guess the sender, but this wouldn't work when running in a PID namespace as the container will pass the PID known in its namespace (something like PID=1,2,3..) and systemd running on the host is not able to map it to the runc service. The proposed solution is to have a proxy in runc that forwards the messages to the host systemd. Example of this issue: projectatomic/atomic-system-containers#24 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

giuseppe · 2017-02-03T14:30:18Z

@yuqi-zhang could you try if it still work with opencontainers/runc#1308 for you? It solves the issue for me here, let's see if it will be accepted upstream :)

The current support of systemd-notify has a race condition as the message send to the systemd notify socket might be dropped if the sender process is not running by the time systemd checks for the sender of the datagram. A proper fix of this in systemd would require changes to the kernel to maintain the cgroup of the sender process when it is dead (but it is not probably going to happen...) Generally, the solution to this issue is to specify the PID in the message itself so that systemd has not to guess the sender, but this wouldn't work when running in a PID namespace as the container will pass the PID known in its namespace (something like PID=1,2,3..) and systemd running on the host is not able to map it to the runc service. The proposed solution is to have a proxy in runc that forwards the messages to the host systemd. Example of this issue: projectatomic/atomic-system-containers#24 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

yuqi-zhang · 2017-02-07T21:47:44Z

@giuseppe the patch works for me as well. If it gets accepted I'll revert back to the CLI notify. Thanks for the patch!

rhatdan · 2017-02-07T21:50:33Z

@yuqi-zhang Add a comment on the runc patch please.

The current support of systemd-notify has a race condition as the message send to the systemd notify socket might be dropped if the sender process is not running by the time systemd checks for the sender of the datagram. A proper fix of this in systemd would require changes to the kernel to maintain the cgroup of the sender process when it is dead (but it is not probably going to happen...) Generally, the solution to this issue is to specify the PID in the message itself so that systemd has not to guess the sender, but this wouldn't work when running in a PID namespace as the container will pass the PID known in its namespace (something like PID=1,2,3..) and systemd running on the host is not able to map it to the runc service. The proposed solution is to have a proxy in runc that forwards the messages to the host systemd. Example of this issue: projectatomic/atomic-system-containers#24 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

giuseppe · 2017-02-27T12:24:44Z

@yuqi-zhang the patch got merged into runc. We can use systemd-notify now.

Remove hacky loop to check for Docker configs, and update to use sd-notify instead to alert systemd when flannel is ready. Signed-off-by: Yu Qi Zhang <jerzhang@redhat.com>

yuqi-zhang · 2017-02-27T16:42:52Z

@giuseppe I've reverted it to use the CLI sd-notify. Thanks for the patch in runc!

giuseppe · 2017-02-27T18:20:25Z

the patch LGTM, thanks for working on it.

Let's wait until the patched runc gets into Fedora before we merge this PR

giuseppe · 2017-05-16T14:05:00Z

@rh-atomic-bot r+ 18d69b4

rh-atomic-bot · 2017-05-16T14:05:09Z

⌛ Testing commit 18d69b4 with merge fb82b8b...

rh-atomic-bot · 2017-05-16T14:07:08Z

☀️ Test successful - status-redhatci
Approved by: giuseppe
Pushing fb82b8b to master...

yuqi-zhang force-pushed the sd-notify branch from 34d6098 to cba7f48 Compare January 16, 2017 20:54

yuqi-zhang mentioned this pull request Jan 20, 2017

Sometimes systemd-notify(3)'s cgroup="/", causing message to get dropped systemd/systemd#2739

Closed

giuseppe mentioned this pull request Feb 3, 2017

fix systemd-notify when using a different PID namespace opencontainers/runc#1308

Merged

Update flannel to use sd-notify

18d69b4

Remove hacky loop to check for Docker configs, and update to use sd-notify instead to alert systemd when flannel is ready. Signed-off-by: Yu Qi Zhang <jerzhang@redhat.com>

yuqi-zhang force-pushed the sd-notify branch from cba7f48 to 18d69b4 Compare February 27, 2017 16:42

rh-atomic-bot closed this in fb82b8b May 16, 2017

yuqi-zhang deleted the sd-notify branch June 8, 2017 15:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update flannel to use sd-notify #24

Update flannel to use sd-notify #24

yuqi-zhang commented Jan 10, 2017

yuqi-zhang commented Jan 10, 2017

rhatdan commented Jan 11, 2017

giuseppe commented Jan 11, 2017

rhatdan commented Jan 11, 2017

yuqi-zhang commented Jan 11, 2017

giuseppe commented Jan 11, 2017 •

edited

yuqi-zhang commented Jan 11, 2017

yuqi-zhang commented Jan 16, 2017 •

edited

giuseppe commented Jan 17, 2017

yuqi-zhang commented Jan 17, 2017 •

edited

yuqi-zhang commented Jan 18, 2017

rhatdan commented Jan 18, 2017

lsm5 commented Jan 18, 2017

giuseppe commented Jan 18, 2017

rhatdan commented Jan 19, 2017

yuqi-zhang commented Jan 19, 2017

giuseppe commented Jan 19, 2017

rhatdan commented Jan 20, 2017

giuseppe commented Jan 20, 2017

rhatdan commented Jan 20, 2017

giuseppe commented Jan 20, 2017 •

edited

rhatdan commented Jan 20, 2017

giuseppe commented Jan 20, 2017

giuseppe commented Feb 3, 2017

yuqi-zhang commented Feb 7, 2017

rhatdan commented Feb 7, 2017

giuseppe commented Feb 27, 2017

yuqi-zhang commented Feb 27, 2017

giuseppe commented Feb 27, 2017

giuseppe commented May 16, 2017

rh-atomic-bot commented May 16, 2017

rh-atomic-bot commented May 16, 2017

Update flannel to use sd-notify #24

Update flannel to use sd-notify #24

Conversation

yuqi-zhang commented Jan 10, 2017

yuqi-zhang commented Jan 10, 2017

rhatdan commented Jan 11, 2017

giuseppe commented Jan 11, 2017

rhatdan commented Jan 11, 2017

yuqi-zhang commented Jan 11, 2017

giuseppe commented Jan 11, 2017 • edited

yuqi-zhang commented Jan 11, 2017

yuqi-zhang commented Jan 16, 2017 • edited

giuseppe commented Jan 17, 2017

yuqi-zhang commented Jan 17, 2017 • edited

yuqi-zhang commented Jan 18, 2017

rhatdan commented Jan 18, 2017

lsm5 commented Jan 18, 2017

giuseppe commented Jan 18, 2017

rhatdan commented Jan 19, 2017

yuqi-zhang commented Jan 19, 2017

giuseppe commented Jan 19, 2017

rhatdan commented Jan 20, 2017

giuseppe commented Jan 20, 2017

rhatdan commented Jan 20, 2017

giuseppe commented Jan 20, 2017 • edited

rhatdan commented Jan 20, 2017

giuseppe commented Jan 20, 2017

giuseppe commented Feb 3, 2017

yuqi-zhang commented Feb 7, 2017

rhatdan commented Feb 7, 2017

giuseppe commented Feb 27, 2017

yuqi-zhang commented Feb 27, 2017

giuseppe commented Feb 27, 2017

giuseppe commented May 16, 2017

rh-atomic-bot commented May 16, 2017

rh-atomic-bot commented May 16, 2017

giuseppe commented Jan 11, 2017 •

edited

yuqi-zhang commented Jan 16, 2017 •

edited

yuqi-zhang commented Jan 17, 2017 •

edited

giuseppe commented Jan 20, 2017 •

edited