Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update flannel to use sd-notify #24

Closed
wants to merge 1 commit into from

Conversation

yuqi-zhang
Copy link
Contributor

Remove hacky loop to check for Docker configs, and update to use
sd-notify instead to alert systemd when flannel is ready.

@yuqi-zhang
Copy link
Contributor Author

Note that systemd will still prompt a daemon-reload for docker.

@rhatdan
Copy link
Member

rhatdan commented Jan 11, 2017

Nice, LGTM, if it works. :^)

@giuseppe
Copy link
Collaborator

great! The only disadvantage is that we will need runc 1.0. On my Fedora 25 I still see runc version 0.1.1

@rhatdan
Copy link
Member

rhatdan commented Jan 11, 2017

@lsm5 @runcom can we get the version of runc on Fedora 25 updated.

@yuqi-zhang
Copy link
Contributor Author

I believe the functionality was added to runc 0.9.0. I did some testing in f25 and it seemed to work fine with 0.1.1 (I put in a delay and docker waited for the sd-notify --ready). Unless you mean something else @giuseppe

@giuseppe
Copy link
Collaborator

giuseppe commented Jan 11, 2017

I cannot get it working on my machine.

I think I am hitting: systemd/systemd#2739

I can confirm that if I add a sleep() in systemd/src/notify/notify.c so to keep the process alive, then it works:

diff --git a/src/notify/notify.c b/src/notify/notify.c
index 70b6f86..3117277 100644
--- a/src/notify/notify.c
+++ b/src/notify/notify.c
@@ -190,6 +190,7 @@ int main(int argc, char* argv[]) {
         }
 
         r = sd_pid_notify(arg_pid ? arg_pid : getppid(), false, n);
+        sleep(5);
         if (r < 0) {
                 log_error_errno(r, "Failed to notify init system: %m");
                 goto finish;

If systemd-notify exits before systemd handles the message I have in the journald:

Jan 11 21:28:31 helium systemd[1]: Cannot find unit for notify message of PID 15778.

Also https://github.com/shishir-a412ed/runc-notify/blob/master/sd_notify.c works for me, as the programs doesn't exit after the call to sd_notify.

This was the reason I added the hacky loop in service.template :(

@yuqi-zhang
Copy link
Contributor Author

Strange, I cannot replicate that locally on f25, will take a deeper look first.

Although locally, for a simple run script I set up, the service seems to notify ready before I hit sd_notify, somewhat of the opposite of what you're seeing. I suppose using the c command may be better if it can avoid the potential race condition. Will take a look.

@yuqi-zhang
Copy link
Contributor Author

yuqi-zhang commented Jan 16, 2017

I still cannot replicate the systemd issue, but I changed it to use shishir's sd_notify and added the binary. Does this version work for you @giuseppe ?

Note: the service runs fine, but when I stop the service, the service goes into a failed state, namely:
Process: 17985 ExecStart=/bin/runc start flannel (code=exited, status=143)
This does not otherwise affect the service itself. Have you seen this error?

@giuseppe
Copy link
Collaborator

@yuqi-zhang might it depend from the retcode of sd-notify? What happens if it exits with success? It should do that on signals instead of "return 1" and "return 2".

@yuqi-zhang
Copy link
Contributor Author

yuqi-zhang commented Jan 17, 2017

Ah right, sorry let me fix that

Edit: Upon further inspection it doesn't seem that handling SIGTERM changes anything. The current upstream image also displays said behaviour (i.e. /bin/runc start flannel fails with 143) without the notify. I'm still investigating, but do you think it would be ok to just mask that code as a success @giuseppe ?

@yuqi-zhang
Copy link
Contributor Author

I opened a separate issue for the service failure for now.

As for systemd-notify, we currently have 3 ways to include it as per discussion with @giuseppe , either as a binary, as a .c to be built in the container from source, or we can include it as part of atomic itself. What do you think @rhatdan ?

@rhatdan
Copy link
Member

rhatdan commented Jan 18, 2017

I think adding this to atomic itself is the sanest thing. atomic-sdnotify. I was hoping we could do this in python, but I don's see any common bindings.

@lsm5
Copy link

lsm5 commented Jan 18, 2017

@rhatdan runc 1.0.0-rc3 has been in -testing for a week now. Has the CVE-2016-9962 fix too. Everyone, please add karma so we can get it into stable: https://bodhi.fedoraproject.org/updates/FEDORA-2017-0200646669

@giuseppe
Copy link
Collaborator

before we add another tool, @yuqi-zhang, could you please verify if there is any work in progress in systemd/systemd#2739? Fixing it in systemd would probably be the best solution and we can use directly systemd-notify

@rhatdan
Copy link
Member

rhatdan commented Jan 19, 2017

Looks like this stalled 9 months ago.

@yuqi-zhang
Copy link
Contributor Author

Yeah there doesn't seem to be a fix in the foreseeable future. Interestingly this should really only happen in non-root cases (unless I'm mis-understanding the conversation in the original thread). I still haven't replicated the original issue locally either, so I'm not even sure I should open a new issue in systemd or something along those lines...

@giuseppe
Copy link
Collaborator

it is a race condition (which I can reproduce almost all the time :)). Another solution would be to pass the PID used by runc inside the container itself, so that systemd-notify can use this PID instead of letting systemd guessing it and failing when the process is already gone.

@rhatdan
Copy link
Member

rhatdan commented Jan 20, 2017

Isn't the way this is supposed to work is that PID 1 of the container is supposed to call sd_notify, which triggeres runc to call sd_notify which passes it on to systemd?

@giuseppe
Copy link
Collaborator

from what I can see, all that runc does is to setup a bind mount to NOTIFY_SOCKET without intercepting anything.

@rhatdan
Copy link
Member

rhatdan commented Jan 20, 2017

Is that a bug? Does systemd verify that the processes that is notifying the socket is the child of systemd?

@giuseppe
Copy link
Collaborator

giuseppe commented Jan 20, 2017

It is controlled via NotifyAccess:

Controls access to the service status notification socket, as accessible via the sd_notify(3) call.Takes one of none (the default), main or all. If none, no daemon status updates are accepted from the service processes, all status update messages are ignored. If main, only service updates sent from the main process of the service are accepted. If all, all services updates from all members of the service's control group are accepted. This option should be set to open access to the notification socket when using Type=notify or WatchdogSec= (see above). If those options are used but NotifyAccess= is not configured, it will be implicitly set to main.

We can specify to systemd-notify the PID of the process that is ready, but inside the container we will have PID from the new PID namespace so it is not really helpful, we will need to pass this information from outside.

@rhatdan
Copy link
Member

rhatdan commented Jan 20, 2017

Well is PID1 of the container the same as runc's pid? runc just execs the pd and disappears, So as long as pid1 send the notify, it should work correctly.

@giuseppe
Copy link
Collaborator

well the issue with the race condition still exists. I am able to reproduce here, a sleep() in systemd-notify after sd_notify is called makes it work on my machine.
If we don't specify any PID then system probably tries to get the PID of the caller process via SO_PEERCRED, but by the time it does that then the process is already dead. We cannot specify the PID as the PID namespace is different so telling PID 1 doesn't work either. We will need to pass this from outside, or get it fixed in systemd.

@yuqi-zhang have you a chance to verify it with systemd guys?

giuseppe added a commit to giuseppe/runc that referenced this pull request Feb 3, 2017
The current support of systemd-notify has a race condition as the
message send to the systemd notify socket might be dropped if the sender
process is not running by the time systemd checks for the sender of the
datagram.  A proper fix of this in systemd would require changes to the
kernel to maintain the cgroup of the sender process when it is dead (but
it is not probably going to happen...)
Generally, the solution to this issue is to specify the PID in the
message itself so that systemd has not to guess the sender, but this
wouldn't work when running in a PID namespace as the container will pass
the PID known in its namespace (something like PID=1,2,3..) and systemd
running on the host is not able to map it to the runc service.

The proposed solution is to have a proxy in runc that forwards the
messages to the host systemd.

Example of this issue:

projectatomic/atomic-system-containers#24

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe
Copy link
Collaborator

giuseppe commented Feb 3, 2017

@yuqi-zhang could you try if it still work with opencontainers/runc#1308 for you? It solves the issue for me here, let's see if it will be accepted upstream :)

giuseppe added a commit to giuseppe/runc that referenced this pull request Feb 3, 2017
The current support of systemd-notify has a race condition as the
message send to the systemd notify socket might be dropped if the sender
process is not running by the time systemd checks for the sender of the
datagram.  A proper fix of this in systemd would require changes to the
kernel to maintain the cgroup of the sender process when it is dead (but
it is not probably going to happen...)
Generally, the solution to this issue is to specify the PID in the
message itself so that systemd has not to guess the sender, but this
wouldn't work when running in a PID namespace as the container will pass
the PID known in its namespace (something like PID=1,2,3..) and systemd
running on the host is not able to map it to the runc service.

The proposed solution is to have a proxy in runc that forwards the
messages to the host systemd.

Example of this issue:

projectatomic/atomic-system-containers#24

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/runc that referenced this pull request Feb 3, 2017
The current support of systemd-notify has a race condition as the
message send to the systemd notify socket might be dropped if the sender
process is not running by the time systemd checks for the sender of the
datagram.  A proper fix of this in systemd would require changes to the
kernel to maintain the cgroup of the sender process when it is dead (but
it is not probably going to happen...)
Generally, the solution to this issue is to specify the PID in the
message itself so that systemd has not to guess the sender, but this
wouldn't work when running in a PID namespace as the container will pass
the PID known in its namespace (something like PID=1,2,3..) and systemd
running on the host is not able to map it to the runc service.

The proposed solution is to have a proxy in runc that forwards the
messages to the host systemd.

Example of this issue:

projectatomic/atomic-system-containers#24

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/runc that referenced this pull request Feb 7, 2017
The current support of systemd-notify has a race condition as the
message send to the systemd notify socket might be dropped if the sender
process is not running by the time systemd checks for the sender of the
datagram.  A proper fix of this in systemd would require changes to the
kernel to maintain the cgroup of the sender process when it is dead (but
it is not probably going to happen...)
Generally, the solution to this issue is to specify the PID in the
message itself so that systemd has not to guess the sender, but this
wouldn't work when running in a PID namespace as the container will pass
the PID known in its namespace (something like PID=1,2,3..) and systemd
running on the host is not able to map it to the runc service.

The proposed solution is to have a proxy in runc that forwards the
messages to the host systemd.

Example of this issue:

projectatomic/atomic-system-containers#24

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@yuqi-zhang
Copy link
Contributor Author

@giuseppe the patch works for me as well. If it gets accepted I'll revert back to the CLI notify. Thanks for the patch!

@rhatdan
Copy link
Member

rhatdan commented Feb 7, 2017

@yuqi-zhang Add a comment on the runc patch please.

giuseppe added a commit to giuseppe/runc that referenced this pull request Feb 15, 2017
The current support of systemd-notify has a race condition as the
message send to the systemd notify socket might be dropped if the sender
process is not running by the time systemd checks for the sender of the
datagram.  A proper fix of this in systemd would require changes to the
kernel to maintain the cgroup of the sender process when it is dead (but
it is not probably going to happen...)
Generally, the solution to this issue is to specify the PID in the
message itself so that systemd has not to guess the sender, but this
wouldn't work when running in a PID namespace as the container will pass
the PID known in its namespace (something like PID=1,2,3..) and systemd
running on the host is not able to map it to the runc service.

The proposed solution is to have a proxy in runc that forwards the
messages to the host systemd.

Example of this issue:

projectatomic/atomic-system-containers#24

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/runc that referenced this pull request Feb 22, 2017
The current support of systemd-notify has a race condition as the
message send to the systemd notify socket might be dropped if the sender
process is not running by the time systemd checks for the sender of the
datagram.  A proper fix of this in systemd would require changes to the
kernel to maintain the cgroup of the sender process when it is dead (but
it is not probably going to happen...)
Generally, the solution to this issue is to specify the PID in the
message itself so that systemd has not to guess the sender, but this
wouldn't work when running in a PID namespace as the container will pass
the PID known in its namespace (something like PID=1,2,3..) and systemd
running on the host is not able to map it to the runc service.

The proposed solution is to have a proxy in runc that forwards the
messages to the host systemd.

Example of this issue:

projectatomic/atomic-system-containers#24

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/runc that referenced this pull request Feb 22, 2017
The current support of systemd-notify has a race condition as the
message send to the systemd notify socket might be dropped if the sender
process is not running by the time systemd checks for the sender of the
datagram.  A proper fix of this in systemd would require changes to the
kernel to maintain the cgroup of the sender process when it is dead (but
it is not probably going to happen...)
Generally, the solution to this issue is to specify the PID in the
message itself so that systemd has not to guess the sender, but this
wouldn't work when running in a PID namespace as the container will pass
the PID known in its namespace (something like PID=1,2,3..) and systemd
running on the host is not able to map it to the runc service.

The proposed solution is to have a proxy in runc that forwards the
messages to the host systemd.

Example of this issue:

projectatomic/atomic-system-containers#24

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe
Copy link
Collaborator

@yuqi-zhang the patch got merged into runc. We can use systemd-notify now.

Remove hacky loop to check for Docker configs, and update to use
sd-notify instead to alert systemd when flannel is ready.

Signed-off-by: Yu Qi Zhang <jerzhang@redhat.com>
@yuqi-zhang
Copy link
Contributor Author

@giuseppe I've reverted it to use the CLI sd-notify. Thanks for the patch in runc!

@giuseppe
Copy link
Collaborator

the patch LGTM, thanks for working on it.

Let's wait until the patched runc gets into Fedora before we merge this PR

@giuseppe
Copy link
Collaborator

@rh-atomic-bot r+ 18d69b4

@rh-atomic-bot
Copy link

⌛ Testing commit 18d69b4 with merge fb82b8b...

@rh-atomic-bot
Copy link

☀️ Test successful - status-redhatci
Approved by: giuseppe
Pushing fb82b8b to master...

@yuqi-zhang yuqi-zhang deleted the sd-notify branch June 8, 2017 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants