New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sometimes systemd-notify(3)
's cgroup="/", causing message to get dropped
#2739
Comments
I think that this is just a variant of #2737 . I'm not familiar enough with the relevant parts of the kernel to confirm, but I think what's happening is that systemd is managing to open the file in |
So |
About it not working properly for non-root users. So the next poor soul won't have to debug why it's not working. Alleviates systemd#2739
For those who are interested in this, please give https://github.com/zonque/systemd/commits/notify a spin. What it does (it lacks proper commit logs in its current state) is it turns the notification socket into a |
@LukeShu Thanks for suggestion. Problem was that we wanted some portable way to do this notification - without linking to some C library. But making binary of some other OS package SUID is even more hard to do in a way portable across distributions. So for now we've stopped on some scripting around |
@zonque Not totally a bad idea, but I think it way too dangerous. We may enter in a deadlock situation. Remember of #1505 (comment) But I think you may have a really good idea, we just have to create an extended version of sd_pid_notify_with_fds with the added parameter (bool async) and the old sd_pid_notify_with_fds function will be using the async socket (by default) |
@zonque i am a bit concerned about making the client side sync, @benjarobin has a point there... We had so much trouble with sd_notify becoming sync to due to full buffers, and that makes me really afraid of making it always sync... I fear systemd-notify isn't particularly useful in --user mode, but I have no idea how to fix that properly without getting SCM_CGROUP or so in the kernel... |
@poettering I think the idea of @zonque is a really good starting point, and as you said the only other solution is getting SCM_CGROUP in the kernel (which may not be for tomorrow)... |
- due to systemd/systemd#2739, the systemd-notify utility has issues when running as a non-root user. Dropping back to a simple service until a new solution can be figured out
@poettering @zonque is there any update to this issue? I believe we are hitting some variation of this at projectatomic/atomic-system-containers#24, where it seems that a race condition is causing the notification process to die before the system can process the ready signal. |
So systemd attempts to authenticate the sending PID of the message on the notify socket (not the MAINPID in the message) using the SCM_CREDENTIALS ancillary message. And if the sending process has terminated and its PID is no longer in the cgroup corresponding to the service identified by MAINPID, then systemd refuses to process the message. Correct? If so, it looks like the problem has been fixed if the systemd-notify process has CAP_SYS_ADMIN. I observe the following using strace of "systemd-notify --ready --pid=14529":
Note that systemd-notify has encoded 14529 in the message contents as MAINPID=14529. And it has also set the value of cmsg_data.pid in the SCM_CREDENTIALS ancillary message. So I am able to notify on behalf of PID 14529 and systemd cannot see the difference between PID 14529 sending it and the systemd-notify PID sending it. Is that correct? The reason I ask is that I have a system with an older version of systemd (Fedora 20, systemd-208-31) that is not sending SCM_CREDENTIALS. I see the following for "systemd-notify --ready --pid=14201" on that box:
@poettering It looks like the following commit introduced the client side behavior of sending credentials. If I write a client that sets credentials as you do in this commit in sd_pid_notify, will an older systemd authenticate my client? Or did you also add support on the systemd side for this? I don't want to write a client to make this work on the old systemd-208 box if systemd itself is not using this authentication mechanism in 208.
|
@poettering You can ignore my question in the prevous post. I wrote a replacement systemd-notify implementation as described and it works with the Fedora 20 systemd-208 version systemd. |
This adds the sd_notify_barrier function, to allow users to synchronize against the reception of sd_notify(3) status messages. It acts as a synchronization point, and a successful return gurantees that all previous messages have been consumed by the manager. This can be used to eliminate race conditions where the sending process exits too early for systemd to associate its PID to a cgroup and attribute the status message to a unit correctly. systemd-notify now uses this function for proper notification delivery and be useful for NotifyAccess=all units again in user mode, or in cases where it doesn't have a control process as parent. Fixes: systemd#2739
This adds the sd_notify_barrier function, to allow users to synchronize against the reception of sd_notify(3) status messages. It acts as a synchronization point, and a successful return gurantees that all previous messages have been consumed by the manager. This can be used to eliminate race conditions where the sending process exits too early for systemd to associate its PID to a cgroup and attribute the status message to a unit correctly. systemd-notify now uses this function for proper notification delivery and be useful for NotifyAccess=all units again in user mode, or in cases where it doesn't have a control process as parent. Fixes: systemd#2739
Submission type
[x] Bug report
[ ] Request for enhancement (RFE)
systemd version the issue has been seen with
229
Used distribution
Parabola GNU/Linux-libre (derivative of Arch Linux)
In case of bug report: Expected behaviour you didn't see
I expected
systemd-notify READY=1
to mark my unit as 'active' instead of 'activating'. Since a couple of weeks ago, it usually doesn't for certain units.In case of bug report: Unexpected behaviour you saw
Systemd doesn't know what to do with the message because when it opens
/proc/$pid/cgroup
, it contains a line1:name=systemd:/
(discovered by putting some debug statements insrc/basic/cgroup-util.c
). Which is weird. If I start a shell process in the unit, and verify that the line in/proc
makes sense, thenexec systemd-notify READY=1
(so that the PID is the same), it still happens; so something is causing the cgroup to revert to being empty.In case of bug report: Steps to reproduce the problem
I honestly don't have clear instructions, my attempts at creating a minimal failure case have left me scratching my head. It happens mostly-consistently with my WMII user-unit, but when I try cutting out things to help reproduce, it starts working. Or, because it's inconsistent, I think it's started working.
The text was updated successfully, but these errors were encountered: