Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sometimes systemd-notify(3)'s cgroup="/", causing message to get dropped #2739

Closed
LukeShu opened this issue Feb 25, 2016 · 11 comments · Fixed by #15547
Closed

Sometimes systemd-notify(3)'s cgroup="/", causing message to get dropped #2739

LukeShu opened this issue Feb 25, 2016 · 11 comments · Fixed by #15547
Labels

Comments

@LukeShu
Copy link
Contributor

LukeShu commented Feb 25, 2016

Submission type

[x] Bug report

[ ] Request for enhancement (RFE)

systemd version the issue has been seen with

229

Used distribution

Parabola GNU/Linux-libre (derivative of Arch Linux)

In case of bug report: Expected behaviour you didn't see

I expected systemd-notify READY=1 to mark my unit as 'active' instead of 'activating'. Since a couple of weeks ago, it usually doesn't for certain units.

In case of bug report: Unexpected behaviour you saw

Systemd doesn't know what to do with the message because when it opens /proc/$pid/cgroup, it contains a line 1:name=systemd:/ (discovered by putting some debug statements in src/basic/cgroup-util.c). Which is weird. If I start a shell process in the unit, and verify that the line in /proc makes sense, then exec systemd-notify READY=1 (so that the PID is the same), it still happens; so something is causing the cgroup to revert to being empty.

In case of bug report: Steps to reproduce the problem

I honestly don't have clear instructions, my attempts at creating a minimal failure case have left me scratching my head. It happens mostly-consistently with my WMII user-unit, but when I try cutting out things to help reproduce, it starts working. Or, because it's inconsistent, I think it's started working.

@LukeShu
Copy link
Contributor Author

LukeShu commented Feb 25, 2016

I think that this is just a variant of #2737 . I'm not familiar enough with the relevant parts of the kernel to confirm, but I think what's happening is that systemd is managing to open the file in /proc while it's still there, but between opening it and it reading it, the process exits/is cleaned up, and the kernel forgets its cgroups.

@binarin
Copy link

binarin commented Mar 10, 2016

So systemd-notify is useless for non-root users (or without CAP_SYS_ADMIN capability). There was a suggestion to add some sleep to systemd-notify 2 years ago - https://lists.freedesktop.org/archives/systemd-devel/2014-April/018876.html
But it was ignored, so probably we should wait for SCM_GROUPS mentioned there. Or wait for the termal death of the universe. Or wait for something else. And suffer during all this time.

binarin added a commit to binarin/systemd that referenced this issue Mar 11, 2016
About it not working properly for non-root users. So the next poor soul
won't have to debug why it's not working.

Alleviates systemd#2739
@zonque
Copy link
Member

zonque commented Mar 17, 2016

For those who are interested in this, please give https://github.com/zonque/systemd/commits/notify a spin. What it does (it lacks proper commit logs in its current state) is it turns the notification socket into a SOCK_SEQPACKET one, but leaves the old one around for compat reasons. By only sending the reply after the message has been dispatched on the daemon side, and by waiting for that reply on the client side, the race condition goes away. I'm happy to discuss the merits and the possible drawbacks of this approach.

@LukeShu
Copy link
Contributor Author

LukeShu commented Apr 5, 2016

@zonque Very cool. I'll check it out when I have a chance.

@binarin What about making systemd-notify SUID in the meantime?

@binarin
Copy link

binarin commented Apr 5, 2016

@LukeShu Thanks for suggestion. Problem was that we wanted some portable way to do this notification - without linking to some C library. But making binary of some other OS package SUID is even more hard to do in a way portable across distributions.

So for now we've stopped on some scripting around socat, and it works pretty well.

@benjarobin
Copy link
Contributor

@zonque Not totally a bad idea, but I think it way too dangerous. We may enter in a deadlock situation. Remember of #1505 (comment)
journald must no be blocking for any reason on the notify socket.

But I think you may have a really good idea, we just have to create an extended version of sd_pid_notify_with_fds with the added parameter (bool async) and the old sd_pid_notify_with_fds function will be using the async socket (by default)

@poettering
Copy link
Member

@zonque i am a bit concerned about making the client side sync, @benjarobin has a point there... We had so much trouble with sd_notify becoming sync to due to full buffers, and that makes me really afraid of making it always sync...

I fear systemd-notify isn't particularly useful in --user mode, but I have no idea how to fix that properly without getting SCM_CGROUP or so in the kernel...

@benjarobin
Copy link
Contributor

@poettering I think the idea of @zonque is a really good starting point, and as you said the only other solution is getting SCM_CGROUP in the kernel (which may not be for tomorrow)...
We just need to improve his idea and only use the sync interface for a really small number of elements/applications, and in the documentation of this new interface clearly explain the downside and the problem that can happen of using the sync interface...

acburdine added a commit to TryGhost/Ghost-CLI that referenced this issue Nov 18, 2016
- due to systemd/systemd#2739, the systemd-notify utility has issues when running as a non-root user. Dropping back to a simple service until a new solution can be figured out
@yuqi-zhang
Copy link

@poettering @zonque is there any update to this issue?

I believe we are hitting some variation of this at projectatomic/atomic-system-containers#24, where it seems that a race condition is causing the notification process to die before the system can process the ready signal.

@jharvell
Copy link

jharvell commented Apr 4, 2017

So systemd attempts to authenticate the sending PID of the message on the notify socket (not the MAINPID in the message) using the SCM_CREDENTIALS ancillary message. And if the sending process has terminated and its PID is no longer in the cgroup corresponding to the service identified by MAINPID, then systemd refuses to process the message. Correct?

If so, it looks like the problem has been fixed if the systemd-notify process has CAP_SYS_ADMIN. I observe the following using strace of "systemd-notify --ready --pid=14529":

socket(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 3
getsockopt(3, SOL_SOCKET, SO_SNDBUF, [212992], [4]) = 0
setsockopt(3, SOL_SOCKET, SO_SNDBUFFORCE, [8388608], 4) = 0
getuid()                                = 0
getgid()                                = 0
sendmsg(3, {msg_name={sa_family=AF_UNIX, sun_path="/run/systemd/notify"}, msg_namelen=21, msg_iov=[{iov_base="READY=1\nMAINPID=14529", iov_len=21}], msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS, cmsg_data={pid=14529, uid=0, gid=0}}], msg_controllen=32, msg_flags=0}, MSG_NOSIGNAL) = 21
close(3)                                = 0

Note that systemd-notify has encoded 14529 in the message contents as MAINPID=14529. And it has also set the value of cmsg_data.pid in the SCM_CREDENTIALS ancillary message. So I am able to notify on behalf of PID 14529 and systemd cannot see the difference between PID 14529 sending it and the systemd-notify PID sending it. Is that correct?

The reason I ask is that I have a system with an older version of systemd (Fedora 20, systemd-208-31) that is not sending SCM_CREDENTIALS. I see the following for "systemd-notify --ready --pid=14201" on that box:

socket(PF_LOCAL, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 3
sendmsg(3, {msg_name(35)={sa_family=AF_LOCAL, sun_path=@"/org/freedesktop/systemd1/notify"}, msg_iov(1)=[{"READY=1\nMAINPID=14201", 21}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 21
close(3)                                = 0

@poettering It looks like the following commit introduced the client side behavior of sending credentials. If I write a client that sets credentials as you do in this commit in sd_pid_notify, will an older systemd authenticate my client? Or did you also add support on the systemd side for this? I don't want to write a client to make this work on the old systemd-208 box if systemd itself is not using this authentication mechanism in 208.

commit be8f4e9e8eb3b0c34a49c2e80a5c5b7dc6d175f0
Author: Lennart Poettering <lennart@poettering.net>
Date:   Thu Jun 5 17:05:18 2014 +0200

    sd-daemon: introduce sd_pid_notify() and sd_pid_notifyf()
    
    sd_pid_notify() operates like sd_notify(), however operates on a
    different PID (for example the parent PID of a process).
    
    Make use of this in systemd-notify, so that message are sent from the
    PID specified with --pid= rather than the usually shortlived PID of
    systemd-notify itself.
    
    This should increase the likelyhood that PID 1 can identify the cgroup
    that the notification message was sent from properly.

M       src/libsystemd/sd-daemon/sd-daemon.c
M       src/notify/notify.c
M       src/shared/util.c
M       src/systemd/sd-daemon.h

@jharvell
Copy link

jharvell commented Apr 5, 2017

@poettering You can ignore my question in the prevous post. I wrote a replacement systemd-notify implementation as described and it works with the Fedora 20 systemd-208 version systemd.

kkdwivedi added a commit to kkdwivedi/systemd that referenced this issue Apr 30, 2020
This adds the sd_notify_barrier function, to allow users to synchronize against
the reception of sd_notify(3) status messages. It acts as a synchronization
point, and a successful return gurantees that all previous messages have been
consumed by the manager. This can be used to eliminate race conditions where
the sending process exits too early for systemd to associate its PID to a
cgroup and attribute the status message to a unit correctly.

systemd-notify now uses this function for proper notification delivery and be
useful for NotifyAccess=all units again in user mode, or in cases where it
doesn't have a control process as parent.

Fixes: systemd#2739
kkdwivedi added a commit to kkdwivedi/systemd that referenced this issue Apr 30, 2020
This adds the sd_notify_barrier function, to allow users to synchronize against
the reception of sd_notify(3) status messages. It acts as a synchronization
point, and a successful return gurantees that all previous messages have been
consumed by the manager. This can be used to eliminate race conditions where
the sending process exits too early for systemd to associate its PID to a
cgroup and attribute the status message to a unit correctly.

systemd-notify now uses this function for proper notification delivery and be
useful for NotifyAccess=all units again in user mode, or in cases where it
doesn't have a control process as parent.

Fixes: systemd#2739
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

7 participants