New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition causing sd_notify messages to get dropped. #2737
Comments
are they actually lost or in the journal lacking the keys you're using to look for it (like -u / _SYSTEMD_UNIT) try using -o json-pretty and looking for nearby messages this is probably an artifact of a well known bug, https://bugs.freedesktop.org/show_bug.cgi?id=50184 |
I'm not entirely sure, if it's getting totally lost or not because of this bug; I found it while trying to track down another bug with notify events getting lost. But they are getting lost :-) It's definitely related to that bug. It's at least conceptually the same, I'm not sure if it's the same code. (I'd thought that everything had been migrated to GitHub. Shows what happens when I make assumptions...) Saying "sometimes things from short-lived processes get lost" would be a sort-of acceptable caveat, except that systemd ships with a short-lived process for the purpose of sending these messages. As Lennart notes, the correct fix is to get the Kernel to send cgroup information. But, unlike the linked bug, a possible workaround exists here: create a separate socket for each unit. You could filter out messages from PIDs known to be in a different unit (although I suppose that introduces another race with PID reuse), but if the process has exited, just assume that it was in the right group, since it knew the correct value for |
If the process calling |
That only works if the process is root (perhaps that should be noted in the man page). |
Duplicate of #2739. Let's close this version. |
There's a bug in `systemd-notify` such that the command sends the message and then exits. Since its lifetime is so short, systemd doesn't have time to do the housekeeping to figure out which service unit to associate it with. The mitigation is to just open the socket and send the message directly ourselves, since our process is long-lived. See systemd/systemd#2737 Fixes #16
Submission type
[X] Bug report
[ ] Request for enhancement (RFE)
systemd version the issue has been seen with
229
Used distribution
Parabola GNU/Linux-libre (derivative of Arch Linux)
In case of bug report: Expected behaviour you didn't see
Call
sd_notify(3)
just before your process exits (as is done insystemd-notify(1)
); I expect the message to always make it to where it's going (and show up in the journal if applicable).In case of bug report: Unexpected behaviour you saw
Sometimes the message doesn't make it there, with the result of
log_warning("Cannot find unit for notify message of PID "PID_FMT".", ucred->pid);
showing up in the journal.In case of bug report: Steps to reproduce the problem
Call
systemd-notify(1)
repeatedly with something that will show up in the journal. A small percentage won't make it. This is because the the manager decides which units it applies to based on the cgroup string. And it decides the cgroup string by looking at/proc/${sending_pid}/cgroup
, which won't exist anymore if the sending process gets cleaned up before systemd gets to handling the message.It's tempting to say "well, the process is exiting anyway, so it probably doesn't matter if we lose it's last words," but
systemd-notify(1)
.The text was updated successfully, but these errors were encountered: