New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce sd_notify_barrier #15547
Introduce sd_notify_barrier #15547
Conversation
hmm, interesting concept. can you elaborate on the precise usecase for you? trying to wrap my head around why one would want this. (not saying it was a bad idea to have, just wondering) |
hmm, and I think there should be an explicit property that should be send here, just to keep our options open, and so that the host side can recognize this case explicitly, and no generate a warning about such messages (i mean pid1 has every right to log warnings about peers sending it arbitrary fds it doesn't expect, hence let's make sure this kind of message is recognizable properly so that the fd is expected) |
I believe the use of this call eliminates any race conditions involving lookup on the cgroup of the client by PID after it is already gone, because we wait until the fd is closed on the manager end (which would mean our previous messages have already been read and processed by then). The parent PID trick has been quite helpful in case of systemd-notify, but it's still not a complete fix (and only works with enough privileges). Using this call should mean NotifyAccess=all begins working reliably again (if you're not the child of a control process and you exit too quickly). The linked bug really shows one other corner case of systemd-notify being invoked in a PID namespace, but talking to the host NOTIFY_SOCKET. You may consider that unsupported, but for a container payload it seems like valid usage (and it would work with SCM_CGROUPS anyway). Ofcourse, anything that has systemd as its client or is long-lived should use the current async API and shouldn't need this. I'll be pushing with the suggested changes soon. |
8ea8cab
to
763fec3
Compare
looks pretty good. man page would be good to have for this too though... |
@poettering Thanks for the review. I've been wondering about one other thing: Currently the BARRIER=1 command is not allowed to be sent with anything else, but it is only problematic when sent with FDSTORE=1, as then we're not able to discern which one to close and which ones to keep (unless we decide on closing the first one in the fd array and let others be kept). Would it be better to only ignore in case it's been sent with FDSTORE=1 (and then everything else sent with them is also not processed)? This has the advantage that we're able to send READY=1 or potentially anything other than FDSTORE=1 with BARRIER=1 in one sendmsg call and determine when it's picked up, as the destructor for fd set is invoked after the notify_message callback is made, but this could be relying too much on the implementation details... |
I must say I kinda like the way you did it right now, i.e. that its a message of its own, and cannot be combined with anything else. Because it makes the ordering very clear, everything you issue before is processed before, everything you issue after is processed after. If we'd allow doing other stuff and the barrier in one msg that property is a bit lost, since suddenly the barrier itself also can carry other operations for which the order isn't as clear. Moreover, I think it kinda pushes people towards enqueuing stuff and then syncing on it, rather than possibly just syncing along with messages all the time, dunno. i.e. there's benefit in making the barrier an extra function you need to call instead of just making it a slightly different flavour of a call ou need to make anyway. But of course this is all just philosophical. Anyway, please leave this concept as it is, let's declare that there are barrier messages and payload messages, but not mixed messages. |
763fec3
to
c047b75
Compare
c047b75
to
b685b72
Compare
Pushed addressing everything but the --no-block behaviour. I think the new change is better, since it is surprising if systemd-notify --ready doesn't work when directly invoked from PID 1. I want to know if the attestation using ppid when --no-block is not used is a good idea still, given the only reason it was added was to avoid races when NotifyAccess=all was used, which should now be gone. Anyone using --no-block is then asking for the race to happen (they can use --pid without an argument to attest the message using their PID, but that sets MAINPID= too). So either we drop the whole thing altogether, or do it as a second best effort thing in case of --no-block (but that still means |
Prompted by the discussions on systemd#15547.
Prompted by the discussions on #15547.
Could you rebase now that #15642 is merged? |
BTW, you could add "Fixes #2739" to your commit msgs somewhere. |
b685b72
to
d208997
Compare
I rearranged the documentation again reflecting your --pid= additions. |
excellent! lgtm! |
Hmmm. Needs another rebase, sorry |
This adds the sd_notify_barrier function, to allow users to synchronize against the reception of sd_notify(3) status messages. It acts as a synchronization point, and a successful return gurantees that all previous messages have been consumed by the manager. This can be used to eliminate race conditions where the sending process exits too early for systemd to associate its PID to a cgroup and attribute the status message to a unit correctly. systemd-notify now uses this function for proper notification delivery and be useful for NotifyAccess=all units again in user mode, or in cases where it doesn't have a control process as parent. Fixes: systemd#2739
Add note for change of behaviour in systemd-notify, where parent pid trick is only used when --no-block is passed, and with enough privileges ofcourse. Also, fix a small error in systemd(1).
d208997
to
5ec7a99
Compare
There you go. |
- This function is called for each message, and almost all of them will not contain BARRIER=1. So 5 comparaisons are realized in previous code. Now only one strstr is used. - If BARRIER=1 is found, drop the message in all case. So always return true when BARRIER=1 is found at the beginning of a new line Follow up of systemd#15547
This is just a proposal I'm posting for review and feedback, it isn't really in its final form yet. It just outlines the basic idea.
sd_notify_barrier allows the caller to synchronize against the reception of messages by systemd, and ensure that all notification changes have been picked up (whether they have been processed successfully or not, e.g. in case of FDSTORE=1 cannot be determined by this) which were sent before the barrier call. This relies on a few guarantees provided by the notification socket and systemd, i.e. SOCK_DGRAM means messages will be processed in order, and systemd closes all foreign file descriptors it receives that are not sent with FDSTORE=1 (which is necessary anyway).
The write end of a pipe is sent to the manager over the notification socket, and client polls the read end for POLLHUP. Once poll returns successfully, we indicate success, otherwise, the timeout field allows for a graceful fallback in cases where systemd isn't responding (probably due to a deadlock).
There are a few questions I have:
The advantages over previous approaches are:
Fixes: #2739