Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

notify_socket.go: use sd_notify_barrier mechanism #3291

Merged
merged 2 commits into from Dec 8, 2022

Conversation

indyjo
Copy link

@indyjo indyjo commented Nov 22, 2021

Add synchronization to how runc reports container readiness to systemd.

It was previously possible for runc to terminate immediately after sending the READY signal. This was sometimes too quick for systemd to find out the sender's cgroup. As a consequence, systemd would ignore the READY signal.

Fixes: #3293

notify_socket.go Outdated Show resolved Hide resolved
notify_socket.go Outdated Show resolved Hide resolved
notify_socket.go Outdated Show resolved Hide resolved
@AkihiroSuda
Copy link
Member

Please squash commits

@kolyshkin
Copy link
Contributor

Overall this looks good to me. Is there any way to have a test for this @indyjo ?

@indyjo
Copy link
Author

indyjo commented Nov 25, 2021

What I can imagine is to write a test against a mocked NOTIFY_SOCKET server, i.e. a mocked systemd. Concerning how to do it I think I need some guidance. I could try adding a notify_socket_test.go alongside notify_socket.go. I also noticed there are "bats tests" in tests/integration, but I'm completely unfamiliar with that framework. What do you think would be a good approach?

Also, my code is currently assuming that whoever listens on the other end of NOTIFY_SOCKET actually understands and honors the BARRIER=1 protocol. This is certainly the case with recent systemd versions (the feature seems to be about two years old). Other implementations might not. For example, sdnotify-proxy is too simple. Currently, this would lead to runc running into the 5 second timeout and failing to start a container.

Is this behavior a) accepted or should I b) ignore the timeout case or should there be c) a command-line option to suppress the sd_notify_barrier feature?

@kolyshkin @AkihiroSuda

@indyjo indyjo force-pushed the fix/sd-notify-barrier branch 2 times, most recently from 0ebabd7 to f317259 Compare November 26, 2021 14:23
@indyjo
Copy link
Author

indyjo commented Nov 26, 2021

I have added a test (notify_socket_test.go). The question of how to deal with sd_notify_barrier timeouts is still open. crun waits for 30 seconds and then ignores the error (see PR). systemd-notify waits for 5 seconds and then fails (source).

notify_socket.go Outdated Show resolved Hide resolved
notify_socket.go Outdated Show resolved Hide resolved
notify_socket.go Outdated Show resolved Hide resolved
notify_socket.go Outdated Show resolved Hide resolved
notify_socket_test.go Outdated Show resolved Hide resolved
notify_socket.go Outdated
Comment on lines 156 to 167
var out bytes.Buffer
_, err := out.Write(ready)
if err != nil {
return err
}

_, err = client.Write(out.Bytes())
if err != nil {
return err
}
_, err = out.Write([]byte{'\n'})
if err != nil {
return err
}

// now we can inform systemd to use pid1 as the pid to monitor
newPid := "MAINPID=" + strconv.Itoa(pid1)
_, err := client.Write([]byte(newPid + "\n"))
if err != nil {
return err
}
return nil
}
_, err = client.Write(out.Bytes())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, wonder why we avoid using bytes.Buffer at all and don't just do something like

_, err = client.WriteString(string(ready)+"\n")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no WriteString on client but of course we can change to:

_, err = io.WriteString(client, string(ready) + "\n")

or, more directly:

_, err = client.Write([]byte(string(ready) + "\n"))

or, more efficient but mutating the array backing ready:

_, err = client.Write(append(ready, '\n'))

Any preference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, wonder why we avoid using bytes.Buffer at all

Looking at git history, this was done in #1308 but I don't see why.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_, err = client.Write(append(ready, '\n'))

This one looks fine to me (most probably there's \n already anyway at that position), as we are not reusing the buffer anywhere.

Can you implement it (as a separate first commit, I guess)?

@kolyshkin
Copy link
Contributor

@indyjo thanks for the update! Can you please

  • reverse the order of commits (first remove bytes.Buffer usage, then add your changes)
  • add Signed-off-by to both commits

@kolyshkin
Copy link
Contributor

  • reverse the order of commits (first remove bytes.Buffer usage, then add your changes)

I'm asking for this because it makes the git log -p smaller (and, consequently, it will be easier to figure out what was changed, especially a few years from now).

@indyjo
Copy link
Author

indyjo commented Dec 10, 2021

Thanks for clarifying, @kolyshkin! The two commits are now in the preferred order.

Should we discuss how to handle the case where the supervising process doesn't (correctly) handle the sd_notify_barrier mechanism and runc fails with a timeout? I'm leaning towards simply doing what crun does, although the 30s timeout seems extreme.

@kolyshkin
Copy link
Contributor

I'm leaning towards simply doing what crun does

I tend to agree.

although the 30s timeout seems extreme.

I guess on an overloaded system 5s might not be enough, so the timeout should be way above the reasonable expected reply time.

But let's ask @giuseppe -- why you ended up with 30s timeout?

@giuseppe
Copy link
Member

But let's ask @giuseppe -- why you ended up with 30s timeout?

I've copied the timeout you set here: https://github.com/opencontainers/runc/blob/master/libcontainer/cgroups/systemd/common.go#L345 😄

Although I've decided to ignore any error, so it is equivalent to the version without sd_notify_barrier where it would not raise an error and because I am not sure as well what a correct value would be for the timeout

@kolyshkin
Copy link
Contributor

@AkihiroSuda PTAL

@kolyshkin kolyshkin added this to the 1.2.0 milestone Feb 16, 2022
@kolyshkin
Copy link
Contributor

@opencontainers/runc-maintainers PTAL

@kolyshkin
Copy link
Contributor

@opencontainers/runc-maintainers @AkihiroSuda PTAL

@MofX
Copy link

MofX commented Mar 14, 2022

Does it really make sense to implement the barrier-machanism inside of runc?

  1. This will fail if systemd is too old (I guess this does not matter, because it only triggers a warning?)
  2. For non-detached containers, if systemd-notify or any other binary, that uses sd_notify_barrier is used inside of the container it will fail, because the pipe and the barrier message is not relayed to systemd leading to a timeout in sd_notify_barrier.
  3. For detached containers, sd_notify_barrier can or cannot timeout, depending on a race. If I don't miss anything, runc terminates, after it received the ready message (and relayed it to systemd). After the patch it will also do the barrier synchronization. A process using sd_notify_barrier will most likely work with this PR, if it is able to send the barrier message with it's own pipe fd before runc terminates. I think the pipe handles are closed, when runc terminates (or on read). But if runc terminates before it read the BUFFER message and the pipe fd, sd_notify_barrier will timeout.

My implementation idea would be to also relay SCM_RIGHT data and let the client and systemd handle the synchronization transparently.
I do not have a real idea, how to implement this for detached containers, because runc now has to run until after barrier-pipe was closed by systemd. runc can only know this, if it intercepts the barrier message and injects it's own pipe. But when can it terminate if no barrier message is sent by the client? Maybe after the timeout?

@indyjo
Copy link
Author

indyjo commented Apr 8, 2022

Hi @MofX, let me give my point of view. And sorry about it taking so long.

First of all, whether this patch ends up in an official release or not is not absolutely critical for me. The company I work for, KUKA, maintains their own patched version of runc. We have made good experiences so far, but of course we only test a very limited scenario.

  1. This will fail if systemd is too old (I guess this does not matter, because it only triggers a warning?)

Probably true. The protocol basically mandates the use of the barrier, though. There is no way for a client to query whether it's supported.

Concerning your point 2 and 3:

I don't know enough about runc to understand the implications of detached and detached containers. It is true, however, that systemd-notify inside a container can still fail. That's because this patch doesn't address the barrier mechanism between runc and the container, but between runc and its supervising process.

Actually, we never observed any timeout happening in systemd-notify. The way systemd-notify may fail is that sometimes runc has already closed the connection directly after receiving the READY state, causing a "Connection refused" when systemd-notify tries to establish the barrier. We usually disable the barrier mechanism for this reason, when running systemd-notify in a container, using the --no-block option. That's a separate issue, though.

@indyjo
Copy link
Author

indyjo commented May 9, 2022

@kolyshkin May I ask if there's a chance of including this in v1.1.2?

@kolyshkin
Copy link
Contributor

@kolyshkin May I ask if there's a chance of including this in v1.1.2?

The target milestone is set for 1.2.0 (which I hope will be released some time in the summer). We can definitely do a backport to 1.1.x once this is merged, and so it will also be included into latest 1.1.x (whatever that would be).

We need more reviews on this, @opencontainers/runc-maintainers PTAL 🙏🏻

@kolyshkin
Copy link
Contributor

@indyjo can you please rebase it (this will also re-run the CI on golang 1.18)?

@indyjo
Copy link
Author

indyjo commented May 12, 2022

@indyjo can you please rebase it (this will also re-run the CI on golang 1.18)?

@kolyshkin Technically, it's done. There's a failing check though: ci / centos-7.

@indyjo
Copy link
Author

indyjo commented Nov 12, 2022

Friendly reminder that there's still a chance for this baby to be merged before its first birthday! 🎂
Is there anything I can help with? @kolyshkin @thaJeztah @cyphar @AkihiroSuda

AkihiroSuda
AkihiroSuda previously approved these changes Nov 12, 2022
@AkihiroSuda
Copy link
Member

The last rebase was 6 months ago, could you rebase again to ensure CI is still green? Thanks

@AkihiroSuda AkihiroSuda added the status/merge-on-green Merge on CI green label Nov 12, 2022
Jonas Eschenburg added 2 commits November 14, 2022 10:41
Signed-off-by: Jonas Eschenburg <jonas.eschenburg@kuka.com>
Signed-off-by: Jonas Eschenburg <jonas.eschenburg@kuka.com>
@indyjo
Copy link
Author

indyjo commented Nov 14, 2022

Rebased and green. @AkihiroSuda @kolyshkin

@indyjo
Copy link
Author

indyjo commented Dec 8, 2022

@kolyshkin Just a friendly reminder

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kolyshkin kolyshkin merged commit e7461c8 into opencontainers:main Dec 8, 2022
@cyphar cyphar mentioned this pull request Mar 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

READY notification sometimes not accepted by systemd
6 participants