Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log output can be lost from services using log namespaces that only produce output immediately before exiting #32604

Closed
richardmaw-codethink opened this issue May 1, 2024 · 2 comments · Fixed by #32619
Labels
bug 🐛 Programming errors, that need preferential fixing journal pid1

Comments

@richardmaw-codethink
Copy link
Contributor

richardmaw-codethink commented May 1, 2024

systemd version the issue has been seen with

256~rc1-gf19076e^

Used distribution

Fedora 39

Linux kernel version used

kernel-6.8.7-200.fc39.x86_64

CPU architectures issue was seen on

x86_64

Component

systemd, systemd-journald

Expected behaviour you didn't see

$ systemd-run --wait -p LogNamespace=foobar echo 'hello world'
$ systemd-run --wait -p LogNamespace=foobaz echo 'hello world'
$ journalctl --list-namespaces
foobar
foobaz

Unexpected behaviour you saw

$ systemd-run --wait -p LogNamespace=foobar echo 'hello world'
$ systemd-run --wait -p LogNamespace=foobaz echo 'hello world'
$ journalctl --list-namespaces
foobar

Steps to reproduce the problem

Run the TEST-44-LOG-NAMESPACE test with the log level set to debug on moderately slow hardware a few times, or get unlucky in CI.

Additional program output to the terminal or log subsystem illustrating the issue

See https://gist.github.com/richardmaw-codethink/05cce7763ad30a6009053961359282c0 for full logs.

I think the relevant logs are:

Apr 30 18:32:39 H testsuite-44.sh[586]: + systemd-run --wait -p LogNamespace=foobaz echo 'hello world'
Apr 30 18:32:39 H systemd[1]: run-u9.service: Forked /usr/bin/echo as 596 (without CLONE_INTO_CGROUP)
Apr 30 18:32:39 H systemd[1]: Started run-u9.service - /usr/bin/echo "hello world".
Apr 30 18:32:39 H systemd[1]: Child 596 (echo) died (code=exited, status=0/SUCCESS)
Apr 30 18:32:39 H systemd[1]: run-u9.service: Child 596 belongs to run-u9.service.
Apr 30 18:32:39 H systemd[1]: run-u9.service: Changed running -> dead
Apr 30 18:32:39 H systemd[1]: systemd-journald@foobaz.socket: Unit is not needed anymore.
Apr 30 18:32:39 H systemd[1]: systemd-journald@foobaz.socket: Trying to enqueue job systemd-journald@foobaz.socket/stop/fail
Apr 30 18:32:39 H systemd[1]: systemd-journald@foobaz.socket: Installed new job systemd-journald@foobaz.socket/stop as 847
Apr 30 18:32:39 H systemd[1]: systemd-journald@foobaz.socket: Enqueued job systemd-journald@foobaz.socket/stop as 847
Apr 30 18:32:39 H systemd[1]: systemd-journald@foobaz.socket: Incoming traffic
Apr 30 18:32:39 H systemd[1]: systemd-journald@foobaz.socket: Suppressing connection request since unit stop is scheduled.

The systemd-journald@foobaz.socket "Unit is not needed anymore" is because systemd-journald@.service has StopWhenUnneeded=yes and before systemd-journald@foobaz.service is started the only service with Requires=systemd-journald@foobaz.socket is run-u9.service

There isn't an accompanying Apr 30 18:32:40 H systemd[1]: Received SIGCHLD from PID 596 ((echo)).,
for this exit, the closest is Apr 30 18:32:39 H systemd[1]: Received SIGCHLD from PID 581 (systemd-machine). which is from systemd-machine-id-commit.service, which is still running because there's no dependency on after firstboot.

This may imply something subtle with handling multiple SIGCHLD events in calls to manager_dispatch_sigchld when manager_dispatch_signal_fd returns a SIGCHLD event,
but I suspect it's just that systemd was kept too busy to handle the incoming IO on the journald stdio socket before run-u9.service exited, and since SIGCHLD has priority over incoming IO the socket is shut down instead of started.

It's possible that explicitly starting journald for namespaces (rather than leaving it to socket activation) would fix this, but I'm not familiar enough with the code to say whether journald could still exit for being idle if there's a long running service that only produces output immediately before exiting.

@richardmaw-codethink richardmaw-codethink added the bug 🐛 Programming errors, that need preferential fixing label May 1, 2024
@richardmaw-codethink
Copy link
Contributor Author

I think this is subtly different from #32539 since it's not that the log message is currently in-flight, it's that systemd never gives the journal the chance to see it.

yuwata added a commit to yuwata/systemd that referenced this issue May 1, 2024
…nits

Otherwise, if a service unit that requests LogNamespace= stopped before
systemd-journald@.service is started, logs generated by the service will be
lost, as systemd-journald@.socket is stopped and
systemd-journald@.service will never started.

To prevernt the issue, let's introduce another implicit dependency to
a oneshot service that explicity synchronizes a namespaced journal file
when the log namespace is not needed anymore.

Fixes systemd#32604.
yuwata added a commit to yuwata/systemd that referenced this issue May 1, 2024
…nits

Otherwise, if a service unit that requests LogNamespace= stopped before
systemd-journald@.service is started, logs generated by the service will be
lost, as systemd-journald@.socket is stopped and
systemd-journald@.service will never started.

To prevent the issue, let's introduce another implicit dependency to
a oneshot service that explicitly synchronizes a namespaced journal file
when the log namespace is not needed anymore.

Fixes systemd#32604.
@yuwata
Copy link
Member

yuwata commented May 1, 2024

@richardmaw-codethink Could you test #32619?

bluca pushed a commit that referenced this issue May 2, 2024
…nits

Otherwise, if a service unit that requests LogNamespace= stopped before
systemd-journald@.service is started, logs generated by the service will be
lost, as systemd-journald@.socket is stopped and
systemd-journald@.service will never started.

To prevent the issue, let's introduce another implicit dependency to
a oneshot service that explicitly synchronizes a namespaced journal file
when the log namespace is not needed anymore.

Fixes #32604.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Programming errors, that need preferential fixing journal pid1
Development

Successfully merging a pull request may close this issue.

2 participants