Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgroup-util: make sure cg_read_pid() can deal sanely with unmapped PIDs (as in foreign pidns) #32534

Merged
merged 1 commit into from May 14, 2024

Conversation

BtbN
Copy link
Contributor

@BtbN BtbN commented Apr 28, 2024

In some environments, namely WSL, the cgroup.procs PID list for some reason contain a ton of zeros everywhere.
My suspicion is that those are from other instances under the same WSL Kernel, which at least always hosts the system instance with the X/Wayland/PA/Pipe server, so there is a bunch of zeros to be had.

Without this patch, whenever cg_read_pid encounters such a zero, it throws an error. This makes systemd near unusable inside of WSL.
Just skipping over any zeros in those lists makes systemd run without any issues for me.

On normal systems, where the list does not contain any zeros to begin with, this has no averse effects.

See also:
microsoft/WSL#8879

@github-actions github-actions bot added util-lib please-review PR is ready for (re-)review by a maintainer labels Apr 28, 2024
Copy link
Member

@YHNdnzj YHNdnzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICS WSL1 still uses cgroups v1? Support for that is already obsolete in systemd v256 onwards, and WSL1 is not the development target of systemd. Sorry.

See also: #29512 (comment). I think the same argument applies to the cgroupfs implementation.

@BtbN
Copy link
Contributor Author

BtbN commented Apr 28, 2024

I'm not sure how WSL1 is involved in this, I haven't used that in ages.
This is an issue on the latest WSL.

@BtbN
Copy link
Contributor Author

BtbN commented Apr 28, 2024

Just checked, latest WSL boots with cgroup_no_v1=all, so it's full cgroup v2.
The layout in /sys/fs/cgroup also looks correct for that to me.

@poettering
Copy link
Member

where do these zeros come from? What precisely is WSL doing there?

Are these simply PIDs that live in a separate pidns that we cannot map?

before we add any code around this I'd really prefer to understand what's going on here.

@poettering
Copy link
Member

I am not sure cg_read_pid() just skipping over these processes is really the right approach. For various purposes (i.e. "is this cgroup empty?") such a logic would simply be wrong

@BtbN
Copy link
Contributor Author

BtbN commented Apr 29, 2024

I'm not sure where they come from. I can only speculate that they are processes from the WSL system distribution, that hosts the X server and friends.

Looks something like this:

$ cat /sys/fs/cgroup/cgroup.procs
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
5
8
9
10
0
0
24
0
49

I'm not sure if there is any other sane way to deal with those zeros than to ignore them.
A bit more involved of an approach would be to have cg_read_pid() return those zeros without error, and then have the code in various places further down the line deal with it.

Another approach that I originally implemented and been using for a while specifically modifies append_cgroup() in src/core/dbus-unit.c to continue on EIO from cg_read_pid(ref).
This also seems to resolve most issues. At the very least it gets systemctl to work again and not bail with EIO every time.
But it's quite ugly to just ignore and press on on EIO.

@trallnag
Copy link

Comments in microsoft/WSL#8879 imply that WSL is the problem:

The WSL handling of cgroups: it's possible that systemd shouldn't observe zeros in its root cgroup. The systemd container interface suggests that systemd should be running in a new cgroup, but I don't know if that is required. [...]

source

[...] the same root cause @nullpo-head figured out for the problem when it was reported under distrod (nullpo-head/wsl-distrod#31 (comment)), with the same fix (put systemd in a new cgroup). [...]

source

@BtbN
Copy link
Contributor Author

BtbN commented Apr 29, 2024

That would have to be fixed from Microsofts side then, wouldn't it? Given its WSL itself that's launching systemd here.
The issue was reported to them in 2022, and so far no sign of any interest in fixing it from their side.

@trallnag
Copy link

That would have to be fixed from Microsofts side then, wouldn't it?

If the suspicions prove correct, yes

@BtbN
Copy link
Contributor Author

BtbN commented Apr 29, 2024

I tried replacing /sbin/init with a small shellscript that runs exec /usr/bin/unshare -C /lib/systemd/systemd, in an attempt to spawn systemd inside of a new cgroup namespace.
However, that seems to have no effect whatsoever. There is still a bunch of zeros in the top level cgroup.procs.

@poettering
Copy link
Member

poettering commented Apr 30, 2024

so i am pretty sure systemd should be fixed to be fine with processes with unmappable pids (which is what those zero PIDs are).

But skipping over them generically as in the proposed patch is problematic. As mentioned for code that checks if a cgroup is empty it is very much relevant to know that there is a process even if its pid cannot be mapped locally. I do have the suspicion that most of the time we want to skip those processes, but just not always.

hence, I figure cg_read_pid() should be changed to take a flags param or so, with a flag CG_PID_SKIP_UNMAPPED or so, which must be explicitly specified to skip the unmapped processes as if they didn't exist, and then every single call site has to be looked at to decide if we should skip or not.

messy, and involved, but I think that's the only right fix.

@poettering poettering added reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks cgroups and removed please-review PR is ready for (re-)review by a maintainer labels Apr 30, 2024
@poettering poettering changed the title cgroup-util: make cg_read_pid skip zeros cgroup-util: make sure cg_read_pid() can deal sanely with unmapped PIDs (ans in foreign pidns) Apr 30, 2024
@BtbN
Copy link
Contributor Author

BtbN commented Apr 30, 2024

There's not thaaat many callers from a quick glance, I'll have a look at it.

@github-actions github-actions bot added cgtop please-review PR is ready for (re-)review by a maintainer and removed reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks labels Apr 30, 2024
@BtbN
Copy link
Contributor Author

BtbN commented May 1, 2024

I've turned the flag around, since for the majority of callers, ignoring the zero-pids is the desired mode of operation.
I only found one single place where they are needed, and that's indeed in the empty-check function.

Not sure what's up with the failing/stuck autopkgtests, I've trouble even identifying what the error is, and it looks unrelated.

src/basic/cgroup-util.c Outdated Show resolved Hide resolved
src/basic/cgroup-util.h Outdated Show resolved Hide resolved
src/basic/cgroup-util.c Outdated Show resolved Hide resolved
src/cgtop/cgtop.c Outdated Show resolved Hide resolved
src/shared/cgroup-setup.c Outdated Show resolved Hide resolved
@yuwata yuwata added reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks and removed please-review PR is ready for (re-)review by a maintainer labels May 1, 2024
src/basic/cgroup-util.h Outdated Show resolved Hide resolved
@github-actions github-actions bot added the please-review PR is ready for (re-)review by a maintainer label May 1, 2024
@BtbN BtbN force-pushed the wsl-fix branch 2 times, most recently from c6943a3 to 9d8f5e8 Compare May 9, 2024 10:19
src/basic/cgroup-util.c Outdated Show resolved Hide resolved
src/shared/cgroup-setup.c Outdated Show resolved Hide resolved
src/cgtop/cgtop.c Outdated Show resolved Hide resolved
@BtbN
Copy link
Contributor Author

BtbN commented May 9, 2024 via email

Copy link
Member

@YHNdnzj YHNdnzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I do think this is the better approach, due to the aforementioned reasons. @keszybz please take a final look.

@YHNdnzj YHNdnzj added good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed and removed please-review PR is ready for (re-)review by a maintainer labels May 10, 2024
@bluca
Copy link
Member

bluca commented May 10, 2024

Is this a known issue @mrc0mmand ?

16:02:05 Core was generated by `/usr/lib/systemd/tests/unit-tests/test-bus-watch-bind'.
16:02:05 Program terminated with signal SIGABRT, Aborted.
16:02:05 #0  0x000074930b0a8e44 in ?? () from /usr/lib/libc.so.6
16:02:05 [Current thread is 1 (Thread 0x749309e006c0 (LWP 1874))]
16:02:05 (gdb) Load new symbol table from "/systemd-meson-build/test-bus-watch-bind"? (y or n) [answered Y; input not from terminal]
16:02:05 Reading symbols from /systemd-meson-build/test-bus-watch-bind...
16:02:05 (gdb) 
16:02:05 Thread 4 (Thread 0x74930b64b940 (LWP 1868)):
16:02:05 #0  0x000074930b0a34e9 in ?? () from /usr/lib/libc.so.6
16:02:05 #1  0x000074930b0a8bf3 in ?? () from /usr/lib/libc.so.6
16:02:05 #2  0x0000568edb5dea61 in main (argc=<optimized out>, argv=<optimized out>) at ../build/src/libsystemd/sd-bus/test-bus-watch-bind.c:220
16:02:05 
16:02:05 Thread 3 (Thread 0x7493094006c0 (LWP 1875)):
16:02:05 #0  0x000074930b12c47b in sendmsg () from /usr/lib/libc.so.6
16:02:05 #1  0x000074930b4240dd in write_to_journal (level=level@entry=31, error=error@entry=-2, file=file@entry=0x74930b537199 "src/libsystemd/sd-bus/sd-bus.c", line=line@entry=3675, func=func@entry=0x74930b57fc00 <__func__.37> "io_callback", object_field=object_field@entry=0x0, object=<optimized out>, extra_field=<optimized out>, extra=<optimized out>, buffer=<optimized out>) at ../build/src/basic/log.c:767
16:02:05 #2  0x000074930b423555 in log_dispatch_internal (level=31, level@entry=7, error=error@entry=-2, file=file@entry=0x74930b537199 "src/libsystemd/sd-bus/sd-bus.c", line=line@entry=3675, func=func@entry=0x74930b57fc00 <__func__.37> "io_callback", object_field=object_field@entry=0x0, object=0x0, extra_field=0x0, extra=0x0, buffer=0x7493093ff1b0 "Processing of bus failed, closing down: No such file or directory") at ../build/src/basic/log.c:813
16:02:05 #3  0x000074930b4238a9 in log_internalv (level=7, error=-2, file=0x74930b537199 "src/libsystemd/sd-bus/sd-bus.c", line=3675, func=0x74930b57fc00 <__func__.37> "io_callback", format=0x74930b537828 "Processing of bus failed, closing down: %m", ap=0x7493093ffa10) at ../build/src/basic/log.c:890
16:02:05 #4  0x000074930b423944 in log_internal (level=level@entry=7, error=error@entry=-2, file=file@entry=0x74930b537199 "src/libsystemd/sd-bus/sd-bus.c", line=line@entry=3675, func=func@entry=0x74930b57fc00 <__func__.37> "io_callback", format=format@entry=0x74930b537828 "Processing of bus failed, closing down: %m") at ../build/src/basic/log.c:905
16:02:05 #5  0x000074930b49bbee in io_callback (s=<optimized out>, fd=<optimized out>, revents=<optimized out>, userdata=0x749304000e60) at ../build/src/libsystemd/sd-bus/sd-bus.c:3675
16:02:05 #6  0x000074930b4fb58e in source_dispatch (s=s@entry=0x749304001ec0) at ../build/src/libsystemd/sd-event/sd-event.c:4222
16:02:05 #7  0x000074930b4fbb3c in sd_event_dispatch (e=e@entry=0x749304000b70) at ../build/src/libsystemd/sd-event/sd-event.c:4843
16:02:05 #8  0x000074930b4fbd73 in sd_event_run (e=e@entry=0x749304000b70, timeout=timeout@entry=18446744073709551615) at ../build/src/libsystemd/sd-event/sd-event.c:4904
16:02:05 #9  0x000074930b4fbdf0 in sd_event_loop (e=0x749304000b70) at ../build/src/libsystemd/sd-event/sd-event.c:4926
16:02:05 #10 0x0000568edb5dd761 in thread_client2 (p=<optimized out>) at ../build/src/libsystemd/sd-bus/test-bus-watch-bind.c:181
16:02:05 #11 0x000074930b0a6ded in ?? () from /usr/lib/libc.so.6
16:02:05 #12 0x000074930b12a0dc in ?? () from /usr/lib/libc.so.6
16:02:05 
16:02:05 Thread 2 (Thread 0x74930a8006c0 (LWP 1873)):
16:02:05 #0  0x000074930b0f2f43 in clock_nanosleep () from /usr/lib/libc.so.6
16:02:05 #1  0x0000568edb5ddfb3 in usleep_safe (usec=usec@entry=100000) at ../build/src/basic/time-util.h:228
16:02:05 #2  0x0000568edb5de0e8 in thread_server (p=0x7ffeef516a40) at ../build/src/libsystemd/sd-bus/test-bus-watch-bind.c:63
16:02:05 #3  0x000074930b0a6ded in ?? () from /usr/lib/libc.so.6
16:02:05 #4  0x000074930b12a0dc in ?? () from /usr/lib/libc.so.6
16:02:05 
16:02:05 Thread 1 (Thread 0x749309e006c0 (LWP 1874)):
16:02:05 #0  0x000074930b0a8e44 in ?? () from /usr/lib/libc.so.6
16:02:05 #1  0x000074930b050a30 in raise () from /usr/lib/libc.so.6
16:02:05 #2  0x000074930b0384c3 in abort () from /usr/lib/libc.so.6
16:02:05 #3  0x000074930b423a9e in log_assert_failed (text=text@entry=0x568edb5df7a6 "r >= 0", file=file@entry=0x568edb5df011 "src/libsystemd/sd-bus/test-bus-watch-bind.c", line=line@entry=149, func=func@entry=0x568edb5df910 <__func__.4> "thread_client1") at ../build/src/basic/log.c:992
16:02:05 #4  0x0000568edb5dddba in thread_client1 (p=<optimized out>) at ../build/src/libsystemd/sd-bus/test-bus-watch-bind.c:149
16:02:05 #5  0x000074930b0a6ded in ?? () from /usr/lib/libc.so.6
16:02:05 #6  0x000074930b12a0dc in ?? () from /usr/lib/libc.so.6
16:02:05 (gdb) (gdb) #0  0x000074930b0a8e44 in ?? () from /usr/lib/libc.so.6
16:02:05 No symbol table info available.
16:02:05 #1  0x000074930b050a30 in raise () from /usr/lib/libc.so.6
16:02:05 No symbol table info available.
16:02:05 #2  0x000074930b0384c3 in abort () from /usr/lib/libc.so.6
16:02:05 No symbol table info available.
16:02:05 #3  0x000074930b423a9e in log_assert_failed (
16:02:05     text=text@entry=0x568edb5df7a6 "r >= 0", 
16:02:05     file=file@entry=0x568edb5df011 "src/libsystemd/sd-bus/test-bus-watch-bind.c", line=line@entry=149, 
16:02:05     func=func@entry=0x568edb5df910 <__func__.4> "thread_client1")
16:02:05     at ../build/src/basic/log.c:992
16:02:05 No locals.
16:02:05 #4  0x0000568edb5dddba in thread_client1 (p=<optimized out>)
16:02:05     at ../build/src/libsystemd/sd-bus/test-bus-watch-bind.c:149
16:02:05         error = {
16:02:05           name = 0x74930b535488 "org.freedesktop.DBus.Error.FileNotFound",
16:02:05           message = 0x74930b1bf173 "No such file or directory",
16:02:05           _need_free = 0
16:02:05         }
16:02:05         bus = 0x7492fc000b70
16:02:05         path = <optimized out>
16:02:05         t = 0x749309dffc10 "unix:path=/dev/shm/systemd-watch-bind-vMfsUe/this/is/a/socket"
16:02:05         r = <optimized out>
16:02:05         __func__ = "thread_client1"
16:02:05 #5  0x000074930b0a6ded in ?? () from /usr/lib/libc.so.6
16:02:05 No symbol table info available.
16:02:05 #6  0x000074930b12a0dc in ?? () from /usr/lib/libc.so.6
16:02:05 No symbol table info available.

@poettering
Copy link
Member

lgtm.

@yuwata yuwata added good-to-merge/with-minor-suggestions and removed good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed labels May 14, 2024
@YHNdnzj YHNdnzj added good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed and removed good-to-merge/with-minor-suggestions labels May 14, 2024
Copy link
Member

@keszybz keszybz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@keszybz keszybz added ci-failure-appears-unrelated and removed good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed labels May 14, 2024
@keszybz
Copy link
Member

keszybz commented May 14, 2024

testing-farm:fedora-rawhide-x86_64 seems to have failed due to some network problems.
mkosi / ci (debian, testing): systemd:integration-tests / TEST-70-TPM2

Both look unrelated.

@keszybz keszybz merged commit 00f1714 into systemd:main May 14, 2024
40 of 44 checks passed
@Werkov
Copy link
Contributor

Werkov commented May 14, 2024

Suggestion 1:

#define PID_UNMAPPED 0 // use this macro in situation when checking for external pidns PIDs

Usage of the macro in conditions will be self-documenting without need to comment each of the guards.

Suggestion 2:

-       CGROUP_DONT_SKIP_UNMAPPED = 1 << 3,
+       CGROUP_INCLUDE_UNMAPPED = 1 << 3,

The double negative is confusing.

Suggestion 3:
s/UNMAPPED/NONMAPPED/ -- when I first saw the title of the issue I thought about pids that were mapped but aren't anymore -- these are simply not mapped/visible into the target pinds.

(Believe it or not, when I started typing this comment, the PR was still unmerged ;-)

@YHNdnzj
Copy link
Member

YHNdnzj commented May 14, 2024

Suggestion 2:

-       CGROUP_DONT_SKIP_UNMAPPED = 1 << 3,
+       CGROUP_INCLUDE_UNMAPPED = 1 << 3,

This is already discussed in #32534 (comment).

@BtbN
Copy link
Contributor Author

BtbN commented May 14, 2024

Could I PR this patch to the 255 branch of the stable repo? So that Ubuntu 24.04, which ships on WSL with systemd enabled by default, could hopefully pick it up at some point? Or is that process automatic?

@Werkov
Copy link
Contributor

Werkov commented May 14, 2024

This is already discussed in #32534 (comment).

I didn't notice that -- I believe that proves the strength of confusion it causes. The error is IMO fine since you can't make pidref to a not mapped PID.

(Triple negative 🤯 )

BTW Can this situation be achieved (processes from inaccessible pidns in managed cgroups) anyhow but by the violation of the single writer rule?

@keszybz
Copy link
Member

keszybz commented May 14, 2024

Could I PR this patch to the 255 branch of the stable repo? So that Ubuntu 24.04, which ships on WSL with systemd enabled by default, could hopefully pick it up at some point? Or is that process automatic?

Feel free to open a PR. Please use git cherry-pick -x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

8 participants