cgroup-util: make sure cg_read_pid() can deal sanely with unmapped PIDs (as in foreign pidns) #32534

BtbN · 2024-04-28T14:37:54Z

In some environments, namely WSL, the cgroup.procs PID list for some reason contain a ton of zeros everywhere.
My suspicion is that those are from other instances under the same WSL Kernel, which at least always hosts the system instance with the X/Wayland/PA/Pipe server, so there is a bunch of zeros to be had.

Without this patch, whenever cg_read_pid encounters such a zero, it throws an error. This makes systemd near unusable inside of WSL.
Just skipping over any zeros in those lists makes systemd run without any issues for me.

On normal systems, where the list does not contain any zeros to begin with, this has no averse effects.

See also:
microsoft/WSL#8879

YHNdnzj

AFAICS WSL1 still uses cgroups v1? Support for that is already obsolete in systemd v256 onwards, and WSL1 is not the development target of systemd. Sorry.

See also: #29512 (comment). I think the same argument applies to the cgroupfs implementation.

BtbN · 2024-04-28T17:24:08Z

I'm not sure how WSL1 is involved in this, I haven't used that in ages.
This is an issue on the latest WSL.

BtbN · 2024-04-28T17:40:48Z

Just checked, latest WSL boots with cgroup_no_v1=all, so it's full cgroup v2.
The layout in /sys/fs/cgroup also looks correct for that to me.

poettering · 2024-04-29T09:31:18Z

where do these zeros come from? What precisely is WSL doing there?

Are these simply PIDs that live in a separate pidns that we cannot map?

before we add any code around this I'd really prefer to understand what's going on here.

poettering · 2024-04-29T09:39:05Z

I am not sure cg_read_pid() just skipping over these processes is really the right approach. For various purposes (i.e. "is this cgroup empty?") such a logic would simply be wrong

BtbN · 2024-04-29T11:08:02Z

I'm not sure where they come from. I can only speculate that they are processes from the WSL system distribution, that hosts the X server and friends.

Looks something like this:

$ cat /sys/fs/cgroup/cgroup.procs
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
5
8
9
10
0
0
24
0
49

I'm not sure if there is any other sane way to deal with those zeros than to ignore them.
A bit more involved of an approach would be to have cg_read_pid() return those zeros without error, and then have the code in various places further down the line deal with it.

Another approach that I originally implemented and been using for a while specifically modifies append_cgroup() in src/core/dbus-unit.c to continue on EIO from cg_read_pid(ref).
This also seems to resolve most issues. At the very least it gets systemctl to work again and not bail with EIO every time.
But it's quite ugly to just ignore and press on on EIO.

trallnag · 2024-04-29T16:30:13Z

Comments in microsoft/WSL#8879 imply that WSL is the problem:

The WSL handling of cgroups: it's possible that systemd shouldn't observe zeros in its root cgroup. The systemd container interface suggests that systemd should be running in a new cgroup, but I don't know if that is required. [...]

source

[...] the same root cause @nullpo-head figured out for the problem when it was reported under distrod (nullpo-head/wsl-distrod#31 (comment)), with the same fix (put systemd in a new cgroup). [...]

source

BtbN · 2024-04-29T17:11:58Z

That would have to be fixed from Microsofts side then, wouldn't it? Given its WSL itself that's launching systemd here.
The issue was reported to them in 2022, and so far no sign of any interest in fixing it from their side.

trallnag · 2024-04-29T17:26:32Z

That would have to be fixed from Microsofts side then, wouldn't it?

If the suspicions prove correct, yes

BtbN · 2024-04-29T18:35:23Z

I tried replacing /sbin/init with a small shellscript that runs exec /usr/bin/unshare -C /lib/systemd/systemd, in an attempt to spawn systemd inside of a new cgroup namespace.
However, that seems to have no effect whatsoever. There is still a bunch of zeros in the top level cgroup.procs.

poettering · 2024-04-30T11:52:38Z

so i am pretty sure systemd should be fixed to be fine with processes with unmappable pids (which is what those zero PIDs are).

But skipping over them generically as in the proposed patch is problematic. As mentioned for code that checks if a cgroup is empty it is very much relevant to know that there is a process even if its pid cannot be mapped locally. I do have the suspicion that most of the time we want to skip those processes, but just not always.

hence, I figure cg_read_pid() should be changed to take a flags param or so, with a flag CG_PID_SKIP_UNMAPPED or so, which must be explicitly specified to skip the unmapped processes as if they didn't exist, and then every single call site has to be looked at to decide if we should skip or not.

messy, and involved, but I think that's the only right fix.

BtbN · 2024-04-30T11:54:59Z

There's not thaaat many callers from a quick glance, I'll have a look at it.

BtbN · 2024-05-01T12:37:29Z

I've turned the flag around, since for the majority of callers, ignoring the zero-pids is the desired mode of operation.
I only found one single place where they are needed, and that's indeed in the empty-check function.

Not sure what's up with the failing/stuck autopkgtests, I've trouble even identifying what the error is, and it looks unrelated.

src/basic/cgroup-util.c

src/basic/cgroup-util.h

src/basic/cgroup-util.c

src/cgtop/cgtop.c

src/shared/cgroup-setup.c

src/basic/cgroup-util.h

src/basic/cgroup-util.c

src/shared/cgroup-setup.c

src/cgtop/cgtop.c

BtbN · 2024-05-09T14:18:29Z

On 09.05.2024 12:38, Mike Yuan wrote: In src/cgtop/cgtop.c <#32534 (comment)>: > @@ -207,7 +207,7 @@ static int process( return r; g->n_tasks = 0; - while (cg_read_pid(f, &pid) > 0) { + while (cg_read_pid(f, &pid, /* flags = */ 0) > 0) { Hmm, this is changed back? It should still set |CGROUP_INCLUDE_UNMAPPABLE_PID|?

I don't think it was ever set in this version of the set, fixed now.

YHNdnzj

LGTM. I do think this is the better approach, due to the aforementioned reasons. @keszybz please take a final look.

bluca · 2024-05-10T12:30:39Z

Is this a known issue @mrc0mmand ?

16:02:05 Core was generated by `/usr/lib/systemd/tests/unit-tests/test-bus-watch-bind'.
16:02:05 Program terminated with signal SIGABRT, Aborted.
16:02:05 #0  0x000074930b0a8e44 in ?? () from /usr/lib/libc.so.6
16:02:05 [Current thread is 1 (Thread 0x749309e006c0 (LWP 1874))]
16:02:05 (gdb) Load new symbol table from "/systemd-meson-build/test-bus-watch-bind"? (y or n) [answered Y; input not from terminal]
16:02:05 Reading symbols from /systemd-meson-build/test-bus-watch-bind...
16:02:05 (gdb) 
16:02:05 Thread 4 (Thread 0x74930b64b940 (LWP 1868)):
16:02:05 #0  0x000074930b0a34e9 in ?? () from /usr/lib/libc.so.6
16:02:05 #1  0x000074930b0a8bf3 in ?? () from /usr/lib/libc.so.6
16:02:05 #2  0x0000568edb5dea61 in main (argc=<optimized out>, argv=<optimized out>) at ../build/src/libsystemd/sd-bus/test-bus-watch-bind.c:220
16:02:05 
16:02:05 Thread 3 (Thread 0x7493094006c0 (LWP 1875)):
16:02:05 #0  0x000074930b12c47b in sendmsg () from /usr/lib/libc.so.6
16:02:05 #1  0x000074930b4240dd in write_to_journal (level=level@entry=31, error=error@entry=-2, file=file@entry=0x74930b537199 "src/libsystemd/sd-bus/sd-bus.c", line=line@entry=3675, func=func@entry=0x74930b57fc00 <__func__.37> "io_callback", object_field=object_field@entry=0x0, object=<optimized out>, extra_field=<optimized out>, extra=<optimized out>, buffer=<optimized out>) at ../build/src/basic/log.c:767
16:02:05 #2  0x000074930b423555 in log_dispatch_internal (level=31, level@entry=7, error=error@entry=-2, file=file@entry=0x74930b537199 "src/libsystemd/sd-bus/sd-bus.c", line=line@entry=3675, func=func@entry=0x74930b57fc00 <__func__.37> "io_callback", object_field=object_field@entry=0x0, object=0x0, extra_field=0x0, extra=0x0, buffer=0x7493093ff1b0 "Processing of bus failed, closing down: No such file or directory") at ../build/src/basic/log.c:813
16:02:05 #3  0x000074930b4238a9 in log_internalv (level=7, error=-2, file=0x74930b537199 "src/libsystemd/sd-bus/sd-bus.c", line=3675, func=0x74930b57fc00 <__func__.37> "io_callback", format=0x74930b537828 "Processing of bus failed, closing down: %m", ap=0x7493093ffa10) at ../build/src/basic/log.c:890
16:02:05 #4  0x000074930b423944 in log_internal (level=level@entry=7, error=error@entry=-2, file=file@entry=0x74930b537199 "src/libsystemd/sd-bus/sd-bus.c", line=line@entry=3675, func=func@entry=0x74930b57fc00 <__func__.37> "io_callback", format=format@entry=0x74930b537828 "Processing of bus failed, closing down: %m") at ../build/src/basic/log.c:905
16:02:05 #5  0x000074930b49bbee in io_callback (s=<optimized out>, fd=<optimized out>, revents=<optimized out>, userdata=0x749304000e60) at ../build/src/libsystemd/sd-bus/sd-bus.c:3675
16:02:05 #6  0x000074930b4fb58e in source_dispatch (s=s@entry=0x749304001ec0) at ../build/src/libsystemd/sd-event/sd-event.c:4222
16:02:05 #7  0x000074930b4fbb3c in sd_event_dispatch (e=e@entry=0x749304000b70) at ../build/src/libsystemd/sd-event/sd-event.c:4843
16:02:05 #8  0x000074930b4fbd73 in sd_event_run (e=e@entry=0x749304000b70, timeout=timeout@entry=18446744073709551615) at ../build/src/libsystemd/sd-event/sd-event.c:4904
16:02:05 #9  0x000074930b4fbdf0 in sd_event_loop (e=0x749304000b70) at ../build/src/libsystemd/sd-event/sd-event.c:4926
16:02:05 #10 0x0000568edb5dd761 in thread_client2 (p=<optimized out>) at ../build/src/libsystemd/sd-bus/test-bus-watch-bind.c:181
16:02:05 #11 0x000074930b0a6ded in ?? () from /usr/lib/libc.so.6
16:02:05 #12 0x000074930b12a0dc in ?? () from /usr/lib/libc.so.6
16:02:05 
16:02:05 Thread 2 (Thread 0x74930a8006c0 (LWP 1873)):
16:02:05 #0  0x000074930b0f2f43 in clock_nanosleep () from /usr/lib/libc.so.6
16:02:05 #1  0x0000568edb5ddfb3 in usleep_safe (usec=usec@entry=100000) at ../build/src/basic/time-util.h:228
16:02:05 #2  0x0000568edb5de0e8 in thread_server (p=0x7ffeef516a40) at ../build/src/libsystemd/sd-bus/test-bus-watch-bind.c:63
16:02:05 #3  0x000074930b0a6ded in ?? () from /usr/lib/libc.so.6
16:02:05 #4  0x000074930b12a0dc in ?? () from /usr/lib/libc.so.6
16:02:05 
16:02:05 Thread 1 (Thread 0x749309e006c0 (LWP 1874)):
16:02:05 #0  0x000074930b0a8e44 in ?? () from /usr/lib/libc.so.6
16:02:05 #1  0x000074930b050a30 in raise () from /usr/lib/libc.so.6
16:02:05 #2  0x000074930b0384c3 in abort () from /usr/lib/libc.so.6
16:02:05 #3  0x000074930b423a9e in log_assert_failed (text=text@entry=0x568edb5df7a6 "r >= 0", file=file@entry=0x568edb5df011 "src/libsystemd/sd-bus/test-bus-watch-bind.c", line=line@entry=149, func=func@entry=0x568edb5df910 <__func__.4> "thread_client1") at ../build/src/basic/log.c:992
16:02:05 #4  0x0000568edb5dddba in thread_client1 (p=<optimized out>) at ../build/src/libsystemd/sd-bus/test-bus-watch-bind.c:149
16:02:05 #5  0x000074930b0a6ded in ?? () from /usr/lib/libc.so.6
16:02:05 #6  0x000074930b12a0dc in ?? () from /usr/lib/libc.so.6
16:02:05 (gdb) (gdb) #0  0x000074930b0a8e44 in ?? () from /usr/lib/libc.so.6
16:02:05 No symbol table info available.
16:02:05 #1  0x000074930b050a30 in raise () from /usr/lib/libc.so.6
16:02:05 No symbol table info available.
16:02:05 #2  0x000074930b0384c3 in abort () from /usr/lib/libc.so.6
16:02:05 No symbol table info available.
16:02:05 #3  0x000074930b423a9e in log_assert_failed (
16:02:05     text=text@entry=0x568edb5df7a6 "r >= 0", 
16:02:05     file=file@entry=0x568edb5df011 "src/libsystemd/sd-bus/test-bus-watch-bind.c", line=line@entry=149, 
16:02:05     func=func@entry=0x568edb5df910 <__func__.4> "thread_client1")
16:02:05     at ../build/src/basic/log.c:992
16:02:05 No locals.
16:02:05 #4  0x0000568edb5dddba in thread_client1 (p=<optimized out>)
16:02:05     at ../build/src/libsystemd/sd-bus/test-bus-watch-bind.c:149
16:02:05         error = {
16:02:05           name = 0x74930b535488 "org.freedesktop.DBus.Error.FileNotFound",
16:02:05           message = 0x74930b1bf173 "No such file or directory",
16:02:05           _need_free = 0
16:02:05         }
16:02:05         bus = 0x7492fc000b70
16:02:05         path = <optimized out>
16:02:05         t = 0x749309dffc10 "unix:path=/dev/shm/systemd-watch-bind-vMfsUe/this/is/a/socket"
16:02:05         r = <optimized out>
16:02:05         __func__ = "thread_client1"
16:02:05 #5  0x000074930b0a6ded in ?? () from /usr/lib/libc.so.6
16:02:05 No symbol table info available.
16:02:05 #6  0x000074930b12a0dc in ?? () from /usr/lib/libc.so.6
16:02:05 No symbol table info available.

poettering · 2024-05-10T14:40:54Z

lgtm.

src/core/cgroup.c

keszybz

LGTM.

keszybz · 2024-05-14T14:12:11Z

testing-farm:fedora-rawhide-x86_64 seems to have failed due to some network problems.
mkosi / ci (debian, testing): systemd:integration-tests / TEST-70-TPM2

Both look unrelated.

Werkov · 2024-05-14T14:17:34Z

Suggestion 1:

#define PID_UNMAPPED 0 // use this macro in situation when checking for external pidns PIDs

Usage of the macro in conditions will be self-documenting without need to comment each of the guards.

Suggestion 2:

-       CGROUP_DONT_SKIP_UNMAPPED = 1 << 3,
+       CGROUP_INCLUDE_UNMAPPED = 1 << 3,

The double negative is confusing.

Suggestion 3:
s/UNMAPPED/NONMAPPED/ -- when I first saw the title of the issue I thought about pids that were mapped but aren't anymore -- these are simply not mapped/visible into the target pinds.

(Believe it or not, when I started typing this comment, the PR was still unmerged ;-)

YHNdnzj · 2024-05-14T14:19:29Z

Suggestion 2:

-       CGROUP_DONT_SKIP_UNMAPPED = 1 << 3,
+       CGROUP_INCLUDE_UNMAPPED = 1 << 3,

This is already discussed in #32534 (comment).

BtbN · 2024-05-14T14:26:13Z

Could I PR this patch to the 255 branch of the stable repo? So that Ubuntu 24.04, which ships on WSL with systemd enabled by default, could hopefully pick it up at some point? Or is that process automatic?

Werkov · 2024-05-14T14:43:54Z

This is already discussed in #32534 (comment).

I didn't notice that -- I believe that proves the strength of confusion it causes. The error is IMO fine since you can't make pidref to a not mapped PID.

(Triple negative 🤯 )

BTW Can this situation be achieved (processes from inaccessible pidns in managed cgroups) anyhow but by the violation of the single writer rule?

keszybz · 2024-05-14T15:32:56Z

Could I PR this patch to the 255 branch of the stable repo? So that Ubuntu 24.04, which ships on WSL with systemd enabled by default, could hopefully pick it up at some point? Or is that process automatic?

Feel free to open a PR. Please use git cherry-pick -x.

github-actions bot added util-lib please-review PR is ready for (re-)review by a maintainer labels Apr 28, 2024

YHNdnzj reviewed Apr 28, 2024

View reviewed changes

poettering added reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks cgroups and removed please-review PR is ready for (re-)review by a maintainer labels Apr 30, 2024

poettering changed the title ~~cgroup-util: make cg_read_pid skip zeros~~ cgroup-util: make sure cg_read_pid() can deal sanely with unmapped PIDs (ans in foreign pidns) Apr 30, 2024

BtbN force-pushed the wsl-fix branch from ce2ac5d to 822e76b Compare April 30, 2024 23:37

github-actions bot added cgtop please-review PR is ready for (re-)review by a maintainer and removed reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks labels Apr 30, 2024

YHNdnzj reviewed May 1, 2024

View reviewed changes

src/basic/cgroup-util.c Outdated Show resolved Hide resolved

yuwata requested changes May 1, 2024

View reviewed changes

src/basic/cgroup-util.h Outdated Show resolved Hide resolved

src/basic/cgroup-util.c Outdated Show resolved Hide resolved

src/cgtop/cgtop.c Outdated Show resolved Hide resolved

src/shared/cgroup-setup.c Outdated Show resolved Hide resolved

yuwata added reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks and removed please-review PR is ready for (re-)review by a maintainer labels May 1, 2024

yuwata reviewed May 1, 2024

View reviewed changes

src/basic/cgroup-util.h Outdated Show resolved Hide resolved

BtbN force-pushed the wsl-fix branch from 822e76b to 598e395 Compare May 1, 2024 15:46

github-actions bot added the please-review PR is ready for (re-)review by a maintainer label May 1, 2024

BtbN force-pushed the wsl-fix branch 2 times, most recently from c6943a3 to 9d8f5e8 Compare May 9, 2024 10:19

YHNdnzj reviewed May 9, 2024

View reviewed changes

src/basic/cgroup-util.c Outdated Show resolved Hide resolved

src/shared/cgroup-setup.c Outdated Show resolved Hide resolved

src/cgtop/cgtop.c Outdated Show resolved Hide resolved

BtbN force-pushed the wsl-fix branch from 9d8f5e8 to 59f07ea Compare May 9, 2024 14:15

YHNdnzj approved these changes May 10, 2024

View reviewed changes

YHNdnzj added good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed and removed please-review PR is ready for (re-)review by a maintainer labels May 10, 2024

yuwata reviewed May 13, 2024

View reviewed changes

src/core/cgroup.c Show resolved Hide resolved

yuwata added good-to-merge/with-minor-suggestions and removed good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed labels May 14, 2024

BtbN force-pushed the wsl-fix branch from 59f07ea to 41219b4 Compare May 14, 2024 12:23

cgroup-util: allow cg_read_pid() to skip unmapped (zero) pids

41219b4

YHNdnzj added good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed and removed good-to-merge/with-minor-suggestions labels May 14, 2024

keszybz approved these changes May 14, 2024

View reviewed changes

keszybz added ci-failure-appears-unrelated and removed good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed labels May 14, 2024

keszybz merged commit 00f1714 into systemd:main May 14, 2024
40 of 44 checks passed

BtbN mentioned this pull request May 14, 2024

cgroup-util: allow cg_read_pid() to skip unmapped (zero) pids systemd/systemd-stable#401

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cgroup-util: make sure cg_read_pid() can deal sanely with unmapped PIDs (as in foreign pidns) #32534

cgroup-util: make sure cg_read_pid() can deal sanely with unmapped PIDs (as in foreign pidns) #32534

BtbN commented Apr 28, 2024 •

edited by github-actions bot

YHNdnzj left a comment

BtbN commented Apr 28, 2024

BtbN commented Apr 28, 2024

poettering commented Apr 29, 2024

poettering commented Apr 29, 2024

BtbN commented Apr 29, 2024

trallnag commented Apr 29, 2024

BtbN commented Apr 29, 2024

trallnag commented Apr 29, 2024

BtbN commented Apr 29, 2024

poettering commented Apr 30, 2024 •

edited

BtbN commented Apr 30, 2024

BtbN commented May 1, 2024

BtbN commented May 9, 2024 via email

YHNdnzj left a comment •

edited

bluca commented May 10, 2024

poettering commented May 10, 2024

keszybz left a comment

keszybz commented May 14, 2024

Werkov commented May 14, 2024 •

edited

YHNdnzj commented May 14, 2024

BtbN commented May 14, 2024

Werkov commented May 14, 2024

keszybz commented May 14, 2024

cgroup-util: make sure cg_read_pid() can deal sanely with unmapped PIDs (as in foreign pidns) #32534

cgroup-util: make sure cg_read_pid() can deal sanely with unmapped PIDs (as in foreign pidns) #32534

Conversation

BtbN commented Apr 28, 2024 • edited by github-actions bot

YHNdnzj left a comment

Choose a reason for hiding this comment

BtbN commented Apr 28, 2024

BtbN commented Apr 28, 2024

poettering commented Apr 29, 2024

poettering commented Apr 29, 2024

BtbN commented Apr 29, 2024

trallnag commented Apr 29, 2024

BtbN commented Apr 29, 2024

trallnag commented Apr 29, 2024

BtbN commented Apr 29, 2024

poettering commented Apr 30, 2024 • edited

BtbN commented Apr 30, 2024

BtbN commented May 1, 2024

BtbN commented May 9, 2024 via email

YHNdnzj left a comment • edited

Choose a reason for hiding this comment

bluca commented May 10, 2024

poettering commented May 10, 2024

keszybz left a comment

Choose a reason for hiding this comment

keszybz commented May 14, 2024

Werkov commented May 14, 2024 • edited

YHNdnzj commented May 14, 2024

BtbN commented May 14, 2024

Werkov commented May 14, 2024

keszybz commented May 14, 2024

BtbN commented Apr 28, 2024 •

edited by github-actions bot

poettering commented Apr 30, 2024 •

edited

YHNdnzj left a comment •

edited

Werkov commented May 14, 2024 •

edited