Flake: TestAppUserGroup #3477
Comments
The failure:
is due to systemd failing with |
This may be fixed by bumping systemd to 232. |
I did some cross-checking between test history, git log and dates and I have some doubts related to #3432. In particular, I fear that as a side-effect of 3c93df9#diff-99ef63dd92c607458ff979e0c391eb56L287 we are now mutating some of the fields related to users/groups/private-users that are now shared by pointer. That may explain the flake, as multiple applications with different settings could be racing against each other. @squeed @s-urbaniak can you please have a cross-check of the logic around there? |
as far as I see |
As I was spamming this, I was able to get this error to happen in rkt v1.20 - before 3432 was merged. So it's not that :-(. |
Also, the same error showed up at some point in |
This flake is hard to reproduce and happens ~once a day on my machine. What we know so far:
A patched systemd with more detailed journal log entries triaged the Here systemd parses That remount fails sporadically. The remount loop of mount entries fails at random places, but consistently at some |
So after discussion the above finding with @lucab we may have a candidate in Now since systemd does a bind-mount dance afterwards too with Recent changes in Marking |
The question then is: are there any applications that rely on |
Further theory about mount propagation turned out to be also wrong: I tried marking all mounts performed by |
@s-urbaniak , did you get something like this
? Which kernel version are you using? |
|
@evverx yes, that is exactly the failure I saw when investigating the issue. The kernel under test was 4.8.13. |
Well, I think this is a kernel issue.
@s-urbaniak ,please try this patch: diff --git a/src/basic/mount-util.c b/src/basic/mount-util.c
index f0bc9ca..bd6c99d 100644
--- a/src/basic/mount-util.c
+++ b/src/basic/mount-util.c
@@ -453,7 +453,14 @@ int bind_remount_recursive(const char *prefix, bool ro, char **blacklist) {
orig_flags &= ~MS_RDONLY;
if (mount(NULL, prefix, NULL, orig_flags|MS_BIND|MS_REMOUNT|(ro ? MS_RDONLY : 0), NULL) < 0)
+ log_error_errno(errno, "Failed to remount %s (first attempt): %m", prefix);
+
+ sleep(5);
+
+ if (mount(NULL, prefix, NULL, orig_flags|MS_BIND|MS_REMOUNT|(ro ? MS_RDONLY : 0), NULL) < 0) {
+ log_error_errno(errno, "Failed to remount %s (second attempt): %m", prefix);
return -errno;
+ }
log_debug("Made top-level directory %s a mount point.", prefix); Does it "fix" the test? |
Also, it would be great to see all |
@evverx I'll try to get an overnight test run on my machine with the above patch and will let you know about the result. |
This flake has the following output and has been seen two times already:
#3453
#3462
The text was updated successfully, but these errors were encountered: