glfsheal coredump occasionally #4239

GeorgeLjz · 2023-10-16T02:59:47Z

Description of problem:
glustershd failed because glfsheal got coredump

The exact command to reproduce the issue:
daemon process glustershd failed, no exact command for this issue.

The full output of the command that failed: N/A

Expected results:

Mandatory info:
- The output of the gluster volume info command:
Volume Name: log
Type: Replicate
Volume ID: 786a290a-28a7-4f4d-8930-450319b79c5c
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 169.254.0.20:/mnt/bricks/log/brick
Brick2: 169.254.0.28:/mnt/bricks/log/brick
Options Reconfigured:
nfs.disable: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
cluster.server-quorum-type: none
cluster.consistent-metadata: no
server.allow-insecure: on
network.ping-timeout: 42
cluster.favorite-child-policy: mtime
cluster.heal-timeout: 60
storage.health-check-interval: 0
performance.client-io-threads: off
diagnostics.brick-log-level: INFO
cluster.server-quorum-ratio: 51

- The output of the `gluster volume status` command:
Status of volume: log
Gluster process TCP Port RDMA Port Online Pid

Brick 169.254.0.20:/mnt/bricks/log/brick 53954 0 Y 1701
Brick 169.254.0.28:/mnt/bricks/log/brick 53955 0 Y 2489
Self-heal Daemon on localhost N/A N/A N N/A
Self-heal Daemon on 169.254.0.24 N/A N/A Y 1283
Self-heal Daemon on 169.254.0.28 N/A N/A N N/A

Task Status of Volume log

There are no active volume tasks

- The output of the gluster volume heal command:
Brick 169.254.0.20:/mnt/bricks/log/brick
/tmpdir1\test/sn.log
/ - Is in split-brain
/tmpdir1\test - Is in split-brain
Status: Connected
Number of entries: 3

Brick 169.254.0.28:/mnt/bricks/log/brick
/tmp9_hard2.log
/tmpdir4\test
/tmpdir4\test/1/2/3/4/5/6/7/8
/tmpdir4\test/1/2/3/4
/tmpdir4\test/1
/tmpdir4\test/1/2/3/4/5/6
/tmpdir1\test/tmp8.log
/tmp4 complex_test-0 &%~$=' ;\ .txt
/tmpdir1\test/tmp2.log
/tmpdir4\test/1/2/3/4/5
/tmpdir4\test/1/2/3
/tmpdir4\test/1/2/3/4/5/6/7
/tmpdir4\test/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17
/tmpdir4\test/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20
/tmpdir2\test
/tmpdir4\test/1/2/3/4/5/6/7/8/9/10/11/12/13/14
/tmpdir1\test - Is in split-brain
/tmpdir4\test/1/2/3/4/5/6/7/8/9/10/11/12/13
/tmpdir4\test/1/2/3/4/5/6/7/8/9/10/11
/tmpdir4\test/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15
/tmp3.log
/tmpdir4\test/1/2/3/4/5/6/7/8/9/10
/master/fsaudit/auth.log
/master/fsaudit/alarms
/tmpdir4\test/1/2/3/4/5/6/7/8/9
/tmpdir1\test/mgn.log
/master/syslog
/tmpdir4\test/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18
/tmpdir4\test/1/2/3/4/5/6/7/8/9/10/11/12
/tmpdir4\test/1/2
/ - Is in split-brain
/tmpdir1\test/tmp6.log
/tmpdir4\test/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/tmp_deep.log
/tmpdir1\test/tmp7.log
/tmpdir4\test/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19
/tmp5.log
/tmp9_soft2.log
/tmpdir4\test/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16
Status: Connected
Number of entries: 38

**- Provide logs present on following locations of client and server nodes -
/var/log/glusterfs/
Final graph:
+------------------------------------------------------------------------------+
1: volume services-client-0
2: type protocol/client
3: option opversion 70000
4: option clnt-lk-version 1
5: option volfile-checksum 0
6: option volfile-key shd/config
7: option client-version 7.0
8: option process-name glustershd
9: option process-uuid CTX_ID:1181f87e-c3c6-46d6-83fa-fcb5783a2a67-GRAPH_ID:5-PID:9852-HOST:SN-0-PC_NAME:services-client-0-RECON_NO:-0
10: option fops-version 1298437
11: option ping-timeout 42
12: option remote-host 169.254.0.20
13: option remote-subvolume /mnt/bricks/services/brick
14: option transport-type socket
15: option transport.address-family inet
16: option username b24f982a-5276-466f-b6a5-42a88e20a2a7
17: option password 72cc7d66-e66f-491e-8901-8629ae0960f0
18: option transport.socket.ssl-enabled off
19: option transport.tcp-user-timeout 9
20: option transport.socket.keepalive-time 20
21: option transport.socket.keepalive-interval 10
22: option transport.socket.keepalive-count 3
23: end-volume
24:
25: volume services-client-1
26: type protocol/client
27: option ping-timeout 42
[2023-09-26 06:50:29.273048] I [rpc-clnt.c:1969:rpc_clnt_reconfig] 5-services-client-1: changing port to 53957 (from 0)
[2023-09-26 06:50:29.273194] I [socket.c:864:__socket_shutdown] 5-services-client-1: intentional socket shutdown(18)
28: option remote-host 169.254.0.28
29: option remote-subvolume /mnt/bricks/services/brick
30: option transport-type socket
31: option transport.address-family inet
32: option username b24f982a-5276-466f-b6a5-42a88e20a2a7
33: option password 72cc7d66-e66f-491e-8901-8629ae0960f0
34: option transport.socket.ssl-enabled off
35: option transport.tcp-user-timeout 9
36: option transport.socket.keepalive-time 20
37: option transport.socket.keepalive-interval 10
38: option transport.socket.keepalive-count 3
39: end-volume
40:
41: volume services-replicate-0
42: type cluster/replicate
43: option node-uuid 88fd68ac-47d0-4cda-87af-4ffc54e4d8e5
44: option afr-pending-xattr services-client-0,services-client-1
45: option background-self-heal-count 0
46: option metadata-self-heal on
47: option data-self-heal on
48: option entry-self-heal on
49: option self-heal-daemon enable
50: option heal-timeout 60
51: option consistent-metadata no
52: option favorite-child-policy mtime
53: option use-compound-fops off
54: option iam-self-heal-daemon yes
55: subvolumes services-client-0 services-client-1
56: end-volume
57:
58: volume services
59: type debug/io-stats
60: option log-level INFO
61: option threads 16
62: subvolumes services-replicate-0
63: end-volume
64:
+------------------------------------------------------------------------------+
[2023-09-26 06:50:29.274719] I [MSGID: 100041] [glusterfsd-mgmt.c:1108:glusterfs_handle_svc_attach] 0-glusterfs: received attach request for volfile-id=shd/mstate
[2023-09-26 06:50:29.274776] I [MSGID: 100040] [glusterfsd-mgmt.c:105:mgmt_process_volfile] 0-glusterfs: No change in volfile, continuing
[2023-09-26 06:50:29.274811] I [MSGID: 100041] [glusterfsd-mgmt.c:1108:glusterfs_handle_svc_attach] 0-glusterfs: received attach request for volfile-id=shd/services
[2023-09-26 06:50:29.274828] I [MSGID: 100040] [glusterfsd-mgmt.c:105:mgmt_process_volfile] 0-glusterfs: No change in volfile, continuing
[2023-09-26 06:50:29.274850] I [MSGID: 108026] [afr-self-heald.c:424:afr_shd_index_heal] 4-mstate-replicate-0: got entry: b4286ea7-6cd8-4931-ac7b-b5a979dca17b from mstate-client-0
[2023-09-26 06:50:29.274895] I [MSGID: 108026] [afr-self-heald.c:424:afr_shd_index_heal] 3-log-replicate-0: got entry: 24972e03-d45b-4900-a147-2903780b5302 from log-client-0
[2023-09-26 06:50:29.274945] I [MSGID: 100040] [glusterfsd-mgmt.c:105:mgmt_process_volfile] 0-glusterfs: No change in volfile, continuing
[2023-09-26 06:50:29.276312] I [MSGID: 100040] [glusterfsd-mgmt.c:105:mgmt_process_volfile] 0-glusterfs: No change in volfile, continuing
[2023-09-26 06:50:29.276402] I [MSGID: 108026] [afr-self-heald.c:333:afr_shd_selfheal] 4-mstate-replicate-0: entry: path /tmpdir1/sn.log, gfid: b4286ea7-6cd8-4931-ac7b-b5a979dca17b
[2023-09-26 06:50:29.276514] I [MSGID: 108026] [afr-self-heald.c:333:afr_shd_selfheal] 3-log-replicate-0: entry: path gfid:24972e03-d45b-4900-a147-2903780b5302, gfid: 24972e03-d45b-4900-a147-2903780b5302
[2023-09-26 06:50:29.277013] I [MSGID: 100040] [glusterfsd-mgmt.c:105:mgmt_process_volfile] 0-glusterfs: No change in volfile, continuing
[2023-09-26 06:50:29.277265] I [MSGID: 114057] [client-handshake.c:1373:select_server_supported_programs] 5-services-client-1: Using Program GlusterFS 4.x v1, Num (1298437), Version (400)
[2023-09-26 06:50:29.277832] I [MSGID: 114046] [client-handshake.c:1104:client_setvolume_cbk] 5-services-client-1: Connected to services-client-1, attached to remote volume '/mnt/bricks/services/brick'.
[2023-09-26 06:50:29.281028] I [MSGID: 108026] [afr-self-heald.c:424:afr_shd_index_heal] 3-log-replicate-0: got entry: a91e162b-e352-451e-8b40-2d15982e4748 from log-client-0
[2023-09-26 06:50:29.281074] I [MSGID: 108026] [afr-self-heal-data.c:327:afr_selfheal_data_do] 4-mstate-replicate-0: performing data selfheal on b4286ea7-6cd8-4931-ac7b-b5a979dca17b
[2023-09-26 06:50:29.284335] I [MSGID: 108026] [afr-self-heald.c:333:afr_shd_selfheal] 3-log-replicate-0: entry: path gfid:a91e162b-e352-451e-8b40-2d15982e4748, gfid: a91e162b-e352-451e-8b40-2d15982e4748
[2023-09-26 06:50:29.289615] I [MSGID: 108026] [afr-self-heald.c:424:afr_shd_index_heal] 3-log-replicate-0: got entry: 00000000-0000-0000-0000-000000000001 from log-client-0
[2023-09-26 06:50:29.289942] I [MSGID: 108026] [afr-self-heald.c:333:afr_shd_selfheal] 3-log-replicate-0: entry: path /, gfid: 00000000-0000-0000-0000-000000000001
[2023-09-26 06:50:29.292540] I [MSGID: 108026] [afr-self-heal-entry.c:905:afr_selfheal_entry_do] 3-log-replicate-0: performing entry selfheal on 00000000-0000-0000-0000-000000000001
[2023-09-26 06:50:29.298955] I [MSGID: 108026] [afr-self-heal-common.c:1748:afr_log_selfheal] 4-mstate-replicate-0: Completed data selfheal on b4286ea7-6cd8-4931-ac7b-b5a979dca17b. sources=[0] sinks=1
[2023-09-26 06:50:29.300452] I [MSGID: 108026] [afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do] 4-mstate-replicate-0: performing metadata selfheal on b4286ea7-6cd8-4931-ac7b-b5a979dca17b
[2023-09-26 06:50:29.307964] I [MSGID: 108026] [afr-self-heal-common.c:1748:afr_log_selfheal] 4-mstate-replicate-0: Completed metadata selfheal on b4286ea7-6cd8-4931-ac7b-b5a979dca17b. sources=[0] sinks=1
[2023-09-26 06:50:29.308050] I [MSGID: 108026] [afr-self-heald.c:424:afr_shd_index_heal] 4-mstate-replicate-0: got entry: 00000000-0000-0000-0000-000000000001 from mstate-client-0
[2023-09-26 06:50:29.308807] I [MSGID: 108026] [afr-self-heald.c:333:afr_shd_selfheal] 4-mstate-replicate-0: entry: path /, gfid: 00000000-0000-0000-0000-000000000001
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2023-09-26 06:50:29
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 7.0
/lib64/libglusterfs.so.0(+0x2c254)[0x7f0912169254]
/lib64/libglusterfs.so.0(gf_print_trace+0x34a)[0x7f0912173f2a]
/lib64/libc.so.6(+0x3db70)[0x7f0911f18b70]
/lib64/libc.so.6(+0xd0f1f)[0x7f0911fabf1f]
/lib64/libc.so.6(__strftime_l+0x2d)[0x7f0911fae6ed]
/usr/lib64/glusterfs/7.0/xlator/cluster/replicate.so(+0x43fda)[0x7f090cafffda]
/usr/lib64/glusterfs/7.0/xlator/cluster/replicate.so(+0x45968)[0x7f090cb01968]
/usr/lib64/glusterfs/7.0/xlator/cluster/replicate.so(+0x52ded)[0x7f090cb0eded]
/usr/lib64/glusterfs/7.0/xlator/cluster/replicate.so(+0x53723)[0x7f090cb0f723]
/usr/lib64/glusterfs/7.0/xlator/cluster/replicate.so(+0x53a70)[0x7f090cb0fa70]
/usr/lib64/glusterfs/7.0/xlator/cluster/replicate.so(+0x4b705)[0x7f090cb07705]
/usr/lib64/glusterfs/7.0/xlator/cluster/replicate.so(+0x4b89c)[0x7f090cb0789c]
/usr/lib64/glusterfs/7.0/xlator/cluster/replicate.so(+0x549d7)[0x7f090cb109d7]
/usr/lib64/glusterfs/7.0/xlator/cluster/replicate.so(+0x54c7b)[0x7f090cb10c7b]
/lib64/libglusterfs.so.0(+0x932b1)[0x7f09121d02b1]
/lib64/libglusterfs.so.0(+0x69cbc)[0x7f09121a6cbc]
/lib64/libc.so.6(+0x54e50)[0x7f0911f2fe50]

**- Is there any crash ? Provide the backtrace and coredump
YES
coredump file attached, and the list the back trace as the below:
backtrace:
Stack trace of thread 40893:
#0 0x00007f1fb9e4ef1f __strftime_internal (libc.so.6 + 0xd0f1f)
#1 0x00007f1fb9e516ed __strftime_l (libc.so.6 + 0xd36ed)
#2 0x00007f1fb44c2fda afr_mark_split_brain_source_sinks_by_policy (replicate.so + 0x43fda)
#3 0x00007f1fb44c4968 afr_mark_split_brain_source_sinks (replicate.so + 0x45968)
#4 0x00007f1fb44d1ded __afr_selfheal_metadata_finalize_source (replicate.so + 0x52ded)
#5 0x00007f1fb44d2723 __afr_selfheal_metadata_prepare (replicate.so + 0x53723)
#6 0x00007f1fb44eb1ea afr_selfheal_locked_metadata_inspect (replicate.so + 0x6c1ea)
#7 0x00007f1fb44ebc06 afr_selfheal_locked_inspect (replicate.so + 0x6cc06)
#8 0x00007f1fb44ebd7e afr_get_heal_info (replicate.so + 0x6cd7e)
#9 0x00007f1fb449e6ff afr_getxattr (replicate.so + 0x1f6ff)
#10 0x00007f1fba17b5ba syncop_getxattr (libglusterfs.so.0 + 0x725ba)
#11 0x000055606791d786 glfsh_process_entries (glfsheal + 0x5786)
#12 0x000055606791e772 glfsh_crawl_directory.isra.0 (glfsheal + 0x6772)
#13 0x000055606791ea97 glfsh_print_pending_heals_type (glfsheal + 0x6a97)
#14 0x000055606791ecff glfsh_print_pending_heals (glfsheal + 0x6cff)
#15 0x000055606791ee79 glfsh_gather_heal_info (glfsheal + 0x6e79)
#16 0x000055606791c372 main (glfsheal + 0x4372)
#17 0x00007f1fb9da5b4a __libc_start_call_main (libc.so.6 + 0x27b4a)
#18 0x00007f1fb9da5c0b __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x27c0b)
#19 0x000055606791c3b5 _start (glfsheal + 0x43b5)

            Stack trace of thread 40894:
            #0  0x00007f1fb9e54293 clock_nanosleep@GLIBC_2.2.5 (libc.so.6 + 0xd6293)
            #1  0x00007f1fb9e58d37 __nanosleep (libc.so.6 + 0xdad37)
            #2  0x00007f1fb9e58c63 sleep (libc.so.6 + 0xdac63)
            #3  0x00007f1fba15f63b pool_sweeper (libglusterfs.so.0 + 0x5663b)
            #4  0x00007f1fb9e0a886 start_thread (libc.so.6 + 0x8c886)
            #5  0x00007f1fb9e906e0 __clone3 (libc.so.6 + 0x1126e0)

            Stack trace of thread 40895:
            #0  0x00007f1fb9e07189 __futex_abstimed_wait_common (libc.so.6 + 0x89189)
            #1  0x00007f1fb9e09e62 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x8be62)
            #2  0x00007f1fba1759c8 syncenv_task (libglusterfs.so.0 + 0x6c9c8)
            #3  0x00007f1fba176845 syncenv_processor (libglusterfs.so.0 + 0x6d845)
            #4  0x00007f1fb9e0a886 start_thread (libc.so.6 + 0x8c886)
            #5  0x00007f1fb9e906e0 __clone3 (libc.so.6 + 0x1126e0)

            Stack trace of thread 40896:
            #0  0x00007f1fb9e07189 __futex_abstimed_wait_common (libc.so.6 + 0x89189)
            #1  0x00007f1fb9e09e62 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x8be62)
            #2  0x00007f1fba1759c8 syncenv_task (libglusterfs.so.0 + 0x6c9c8)
            #3  0x00007f1fba176845 syncenv_processor (libglusterfs.so.0 + 0x6d845)
            #4  0x00007f1fb9e0a886 start_thread (libc.so.6 + 0x8c886)
            #5  0x00007f1fb9e906e0 __clone3 (libc.so.6 + 0x1126e0)

            Stack trace of thread 40898:
            #0  0x00007f1fb9e07189 __futex_abstimed_wait_common (libc.so.6 + 0x89189)
            #1  0x00007f1fb9e09e62 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x8be62)

Additional info:

- The operating system / glusterfs version: 7.0.1
after checking the latest source code, the issue should be existed in latest version also.

Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration

core.glfsheal.0.d3696c8cc7b54594aa8d2c0e0a347230.51734.1695707401000000.zip

The text was updated successfully, but these errors were encountered:

glfsheal encounter a SIGSEGV in __strftime_interna called from afr_mark_split_brain_source_sinks_by_policy Root cause: mis-compare between the int and unisgned int Solution: convert the compare between 2 ints Fixes: gluster#4239 Change-Id: If6a356db60298da39a48c7979abdfbac03521aa7

glfsheal encounter a SIGSEGV in __strftime_interna called from afr_mark_split_brain_source_sinks_by_policy Root cause: ctime is negative Solution: change ctime to 0 when ctime is negative before strftime Fixes: gluster#4239 Change-Id: If6a356db60298da39a48c7979abdfbac03521aa7

GeorgeLjz linked a pull request Oct 16, 2023 that will close this issue

core: glfsheal encounter a SIGSEGV in __strftime_internal #4240

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

glfsheal coredump occasionally #4239

glfsheal coredump occasionally #4239

GeorgeLjz commented Oct 16, 2023

- The output of the `gluster volume status` command:
Status of volume: log
Gluster process TCP Port RDMA Port Online Pid

Task Status of Volume log

glfsheal coredump occasionally #4239

glfsheal coredump occasionally #4239

Comments

GeorgeLjz commented Oct 16, 2023

- The output of the gluster volume status command: Status of volume: log Gluster process TCP Port RDMA Port Online Pid

Task Status of Volume log

- The output of the `gluster volume status` command:
Status of volume: log
Gluster process TCP Port RDMA Port Online Pid