Bug 209217

Summary: gfs2: cpu soft lockup on filesystem write in 4.19 kernel
Product: File System Reporter: Daniel Craig (daniel.craig)
Component: OtherAssignee: fs_other
Status: NEW ---    
Severity: normal CC: carnil
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.19.132 Subsystem:
Regression: Yes Bisected commit-id:

Description Daniel Craig 2020-09-10 04:38:07 UTC
Writing to a gfs2 filesystem fails and results in a soft lockup of the machine on kernel version 4.19.132. This is on a debian system and the previous kernel (4.19.118) was working fine, so it appears that there has been a regression in one of the intermediate kernel versions. I have also verified that the regression still exists in 4.19.144.

After reverting the gfs2 patches applied between versions 118 and 132, it appears that the responsible commit was introduced in 4.19.130 and is c91cffd0fd010c06d67f3a9a528b858ce28c60fb, referenced here:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c91cffd0fd010c06d67f3a9a528b858ce28c60fb

This problem appears to be independent of the underlying block device, I can reproduce it in both a single node (on a local block device) and multi node (on clustered LVM over DRBD) contexts.

The minimal series of commands that I can use to reproduce this is:

WILLS-# lvcreate -n gfs2test -L 1GB wills-vg
WILLS-# mkfs.gfs2 -j2 -p lock_dlm -t pacemaker:gfs2test
WILLS-# mount -t gfs2 /dev/wills-vg/gfs2text /mnt/tmp
WILLS-# echo "test" > /mnt/tmp/test.txt
Killed

This results in the following kernel trace:

Sep 10 12:00:56 wills kernel: [ 8279.874188] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
Sep 10 12:00:56 wills kernel: [ 8279.874266] PGD 0 P4D 0
Sep 10 12:00:56 wills kernel: [ 8279.874296] Oops: 0002 [#1] SMP NOPTI
Sep 10 12:00:56 wills kernel: [ 8279.874333] CPU: 16 PID: 7018 Comm: bash Not tainted 4.19.0-10-amd64 #1 Debian 4.19.132-1
Sep 10 12:00:56 wills kernel: [ 8279.874404] Hardware name: Dell Inc. PowerEdge R740xd/0WXD1Y, BIOS 2.6.4 04/09/2020
Sep 10 12:00:56 wills kernel: [ 8279.874493] RIP: 0010:gfs2_log_commit+0x104/0x400 [gfs2]
Sep 10 12:00:56 wills kernel: [ 8279.874542] Code: 60 4c 8d b3 dc 08 00 00 4c 89 f7 e8 f6 a0 8b dc 48 8b 55 70 48 8d 45 70 48 39 d0 74 29 49 8b 4c 24 78 48 8b 75 70 48 8b 55 78 <48> 89 4e 08 48 89 31 49 8d 4c 24 70 48 89 0a 49 89 54 24 78 48 89
Sep 10 12:00:56 wills kernel: [ 8279.874699] RSP: 0018:ffff9c798e6a3b30 EFLAGS: 00010282
Sep 10 12:00:56 wills kernel: [ 8279.874747] RAX: ffff8b14b842c070 RBX: ffff8b44b6ba5000 RCX: 0000000000000000
Sep 10 12:00:56 wills kernel: [ 8279.874809] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8b44b6ba58dc
Sep 10 12:00:56 wills kernel: [ 8279.874871] RBP: ffff8b14b842c000 R08: ffff8b14b34e73e0 R09: ffff8b147805d540
Sep 10 12:00:56 wills kernel: [ 8279.874933] R10: ffff8b44852f0000 R11: 0000000000000000 R12: ffff8b14b842c300
Sep 10 12:00:56 wills kernel: [ 8279.874994] R13: ffff8b44b6ba57c8 R14: ffff8b44b6ba58dc R15: ffff8b44b6ba5000
Sep 10 12:00:56 wills kernel: [ 8279.875057] FS:  00007ff38f5f0740(0000) GS:ffff8b14c0800000(0000) knlGS:0000000000000000
Sep 10 12:00:56 wills kernel: [ 8279.875127] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 10 12:00:56 wills kernel: [ 8279.875178] CR2: 0000000000000008 CR3: 0000002f08a98005 CR4: 00000000007606e0
Sep 10 12:00:56 wills kernel: [ 8279.875240] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 10 12:00:56 wills kernel: [ 8279.875302] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Sep 10 12:00:56 wills kernel: [ 8279.875363] PKRU: 55555554
Sep 10 12:00:56 wills kernel: [ 8279.875389] Call Trace:
Sep 10 12:00:56 wills kernel: [ 8279.875444]  gfs2_trans_end+0x7d/0x160 [gfs2]
Sep 10 12:00:56 wills kernel: [ 8279.875505]  gfs2_create_inode+0x742/0x1360 [gfs2]
Sep 10 12:00:56 wills kernel: [ 8279.875567]  ? gfs2_create_inode+0x101/0x1360 [gfs2]
Sep 10 12:00:56 wills kernel: [ 8279.875626]  ? __gfs2_lookup+0x12c/0x140 [gfs2]
Sep 10 12:00:56 wills kernel: [ 8279.875682]  gfs2_atomic_open+0x51/0xd0 [gfs2]
Sep 10 12:00:56 wills kernel: [ 8279.875730]  path_openat+0xebf/0x1480
Sep 10 12:00:56 wills kernel: [ 8279.877676]  ? flush_to_ldisc+0x20/0xc0
Sep 10 12:00:56 wills kernel: [ 8279.879593]  ? tty_mode_ioctl+0xf6/0x4c0
Sep 10 12:00:56 wills kernel: [ 8279.881496]  do_filp_open+0x93/0x100
Sep 10 12:00:56 wills kernel: [ 8279.883365]  ? __handle_mm_fault+0xb8b/0x1270
Sep 10 12:00:56 wills kernel: [ 8279.884682]  ? __check_object_size+0x162/0x173
Sep 10 12:00:56 wills kernel: [ 8279.885519]  do_sys_open+0x186/0x210
Sep 10 12:00:56 wills kernel: [ 8279.886354]  do_syscall_64+0x53/0x110
Sep 10 12:00:56 wills kernel: [ 8279.887179]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Sep 10 12:00:56 wills kernel: [ 8279.888002] RIP: 0033:0x7ff38f6dd1ae
Sep 10 12:00:56 wills kernel: [ 8279.888811] Code: 25 00 00 41 00 3d 00 00 41 00 74 48 48 8d 05 59 65 0d 00 8b 00 85 c0 75 69 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 0f 87 a6 00 00 00 48 8b 4c 24 28 64 48 33 0c 25
Sep 10 12:00:56 wills kernel: [ 8279.890493] RSP: 002b:00007ffe34c674f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
Sep 10 12:00:56 wills kernel: [ 8279.891354] RAX: ffffffffffffffda RBX: 0000555b60555ff0 RCX: 00007ff38f6dd1ae
Sep 10 12:00:56 wills kernel: [ 8279.892224] RDX: 0000000000000241 RSI: 0000555b60545020 RDI: 00000000ffffff9c
Sep 10 12:00:56 wills kernel: [ 8279.893104] RBP: 00007ffe34c675f0 R08: 0000000000000000 R09: 0000000000000020
Sep 10 12:00:56 wills kernel: [ 8279.893962] R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000000
Sep 10 12:00:56 wills kernel: [ 8279.894645] R13: 0000000000000003 R14: 0000000000000001 R15: 0000555b60545020
Sep 10 12:00:56 wills kernel: [ 8279.895104] Modules linked in: gfs2 dlm nf_tables nfnetlink sch_ingress mpt3sas raid_class scsi_transport_sas mptctl mptbase bonding intel_rapl openvswitch nsh nf_nat_ipv6 dell_rbu nf_nat_ipv4 nf_conncount nf_nat nf_conntrack skx_edac nfit nf_defrag_ipv6 libnvdimm nf_defrag_ipv4 x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif kvm irqbypass mgag200 crct10dif_pclmul dell_smbios nls_ascii crc32_pclmul ttm nls_cp437 vfat ghash_clmulni_intel fat intel_cstate wmi_bmof dell_wmi_descriptor dcdbas drm_kms_helper efi_pstore intel_uncore mei_me iTCO_wdt drm efivars pcspkr sg intel_rapl_perf iTCO_vendor_support mei wmi evdev acpi_power_meter button ipmi_si ipmi_devintf ipmi_msghandler vhost_net tun vhost tap drbd lru_cache libcrc32c configfs efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache
Sep 10 12:00:56 wills kernel: [ 8279.898761]  jbd2 crc32c_generic fscrypto ecb dm_mod mlx5_ib ib_uverbs sd_mod ib_core crc32c_intel mlx5_core aesni_intel aes_x86_64 ahci mlxfw crypto_simd cryptd glue_helper libahci devlink xhci_pci i40e xhci_hcd libata megaraid_sas usbcore igb scsi_mod lpc_ich i2c_algo_bit dca i2c_i801 mfd_core usb_common
Sep 10 12:00:56 wills kernel: [ 8279.900469] CR2: 0000000000000008
Sep 10 12:00:56 wills kernel: [ 8279.901020] ---[ end trace 5791762b34dbaf5b ]---
Comment 1 Salvatore Bonaccorso 2020-09-17 19:47:04 UTC
https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/commit/?id=df4ffe7cb0f3a3fee591f93ac085f4b6e7e59694 has been queued up for v4.19.y (and other stable affected series)