Bug 201119

Summary: btrfs: unable to handle kernel NULL pointer dereference at (null); Workqueue: btrfs-delalloc btrfs_delalloc_helper
Product: File System Reporter: Tomas Thiemel (thiemel)
Component: btrfsAssignee: BTRFS virtual assignee (fs_btrfs)
Status: RESOLVED CODE_FIX    
Severity: low CC: dsterba
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 4.14.65-gentoo Subsystem:
Regression: No Bisected commit-id:
Attachments: kernel build config

Description Tomas Thiemel 2018-09-13 23:07:40 UTC
- not sure if HW error ("flipped bit" by cosmic rays or bad capacitor) or kernel "bug"
- from time to time (~3 months) I have an "issue" with the HDD(s) (not sure if HDD or disk controller), so I have to turn-off of the machine completely to "fix" the problem ("plain reboot" does not help)
- it's storage/backup/archive server build from desktop components with Mo-Bo Z77MA-G45 (6x SATA) and Adaptec PM8018 SAS HBA (8x SATA) 
- not sure if already fixed in 4.14.67 (Upstream commit 665d4953cde6d9e75c62a07ec8f4f8fd7d396ade) or in 4.14.68 (Upstream commit 3c4276936f6fbe52884b4ea4e6cc120b890a0f9f)

#### 
The bug/error happened during re-balance (SINGLE->RAID1) on disks/array with scrub errors (device sdk1+sdl1+sdg1+sdc1) and during rsync to another RAID1 array on sdj2+sdd2

###
Here is the status of the last scrub of the array (where the SINGLE->RAID1 balance was running due to found errors):
scrub device /dev/sdc1 (id 1) history
        scrub started at Sat Sep  1 05:30:01 2018 and finished after 07:14:14
        total bytes scrubbed: 2.53TiB with 0 errors
scrub device /dev/sdg1 (id 2) history
        scrub started at Sat Sep  1 05:30:01 2018 and finished after 08:47:31
        total bytes scrubbed: 2.62TiB with 3870 errors
        error details: csum=3870
        corrected errors: 0, uncorrectable errors: 3870, unverified errors: 0
scrub device /dev/sdk1 (id 3)
        no stats available
scrub device /dev/sdl1 (id 4)
        no stats available

####
dmesg ([50430.147857] ~ "Sep 13 22:37:19 GMT+02"):
...
[50430.147857] BTRFS info (device sdk1): found 18 extents
[50432.606050] BTRFS info (device sdk1): found 18 extents
[50433.327590] BTRFS info (device sdk1): relocating block group 2453529427968 flags data
[50452.813703] BTRFS info (device sdk1): found 6 extents
[50454.391633] BTRFS info (device sdk1): found 6 extents
[50455.092628] BTRFS info (device sdk1): relocating block group 2452455686144 flags data
[50472.275049] BTRFS error (device sdj2): error inheriting props for ino 1148883 (root 422): -28
[50474.949589] BTRFS info (device sdk1): found 10 extents
[50476.671731] BTRFS error (device sdj2): error inheriting props for ino 1151102 (root 422): -28
[50476.707708] BTRFS error (device sdj2): error inheriting props for ino 1151124 (root 422): -28
[50477.315080] BTRFS error (device sdj2): error inheriting props for ino 1151543 (root 422): -28
[50477.488868] BTRFS info (device sdk1): found 10 extents
[50478.315102] BTRFS info (device sdk1): relocating block group 2451381944320 flags data
[50480.000302] BTRFS error (device sdj2): error inheriting props for ino 1152972 (root 422): -28
[50497.843425] BTRFS error (device sdj2): error inheriting props for ino 1156653 (root 422): -28
[50497.843487] BTRFS error (device sdj2): error inheriting props for ino 1156654 (root 422): -28
[50497.919928] BTRFS error (device sdj2): error inheriting props for ino 1156659 (root 422): -28
[50497.986041] BTRFS info (device sdk1): found 7 extents
[50500.445610] BTRFS info (device sdk1): found 7 extents
[50501.258398] BTRFS info (device sdk1): relocating block group 2450308202496 flags data
[50519.997881] BTRFS info (device sdk1): found 9 extents
[50522.341004] BTRFS info (device sdk1): found 9 extents
[50523.075431] BTRFS info (device sdk1): relocating block group 2449234460672 flags data
[50542.299346] BTRFS info (device sdk1): found 9 extents
[50544.843798] BTRFS info (device sdk1): found 9 extents
[50545.578961] BTRFS info (device sdk1): relocating block group 2448160718848 flags data
[50565.642902] BTRFS info (device sdk1): found 4 extents
[50568.017562] BTRFS info (device sdk1): found 4 extents
[50568.700943] BTRFS info (device sdk1): relocating block group 2447086977024 flags data
[50588.049332] BTRFS info (device sdk1): found 6 extents
[50589.864583] BTRFS info (device sdk1): found 6 extents
[50590.766450] BTRFS info (device sdk1): relocating block group 2446013235200 flags data
[50610.352386] BTRFS info (device sdk1): found 9 extents
[50612.630374] BTRFS error (device sdj2): error inheriting props for ino 1172792 (root 422): -28
[50612.641376] BTRFS error (device sdj2): error inheriting props for ino 1172810 (root 422): -28
[50612.730649] BTRFS info (device sdk1): found 9 extents
[50613.410105] BTRFS info (device sdk1): relocating block group 2444939493376 flags data
[50632.707013] BTRFS info (device sdk1): found 7 extents
[50633.363423] BTRFS error (device sdj2): error inheriting props for ino 1175549 (root 422): -28
[50634.563046] BTRFS info (device sdk1): found 7 extents
[50635.442431] BTRFS info (device sdk1): relocating block group 2443865751552 flags data
[50654.761730] BTRFS info (device sdk1): found 6 extents
[50656.951044] BTRFS info (device sdk1): found 6 extents
[50657.641735] BTRFS info (device sdk1): relocating block group 2442792009728 flags data
...
[51679.229784] BTRFS info (device sdk1): found 17 extents
[51680.990832] BTRFS info (device sdk1): found 17 extents
[51681.381163] BTRFS info (device sdk1): relocating block group 2391252402176 flags data
[51701.086051] BTRFS info (device sdk1): found 14 extents
[51703.012439] BTRFS info (device sdk1): found 14 extents
[51703.519054] BTRFS info (device sdk1): relocating block group 2390178660352 flags data
[51721.640384] BTRFS error (device sdj2): error inheriting props for ino 1233654 (root 422): -28
[51722.471260] BTRFS info (device sdk1): found 6 extents
[51724.256136] BTRFS info (device sdk1): found 6 extents
[51724.773847] BTRFS info (device sdk1): relocating block group 2389104918528 flags data
...
[53134.936723] BTRFS info (device sdk1): found 7 extents
[53136.736990] BTRFS info (device sdk1): found 7 extents
[53137.382786] BTRFS info (device sdk1): relocating block group 2319311699968 flags data
[53156.993860] BTRFS info (device sdk1): found 7 extents
[53159.125566] BTRFS info (device sdk1): found 7 extents
[53159.683308] BTRFS info (device sdk1): relocating block group 2318237958144 flags data
[53169.931362] BUG: unable to handle kernel NULL pointer dereference at           (null)
[53169.931440] IP: compress_file_range.constprop.27+0x687/0x730
[53169.931485] PGD 0 P4D 0
[53169.931513] Oops: 0000 [#1] SMP
[53169.931542] Modules linked in: xt_conntrack xt_tcpudp iptable_filter vhost_net vhost tap ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables tun nfsd auth_rpcgss oid_registry lockd grace sunrpc binfmt_misc nls_iso8859_1 vfat fat f71882fg x86_pkg_temp_thermal coretemp dummy kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel i915 ghash_clmulni_intel iTCO_wdt iTCO_vendor_support pcbc intel_gtt aesni_intel i2c_algo_bit aes_x86_64 drm_kms_helper crypto_simd syscopyarea glue_helper sysfillrect cryptd sysimgblt fb_sys_fops video xhci_pci drm ehci_pci i2c_i801 pm80xx xhci_hcd ehci_hcd r8169 intel_smartconnect thermal backlight usbcore fan mei_me mii evdev lpc_ich i2c_core mei ie31200_edac mfd_core usb_common
[53169.937633] CPU: 6 PID: 20319 Comm: kworker/u16:7 Not tainted 4.14.65-gentoo-xeon #2
[53169.937635] Hardware name: MSI MS-7759/Z77MA-G45 (MS-7759), BIOS V1.9 03/01/2013
[53169.937640] Workqueue: btrfs-delalloc btrfs_delalloc_helper
[53169.937642] task: ffff8803b70b9940 task.stack: ffffc900033bc000
[53169.937647] RIP: 0010:compress_file_range.constprop.27+0x687/0x730
[53169.937648] RSP: 0018:ffffc900033bfd38 EFLAGS: 00010202
[53169.937651] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000003
[53169.937652] RDX: 000000000000000f RSI: 0000000000000003 RDI: ffff88045c8a5c00
[53169.937653] RBP: 0000000000000577 R08: 0000000000000001 R09: 0000000000000001
[53169.937654] R10: ffffc900033bfcc0 R11: 0000000000000000 R12: ffff88006ef14bc0
[53169.937655] R13: 0000000000000000 R14: 0000000000001000 R15: 0000000000000002
[53169.937657] FS:  0000000000000000(0000) GS:ffff8807fe800000(0000) knlGS:0000000000000000
[53169.937658] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[53169.937660] CR2: 0000000000000000 CR3: 0000000006011006 CR4: 00000000001626e0
[53169.937660] Call Trace:
[53169.937668]  async_cow_start+0x3e/0x80
[53169.937670]  btrfs_worker_helper+0xc8/0x1e0
[53169.937676]  process_one_work+0x1d1/0x450
[53169.937679]  ? process_one_work+0x16e/0x450
[53169.937683]  worker_thread+0x35/0x380
[53169.937686]  ? process_one_work+0x450/0x450
[53169.937689]  kthread+0x11c/0x140
[53169.937691]  ? kthread_create_on_node+0x60/0x60
[53169.937695]  ret_from_fork+0x1f/0x30
[53169.937699] Code: 48 8b 54 24 38 31 f6 4c 89 e7 e8 25 d2 fd ff 5a 31 db 31 c0 48 83 7c 24 60 00 75 0f eb 34 83 c3 01 48 63 c3 48 3b 44 24 60 73 27 <49> 8b 7c c5 00 48 83 7f 08 00 75 28 48 8b 47 20 48 8d 50 ff a8
[53169.937751] RIP: compress_file_range.constprop.27+0x687/0x730 RSP: ffffc900033bfd38
[53169.937752] CR2: 0000000000000000
[53169.937754] ---[ end trace 95e0b30a92e3f83c ]---
[53169.937756] BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:34
[53169.937757] in_atomic(): 0, irqs_disabled(): 1, pid: 20319, name: kworker/u16:7
[53169.937758] INFO: lockdep is turned off.
[53169.937760] CPU: 6 PID: 20319 Comm: kworker/u16:7 Tainted: G      D         4.14.65-gentoo-xeon #2
[53169.937761] Hardware name: MSI MS-7759/Z77MA-G45 (MS-7759), BIOS V1.9 03/01/2013
[53169.937764] Workqueue: btrfs-delalloc btrfs_delalloc_helper
[53169.937765] Call Trace:
[53169.937771]  dump_stack+0x46/0x65
[53169.937774]  ___might_sleep+0xcf/0x110
[53169.937777]  exit_signals+0x2b/0x210
[53169.937780]  do_exit+0xab/0xb50
[53169.937783]  ? process_one_work+0x450/0x450
[53169.937785]  ? kthread+0x11c/0x140
[53169.937788]  rewind_stack_do_exit+0x17/0x20
[55402.743959] BTRFS info (device sdk1): found 8 extents
[55404.655113] BTRFS info (device sdk1): found 8 extents
Comment 1 Tomas Thiemel 2018-09-13 23:11:35 UTC
Created attachment 278521 [details]
kernel build config

the config I used to build the kernel
Comment 2 Tomas Thiemel 2018-09-13 23:16:53 UTC
Mount options (/etc/fstab) of both arrays (same for both of them):
defaults,noatime,compress=lzo,skip_balance,commit=120,autodefrag,nofail,noauto
Comment 3 David Sterba 2019-05-21 12:25:07 UTC
Thanks for the report. Fixed by 3527a018c00e5d "Btrfs: fix null pointer dereference on compressed write path error", in 4.20.