Bug 216908

Summary: General Protection Fault in btrfs_file_llseek
Product: File System Reporter: Matthias Schoepfer (matthias.schoepfer)
Component: btrfsAssignee: BTRFS virtual assignee (fs_btrfs)
Status: RESOLVED CODE_FIX    
Severity: high CC: dianlujitao, dsterba, regressions
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 6.1.4 Subsystem:
Regression: No Bisected commit-id:

Description Matthias Schoepfer 2023-01-10 13:58:06 UTC
When creating a large file (i.e. mkfs.ext4 within yocto embedded linux task, which means, 8+GB file), mkfs.ext4 will report a segfault and I get a general protection fault, the system becomes more or less unstable after this. 

I can reproduce this 100%, when I do the same with 6.0.6 kernel, it works fine. 

Here is the Kernel dump:
Jan 10 14:12:38 michelle kernel: BTRFS warning (device nvme0n1p5): bad eb member end: ptr 0x3fea start 2704543268864 member offset 16383 size 8
Jan 10 14:12:38 michelle kernel: general protection fault, probably for non-canonical address 0x85d8740000000: 0000 [#1] PREEMPT SMP
Jan 10 14:12:38 michelle kernel: CPU: 21 PID: 2143606 Comm: mkfs.ext4.real Tainted: P           O    T  6.1.4-gentoo #2
Jan 10 14:12:38 michelle kernel: Hardware name: Micro-Star International Co., Ltd. MS-7C37/X570-A PRO (MS-7C37), BIOS H.70 01/09/2020
Jan 10 14:12:38 michelle kernel: RIP: 0010:btrfs_get_64+0xe7/0x100
Jan 10 14:12:38 michelle kernel: Code: 40 08 48 2b 15 b2 3a 15 01 48 8d 0c 04 48 c1 fa 06 48 c1 e2 0c 48 03 15 af 3a 15 01 81 eb f8 0f 00 00 74 12 31 c0 89 c6 ff c0 <0f> b6 3c 32 40 88 3c 3>
Jan 10 14:12:38 michelle kernel: RSP: 0018:ffffb2d4ca4c3dd0 EFLAGS: 00010202
Jan 10 14:12:38 michelle kernel: RAX: 0000000000000001 RBX: 0000000000000007 RCX: ffffb2d4ca4c3dd9
Jan 10 14:12:38 michelle kernel: RDX: 00085d8740000000 RSI: 0000000000000000 RDI: 000000000000000a
Jan 10 14:12:38 michelle kernel: RBP: ffff96fbbfd9c600 R08: 0000000000000001 R09: 00000000ffffdfff
Jan 10 14:12:38 michelle kernel: R10: ffffffff94a3a700 R11: ffffffff94aea700 R12: 0000000000000003
Jan 10 14:12:38 michelle kernel: R13: ffff96fbbfd9c600 R14: 0000000000000003 R15: 0000000000003fea
Jan 10 14:12:38 michelle kernel: FS:  00007f6ac90e5780(0000) GS:ffff97071ed40000(0000) knlGS:0000000000000000
Jan 10 14:12:38 michelle kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 10 14:12:38 michelle kernel: CR2: 0000557c6f245760 CR3: 000000019a64a000 CR4: 0000000000350ee0
Jan 10 14:12:38 michelle kernel: Call Trace:
Jan 10 14:12:38 michelle kernel:  <TASK>
Jan 10 14:12:38 michelle kernel:  btrfs_file_llseek+0x269/0x670
Jan 10 14:12:38 michelle kernel:  ksys_lseek+0x61/0xa0
Jan 10 14:12:38 michelle kernel:  do_syscall_64+0x56/0x80
Jan 10 14:12:38 michelle kernel:  entry_SYSCALL_64_after_hwframe+0x46/0xb0
Jan 10 14:12:38 michelle kernel: RIP: 0033:0x7f6ac91e4d3b
Jan 10 14:12:38 michelle kernel: Code: ff ff c3 0f 1f 40 00 48 8b 15 e1 90 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb ba 0f 1f 00 f3 0f 1e fa b8 08 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 0>
Jan 10 14:12:38 michelle kernel: RSP: 002b:00007fff68e04038 EFLAGS: 00000297 ORIG_RAX: 0000000000000008
Jan 10 14:12:38 michelle kernel: RAX: ffffffffffffffda RBX: 000056367d4ffea0 RCX: 00007f6ac91e4d3b
Jan 10 14:12:38 michelle kernel: RDX: 0000000000000003 RSI: 0000000000000000 RDI: 0000000000000004
Jan 10 14:12:38 michelle kernel: RBP: 0000000000000004 R08: 00007f6ac92bef90 R09: 000056367d4f6160
Jan 10 14:12:38 michelle kernel: R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000000
Jan 10 14:12:38 michelle kernel: R13: 000056367d4c0eb0 R14: 000056367d50d620 R15: 000056367d4ebab0
Jan 10 14:12:38 michelle kernel:  </TASK>
Jan 10 14:12:38 michelle kernel: Modules linked in: xt_CHECKSUM xt_MASQUERADE nvidia_drm(PO) nvidia_modeset(PO) ip6table_nat iptable_nat bpfilter nvidia(PO) uvcvideo videobuf2_vmalloc video>
Jan 10 14:12:38 michelle kernel: ---[ end trace 0000000000000000 ]---
Jan 10 14:12:38 michelle kernel: RIP: 0010:btrfs_get_64+0xe7/0x100
Jan 10 14:12:38 michelle kernel: Code: 40 08 48 2b 15 b2 3a 15 01 48 8d 0c 04 48 c1 fa 06 48 c1 e2 0c 48 03 15 af 3a 15 01 81 eb f8 0f 00 00 74 12 31 c0 89 c6 ff c0 <0f> b6 3c 32 40 88 3c 3>
Jan 10 14:12:38 michelle kernel: RSP: 0018:ffffb2d4ca4c3dd0 EFLAGS: 00010202
Jan 10 14:12:38 michelle kernel: RAX: 0000000000000001 RBX: 0000000000000007 RCX: ffffb2d4ca4c3dd9
Jan 10 14:12:38 michelle kernel: RDX: 00085d8740000000 RSI: 0000000000000000 RDI: 000000000000000a
Jan 10 14:12:38 michelle kernel: RBP: ffff96fbbfd9c600 R08: 0000000000000001 R09: 00000000ffffdfff
Jan 10 14:12:38 michelle kernel: R10: ffffffff94a3a700 R11: ffffffff94aea700 R12: 0000000000000003
Jan 10 14:12:38 michelle kernel: R13: ffff96fbbfd9c600 R14: 0000000000000003 R15: 0000000000003fea
Jan 10 14:12:38 michelle kernel: FS:  00007f6ac90e5780(0000) GS:ffff97071ed40000(0000) knlGS:0000000000000000
Jan 10 14:12:38 michelle kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 10 14:12:38 michelle kernel: CR2: 0000557c6f245760 CR3: 000000019a64a000 CR4: 0000000000350ee0
Comment 1 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-01-11 10:59:56 UTC
Just to ensure nothing went sideways during testing: did you really see this with 6.1.4? A patch for a problem that looks quite similar to yours was merged for that version: 
https://lore.kernel.org/all/20230104160512.620453792@linuxfoundation.org/
Comment 2 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-01-11 11:02:11 UTC
(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #1)
> Just to ensure nothing went sideways during testing

[And yes, I see that "6.1.4-gentoo" in the backtrace, but that "gentoo" made me wonder if it's patches or something]
Comment 3 Matthias Schoepfer 2023-01-12 07:27:24 UTC
Well, I did not check each and every line of the kernel, but it should be 6.1.4 with very minor patches to kconfig. 

I do not think the patch addresses the same problem just from the stack traces. But I might be mistaken.

I can try with a vanilla kernel directly from git. Can you tell me which version you want me to test? 

I will try to do a minimal test case as well.
Comment 4 dianlujitao 2023-01-17 07:17:20 UTC
Hello, I have a similar problem when compiling AOSP on Arch w/ 6.1.6-zen kernel:

Jan 17 12:35:47 arch-pc kernel: perf: interrupt took too long (2514 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
Jan 17 12:49:42 arch-pc kernel: BTRFS warning (device nvme0n1p1): bad eb member end: ptr 0x3fe9 start 791064428544 member offset 16382 size 8
Jan 17 12:49:42 arch-pc kernel: general protection fault, probably for non-canonical address 0x3292e80000000: 0000 [#1] PREEMPT SMP PTI
Jan 17 12:49:42 arch-pc kernel: CPU: 2 PID: 393424 Comm: e2fsdroid Tainted: G        W  OE      6.1.6-zen1-1-zen #1 a9f1d40d38a4e5cc84569c1bc8cda8fa4a251102
Jan 17 12:49:42 arch-pc kernel: Hardware name: Dell Inc. OptiPlex 7050/062KRH, BIOS 1.22.1 09/15/2022
Jan 17 12:49:42 arch-pc kernel: RIP: 0010:btrfs_get_64+0xdc/0x120 [btrfs]
Jan 17 12:49:42 arch-pc kernel: Code: 4a 8b 44 e5 78 48 2b 05 f2 4d fe e4 48 c1 f8 06 48 c1 e0 0c 48 03 05 f3 4d fe e4 81 eb f8 0f 00 00 74 13 31 d2 89 d6 83 c2 01 <0f> b6 3c 30 40 88 3c 31 39 da 72 ef 48 8b 44 24 08 48 8b 54 24 10
Jan 17 12:49:42 arch-pc kernel: RSP: 0018:ffffa37297dc7d60 EFLAGS: 00010202
Jan 17 12:49:42 arch-pc kernel: RAX: 0003292e80000000 RBX: 0000000000000006 RCX: ffffa37297dc7d6a
Jan 17 12:49:42 arch-pc kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000001
Jan 17 12:49:42 arch-pc kernel: RBP: ffff8c0f888cf800 R08: 0000000000000002 R09: 00000000ffffffea
Jan 17 12:49:42 arch-pc kernel: R10: ffffffffa785b840 R11: 00000000fffff000 R12: 0000000000000003
Jan 17 12:49:42 arch-pc kernel: R13: 0000000000003fe9 R14: 0000000000001000 R15: 0000000000000000
Jan 17 12:49:42 arch-pc kernel: FS:  00007f06fbc6c740(0000) GS:ffff8c1a8dc80000(0000) knlGS:0000000000000000
Jan 17 12:49:42 arch-pc kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 17 12:49:42 arch-pc kernel: CR2: 00007f06fbf7ee98 CR3: 000000043a35a002 CR4: 00000000003706e0
Jan 17 12:49:42 arch-pc kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 17 12:49:42 arch-pc kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jan 17 12:49:42 arch-pc kernel: Call Trace:
Jan 17 12:49:42 arch-pc kernel:  <TASK>
Jan 17 12:49:42 arch-pc kernel:  btrfs_file_llseek+0x36c/0x830 [btrfs 5f77724550ea3d487f82dd40a49fdd783c0cb897]
Jan 17 12:49:42 arch-pc kernel:  ? __x64_sys_newfstat+0x16f/0x1c0
Jan 17 12:49:42 arch-pc kernel:  __x64_sys_lseek+0x76/0xc0
Jan 17 12:49:42 arch-pc kernel:  do_syscall_64+0x5c/0x90
Jan 17 12:49:42 arch-pc kernel:  ? syscall_exit_to_user_mode+0x2c/0x1d0
Jan 17 12:49:42 arch-pc kernel:  ? do_syscall_64+0x6b/0x90
Jan 17 12:49:42 arch-pc kernel:  ? do_syscall_64+0x6b/0x90
Jan 17 12:49:42 arch-pc kernel:  ? exc_page_fault+0x74/0x170
Jan 17 12:49:42 arch-pc kernel:  entry_SYSCALL_64_after_hwframe+0x63/0xcd
Jan 17 12:49:42 arch-pc kernel: RIP: 0033:0x7f06fbd6614b
Jan 17 12:49:42 arch-pc kernel: Code: ff ff c3 0f 1f 40 00 48 8b 15 39 0c 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb ba 0f 1f 00 f3 0f 1e fa b8 08 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 09 0c 0e 00 f7 d8
Jan 17 12:49:42 arch-pc kernel: RSP: 002b:00007ffcf9a756e8 EFLAGS: 00000293 ORIG_RAX: 0000000000000008
Jan 17 12:49:42 arch-pc kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f06fbd6614b
Jan 17 12:49:42 arch-pc kernel: RDX: 0000000000000003 RSI: 0000000000000000 RDI: 0000000000000004
Jan 17 12:49:42 arch-pc kernel: RBP: 00007ffcf9a75800 R08: 0000000000001000 R09: 00007f06fbe48220
Jan 17 12:49:42 arch-pc kernel: R10: 00005584072bb390 R11: 0000000000000293 R12: 00005584072c1e50
Jan 17 12:49:42 arch-pc kernel: R13: 0000000000000004 R14: 000000007f2bb746 R15: 00005584072ae510
Jan 17 12:49:42 arch-pc kernel:  </TASK>
Jan 17 12:49:42 arch-pc kernel: Modules linked in: tcp_diag inet_diag xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc overlay snd_seq_dummy snd_hrtimer snd_seq snd_seq_device rfkill vmnet(OE) intel_rapl_msr intel_rapl_common intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm snd_ctl_led irqbypass crct10dif_pclmul crc32_pclmul snd_hda_codec_realtek snd_hda_codec_generic polyval_clmulni polyval_generic snd_hda_codec_hdmi gf128mul ghash_clmulni_intel sha512_ssse3 snd_hda_intel snd_intel_dspcfg aesni_intel snd_intel_sdw_acpi crypto_simd cryptd snd_hda_codec iTCO_wdt mei_wdt snd_hda_core mei_hdcp intel_pmc_bxt mei_pxp dell_wmi vfat snd_hwdep rapl iTCO_vendor_support ee1004 ledtrig_audio dell_smbios dell_wmi_aio e1000e intel_cstate fat snd_pcm mei_me intel_wmi_thunderbolt dcdbas dell_wmi_descriptor wmi_bmof mei pcspkr sparse_keymap snd_timer intel_lpss_pci
Jan 17 12:49:42 arch-pc kernel:  intel_uncore i2c_i801 snd i2c_smbus intel_lpss idma64 soundcore mousedev joydev acpi_pad mac_hid vmmon(OE) vmw_vmci v4l2loopback(OE) videodev mc dm_multipath dm_mod i2c_dev sg crypto_user fuse ip_tables x_tables usbhid btrfs i915 blake2b_generic libcrc32c crc32c_generic nvme xhci_pci sr_mod xor nvme_core nvme_common crc32c_intel intel_gtt xhci_pci_renesas cdrom raid6_pq amdgpu gpu_sched drm_buddy video wmi drm_ttm_helper ttm drm_display_helper cec
Jan 17 12:49:42 arch-pc kernel: ---[ end trace 0000000000000000 ]---
Jan 17 12:49:42 arch-pc kernel: RIP: 0010:btrfs_get_64+0xdc/0x120 [btrfs]
Jan 17 12:49:42 arch-pc kernel: Code: 4a 8b 44 e5 78 48 2b 05 f2 4d fe e4 48 c1 f8 06 48 c1 e0 0c 48 03 05 f3 4d fe e4 81 eb f8 0f 00 00 74 13 31 d2 89 d6 83 c2 01 <0f> b6 3c 30 40 88 3c 31 39 da 72 ef 48 8b 44 24 08 48 8b 54 24 10
Jan 17 12:49:42 arch-pc kernel: RSP: 0018:ffffa37297dc7d60 EFLAGS: 00010202
Jan 17 12:49:42 arch-pc kernel: RAX: 0003292e80000000 RBX: 0000000000000006 RCX: ffffa37297dc7d6a
Jan 17 12:49:42 arch-pc kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000001
Jan 17 12:49:42 arch-pc kernel: RBP: ffff8c0f888cf800 R08: 0000000000000002 R09: 00000000ffffffea
Jan 17 12:49:42 arch-pc kernel: R10: ffffffffa785b840 R11: 00000000fffff000 R12: 0000000000000003
Jan 17 12:49:42 arch-pc kernel: R13: 0000000000003fe9 R14: 0000000000001000 R15: 0000000000000000
Jan 17 12:49:42 arch-pc kernel: FS:  00007f06fbc6c740(0000) GS:ffff8c1a8dc80000(0000) knlGS:0000000000000000
Jan 17 12:49:42 arch-pc kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 17 12:49:42 arch-pc kernel: CR2: 00007f06fbf7ee98 CR3: 000000043a35a002 CR4: 00000000003706e0
Jan 17 12:49:42 arch-pc kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 17 12:49:42 arch-pc kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

According to "Comm: e2fsdroid", it should be creating the system image.

It was definitely working well in the past, but I can't tell which exact kernel update introduced the issue.
Comment 5 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-01-17 07:36:19 UTC
Not my area of expertise, hence I can't tell you if it'S the same or a different problem. But FWIW, a fix for the initial problem was posted here:

https://lore.kernel.org/all/CAL3q7H5XUr2=kLEV192yU6cZakX_diS5+WRLq7LHkGPUOAZZZw@mail.gmail.com/

You might want to try that and if that doesn't help submit you a separate report.
Comment 6 dianlujitao 2023-01-17 09:30:22 UTC
(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #5)
> Not my area of expertise, hence I can't tell you if it'S the same or a
> different problem. But FWIW, a fix for the initial problem was posted here:
> 
> https://lore.kernel.org/all/
> CAL3q7H5XUr2=kLEV192yU6cZakX_diS5+WRLq7LHkGPUOAZZZw@mail.gmail.com/
> 
> You might want to try that and if that doesn't help submit you a separate
> report.

Thanks for the info. That patch seems to fix my issue.
Comment 7 David Sterba 2023-01-20 12:10:27 UTC
Thanks for the report, testing. The fix is in Linus' tree and will appear in stable soon.