Bug 216908 - General Protection Fault in btrfs_file_llseek
Summary: General Protection Fault in btrfs_file_llseek
Status: RESOLVED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: btrfs (show other bugs)
Hardware: All Linux
: P1 high
Assignee: BTRFS virtual assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-01-10 13:58 UTC by Matthias Schoepfer
Modified: 2023-01-20 12:10 UTC (History)
3 users (show)

See Also:
Kernel Version: 6.1.4
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Matthias Schoepfer 2023-01-10 13:58:06 UTC
When creating a large file (i.e. mkfs.ext4 within yocto embedded linux task, which means, 8+GB file), mkfs.ext4 will report a segfault and I get a general protection fault, the system becomes more or less unstable after this. 

I can reproduce this 100%, when I do the same with 6.0.6 kernel, it works fine. 

Here is the Kernel dump:
Jan 10 14:12:38 michelle kernel: BTRFS warning (device nvme0n1p5): bad eb member end: ptr 0x3fea start 2704543268864 member offset 16383 size 8
Jan 10 14:12:38 michelle kernel: general protection fault, probably for non-canonical address 0x85d8740000000: 0000 [#1] PREEMPT SMP
Jan 10 14:12:38 michelle kernel: CPU: 21 PID: 2143606 Comm: mkfs.ext4.real Tainted: P           O    T  6.1.4-gentoo #2
Jan 10 14:12:38 michelle kernel: Hardware name: Micro-Star International Co., Ltd. MS-7C37/X570-A PRO (MS-7C37), BIOS H.70 01/09/2020
Jan 10 14:12:38 michelle kernel: RIP: 0010:btrfs_get_64+0xe7/0x100
Jan 10 14:12:38 michelle kernel: Code: 40 08 48 2b 15 b2 3a 15 01 48 8d 0c 04 48 c1 fa 06 48 c1 e2 0c 48 03 15 af 3a 15 01 81 eb f8 0f 00 00 74 12 31 c0 89 c6 ff c0 <0f> b6 3c 32 40 88 3c 3>
Jan 10 14:12:38 michelle kernel: RSP: 0018:ffffb2d4ca4c3dd0 EFLAGS: 00010202
Jan 10 14:12:38 michelle kernel: RAX: 0000000000000001 RBX: 0000000000000007 RCX: ffffb2d4ca4c3dd9
Jan 10 14:12:38 michelle kernel: RDX: 00085d8740000000 RSI: 0000000000000000 RDI: 000000000000000a
Jan 10 14:12:38 michelle kernel: RBP: ffff96fbbfd9c600 R08: 0000000000000001 R09: 00000000ffffdfff
Jan 10 14:12:38 michelle kernel: R10: ffffffff94a3a700 R11: ffffffff94aea700 R12: 0000000000000003
Jan 10 14:12:38 michelle kernel: R13: ffff96fbbfd9c600 R14: 0000000000000003 R15: 0000000000003fea
Jan 10 14:12:38 michelle kernel: FS:  00007f6ac90e5780(0000) GS:ffff97071ed40000(0000) knlGS:0000000000000000
Jan 10 14:12:38 michelle kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 10 14:12:38 michelle kernel: CR2: 0000557c6f245760 CR3: 000000019a64a000 CR4: 0000000000350ee0
Jan 10 14:12:38 michelle kernel: Call Trace:
Jan 10 14:12:38 michelle kernel:  <TASK>
Jan 10 14:12:38 michelle kernel:  btrfs_file_llseek+0x269/0x670
Jan 10 14:12:38 michelle kernel:  ksys_lseek+0x61/0xa0
Jan 10 14:12:38 michelle kernel:  do_syscall_64+0x56/0x80
Jan 10 14:12:38 michelle kernel:  entry_SYSCALL_64_after_hwframe+0x46/0xb0
Jan 10 14:12:38 michelle kernel: RIP: 0033:0x7f6ac91e4d3b
Jan 10 14:12:38 michelle kernel: Code: ff ff c3 0f 1f 40 00 48 8b 15 e1 90 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb ba 0f 1f 00 f3 0f 1e fa b8 08 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 0>
Jan 10 14:12:38 michelle kernel: RSP: 002b:00007fff68e04038 EFLAGS: 00000297 ORIG_RAX: 0000000000000008
Jan 10 14:12:38 michelle kernel: RAX: ffffffffffffffda RBX: 000056367d4ffea0 RCX: 00007f6ac91e4d3b
Jan 10 14:12:38 michelle kernel: RDX: 0000000000000003 RSI: 0000000000000000 RDI: 0000000000000004
Jan 10 14:12:38 michelle kernel: RBP: 0000000000000004 R08: 00007f6ac92bef90 R09: 000056367d4f6160
Jan 10 14:12:38 michelle kernel: R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000000
Jan 10 14:12:38 michelle kernel: R13: 000056367d4c0eb0 R14: 000056367d50d620 R15: 000056367d4ebab0
Jan 10 14:12:38 michelle kernel:  </TASK>
Jan 10 14:12:38 michelle kernel: Modules linked in: xt_CHECKSUM xt_MASQUERADE nvidia_drm(PO) nvidia_modeset(PO) ip6table_nat iptable_nat bpfilter nvidia(PO) uvcvideo videobuf2_vmalloc video>
Jan 10 14:12:38 michelle kernel: ---[ end trace 0000000000000000 ]---
Jan 10 14:12:38 michelle kernel: RIP: 0010:btrfs_get_64+0xe7/0x100
Jan 10 14:12:38 michelle kernel: Code: 40 08 48 2b 15 b2 3a 15 01 48 8d 0c 04 48 c1 fa 06 48 c1 e2 0c 48 03 15 af 3a 15 01 81 eb f8 0f 00 00 74 12 31 c0 89 c6 ff c0 <0f> b6 3c 32 40 88 3c 3>
Jan 10 14:12:38 michelle kernel: RSP: 0018:ffffb2d4ca4c3dd0 EFLAGS: 00010202
Jan 10 14:12:38 michelle kernel: RAX: 0000000000000001 RBX: 0000000000000007 RCX: ffffb2d4ca4c3dd9
Jan 10 14:12:38 michelle kernel: RDX: 00085d8740000000 RSI: 0000000000000000 RDI: 000000000000000a
Jan 10 14:12:38 michelle kernel: RBP: ffff96fbbfd9c600 R08: 0000000000000001 R09: 00000000ffffdfff
Jan 10 14:12:38 michelle kernel: R10: ffffffff94a3a700 R11: ffffffff94aea700 R12: 0000000000000003
Jan 10 14:12:38 michelle kernel: R13: ffff96fbbfd9c600 R14: 0000000000000003 R15: 0000000000003fea
Jan 10 14:12:38 michelle kernel: FS:  00007f6ac90e5780(0000) GS:ffff97071ed40000(0000) knlGS:0000000000000000
Jan 10 14:12:38 michelle kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 10 14:12:38 michelle kernel: CR2: 0000557c6f245760 CR3: 000000019a64a000 CR4: 0000000000350ee0
Comment 1 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-01-11 10:59:56 UTC
Just to ensure nothing went sideways during testing: did you really see this with 6.1.4? A patch for a problem that looks quite similar to yours was merged for that version: 
https://lore.kernel.org/all/20230104160512.620453792@linuxfoundation.org/
Comment 2 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-01-11 11:02:11 UTC
(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #1)
> Just to ensure nothing went sideways during testing

[And yes, I see that "6.1.4-gentoo" in the backtrace, but that "gentoo" made me wonder if it's patches or something]
Comment 3 Matthias Schoepfer 2023-01-12 07:27:24 UTC
Well, I did not check each and every line of the kernel, but it should be 6.1.4 with very minor patches to kconfig. 

I do not think the patch addresses the same problem just from the stack traces. But I might be mistaken.

I can try with a vanilla kernel directly from git. Can you tell me which version you want me to test? 

I will try to do a minimal test case as well.
Comment 4 dianlujitao 2023-01-17 07:17:20 UTC
Hello, I have a similar problem when compiling AOSP on Arch w/ 6.1.6-zen kernel:

Jan 17 12:35:47 arch-pc kernel: perf: interrupt took too long (2514 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
Jan 17 12:49:42 arch-pc kernel: BTRFS warning (device nvme0n1p1): bad eb member end: ptr 0x3fe9 start 791064428544 member offset 16382 size 8
Jan 17 12:49:42 arch-pc kernel: general protection fault, probably for non-canonical address 0x3292e80000000: 0000 [#1] PREEMPT SMP PTI
Jan 17 12:49:42 arch-pc kernel: CPU: 2 PID: 393424 Comm: e2fsdroid Tainted: G        W  OE      6.1.6-zen1-1-zen #1 a9f1d40d38a4e5cc84569c1bc8cda8fa4a251102
Jan 17 12:49:42 arch-pc kernel: Hardware name: Dell Inc. OptiPlex 7050/062KRH, BIOS 1.22.1 09/15/2022
Jan 17 12:49:42 arch-pc kernel: RIP: 0010:btrfs_get_64+0xdc/0x120 [btrfs]
Jan 17 12:49:42 arch-pc kernel: Code: 4a 8b 44 e5 78 48 2b 05 f2 4d fe e4 48 c1 f8 06 48 c1 e0 0c 48 03 05 f3 4d fe e4 81 eb f8 0f 00 00 74 13 31 d2 89 d6 83 c2 01 <0f> b6 3c 30 40 88 3c 31 39 da 72 ef 48 8b 44 24 08 48 8b 54 24 10
Jan 17 12:49:42 arch-pc kernel: RSP: 0018:ffffa37297dc7d60 EFLAGS: 00010202
Jan 17 12:49:42 arch-pc kernel: RAX: 0003292e80000000 RBX: 0000000000000006 RCX: ffffa37297dc7d6a
Jan 17 12:49:42 arch-pc kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000001
Jan 17 12:49:42 arch-pc kernel: RBP: ffff8c0f888cf800 R08: 0000000000000002 R09: 00000000ffffffea
Jan 17 12:49:42 arch-pc kernel: R10: ffffffffa785b840 R11: 00000000fffff000 R12: 0000000000000003
Jan 17 12:49:42 arch-pc kernel: R13: 0000000000003fe9 R14: 0000000000001000 R15: 0000000000000000
Jan 17 12:49:42 arch-pc kernel: FS:  00007f06fbc6c740(0000) GS:ffff8c1a8dc80000(0000) knlGS:0000000000000000
Jan 17 12:49:42 arch-pc kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 17 12:49:42 arch-pc kernel: CR2: 00007f06fbf7ee98 CR3: 000000043a35a002 CR4: 00000000003706e0
Jan 17 12:49:42 arch-pc kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 17 12:49:42 arch-pc kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jan 17 12:49:42 arch-pc kernel: Call Trace:
Jan 17 12:49:42 arch-pc kernel:  <TASK>
Jan 17 12:49:42 arch-pc kernel:  btrfs_file_llseek+0x36c/0x830 [btrfs 5f77724550ea3d487f82dd40a49fdd783c0cb897]
Jan 17 12:49:42 arch-pc kernel:  ? __x64_sys_newfstat+0x16f/0x1c0
Jan 17 12:49:42 arch-pc kernel:  __x64_sys_lseek+0x76/0xc0
Jan 17 12:49:42 arch-pc kernel:  do_syscall_64+0x5c/0x90
Jan 17 12:49:42 arch-pc kernel:  ? syscall_exit_to_user_mode+0x2c/0x1d0
Jan 17 12:49:42 arch-pc kernel:  ? do_syscall_64+0x6b/0x90
Jan 17 12:49:42 arch-pc kernel:  ? do_syscall_64+0x6b/0x90
Jan 17 12:49:42 arch-pc kernel:  ? exc_page_fault+0x74/0x170
Jan 17 12:49:42 arch-pc kernel:  entry_SYSCALL_64_after_hwframe+0x63/0xcd
Jan 17 12:49:42 arch-pc kernel: RIP: 0033:0x7f06fbd6614b
Jan 17 12:49:42 arch-pc kernel: Code: ff ff c3 0f 1f 40 00 48 8b 15 39 0c 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb ba 0f 1f 00 f3 0f 1e fa b8 08 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 09 0c 0e 00 f7 d8
Jan 17 12:49:42 arch-pc kernel: RSP: 002b:00007ffcf9a756e8 EFLAGS: 00000293 ORIG_RAX: 0000000000000008
Jan 17 12:49:42 arch-pc kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f06fbd6614b
Jan 17 12:49:42 arch-pc kernel: RDX: 0000000000000003 RSI: 0000000000000000 RDI: 0000000000000004
Jan 17 12:49:42 arch-pc kernel: RBP: 00007ffcf9a75800 R08: 0000000000001000 R09: 00007f06fbe48220
Jan 17 12:49:42 arch-pc kernel: R10: 00005584072bb390 R11: 0000000000000293 R12: 00005584072c1e50
Jan 17 12:49:42 arch-pc kernel: R13: 0000000000000004 R14: 000000007f2bb746 R15: 00005584072ae510
Jan 17 12:49:42 arch-pc kernel:  </TASK>
Jan 17 12:49:42 arch-pc kernel: Modules linked in: tcp_diag inet_diag xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc overlay snd_seq_dummy snd_hrtimer snd_seq snd_seq_device rfkill vmnet(OE) intel_rapl_msr intel_rapl_common intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm snd_ctl_led irqbypass crct10dif_pclmul crc32_pclmul snd_hda_codec_realtek snd_hda_codec_generic polyval_clmulni polyval_generic snd_hda_codec_hdmi gf128mul ghash_clmulni_intel sha512_ssse3 snd_hda_intel snd_intel_dspcfg aesni_intel snd_intel_sdw_acpi crypto_simd cryptd snd_hda_codec iTCO_wdt mei_wdt snd_hda_core mei_hdcp intel_pmc_bxt mei_pxp dell_wmi vfat snd_hwdep rapl iTCO_vendor_support ee1004 ledtrig_audio dell_smbios dell_wmi_aio e1000e intel_cstate fat snd_pcm mei_me intel_wmi_thunderbolt dcdbas dell_wmi_descriptor wmi_bmof mei pcspkr sparse_keymap snd_timer intel_lpss_pci
Jan 17 12:49:42 arch-pc kernel:  intel_uncore i2c_i801 snd i2c_smbus intel_lpss idma64 soundcore mousedev joydev acpi_pad mac_hid vmmon(OE) vmw_vmci v4l2loopback(OE) videodev mc dm_multipath dm_mod i2c_dev sg crypto_user fuse ip_tables x_tables usbhid btrfs i915 blake2b_generic libcrc32c crc32c_generic nvme xhci_pci sr_mod xor nvme_core nvme_common crc32c_intel intel_gtt xhci_pci_renesas cdrom raid6_pq amdgpu gpu_sched drm_buddy video wmi drm_ttm_helper ttm drm_display_helper cec
Jan 17 12:49:42 arch-pc kernel: ---[ end trace 0000000000000000 ]---
Jan 17 12:49:42 arch-pc kernel: RIP: 0010:btrfs_get_64+0xdc/0x120 [btrfs]
Jan 17 12:49:42 arch-pc kernel: Code: 4a 8b 44 e5 78 48 2b 05 f2 4d fe e4 48 c1 f8 06 48 c1 e0 0c 48 03 05 f3 4d fe e4 81 eb f8 0f 00 00 74 13 31 d2 89 d6 83 c2 01 <0f> b6 3c 30 40 88 3c 31 39 da 72 ef 48 8b 44 24 08 48 8b 54 24 10
Jan 17 12:49:42 arch-pc kernel: RSP: 0018:ffffa37297dc7d60 EFLAGS: 00010202
Jan 17 12:49:42 arch-pc kernel: RAX: 0003292e80000000 RBX: 0000000000000006 RCX: ffffa37297dc7d6a
Jan 17 12:49:42 arch-pc kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000001
Jan 17 12:49:42 arch-pc kernel: RBP: ffff8c0f888cf800 R08: 0000000000000002 R09: 00000000ffffffea
Jan 17 12:49:42 arch-pc kernel: R10: ffffffffa785b840 R11: 00000000fffff000 R12: 0000000000000003
Jan 17 12:49:42 arch-pc kernel: R13: 0000000000003fe9 R14: 0000000000001000 R15: 0000000000000000
Jan 17 12:49:42 arch-pc kernel: FS:  00007f06fbc6c740(0000) GS:ffff8c1a8dc80000(0000) knlGS:0000000000000000
Jan 17 12:49:42 arch-pc kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 17 12:49:42 arch-pc kernel: CR2: 00007f06fbf7ee98 CR3: 000000043a35a002 CR4: 00000000003706e0
Jan 17 12:49:42 arch-pc kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 17 12:49:42 arch-pc kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

According to "Comm: e2fsdroid", it should be creating the system image.

It was definitely working well in the past, but I can't tell which exact kernel update introduced the issue.
Comment 5 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-01-17 07:36:19 UTC
Not my area of expertise, hence I can't tell you if it'S the same or a different problem. But FWIW, a fix for the initial problem was posted here:

https://lore.kernel.org/all/CAL3q7H5XUr2=kLEV192yU6cZakX_diS5+WRLq7LHkGPUOAZZZw@mail.gmail.com/

You might want to try that and if that doesn't help submit you a separate report.
Comment 6 dianlujitao 2023-01-17 09:30:22 UTC
(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #5)
> Not my area of expertise, hence I can't tell you if it'S the same or a
> different problem. But FWIW, a fix for the initial problem was posted here:
> 
> https://lore.kernel.org/all/
> CAL3q7H5XUr2=kLEV192yU6cZakX_diS5+WRLq7LHkGPUOAZZZw@mail.gmail.com/
> 
> You might want to try that and if that doesn't help submit you a separate
> report.

Thanks for the info. That patch seems to fix my issue.
Comment 7 David Sterba 2023-01-20 12:10:27 UTC
Thanks for the report, testing. The fix is in Linus' tree and will appear in stable soon.

Note You need to log in before you can comment on or make changes to this bug.