Created attachment 303112 [details] dmesg error After updating kernel past 5.17 (checked in 5.19, 6.06), deluge torrent client began to hang after 1-4 hours of runtime, (when under heavy load - thousands of files mmapped and read at 20+MB/s) with following message in dmesg: BUG: kernel NULL pointer dereference, address: 0000000000000096 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 15 PID: 8263 Comm: Disk Not tainted 5.17.0-rc4_ap-00165-g56a4d67c264e-dirty #36 Hardware name: Micro-Star International Co., Ltd. MS-7C35/MEG X570 UNIFY (MS-7C35), BIOS A.C3 03/15/2022 RIP: 0010:__filemap_get_folio+0x9e/0x350 Code: 10 e8 46 06 68 00 48 89 c3 48 3d 02 04 00 00 74 e2 48 3d 06 04 00 00 74 da 48 85 c0 0f 84 3e 02 00 00 a8 01 0f 85 40 02 00 00 <8b> 40 34 85 c0 74 c2 8d 50 01 f0 0f b1 53 34 75 f2 48 8b 54 24 28 RSP: 0000:ffffbe1044ad3cb0 EFLAGS: 00010246 RAX: 0000000000000062 RBX: 0000000000000062 RCX: 0000000000000002 RDX: 000000000000001c RSI: ffffbe1044ad3cc0 RDI: ffff9fca83239ff0 RBP: 0000000000000000 R08: ffffbe1044ad3d40 R09: 0000000000000000 R10: ffffffffffffffc0 R11: 0000000000000000 R12: 0000000000000000 R13: ffff9fcbee9efa78 R14: 000000000004285e R15: fff000003fffffff FS: 00007f0a763fc640(0000) GS:ffff9fd23edc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000096 CR3: 0000000122c60000 CR4: 0000000000750ee0 PKRU: 55555554 Call Trace: <TASK> filemap_fault+0x63/0x820 __do_fault+0x2f/0x80 __handle_mm_fault+0xe46/0x15c0 ? __hrtimer_init+0xd0/0xd0 handle_mm_fault+0xbc/0x280 do_user_addr_fault+0x1bc/0x640 exc_page_fault+0x60/0x140 ? asm_exc_page_fault+0x8/0x30 asm_exc_page_fault+0x1e/0x30 RIP: 0033:0x7f0aae557789 Code: 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 83 fa 20 72 27 <c5> fe 6f 06 48 83 fa 40 0f 87 a9 00 00 00 c5 fe 6f 4c 16 e0 c5 fe RSP: 002b:00007f0a763fb7e8 EFLAGS: 00010202 RAX: 00007f0a5c070bc0 RBX: 0000000000000000 RCX: 00007f0a763fb990 RDX: 0000000000004000 RSI: 00007ef87d85e4d7 RDI: 00007f0a5c070bc0 RBP: 00007f0a763fb808 R08: 0000000000000006 R09: 0000000000000000 R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f09dccf30f0 R14: 000000000000001d R15: 00007f0a5c001480 </TASK> Modules linked in: overlay xt_addrtype amdgpu drm_ttm_helper ttm gpu_sched drm_kms_helper backlight iwlmvm syscopyarea mac80211 libarc4 sysfillrect sysimgblt fb_sys_fops iwlwifi i2c_piix4 cfg80211 k10temp fuse configfs efivarfs CR2: 0000000000000096 ---[ end trace 0000000000000000 ]--- RIP: 0010:__filemap_get_folio+0x9e/0x350 Code: 10 e8 46 06 68 00 48 89 c3 48 3d 02 04 00 00 74 e2 48 3d 06 04 00 00 74 da 48 85 c0 0f 84 3e 02 00 00 a8 01 0f 85 40 02 00 00 <8b> 40 34 85 c0 74 c2 8d 50 01 f0 0f b1 53 34 75 f2 48 8b 54 24 28 RSP: 0000:ffffbe1044ad3cb0 EFLAGS: 00010246 RAX: 0000000000000062 RBX: 0000000000000062 RCX: 0000000000000002 RDX: 000000000000001c RSI: ffffbe1044ad3cc0 RDI: ffff9fca83239ff0 RBP: 0000000000000000 R08: ffffbe1044ad3d40 R09: 0000000000000000 R10: ffffffffffffffc0 R11: 0000000000000000 R12: 0000000000000000 R13: ffff9fcbee9efa78 R14: 000000000004285e R15: fff000003fffffff FS: 00007f0a763fc640(0000) GS:ffff9fd23edc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000096 CR3: 0000000122c60000 CR4: 0000000000750ee0 PKRU: 55555554 i did a bisect: git bisect start git bisect bad be1a63daffdd152ba4c7b71ab9fec2e39259b42b git bisect good 8bb7eca972ad531c9b149c0a51ab43a417385813 git bisect good fee62ea772040a6b7d5d07d285dcf68f989fc81c git bisect bad dbe946287e0825f0e9cd4cbeacfcde9d9b2dd168 git bisect bad 25fd2d41b505d0640bdfe67aa77c549de2d3c18a git bisect bad 182966e1cd74ec0e326cd376de241803ee79741b git bisect good b080cee72ef355669cbc52ff55dc513d37433600 git bisect good 3fe2f7446f1e029b220f7f650df6d138f91651f2 git bisect bad d51b1b33c51d147b757f042b4d336603b699f362 git bisect good 3bf03b9a0839c9fb06927ae53ebd0f960b19d408 git bisect bad 6b1f86f8e9c7f9de7ca1cb987b2cf25e99b1ae3a git bisect good 4aed23a2f8aaaafad0232d3392afcf493c3c3df3 git bisect good ebf55c886eb7fc3c54d02ba1046f0ee38b81fc10 git bisect good d68eccad370665830e16e5c77611fde78cd749b3 git bisect good 3a3bae50af5d73fab5da20484029de77ca67bb2e git bisect good 1854bc6e2420472676c5c90d3d6b15f6cd640e40 git bisect bad 72e725887413f031fa72d27fea5795450bab1940 git bisect bad 4687fdbb805a92ce5a9f23042c436dc64fef8b77 git bisect bad 56a4d67c264e37014b8392cba9869c7fe904ed1e which identified commit 56a4d67 as the culprit --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3027,7 +3027,7 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf) ra->size = ra->ra_pages; ra->async_size = ra->ra_pages / 4; ractl._index = ra->start; - do_page_cache_ra(&ractl, ra->size, ra->async_size); + page_cache_ra_order(&ractl, ra, 0); return fpin; } I took a look at page_cache_ra_order and saw that it's behavior depends on MAX_PAGECACHE_ORDER and, subsequently, CONFIG_TRANSPARENT_HUGEPAGE. Then i tried disabling CONFIG_TRANSPARENT_HUGEPAGE and found that it indeed works around the issue for now. Hardware: https://linux-hardware.org/?probe=1a88842782
Created attachment 303113 [details] kernel config
Created attachment 303114 [details] same error in 6.0.2
Mikhail, what's the status here? Have you seen this reply: https://lore.kernel.org/lkml/Y2Pqhr1DFgHP1dsg@casper.infradead.org/
(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #3) > Mikhail, what's the status here? Have you seen this reply: > https://lore.kernel.org/lkml/Y2Pqhr1DFgHP1dsg@casper.infradead.org/ Yes, i did see Matthew's reply and applied his suggestion to my bisection, but it takes ~25 hours to verify each step (i do use bisect replay and stuff, just don't have much free machine time for this atm and bug isn't exactly deterministic). here's output of decode_stacktrace.sh of the last iteration of the bug, it differs from what Matthew posted: [93875.323165] Call Trace: [93875.323166] <TASK> [93875.323169] filemap_fault+0x62/0x8f0 [93875.323172] __do_fault+0x32/0x90 [93875.323174] __handle_mm_fault+0xe3d/0x1590 [93875.323177] handle_mm_fault+0xc0/0x290 [93875.323179] do_user_addr_fault+0x1c3/0x650 [93875.323182] exc_page_fault+0x62/0x140 [93875.323184] ? asm_exc_page_fault+0x8/0x30 [93875.323188] asm_exc_page_fault+0x1e/0x30 [93875.323189] RIP: 0033:0x7effd7157789 [93875.323190] Code: 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 83 fa 20 72 27 <c5> fe 6f 06 48 83 fa 40 0f 87 a9 00 00 00 c5 fe 6f 4c 16 e0 c5 fe All code ======== 0: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) 7: 00 00 00 00 b: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) 12: 00 00 00 00 16: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) 1d: 00 00 00 00 21: 48 89 f8 mov %rdi,%rax 24: 48 83 fa 20 cmp $0x20,%rdx 28: 72 27 jb 0x51 2a:* c5 fe 6f 06 vmovdqu (%rsi),%ymm0 <-- trapping instruction 2e: 48 83 fa 40 cmp $0x40,%rdx 32: 0f 87 a9 00 00 00 ja 0xe1 38: c5 fe 6f 4c 16 e0 vmovdqu -0x20(%rsi,%rdx,1),%ymm1 3e: c5 .byte 0xc5 3f: fe .byte 0xfe Code starting with the faulting instruction =========================================== 0: c5 fe 6f 06 vmovdqu (%rsi),%ymm0 4: 48 83 fa 40 cmp $0x40,%rdx 8: 0f 87 a9 00 00 00 ja 0xb7 e: c5 fe 6f 4c 16 e0 vmovdqu -0x20(%rsi,%rdx,1),%ymm1 14: c5 .byte 0xc5 15: fe .byte 0xfe but i lost the commit i caught it on. i will report back in a couple of days, when i'm certain that i've finished bisection
It is not necessary to recompile the kernel to disable CONFIG_TRANSPARENT_HUGEPAGE: https://docs.kernel.org/admin-guide/mm/transhuge.html https://access.redhat.com/solutions/46111 https://wiki.archlinux.org/title/Redis#Warning_about_Transparent_Huge_Pages_(THP) My machine: https://linux-hardware.org/?probe=ecafbb7acb I had the same problem with qBittorrent + libtorrent 2.x. I adopted as a temporary solution to recompile qBittorrent with libtorrent 1.x but since some time ago I have installed a custom kernel with "/sys/kernel/mm/transparent_hugepage/enabled" = "madvise" and "/sys/kernel/mm/transparent_hugepage/defrag" = "madvise". I will try again with qBittorrent + libtorrent 2.x.
i can't figure out how to use git send-email, so i will post this here instead > > > git bisect good 1854bc6e2420472676c5c90d3d6b15f6cd640e40 > I suspect this is where your bisection went astray. This should have > been bad and it led you to the wrong commit. so i've applied your suggestion and did some more bisecting and arrived at this: git bisect start git bisect bad be1a63daffdd152ba4c7b71ab9fec2e39259b42b git bisect good 8bb7eca972ad531c9b149c0a51ab43a417385813 git bisect good fee62ea772040a6b7d5d07d285dcf68f989fc81c git bisect bad dbe946287e0825f0e9cd4cbeacfcde9d9b2dd168 git bisect bad 25fd2d41b505d0640bdfe67aa77c549de2d3c18a git bisect bad 182966e1cd74ec0e326cd376de241803ee79741b git bisect good b080cee72ef355669cbc52ff55dc513d37433600 git bisect good 3fe2f7446f1e029b220f7f650df6d138f91651f2 git bisect bad d51b1b33c51d147b757f042b4d336603b699f362 git bisect good 3bf03b9a0839c9fb06927ae53ebd0f960b19d408 git bisect bad 6b1f86f8e9c7f9de7ca1cb987b2cf25e99b1ae3a git bisect good 4aed23a2f8aaaafad0232d3392afcf493c3c3df3 git bisect good ebf55c886eb7fc3c54d02ba1046f0ee38b81fc10 git bisect good d68eccad370665830e16e5c77611fde78cd749b3 git bisect good 3a3bae50af5d73fab5da20484029de77ca67bb2e git bisect bad 1854bc6e2420472676c5c90d3d6b15f6cd640e40 git bisect good 421f1ab48452af48b64e205de1caca3d1ba415f4 git bisect bad 793917d997df2e432f3e9ac126e4482d68256d01 git bisect good 18788cfa236967741b83db1035ab24539e2a21bb # first bad commit: [793917d997df2e432f3e9ac126e4482d68256d01] mm/readahead: Add large folio readahead i verified last two commits over a couple of days to be sure here's the output of scripts/decode_stacktrace.sh: RIP: 0010:__filemap_get_folio (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/atomic.h:29 /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1158 /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1183 /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-instrumented.h:608 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:238 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:247 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:280 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:313 /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1897 /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1949) Code: 10 e8 a6 05 68 00 48 89 c3 48 3d 02 04 00 00 74 e2 48 3d 06 04 00 00 74 da 48 85 c0 0f 84 3e 02 00 00 a8 01 0f 85 40 02 00 00 <8b> 40 34 85 c0 74 c2 8d 50 01 f0 0f b1 53 34 75 f2 48 8b 54 24 28 All code ======== 0: 10 e8 adc %ch,%al 2: a6 cmpsb %es:(%rdi),%ds:(%rsi) 3: 05 68 00 48 89 add $0x89480068,%eax 8: c3 ret 9: 48 3d 02 04 00 00 cmp $0x402,%rax f: 74 e2 je 0xfffffffffffffff3 11: 48 3d 06 04 00 00 cmp $0x406,%rax 17: 74 da je 0xfffffffffffffff3 19: 48 85 c0 test %rax,%rax 1c: 0f 84 3e 02 00 00 je 0x260 22: a8 01 test $0x1,%al 24: 0f 85 40 02 00 00 jne 0x26a 2a:* 8b 40 34 mov 0x34(%rax),%eax <-- trapping instruction 2d: 85 c0 test %eax,%eax 2f: 74 c2 je 0xfffffffffffffff3 31: 8d 50 01 lea 0x1(%rax),%edx 34: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) 39: 75 f2 jne 0x2d 3b: 48 8b 54 24 28 mov 0x28(%rsp),%rdx Code starting with the faulting instruction =========================================== 0: 8b 40 34 mov 0x34(%rax),%eax 3: 85 c0 test %eax,%eax 5: 74 c2 je 0xffffffffffffffc9 7: 8d 50 01 lea 0x1(%rax),%edx a: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) f: 75 f2 jne 0x3 11: 48 8b 54 24 28 mov 0x28(%rsp),%rdx RSP: 0000:ffffa39e84c2bcb0 EFLAGS: 00010246 RAX: 00000000000000c2 RBX: 00000000000000c2 RCX: 0000000000000002 RDX: 0000000000000034 RSI: ffffa39e84c2bcc0 RDI: ffff8c676acf8920 RBP: 0000000000000000 R08: ffffa39e84c2bd40 R09: 0000000000000000 R10: ffffffffffffffc0 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c675f8ceb78 R14: 00000000005a15b7 R15: fff000003fffffff FS: 00007f8628ffc640(0000) GS:ffff8c6dbea80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000f6 CR3: 0000000166dda000 CR4: 0000000000750ee0 PKRU: 55555554 Call Trace: <TASK> filemap_fault (/home/reinhardt/dev-apps/kernel/linux/./include/linux/pagemap.h:531 /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:3105) __do_fault (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:3852) __handle_mm_fault (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4169 /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4297 /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4555 /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4690) handle_mm_fault (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4788) do_user_addr_fault (/home/reinhardt/dev-apps/kernel/linux/./include/linux/sched/signal.h:404 /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1399) exc_page_fault (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/irqflags.h:40 /home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/irqflags.h:75 /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1492 /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1540) ? asm_exc_page_fault (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/idtentry.h:568) asm_exc_page_fault (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/idtentry.h:568) RIP: 0033:0x7f863519b789 Code: 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 83 fa 20 72 27 <c5> fe 6f 06 48 83 fa 40 0f 87 a9 00 00 00 c5 fe 6f 4c 16 e0 c5 fe All code ======== 0: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) 7: 00 00 00 00 b: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) 12: 00 00 00 00 16: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) 1d: 00 00 00 00 21: 48 89 f8 mov %rdi,%rax 24: 48 83 fa 20 cmp $0x20,%rdx 28: 72 27 jb 0x51 2a:* c5 fe 6f 06 vmovdqu (%rsi),%ymm0 <-- trapping instruction 2e: 48 83 fa 40 cmp $0x40,%rdx 32: 0f 87 a9 00 00 00 ja 0xe1 38: c5 fe 6f 4c 16 e0 vmovdqu -0x20(%rsi,%rdx,1),%ymm1 3e: c5 .byte 0xc5 3f: fe .byte 0xfe Code starting with the faulting instruction =========================================== 0: c5 fe 6f 06 vmovdqu (%rsi),%ymm0 4: 48 83 fa 40 cmp $0x40,%rdx 8: 0f 87 a9 00 00 00 ja 0xb7 e: c5 fe 6f 4c 16 e0 vmovdqu -0x20(%rsi,%rdx,1),%ymm1 14: c5 .byte 0xc5 15: fe .byte 0xfe RSP: 002b:00007f8628ffb888 EFLAGS: 00010202 RAX: 00007f85f00405f0 RBX: 0000000000000000 RCX: 00007f8628ffba10 RDX: 0000000000004000 RSI: 00007f6b803572d5 RDI: 00007f85f00405f0 RBP: 00007f8628ffb8a8 R08: 000000006375fb7a R09: 0000000000000000 R10: 0000000000000008 R11: 0000000000000246 R12: 00007f85f000bc00 R13: 00007f8624002130 R14: 00000> > > git bisect good 1854bc6e2420472676c5c90d3d6b15f6cd640e40 > I suspect this is where your bisection went astray. This should have > been bad and it led you to the wrong commit. so i've applied your suggestion and did some more bisecting and arrived at this: git bisect start git bisect bad be1a63daffdd152ba4c7b71ab9fec2e39259b42b git bisect good 8bb7eca972ad531c9b149c0a51ab43a417385813 git bisect good fee62ea772040a6b7d5d07d285dcf68f989fc81c git bisect bad dbe946287e0825f0e9cd4cbeacfcde9d9b2dd168 git bisect bad 25fd2d41b505d0640bdfe67aa77c549de2d3c18a git bisect bad 182966e1cd74ec0e326cd376de241803ee79741b git bisect good b080cee72ef355669cbc52ff55dc513d37433600 git bisect good 3fe2f7446f1e029b220f7f650df6d138f91651f2 git bisect bad d51b1b33c51d147b757f042b4d336603b699f362 git bisect good 3bf03b9a0839c9fb06927ae53ebd0f960b19d408 git bisect bad 6b1f86f8e9c7f9de7ca1cb987b2cf25e99b1ae3a git bisect good 4aed23a2f8aaaafad0232d3392afcf493c3c3df3 git bisect good ebf55c886eb7fc3c54d02ba1046f0ee38b81fc10 git bisect good d68eccad370665830e16e5c77611fde78cd749b3 git bisect good 3a3bae50af5d73fab5da20484029de77ca67bb2e git bisect bad 1854bc6e2420472676c5c90d3d6b15f6cd640e40 git bisect good 421f1ab48452af48b64e205de1caca3d1ba415f4 git bisect bad 793917d997df2e432f3e9ac126e4482d68256d01 git bisect good 18788cfa236967741b83db1035ab24539e2a21bb # first bad commit: [793917d997df2e432f3e9ac126e4482d68256d01] mm/readahead: Add large folio readahead i verified last two commits over a couple of days to be sure here's the output of scripts/decode_stacktrace.sh: RIP: 0010:__filemap_get_folio (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/atomic.h:29 /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1158 /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1183 /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-instrumented.h:608 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:238 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:247 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:280 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:313 /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1897 /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1949) Code: 10 e8 a6 05 68 00 48 89 c3 48 3d 02 04 00 00 74 e2 48 3d 06 04 00 00 74 da 48 85 c0 0f 84 3e 02 00 00 a8 01 0f 85 40 02 00 00 <8b> 40 34 85 c0 74 c2 8d 50 01 f0 0f b1 53 34 75 f2 48 8b 54 24 28 All code ======== 0: 10 e8 adc %ch,%al 2: a6 cmpsb %es:(%rdi),%ds:(%rsi) 3: 05 68 00 48 89 add $0x89480068,%eax 8: c3 ret 9: 48 3d 02 04 00 00 cmp $0x402,%rax f: 74 e2 je 0xfffffffffffffff3 11: 48 3d 06 04 00 00 cmp $0x406,%rax 17: 74 da je 0xfffffffffffffff3 19: 48 85 c0 test %rax,%rax 1c: 0f 84 3e 02 00 00 je 0x260 22: a8 01 test $0x1,%al 24: 0f 85 40 02 00 00 jne 0x26a 2a:* 8b 40 34 mov 0x34(%rax),%eax <-- trapping instruction 2d: 85 c0 test %eax,%eax 2f: 74 c2 je 0xfffffffffffffff3 31: 8d 50 01 lea 0x1(%rax),%edx 34: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) 39: 75 f2 jne 0x2d 3b: 48 8b 54 24 28 mov 0x28(%rsp),%rdx Code starting with the faulting instruction =========================================== 0: 8b 40 34 mov 0x34(%rax),%eax 3: 85 c0 test %eax,%eax 5: 74 c2 je 0xffffffffffffffc9 7: 8d 50 01 lea 0x1(%rax),%edx a: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) f: 75 f2 jne 0x3 11: 48 8b 54 24 28 mov 0x28(%rsp),%rdx RSP: 0000:ffffa39e84c2bcb0 EFLAGS: 00010246 RAX: 00000000000000c2 RBX: 00000000000000c2 RCX: 0000000000000002 RDX: 0000000000000034 RSI: ffffa39e84c2bcc0 RDI: ffff8c676acf8920 RBP: 0000000000000000 R08: ffffa39e84c2bd40 R09: 0000000000000000 R10: ffffffffffffffc0 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c675f8ceb78 R14: 00000000005a15b7 R15: fff000003fffffff FS: 00007f8628ffc640(0000) GS:ffff8c6dbea80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000f6 CR3: 0000000166dda000 CR4: 0000000000750ee0 PKRU: 55555554 Call Trace: filemap_fault (/home/reinhardt/dev-apps/kernel/linux/./include/linux/pagemap.h:531 /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:3105) __do_fault (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:3852) __handle_mm_fault (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4169 /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4297 /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4555 /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4690) handle_mm_fault (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4788) do_user_addr_fault (/home/reinhardt/dev-apps/kernel/linux/./include/linux/sched/signal.h:404 /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1399) exc_page_fault (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/irqflags.h:40 /home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/irqflags.h:75 /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1492 /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1540) ? asm_exc_page_fault (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/idtentry.h:568) asm_exc_page_fault (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/idtentry.h:568) RIP: 0033:0x7f863519b789 Code: 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 83 fa 20 72 27 fe 6f 06 48 83 fa 40 0f 87 a9 00 00 00 c5 fe 6f 4c 16 e0 c5 fe All code ======== 0: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) 7: 00 00 00 00 b: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) 12: 00 00 00 00 16: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) 1d: 00 00 00 00 21: 48 89 f8 mov %rdi,%rax 24: 48 83 fa 20 cmp $0x20,%rdx 28: 72 27 jb 0x51 2a:* c5 fe 6f 06 vmovdqu (%rsi),%ymm0 <-- trapping instruction 2e: 48 83 fa 40 cmp $0x40,%rdx 32: 0f 87 a9 00 00 00 ja 0xe1 38: c5 fe 6f 4c 16 e0 vmovdqu -0x20(%rsi,%rdx,1),%ymm1 3e: c5 .byte 0xc5 3f: fe .byte 0xfe Code starting with the faulting instruction =========================================== 0: c5 fe 6f 06 vmovdqu (%rsi),%ymm0 4: 48 83 fa 40 cmp $0x40,%rdx 8: 0f 87 a9 00 00 00 ja 0xb7 e: c5 fe 6f 4c 16 e0 vmovdqu -0x20(%rsi,%rdx,1),%ymm1 14: c5 .byte 0xc5 15: fe .byte 0xfe RSP: 002b:00007f8628ffb888 EFLAGS: 00010202 RAX: 00007f85f00405f0 RBX: 0000000000000000 RCX: 00007f8628ffba10 RDX: 0000000000004000 RSI: 00007f6b803572d5 RDI: 00007f85f00405f0 RBP: 00007f8628ffb8a8 R08: 000000006375fb7a R09: 0000000000000000 R10: 0000000000000008 R11: 0000000000000246 R12: 00007f85f000bc00 R13: 00007f8624002130 R14: 00000005a15b72d5 R15: 0000000000004000 Modules linked in: overlay xt_addrtype amdgpu iwlmvm mac80211 libarc4 drm_ttm_helper ttm gpu_sched drm_kms_helper backlight syscopyarea sysfillrect iwlwifi sysimgblt fb_sys_fops i2c_piix4 cfg80211 k10temp fuse configfs efivarfs CR2: 00000000000000f6 ---[ end trace 0000000000000000 ]--- RIP: 0010:__filemap_get_folio (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/atomic.h:29 /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1158 /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1183 /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-instrumented.h:608 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:238 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:247 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:280 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:313 /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1897 /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1949) Code: 10 e8 a6 05 68 00 48 89 c3 48 3d 02 04 00 00 74 e2 48 3d 06 04 00 00 74 da 48 85 c0 0f 84 3e 02 00 00 a8 01 0f 85 40 02 00 00 <8b> 40 34 85 c0 74 c2 8d 50 01 f0 0f b1 53 34 75 f2 48 8b 54 24 28 All code ======== 0: 10 e8 adc %ch,%al 2: a6 cmpsb %es:(%rdi),%ds:(%rsi) 3: 05 68 00 48 89 add $0x89480068,%eax 8: c3 ret 9: 48 3d 02 04 00 00 cmp $0x402,%rax f: 74 e2 je 0xfffffffffffffff3 11: 48 3d 06 04 00 00 cmp $0x406,%rax 17: 74 da je 0xfffffffffffffff3 19: 48 85 c0 test %rax,%rax 1c: 0f 84 3e 02 00 00 je 0x260 22: a8 01 test $0x1,%al 24: 0f 85 40 02 00 00 jne 0x26a 2a:* 8b 40 34 mov 0x34(%rax),%eax <-- trapping instruction 2d: 85 c0 test %eax,%eax 2f: 74 c2 je 0xfffffffffffffff3 31: 8d 50 01 lea 0x1(%rax),%edx 34: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) 39: 75 f2 jne 0x2d 3b: 48 8b 54 24 28 mov 0x28(%rsp),%rdx Code starting with the faulting instruction =========================================== 0: 8b 40 34 mov 0x34(%rax),%eax 3: 85 c0 test %eax,%eax 5: 74 c2 je 0xffffffffffffffc9 7: 8d 50 01 lea 0x1(%rax),%edx a: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) f: 75 f2 jne 0x3 11: 48 8b 54 24 28 mov 0x28(%rsp),%rdx RSP: 0000:ffffa39e84c2bcb0 EFLAGS: 00010246 RAX: 00000000000000c2 RBX: 00000000000000c2 RCX: 0000000000000002 RDX: 0000000000000034 RSI: ffffa39e84c2bcc0 RDI: ffff8c676acf8920 RBP: 0000000000000000 R08: ffffa39e84c2bd40 R09: 0000000000000000 R10: ffffffffffffffc0 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c675f8ceb78 R14: 00000000005a15b7 R15: fff000003fffffff FS: 00007f8628ffc640(0000) GS:ffff8c6dbea80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000f6 CR3: 0000000166dda000 CR4: 0000000000750ee0 PKRU: 55555554 005a15b72d5 R15: 0000000000004000 </TASK> Modules linked in: overlay xt_addrtype amdgpu iwlmvm mac80211 libarc4 drm_ttm_helper ttm gpu_sched drm_kms_helper backlight syscopyarea sysfillrect iwlwifi sysimgblt fb_sys_fops i2c_piix4 cfg80211 k10temp fuse configfs efivarfs CR2: 00000000000000f6 ---[ end trace 0000000000000000 ]--- RIP: 0010:__filemap_get_folio (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/atomic.h:29 /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1158 /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1183 /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-instrumented.h:608 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:238 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:247 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:280 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:313 /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1897 /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1949) Code: 10 e8 a6 05 68 00 48 89 c3 48 3d 02 04 00 00 74 e2 48 3d 06 04 00 00 74 da 48 85 c0 0f 84 3e 02 00 00 a8 01 0f 85 40 02 00 00 <8b> 40 34 85 c0 74 c2 8d 50 01 f0 0f b1 53 34 75 f2 48 8b 54 24 28 All code ======== 0: 10 e8 adc %ch,%al 2: a6 cmpsb %es:(%rdi),%ds:(%rsi) 3: 05 68 00 48 89 add $0x89480068,%eax 8: c3 ret 9: 48 3d 02 04 00 00 cmp $0x402,%rax f: 74 e2 je 0xfffffffffffffff3 11: 48 3d 06 04 00 00 cmp $0x406,%rax 17: 74 da je 0xfffffffffffffff3 19: 48 85 c0 test %rax,%rax 1c: 0f 84 3e 02 00 00 je 0x260 22: a8 01 test $0x1,%al 24: 0f 85 40 02 00 00 jne 0x26a 2a:* 8b 40 34 mov 0x34(%rax),%eax <-- trapping instruction 2d: 85 c0 test %eax,%eax 2f: 74 c2 je 0xfffffffffffffff3 31: 8d 50 01 lea 0x1(%rax),%edx 34: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) 39: 75 f2 jne 0x2d 3b: 48 8b 54 24 28 mov 0x28(%rsp),%rdx Code starting with the faulting instruction =========================================== 0: 8b 40 34 mov 0x34(%rax),%eax 3: 85 c0 test %eax,%eax 5: 74 c2 je 0xffffffffffffffc9 7: 8d 50 01 lea 0x1(%rax),%edx a: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) f: 75 f2 jne 0x3 11: 48 8b 54 24 28 mov 0x28(%rsp),%rdx RSP: 0000:ffffa39e84c2bcb0 EFLAGS: 00010246 RAX: 00000000000000c2 RBX: 00000000000000c2 RCX: 0000000000000002 RDX: 0000000000000034 RSI: ffffa39e84c2bcc0 RDI: ffff8c676acf8920 RBP: 0000000000000000 R08: ffffa39e84c2bd40 R09: 0000000000000000 R10: ffffffffffffffc0 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c675f8ceb78 R14: 00000000005a15b7 R15: fff000003fffffff FS: 00007f8628ffc640(0000) GS:ffff8c6dbea80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000f6 CR3: 0000000166dda000 CR4: 0000000000750ee0 PKRU: 55555554
On Mon, 21 Nov 2022 18:34:34 +0000 bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=216646 > > --- Comment #6 from Mikhail Pletnev (mmp.dux@gmail.com) --- > i can't figure out how to use git send-email, so i will post this here > instead > > > > > git bisect good 1854bc6e2420472676c5c90d3d6b15f6cd640e40 > > > I suspect this is where your bisection went astray. This should have > > been bad and it led you to the wrong commit. > > so i've applied your suggestion and did some more bisecting and arrived at > this: Folks, thanks for continuing to work on this. But can I please please implore you to stop using bugzilla for this issue? We just don't use it and nobody looks at it so nobody knows what's going on. So please, let's continue to work this via email instead? A reply-to-all to this email will work just nicely. Thanks.
On 21.11.22 21:54, Andrew Morton wrote: > On Mon, 21 Nov 2022 18:34:34 +0000 bugzilla-daemon@kernel.org wrote: > >> https://bugzilla.kernel.org/show_bug.cgi?id=216646 >> >> --- Comment #6 from Mikhail Pletnev (mmp.dux@gmail.com) --- >> i can't figure out how to use git send-email, so i will post this here >> instead >> >>>>> git bisect good 1854bc6e2420472676c5c90d3d6b15f6cd640e40 >> >>> I suspect this is where your bisection went astray. This should have >>> been bad and it led you to the wrong commit. >> >> so i've applied your suggestion and did some more bisecting and arrived at >> this: That was 793917d997df ("mm/readahead: Add large folio readahead") which is also from Matthew. > Folks, thanks for continuing to work on this. Well, since then nothing happened. :-/ Or have I missed something? Matthew, do you still have this on your radar? Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight. #regzbot poke
On Fri, Dec 02, 2022 at 12:56:40PM +0100, Thorsten Leemhuis wrote: > On 21.11.22 21:54, Andrew Morton wrote: > > On Mon, 21 Nov 2022 18:34:34 +0000 bugzilla-daemon@kernel.org wrote: > > > >> https://bugzilla.kernel.org/show_bug.cgi?id=216646 > >> > >> --- Comment #6 from Mikhail Pletnev (mmp.dux@gmail.com) --- > >> i can't figure out how to use git send-email, so i will post this here > instead > >> > >>>>> git bisect good 1854bc6e2420472676c5c90d3d6b15f6cd640e40 > >> > >>> I suspect this is where your bisection went astray. This should have > >>> been bad and it led you to the wrong commit. > >> > >> so i've applied your suggestion and did some more bisecting and arrived at > >> this: > > That was 793917d997df ("mm/readahead: Add large folio readahead") which > is also from Matthew. > > > Folks, thanks for continuing to work on this. > > Well, since then nothing happened. :-/ Or have I missed something? > > Matthew, do you still have this on your radar? No, things tend to fall off my radar after a week or two of inactivity. Particularly since I went on holiday. Since I wasn't cc'd on the bug, the activity was completely invisible to me. Landing on 793917d997df makes a lot more sense. That's where we actually start using large folios. It doesn't really help narrow down the problem. I have an idea for what it might be; patch to try will follow. But I'll need feedback by email.
On Fri, 2 Dec 2022 16:58:42 +0000 Matthew Wilcox <willy@infradead.org> wrote: > On Fri, Dec 02, 2022 at 12:56:40PM +0100, Thorsten Leemhuis wrote: > > On 21.11.22 21:54, Andrew Morton wrote: > > > On Mon, 21 Nov 2022 18:34:34 +0000 bugzilla-daemon@kernel.org wrote: > > > > > >> https://bugzilla.kernel.org/show_bug.cgi?id=216646 > > >> > > >> --- Comment #6 from Mikhail Pletnev (mmp.dux@gmail.com) --- > > >> i can't figure out how to use git send-email, so i will post this here > instead > > >> > > >>>>> git bisect good 1854bc6e2420472676c5c90d3d6b15f6cd640e40 > > >> > > >>> I suspect this is where your bisection went astray. This should have > > >>> been bad and it led you to the wrong commit. > > >> > > >> so i've applied your suggestion and did some more bisecting and arrived > at > > >> this: > > > > That was 793917d997df ("mm/readahead: Add large folio readahead") which > > is also from Matthew. > > > > > Folks, thanks for continuing to work on this. > > > > Well, since then nothing happened. :-/ Or have I missed something? > > > > Matthew, do you still have this on your radar? > > No, things tend to fall off my radar after a week or two of inactivity. > Particularly since I went on holiday. Since I wasn't cc'd on the bug, > the activity was completely invisible to me. > > Landing on 793917d997df makes a lot more sense. That's where we > actually start using large folios. It doesn't really help narrow > down the problem. I have an idea for what it might be; patch to > try will follow. But I'll need feedback by email. Sorry for long absence, here's what i've ended up with (that commit you suspected): (also seems to fix amdgpu crashes i've been having with newer 5.18+ kernels) > > > git bisect good 1854bc6e2420472676c5c90d3d6b15f6cd640e40 > I suspect this is where your bisection went astray. This should have > been bad and it led you to the wrong commit. so i've applied your suggestion and did some more bisecting and arrived at this: git bisect start git bisect bad be1a63daffdd152ba4c7b71ab9fec2e39259b42b git bisect good 8bb7eca972ad531c9b149c0a51ab43a417385813 git bisect good fee62ea772040a6b7d5d07d285dcf68f989fc81c git bisect bad dbe946287e0825f0e9cd4cbeacfcde9d9b2dd168 git bisect bad 25fd2d41b505d0640bdfe67aa77c549de2d3c18a git bisect bad 182966e1cd74ec0e326cd376de241803ee79741b git bisect good b080cee72ef355669cbc52ff55dc513d37433600 git bisect good 3fe2f7446f1e029b220f7f650df6d138f91651f2 git bisect bad d51b1b33c51d147b757f042b4d336603b699f362 git bisect good 3bf03b9a0839c9fb06927ae53ebd0f960b19d408 git bisect bad 6b1f86f8e9c7f9de7ca1cb987b2cf25e99b1ae3a git bisect good 4aed23a2f8aaaafad0232d3392afcf493c3c3df3 git bisect good ebf55c886eb7fc3c54d02ba1046f0ee38b81fc10 git bisect good d68eccad370665830e16e5c77611fde78cd749b3 git bisect good 3a3bae50af5d73fab5da20484029de77ca67bb2e git bisect bad 1854bc6e2420472676c5c90d3d6b15f6cd640e40 git bisect good 421f1ab48452af48b64e205de1caca3d1ba415f4 git bisect bad 793917d997df2e432f3e9ac126e4482d68256d01 git bisect good 18788cfa236967741b83db1035ab24539e2a21bb # first bad commit: [793917d997df2e432f3e9ac126e4482d68256d01] mm/readahead: Add large folio readahead i verified last two commits over a couple of days to be sure here's the output of scripts/decode_stacktrace.sh: RIP: 0010:__filemap_get_folio (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/atomic.h:29 /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1158 /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1183 /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-instrumented.h:608 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:238 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:247 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:280 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:313 /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1897 /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1949) Code: 10 e8 a6 05 68 00 48 89 c3 48 3d 02 04 00 00 74 e2 48 3d 06 04 00 00 74 da 48 85 c0 0f 84 3e 02 00 00 a8 01 0f 85 40 02 00 00 <8b> 40 34 85 c0 74 c2 8d 50 01 f0 0f b1 53 34 75 f2 48 8b 54 24 28 All code ======== 0: 10 e8 adc %ch,%al 2: a6 cmpsb %es:(%rdi),%ds:(%rsi) 3: 05 68 00 48 89 add $0x89480068,%eax 8: c3 ret 9: 48 3d 02 04 00 00 cmp $0x402,%rax f: 74 e2 je 0xfffffffffffffff3 11: 48 3d 06 04 00 00 cmp $0x406,%rax 17: 74 da je 0xfffffffffffffff3 19: 48 85 c0 test %rax,%rax 1c: 0f 84 3e 02 00 00 je 0x260 22: a8 01 test $0x1,%al 24: 0f 85 40 02 00 00 jne 0x26a 2a:* 8b 40 34 mov 0x34(%rax),%eax <-- trapping instruction 2d: 85 c0 test %eax,%eax 2f: 74 c2 je 0xfffffffffffffff3 31: 8d 50 01 lea 0x1(%rax),%edx 34: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) 39: 75 f2 jne 0x2d 3b: 48 8b 54 24 28 mov 0x28(%rsp),%rdx Code starting with the faulting instruction =========================================== 0: 8b 40 34 mov 0x34(%rax),%eax 3: 85 c0 test %eax,%eax 5: 74 c2 je 0xffffffffffffffc9 7: 8d 50 01 lea 0x1(%rax),%edx a: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) f: 75 f2 jne 0x3 11: 48 8b 54 24 28 mov 0x28(%rsp),%rdx RSP: 0000:ffffa39e84c2bcb0 EFLAGS: 00010246 RAX: 00000000000000c2 RBX: 00000000000000c2 RCX: 0000000000000002 RDX: 0000000000000034 RSI: ffffa39e84c2bcc0 RDI: ffff8c676acf8920 RBP: 0000000000000000 R08: ffffa39e84c2bd40 R09: 0000000000000000 R10: ffffffffffffffc0 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c675f8ceb78 R14: 00000000005a15b7 R15: fff000003fffffff FS: 00007f8628ffc640(0000) GS:ffff8c6dbea80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000f6 CR3: 0000000166dda000 CR4: 0000000000750ee0 PKRU: 55555554 Call Trace: <TASK> filemap_fault (/home/reinhardt/dev-apps/kernel/linux/./include/linux/pagemap.h:531 /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:3105) __do_fault (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:3852) __handle_mm_fault (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4169 /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4297 /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4555 /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4690) handle_mm_fault (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4788) do_user_addr_fault (/home/reinhardt/dev-apps/kernel/linux/./include/linux/sched/signal.h:404 /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1399) exc_page_fault (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/irqflags.h:40 /home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/irqflags.h:75 /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1492 /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1540) ? asm_exc_page_fault (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/idtentry.h:568) asm_exc_page_fault (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/idtentry.h:568) RIP: 0033:0x7f863519b789 Code: 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 83 fa 20 72 27 <c5> fe 6f 06 48 83 fa 40 0f 87 a9 00 00 00 c5 fe 6f 4c 16 e0 c5 fe All code ======== 0: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) 7: 00 00 00 00 b: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) 12: 00 00 00 00 16: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) 1d: 00 00 00 00 21: 48 89 f8 mov %rdi,%rax 24: 48 83 fa 20 cmp $0x20,%rdx 28: 72 27 jb 0x51 2a:* c5 fe 6f 06 vmovdqu (%rsi),%ymm0 <-- trapping instruction 2e: 48 83 fa 40 cmp $0x40,%rdx 32: 0f 87 a9 00 00 00 ja 0xe1 38: c5 fe 6f 4c 16 e0 vmovdqu -0x20(%rsi,%rdx,1),%ymm1 3e: c5 .byte 0xc5 3f: fe .byte 0xfe Code starting with the faulting instruction =========================================== 0: c5 fe 6f 06 vmovdqu (%rsi),%ymm0 4: 48 83 fa 40 cmp $0x40,%rdx 8: 0f 87 a9 00 00 00 ja 0xb7 e: c5 fe 6f 4c 16 e0 vmovdqu -0x20(%rsi,%rdx,1),%ymm1 14: c5 .byte 0xc5 15: fe .byte 0xfe RSP: 002b:00007f8628ffb888 EFLAGS: 00010202 RAX: 00007f85f00405f0 RBX: 0000000000000000 RCX: 00007f8628ffba10 RDX: 0000000000004000 RSI: 00007f6b803572d5 RDI: 00007f85f00405f0 RBP: 00007f8628ffb8a8 R08: 000000006375fb7a R09: 0000000000000000 R10: 0000000000000008 R11: 0000000000000246 R12: 00007f85f000bc00 R13: 00007f8624002130 R14: 00000005a15b72d5 R15: 0000000000004000 </TASK> Modules linked in: overlay xt_addrtype amdgpu iwlmvm mac80211 libarc4 drm_ttm_helper ttm gpu_sched drm_kms_helper backlight syscopyarea sysfillrect iwlwifi sysimgblt fb_sys_fops i2c_piix4 cfg80211 k10temp fuse configfs efivarfs CR2: 00000000000000f6 ---[ end trace 0000000000000000 ]--- RIP: 0010:__filemap_get_folio (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/atomic.h:29 /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1158 /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1183 /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-instrumented.h:608 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:238 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:247 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:280 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:313 /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1897 /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1949) Code: 10 e8 a6 05 68 00 48 89 c3 48 3d 02 04 00 00 74 e2 48 3d 06 04 00 00 74 da 48 85 c0 0f 84 3e 02 00 00 a8 01 0f 85 40 02 00 00 <8b> 40 34 85 c0 74 c2 8d 50 01 f0 0f b1 53 34 75 f2 48 8b 54 24 28 All code ======== 0: 10 e8 adc %ch,%al 2: a6 cmpsb %es:(%rdi),%ds:(%rsi) 3: 05 68 00 48 89 add $0x89480068,%eax 8: c3 ret 9: 48 3d 02 04 00 00 cmp $0x402,%rax f: 74 e2 je 0xfffffffffffffff3 11: 48 3d 06 04 00 00 cmp $0x406,%rax 17: 74 da je 0xfffffffffffffff3 19: 48 85 c0 test %rax,%rax 1c: 0f 84 3e 02 00 00 je 0x260 22: a8 01 test $0x1,%al 24: 0f 85 40 02 00 00 jne 0x26a 2a:* 8b 40 34 mov 0x34(%rax),%eax <-- trapping instruction 2d: 85 c0 test %eax,%eax 2f: 74 c2 je 0xfffffffffffffff3 31: 8d 50 01 lea 0x1(%rax),%edx 34: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) 39: 75 f2 jne 0x2d 3b: 48 8b 54 24 28 mov 0x28(%rsp),%rdx Code starting with the faulting instruction =========================================== 0: 8b 40 34 mov 0x34(%rax),%eax 3: 85 c0 test %eax,%eax 5: 74 c2 je 0xffffffffffffffc9 7: 8d 50 01 lea 0x1(%rax),%edx a: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) f: 75 f2 jne 0x3 11: 48 8b 54 24 28 mov 0x28(%rsp),%rdx RSP: 0000:ffffa39e84c2bcb0 EFLAGS: 00010246 RAX: 00000000000000c2 RBX: 00000000000000c2 RCX: 0000000000000002 RDX: 0000000000000034 RSI: ffffa39e84c2bcc0 RDI: ffff8c676acf8920 RBP: 0000000000000000 R08: ffffa39e84c2bd40 R09: 0000000000000000 R10: ffffffffffffffc0 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8c675f8ceb78 R14: 00000000005a15b7 R15: fff000003fffffff FS: 00007f8628ffc640(0000) GS:ffff8c6dbea80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000f6 CR3: 0000000166dda000 CR4: 0000000000750ee0 PKRU: 55555554
On Fri, Dec 02, 2022 at 04:58:42PM +0000, Matthew Wilcox wrote: > Landing on 793917d997df makes a lot more sense. That's where we > actually start using large folios. It doesn't really help narrow > down the problem. I have an idea for what it might be; patch to > try will follow. But I'll need feedback by email. This will give us a bit more information when it does happen. Further patch to catch it earlier will come "soon". diff --git a/lib/xarray.c b/lib/xarray.c index 6f47f6375808..b358b4e1dac6 100644 --- a/lib/xarray.c +++ b/lib/xarray.c @@ -6,6 +6,7 @@ * Author: Matthew Wilcox <willy@infradead.org> */ +#define XA_DEBUG #include <linux/bitmap.h> #include <linux/export.h> #include <linux/list.h> @@ -207,6 +208,12 @@ static void *xas_descend(struct xa_state *xas, struct xa_node *node) if (xa_is_sibling(entry)) { offset = xa_to_sibling(entry); entry = xa_entry(xas->xa, node, offset); + + if (xa_is_sibling(entry)) { + printk("***BAD SIBLING*** index %ld offset %d\n", + xas->xa_index, offset); + xa_dump_node(node); + } } xas->xa_offset = offset;
Created attachment 303344 [details] full_dmesg On Fri, 2 Dec 2022 21:57:09 +0000 Matthew Wilcox <willy@infradead.org> wrote: > On Fri, Dec 02, 2022 at 04:58:42PM +0000, Matthew Wilcox wrote: > > Landing on 793917d997df makes a lot more sense. That's where we > > actually start using large folios. It doesn't really help narrow > > down the problem. I have an idea for what it might be; patch to > > try will follow. But I'll need feedback by email. > > This will give us a bit more information when it does happen. > Further patch to catch it earlier will come "soon". > > diff --git a/lib/xarray.c b/lib/xarray.c > index 6f47f6375808..b358b4e1dac6 100644 > --- a/lib/xarray.c > +++ b/lib/xarray.c > @@ -6,6 +6,7 @@ > * Author: Matthew Wilcox <willy@infradead.org> > */ > > +#define XA_DEBUG > #include <linux/bitmap.h> > #include <linux/export.h> > #include <linux/list.h> > @@ -207,6 +208,12 @@ static void *xas_descend(struct xa_state *xas, struct > xa_node *node) > if (xa_is_sibling(entry)) { > offset = xa_to_sibling(entry); > entry = xa_entry(xas->xa, node, offset); > + > + if (xa_is_sibling(entry)) { > + printk("***BAD SIBLING*** index %ld offset %d\n", > + xas->xa_index, offset); > + xa_dump_node(node); > + } > } > > xas->xa_offset = offset; here is the crash with your patch (full dmesg in attachement): [ 876.422920] ***BAD SIBLING*** index 104046 offset 40 [ 876.422922] node ffffa0f3a6367b50 offset 25 parent ffffa0f37bd7a480 shift 0 count 64 values 8 array ffffa0f1fab08dc0 list ffffa0f3a6367b68 ffffa0f3a6367b68 marks 0 0 0 [ 876.422926] BUG: kernel NULL pointer dereference, address: 0000000000000082 [ 876.422928] #PF: supervisor read access in kernel mode [ 876.422929] #PF: error_code(0x0000) - not-present page [ 876.422930] PGD 0 P4D 0 [ 876.422931] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 876.422933] CPU: 19 PID: 8313 Comm: deluge-gtk Not tainted 5.17.0-rc4_ap_test-00163-g793917d997df-dirty #3 [ 876.422934] Hardware name: Micro-Star International Co., Ltd. MS-7C35/MEG X570 UNIFY (MS-7C35), BIOS A.C3 03/15/2022 [ 876.422935] RIP: 0010:next_uptodate_page+0x40/0x1e0 [ 876.422939] Code: 0f 84 2f 01 00 00 48 81 ff 06 04 00 00 0f 84 a3 00 00 00 48 81 ff 02 04 00 00 0f 84 22 01 00 00 40 f6 c7 01 0f 85 8c 00 00 00 <48> 8b 07 a8 01 0f 85 81 00 00 00 8b 47 34 85 c0 74 7a 8d 50 01 4c [ 876.422940] RSP: 0000:ffffbf0704aefce8 EFLAGS: 00010246 [ 876.422942] RAX: 0000000000000082 RBX: ffffbf0704aefd40 RCX: 000000000001967d [ 876.422942] RDX: ffffbf0704aefd40 RSI: ffffa0f1fab08db8 RDI: 0000000000000082 [ 876.422943] RBP: ffffa0f1fab08db8 R08: 00000000ffffdfff R09: 00000000ffffdfff [ 876.422944] R10: ffffffff9ee72dc0 R11: ffffffff9ee72dc0 R12: 000000000001967d [ 876.422944] R13: ffffa0f3cc1c06c0 R14: ffffa0f1fab08db8 R15: 000000000001966e [ 876.422945] FS: 00007f8971ffb6c0(0000) GS:ffffa0f87eec0000(0000) knlGS:0000000000000000 [ 876.422946] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 876.422947] CR2: 0000000000000082 CR3: 000000010fada000 CR4: 0000000000750ee0 [ 876.422947] PKRU: 55555554 [ 876.422948] Call Trace: [ 876.422949] <TASK> [ 876.422951] filemap_map_pages+0xa3/0x570 [ 876.422954] xfs_filemap_map_pages+0x3f/0x60 [ 876.422957] __handle_mm_fault+0xfbe/0x15c0 [ 876.422959] ? __hrtimer_init+0xd0/0xd0 [ 876.422963] handle_mm_fault+0xbc/0x280 [ 876.422964] do_user_addr_fault+0x1bc/0x640 [ 876.422968] exc_page_fault+0x60/0x140 [ 876.422971] ? asm_exc_page_fault+0x8/0x30 [ 876.422973] asm_exc_page_fault+0x1e/0x30 [ 876.422975] RIP: 0033:0x7f89ab02e409 [ 876.422977] Code: 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 83 fa 20 72 27 <c5> fe 6f 06 48 83 fa 40 0f 87 a9 00 00 00 c5 fe 6f 4c 16 e0 c5 fe [ 876.422978] RSP: 002b:00007f8971ffa908 EFLAGS: 00010202 [ 876.422979] RAX: 00007f895c008900 RBX: 0000000000000000 RCX: 00007f8971ffaa90 [ 876.422980] RDX: 0000000000004000 RSI: 00007f60e9fd2b7d RDI: 00007f895c008900 [ 876.422981] RBP: 00007f8971ffa928 R08: 00000000638a7d7a R09: 0000000000000000 [ 876.422981] R10: 0000000000000008 R11: 0000000000000246 R12: 00007f895c094a10 [ 876.422982] R13: 00007f89640016d0 R14: 0000000019670b7d R15: 0000000000004000 [ 876.422984] </TASK> [ 876.422984] Modules linked in: overlay xt_addrtype amdgpu drm_ttm_helper ttm gpu_sched drm_kms_helper backlight iwlmvm syscopyarea mac80211 sysfillrect libarc4 sysimgblt fb_sys_fops iwlwifi i2c_piix4 cfg80211 k10temp fuse configfs efivarfs [ 876.422994] CR2: 0000000000000082 [ 876.422995] ---[ end trace 0000000000000000 ]--- [ 876.422996] RIP: 0010:next_uptodate_page+0x40/0x1e0 [ 876.422998] Code: 0f 84 2f 01 00 00 48 81 ff 06 04 00 00 0f 84 a3 00 00 00 48 81 ff 02 04 00 00 0f 84 22 01 00 00 40 f6 c7 01 0f 85 8c 00 00 00 <48> 8b 07 a8 01 0f 85 81 00 00 00 8b 47 34 85 c0 74 7a 8d 50 01 4c [ 876.422999] RSP: 0000:ffffbf0704aefce8 EFLAGS: 00010246 [ 876.423000] RAX: 0000000000000082 RBX: ffffbf0704aefd40 RCX: 000000000001967d [ 876.423001] RDX: ffffbf0704aefd40 RSI: ffffa0f1fab08db8 RDI: 0000000000000082 [ 876.423002] RBP: ffffa0f1fab08db8 R08: 00000000ffffdfff R09: 00000000ffffdfff [ 876.423003] R10: ffffffff9ee72dc0 R11: ffffffff9ee72dc0 R12: 000000000001967d [ 876.423003] R13: ffffa0f3cc1c06c0 R14: ffffa0f1fab08db8 R15: 000000000001966e [ 876.423004] FS: 00007f8971ffb6c0(0000) GS:ffffa0f87eec0000(0000) knlGS:0000000000000000 [ 876.423005] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 876.423006] CR2: 0000000000000082 CR3: 000000010fada000 CR4: 0000000000750ee0 [ 876.423007] PKRU: 55555554 i will, of course, test further patches to help with this issue
On Sat, Dec 03, 2022 at 01:44:20AM +0300, Mikhail Pletnev wrote: > On Fri, 2 Dec 2022 21:57:09 +0000 > Matthew Wilcox <willy@infradead.org> wrote: > > > On Fri, Dec 02, 2022 at 04:58:42PM +0000, Matthew Wilcox wrote: > > > Landing on 793917d997df makes a lot more sense. That's where we > > > actually start using large folios. It doesn't really help narrow > > > down the problem. I have an idea for what it might be; patch to > > > try will follow. But I'll need feedback by email. > > > > This will give us a bit more information when it does happen. > > Further patch to catch it earlier will come "soon". > > > > diff --git a/lib/xarray.c b/lib/xarray.c > > index 6f47f6375808..b358b4e1dac6 100644 > > --- a/lib/xarray.c > > +++ b/lib/xarray.c > > @@ -6,6 +6,7 @@ > > * Author: Matthew Wilcox <willy@infradead.org> > > */ > > > > +#define XA_DEBUG > > #include <linux/bitmap.h> > > #include <linux/export.h> > > #include <linux/list.h> > > @@ -207,6 +208,12 @@ static void *xas_descend(struct xa_state *xas, struct > xa_node *node) > > if (xa_is_sibling(entry)) { > > offset = xa_to_sibling(entry); > > entry = xa_entry(xas->xa, node, offset); > > + > > + if (xa_is_sibling(entry)) { > > + printk("***BAD SIBLING*** index %ld offset %d\n", > > + xas->xa_index, offset); > > + xa_dump_node(node); > > + } > > } > > > > xas->xa_offset = offset; > > here is the crash with your patch (full dmesg in attachement): Thanks! I think this may be the problem ... diff --git a/include/linux/xarray.h b/include/linux/xarray.h index 44dd6d6e01bc..cc1fd1f849a7 100644 --- a/include/linux/xarray.h +++ b/include/linux/xarray.h @@ -1617,6 +1617,12 @@ static inline void xas_advance(struct xa_state *xas, unsigned long index) xas->xa_offset = (index >> shift) & XA_CHUNK_MASK; } +static inline void xas_adjust_order(struct xa_state *xas, unsigned int order) +{ + xas->xa_shift = order - (order % XA_CHUNK_SHIFT); + xas->xa_sibs = (1 << (order % XA_CHUNK_SHIFT)) - 1; +} + /** * xas_set_order() - Set up XArray operation state for a multislot entry. * @xas: XArray operation state. @@ -1628,8 +1634,7 @@ static inline void xas_set_order(struct xa_state *xas, unsigned long index, { #ifdef CONFIG_XARRAY_MULTI xas->xa_index = order < BITS_PER_LONG ? (index >> order) << order : 0; - xas->xa_shift = order - (order % XA_CHUNK_SHIFT); - xas->xa_sibs = (1 << (order % XA_CHUNK_SHIFT)) - 1; + xas_adjust_order(xas, order); xas->xa_node = XAS_RESTART; #else BUG_ON(order > 0); diff --git a/mm/filemap.c b/mm/filemap.c index 08341616ae7a..6e3f486131e4 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -305,11 +305,13 @@ static void page_cache_delete_batch(struct address_space *mapping, WARN_ON_ONCE(!folio_test_locked(folio)); + if (!folio_test_hugetlb(folio)) + xas_adjust_order(&xas, folio_order(folio)); + xas_store(&xas, NULL); folio->mapping = NULL; /* Leave folio->index set: truncation lookup relies on it */ i++; - xas_store(&xas, NULL); total_pages += folio_nr_pages(folio); } mapping->nrpages -= total_pages;
Created attachment 303365 [details] full_dmesg_new On Mon, 5 Dec 2022 20:25:11 +0000 Matthew Wilcox <willy@infradead.org> wrote: > > Thanks! I think this may be the problem ... > Hi Matthew, thanks for swift response, i've applied your last patch and ran my stress test a couple of times. It's still constistently crashing (albeit it seems in a different place): [ 1975.257126] ***BAD SIBLING*** index 912583 offset 4 [ 1975.257128] node ffff9fc817e01ff0 offset 51 parent ffff9fc5c7a31ff0 shift 0 count 64 values 48 array ffff9fc521173e80 list ffff9fc817e02008 ffff9fc817e02008 marks 0 0 0 [ 1975.257133] BUG: kernel NULL pointer dereference, address: 0000000000000036 [ 1975.257135] #PF: supervisor read access in kernel mode [ 1975.257137] #PF: error_code(0x0000) - not-present page [ 1975.257138] PGD 0 P4D 0 [ 1975.257139] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 1975.257141] CPU: 5 PID: 8303 Comm: deluge-gtk Not tainted 5.17.0-rc4_ap_test-00163-g793917d997df-dirty #6 [ 1975.257144] Hardware name: Micro-Star International Co., Ltd. MS-7C35/MEG X570 UNIFY (MS-7C35), BIOS A.C3 03/15/2022 [ 1975.257146] RIP: 0010:__filemap_get_folio (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/atomic.h:29 /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1158 /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1183 /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-instrumented.h:608 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:238 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:247 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:280 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:313 /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1899 /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1951) [ 1975.257152] Code: 10 e8 56 fd 67 00 48 89 c3 48 3d 02 04 00 00 74 e2 48 3d 06 04 00 00 74 da 48 85 c0 0f 84 3e 02 00 00 a8 01 0f 85 40 02 00 00 <8b> 40 34 85 c0 74 c2 8d 50 01 f0 0f b1 53 34 75 f2 48 8b 54 24 28 All code ======== 0: 10 e8 adc %ch,%al 2: 56 push %rsi 3: fd std 4: 67 00 48 89 add %cl,-0x77(%eax) 8: c3 ret 9: 48 3d 02 04 00 00 cmp $0x402,%rax f: 74 e2 je 0xfffffffffffffff3 11: 48 3d 06 04 00 00 cmp $0x406,%rax 17: 74 da je 0xfffffffffffffff3 19: 48 85 c0 test %rax,%rax 1c: 0f 84 3e 02 00 00 je 0x260 22: a8 01 test $0x1,%al 24: 0f 85 40 02 00 00 jne 0x26a 2a:* 8b 40 34 mov 0x34(%rax),%eax <-- trapping instruction 2d: 85 c0 test %eax,%eax 2f: 74 c2 je 0xfffffffffffffff3 31: 8d 50 01 lea 0x1(%rax),%edx 34: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) 39: 75 f2 jne 0x2d 3b: 48 8b 54 24 28 mov 0x28(%rsp),%rdx Code starting with the faulting instruction =========================================== 0: 8b 40 34 mov 0x34(%rax),%eax 3: 85 c0 test %eax,%eax 5: 74 c2 je 0xffffffffffffffc9 7: 8d 50 01 lea 0x1(%rax),%edx a: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) f: 75 f2 jne 0x3 11: 48 8b 54 24 28 mov 0x28(%rsp),%rdx [ 1975.257154] RSP: 0000:ffffc2d744c37cb0 EFLAGS: 00010246 [ 1975.257155] RAX: 0000000000000002 RBX: 0000000000000002 RCX: 0000000000000000 [ 1975.257156] RDX: 0000000000000000 RSI: ffffffffbb117459 RDI: 00000000ffffffff [ 1975.257157] RBP: 0000000000000000 R08: 00000000ffffdfff R09: 00000000ffffdfff [ 1975.257158] R10: ffffffffbb472dc0 R11: ffffffffbb472dc0 R12: 0000000000000000 [ 1975.257159] R13: ffff9fc521173e78 R14: 00000000000decc7 R15: fff000003fffffff [ 1975.257160] FS: 00007fb2137fe6c0(0000) GS:ffff9fcb7eb40000(0000) knlGS:0000000000000000 [ 1975.257161] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1975.257162] CR2: 0000000000000036 CR3: 0000000164114000 CR4: 0000000000750ee0 [ 1975.257163] PKRU: 55555554 [ 1975.257163] Call Trace: [ 1975.257164] <TASK> [ 1975.257166] ? page_add_file_rmap (/home/reinhardt/dev-apps/kernel/linux/./include/linux/page-flags.h:195 /home/reinhardt/dev-apps/kernel/linux/mm/internal.h:440 /home/reinhardt/dev-apps/kernel/linux/mm/rmap.c:1270) [ 1975.257169] filemap_fault (/home/reinhardt/dev-apps/kernel/linux/./include/linux/pagemap.h:531 /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:3107) [ 1975.257172] __do_fault (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:3852) [ 1975.257174] __handle_mm_fault (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4169 /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4297 /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4555 /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4690) [ 1975.257176] handle_mm_fault (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4788) [ 1975.257178] do_user_addr_fault (/home/reinhardt/dev-apps/kernel/linux/./include/linux/sched/signal.h:404 /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1399) [ 1975.257181] exc_page_fault (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/irqflags.h:40 /home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/irqflags.h:75 /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1492 /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1540) [ 1975.257184] ? asm_exc_page_fault (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/idtentry.h:568) [ 1975.257186] asm_exc_page_fault (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/idtentry.h:568) [ 1975.257188] RIP: 0033:0x7fb265b88409 [ 1975.257189] Code: 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 83 fa 20 72 27 <c5> fe 6f 06 48 83 fa 40 0f 87 a9 00 00 00 c5 fe 6f 4c 16 e0 c5 fe All code ======== 0: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) 7: 00 00 00 00 b: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) 12: 00 00 00 00 16: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) 1d: 00 00 00 00 21: 48 89 f8 mov %rdi,%rax 24: 48 83 fa 20 cmp $0x20,%rdx 28: 72 27 jb 0x51 2a:* c5 fe 6f 06 vmovdqu (%rsi),%ymm0 <-- trapping instruction 2e: 48 83 fa 40 cmp $0x40,%rdx 32: 0f 87 a9 00 00 00 ja 0xe1 38: c5 fe 6f 4c 16 e0 vmovdqu -0x20(%rsi,%rdx,1),%ymm1 3e: c5 .byte 0xc5 3f: fe .byte 0xfe Code starting with the faulting instruction =========================================== 0: c5 fe 6f 06 vmovdqu (%rsi),%ymm0 4: 48 83 fa 40 cmp $0x40,%rdx 8: 0f 87 a9 00 00 00 ja 0xb7 e: c5 fe 6f 4c 16 e0 vmovdqu -0x20(%rsi,%rdx,1),%ymm1 14: c5 .byte 0xc5 15: fe .byte 0xfe [ 1975.257190] RSP: 002b:00007fb2137fd908 EFLAGS: 00010202 [ 1975.257191] RAX: 00007fb204012a80 RBX: 0000000000000000 RCX: 00007fb2137fda90 [ 1975.257192] RDX: 0000000000004000 RSI: 00007f9fddbb51c3 RDI: 00007fb204012a80 [ 1975.257193] RBP: 00007fb2137fd928 R08: 00000000638ea1ab R09: 0000000000000000 [ 1975.257193] R10: 0000000000000008 R11: 0000000000000246 R12: 00007fb204000bb0 [ 1975.257194] R13: 00007fb21809a5a0 R14: 00000000decc71c3 R15: 0000000000004000 [ 1975.257196] </TASK> [ 1975.257196] Modules linked in: overlay xt_addrtype amdgpu drm_ttm_helper ttm gpu_sched drm_kms_helper iwlmvm backlight syscopyarea mac80211 sysfillrect sysimgblt libarc4 fb_sys_fops iwlwifi cfg80211 i2c_piix4 k10temp fuse configfs efivarfs [ 1975.257207] CR2: 0000000000000036 [ 1975.257208] ---[ end trace 0000000000000000 ]--- [ 1975.257209] RIP: 0010:__filemap_get_folio (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/atomic.h:29 /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1158 /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1183 /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-instrumented.h:608 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:238 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:247 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:280 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:313 /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1899 /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1951) [ 1975.257211] Code: 10 e8 56 fd 67 00 48 89 c3 48 3d 02 04 00 00 74 e2 48 3d 06 04 00 00 74 da 48 85 c0 0f 84 3e 02 00 00 a8 01 0f 85 40 02 00 00 <8b> 40 34 85 c0 74 c2 8d 50 01 f0 0f b1 53 34 75 f2 48 8b 54 24 28 All code ======== 0: 10 e8 adc %ch,%al 2: 56 push %rsi 3: fd std 4: 67 00 48 89 add %cl,-0x77(%eax) 8: c3 ret 9: 48 3d 02 04 00 00 cmp $0x402,%rax f: 74 e2 je 0xfffffffffffffff3 11: 48 3d 06 04 00 00 cmp $0x406,%rax 17: 74 da je 0xfffffffffffffff3 19: 48 85 c0 test %rax,%rax 1c: 0f 84 3e 02 00 00 je 0x260 22: a8 01 test $0x1,%al 24: 0f 85 40 02 00 00 jne 0x26a 2a:* 8b 40 34 mov 0x34(%rax),%eax <-- trapping instruction 2d: 85 c0 test %eax,%eax 2f: 74 c2 je 0xfffffffffffffff3 31: 8d 50 01 lea 0x1(%rax),%edx 34: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) 39: 75 f2 jne 0x2d 3b: 48 8b 54 24 28 mov 0x28(%rsp),%rdx Code starting with the faulting instruction =========================================== 0: 8b 40 34 mov 0x34(%rax),%eax 3: 85 c0 test %eax,%eax 5: 74 c2 je 0xffffffffffffffc9 7: 8d 50 01 lea 0x1(%rax),%edx a: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) f: 75 f2 jne 0x3 11: 48 8b 54 24 28 mov 0x28(%rsp),%rdx [ 1975.257212] RSP: 0000:ffffc2d744c37cb0 EFLAGS: 00010246 [ 1975.257213] RAX: 0000000000000002 RBX: 0000000000000002 RCX: 0000000000000000 [ 1975.257214] RDX: 0000000000000000 RSI: ffffffffbb117459 RDI: 00000000ffffffff [ 1975.257215] RBP: 0000000000000000 R08: 00000000ffffdfff R09: 00000000ffffdfff [ 1975.257215] R10: ffffffffbb472dc0 R11: ffffffffbb472dc0 R12: 0000000000000000 [ 1975.257216] R13: ffff9fc521173e78 R14: 00000000000decc7 R15: fff000003fffffff [ 1975.257217] FS: 00007fb2137fe6c0(0000) GS:ffff9fcb7eb40000(0000) knlGS:0000000000000000 [ 1975.257218] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1975.257219] CR2: 0000000000000036 CR3: 0000000164114000 CR4: 0000000000750ee0 [ 1975.257220] PKRU: 55555554 (full dmesg and my local changeset in attachments for your reference)
Created attachment 303366 [details] changeset
Hi, this is your Linux kernel regression tracker. Top-posting for once, to make this easily accessible to everyone. Was some progress made to get this regression resolved? From here it looks kinda stalled, that's why I'm asking -- but maybe I just missed something. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight. On 06.12.22 03:08, Mikhail Pletnev wrote: > On Mon, 5 Dec 2022 20:25:11 +0000 > Matthew Wilcox <willy@infradead.org> wrote: >> >> Thanks! I think this may be the problem ... >> > > Hi Matthew, thanks for swift response, i've applied your last patch and ran > my stress test a couple of times. It's still constistently crashing (albeit > it seems in a different place): > > [ 1975.257126] ***BAD SIBLING*** index 912583 offset 4 > [ 1975.257128] node ffff9fc817e01ff0 offset 51 parent ffff9fc5c7a31ff0 shift > 0 count 64 values 48 array ffff9fc521173e80 list ffff9fc817e02008 > ffff9fc817e02008 marks 0 0 0 > [ 1975.257133] BUG: kernel NULL pointer dereference, address: > 0000000000000036 > [ 1975.257135] #PF: supervisor read access in kernel mode > [ 1975.257137] #PF: error_code(0x0000) - not-present page > [ 1975.257138] PGD 0 P4D 0 > [ 1975.257139] Oops: 0000 [#1] PREEMPT SMP NOPTI > [ 1975.257141] CPU: 5 PID: 8303 Comm: deluge-gtk Not tainted > 5.17.0-rc4_ap_test-00163-g793917d997df-dirty #6 > [ 1975.257144] Hardware name: Micro-Star International Co., Ltd. MS-7C35/MEG > X570 UNIFY (MS-7C35), BIOS A.C3 03/15/2022 > [ 1975.257146] RIP: 0010:__filemap_get_folio > (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/atomic.h:29 > /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1158 > /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1183 > /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-instrumented.h:608 > /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:238 > /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:247 > /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:280 > /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:313 > /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1899 > /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1951) > [ 1975.257152] Code: 10 e8 56 fd 67 00 48 89 c3 48 3d 02 04 00 00 74 e2 48 3d > 06 04 00 00 74 da 48 85 c0 0f 84 3e 02 00 00 a8 01 0f 85 40 02 00 00 <8b> 40 > 34 85 c0 74 c2 8d 50 01 f0 0f b1 53 34 75 f2 48 8b 54 24 28 > All code > ======== > 0: 10 e8 adc %ch,%al > 2: 56 push %rsi > 3: fd std > 4: 67 00 48 89 add %cl,-0x77(%eax) > 8: c3 ret > 9: 48 3d 02 04 00 00 cmp $0x402,%rax > f: 74 e2 je 0xfffffffffffffff3 > 11: 48 3d 06 04 00 00 cmp $0x406,%rax > 17: 74 da je 0xfffffffffffffff3 > 19: 48 85 c0 test %rax,%rax > 1c: 0f 84 3e 02 00 00 je 0x260 > 22: a8 01 test $0x1,%al > 24: 0f 85 40 02 00 00 jne 0x26a > 2a:* 8b 40 34 mov 0x34(%rax),%eax <-- > trapping instruction > 2d: 85 c0 test %eax,%eax > 2f: 74 c2 je 0xfffffffffffffff3 > 31: 8d 50 01 lea 0x1(%rax),%edx > 34: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) > 39: 75 f2 jne 0x2d > 3b: 48 8b 54 24 28 mov 0x28(%rsp),%rdx > > Code starting with the faulting instruction > =========================================== > 0: 8b 40 34 mov 0x34(%rax),%eax > 3: 85 c0 test %eax,%eax > 5: 74 c2 je 0xffffffffffffffc9 > 7: 8d 50 01 lea 0x1(%rax),%edx > a: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) > f: 75 f2 jne 0x3 > 11: 48 8b 54 24 28 mov 0x28(%rsp),%rdx > [ 1975.257154] RSP: 0000:ffffc2d744c37cb0 EFLAGS: 00010246 > [ 1975.257155] RAX: 0000000000000002 RBX: 0000000000000002 RCX: > 0000000000000000 > [ 1975.257156] RDX: 0000000000000000 RSI: ffffffffbb117459 RDI: > 00000000ffffffff > [ 1975.257157] RBP: 0000000000000000 R08: 00000000ffffdfff R09: > 00000000ffffdfff > [ 1975.257158] R10: ffffffffbb472dc0 R11: ffffffffbb472dc0 R12: > 0000000000000000 > [ 1975.257159] R13: ffff9fc521173e78 R14: 00000000000decc7 R15: > fff000003fffffff > [ 1975.257160] FS: 00007fb2137fe6c0(0000) GS:ffff9fcb7eb40000(0000) > knlGS:0000000000000000 > [ 1975.257161] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1975.257162] CR2: 0000000000000036 CR3: 0000000164114000 CR4: > 0000000000750ee0 > [ 1975.257163] PKRU: 55555554 > [ 1975.257163] Call Trace: > [ 1975.257164] <TASK> > [ 1975.257166] ? page_add_file_rmap > (/home/reinhardt/dev-apps/kernel/linux/./include/linux/page-flags.h:195 > /home/reinhardt/dev-apps/kernel/linux/mm/internal.h:440 > /home/reinhardt/dev-apps/kernel/linux/mm/rmap.c:1270) > [ 1975.257169] filemap_fault > (/home/reinhardt/dev-apps/kernel/linux/./include/linux/pagemap.h:531 > /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:3107) > [ 1975.257172] __do_fault > (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:3852) > [ 1975.257174] __handle_mm_fault > (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4169 > /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4297 > /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4555 > /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4690) > [ 1975.257176] handle_mm_fault > (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4788) > [ 1975.257178] do_user_addr_fault > (/home/reinhardt/dev-apps/kernel/linux/./include/linux/sched/signal.h:404 > /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1399) > [ 1975.257181] exc_page_fault > (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/irqflags.h:40 > /home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/irqflags.h:75 > /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1492 > /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1540) > [ 1975.257184] ? asm_exc_page_fault > (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/idtentry.h:568) > [ 1975.257186] asm_exc_page_fault > (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/idtentry.h:568) > [ 1975.257188] RIP: 0033:0x7fb265b88409 > [ 1975.257189] Code: 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 > 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 83 fa 20 72 27 <c5> fe > 6f 06 48 83 fa 40 0f 87 a9 00 00 00 c5 fe 6f 4c 16 e0 c5 fe > All code > ======== > 0: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) > 7: 00 00 00 00 > b: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) > 12: 00 00 00 00 > 16: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) > 1d: 00 00 00 00 > 21: 48 89 f8 mov %rdi,%rax > 24: 48 83 fa 20 cmp $0x20,%rdx > 28: 72 27 jb 0x51 > 2a:* c5 fe 6f 06 vmovdqu (%rsi),%ymm0 <-- > trapping instruction > 2e: 48 83 fa 40 cmp $0x40,%rdx > 32: 0f 87 a9 00 00 00 ja 0xe1 > 38: c5 fe 6f 4c 16 e0 vmovdqu -0x20(%rsi,%rdx,1),%ymm1 > 3e: c5 .byte 0xc5 > 3f: fe .byte 0xfe > > Code starting with the faulting instruction > =========================================== > 0: c5 fe 6f 06 vmovdqu (%rsi),%ymm0 > 4: 48 83 fa 40 cmp $0x40,%rdx > 8: 0f 87 a9 00 00 00 ja 0xb7 > e: c5 fe 6f 4c 16 e0 vmovdqu -0x20(%rsi,%rdx,1),%ymm1 > 14: c5 .byte 0xc5 > 15: fe .byte 0xfe > [ 1975.257190] RSP: 002b:00007fb2137fd908 EFLAGS: 00010202 > [ 1975.257191] RAX: 00007fb204012a80 RBX: 0000000000000000 RCX: > 00007fb2137fda90 > [ 1975.257192] RDX: 0000000000004000 RSI: 00007f9fddbb51c3 RDI: > 00007fb204012a80 > [ 1975.257193] RBP: 00007fb2137fd928 R08: 00000000638ea1ab R09: > 0000000000000000 > [ 1975.257193] R10: 0000000000000008 R11: 0000000000000246 R12: > 00007fb204000bb0 > [ 1975.257194] R13: 00007fb21809a5a0 R14: 00000000decc71c3 R15: > 0000000000004000 > [ 1975.257196] </TASK> > [ 1975.257196] Modules linked in: overlay xt_addrtype amdgpu drm_ttm_helper > ttm gpu_sched drm_kms_helper iwlmvm backlight syscopyarea mac80211 > sysfillrect sysimgblt libarc4 fb_sys_fops iwlwifi cfg80211 i2c_piix4 k10temp > fuse configfs efivarfs > [ 1975.257207] CR2: 0000000000000036 > [ 1975.257208] ---[ end trace 0000000000000000 ]--- > [ 1975.257209] RIP: 0010:__filemap_get_folio > (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/atomic.h:29 > /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1158 > /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1183 > /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-instrumented.h:608 > /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:238 > /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:247 > /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:280 > /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:313 > /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1899 > /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1951) > [ 1975.257211] Code: 10 e8 56 fd 67 00 48 89 c3 48 3d 02 04 00 00 74 e2 48 3d > 06 04 00 00 74 da 48 85 c0 0f 84 3e 02 00 00 a8 01 0f 85 40 02 00 00 <8b> 40 > 34 85 c0 74 c2 8d 50 01 f0 0f b1 53 34 75 f2 48 8b 54 24 28 > All code > ======== > 0: 10 e8 adc %ch,%al > 2: 56 push %rsi > 3: fd std > 4: 67 00 48 89 add %cl,-0x77(%eax) > 8: c3 ret > 9: 48 3d 02 04 00 00 cmp $0x402,%rax > f: 74 e2 je 0xfffffffffffffff3 > 11: 48 3d 06 04 00 00 cmp $0x406,%rax > 17: 74 da je 0xfffffffffffffff3 > 19: 48 85 c0 test %rax,%rax > 1c: 0f 84 3e 02 00 00 je 0x260 > 22: a8 01 test $0x1,%al > 24: 0f 85 40 02 00 00 jne 0x26a > 2a:* 8b 40 34 mov 0x34(%rax),%eax <-- > trapping instruction > 2d: 85 c0 test %eax,%eax > 2f: 74 c2 je 0xfffffffffffffff3 > 31: 8d 50 01 lea 0x1(%rax),%edx > 34: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) > 39: 75 f2 jne 0x2d > 3b: 48 8b 54 24 28 mov 0x28(%rsp),%rdx > > Code starting with the faulting instruction > =========================================== > 0: 8b 40 34 mov 0x34(%rax),%eax > 3: 85 c0 test %eax,%eax > 5: 74 c2 je 0xffffffffffffffc9 > 7: 8d 50 01 lea 0x1(%rax),%edx > a: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) > f: 75 f2 jne 0x3 > 11: 48 8b 54 24 28 mov 0x28(%rsp),%rdx > [ 1975.257212] RSP: 0000:ffffc2d744c37cb0 EFLAGS: 00010246 > [ 1975.257213] RAX: 0000000000000002 RBX: 0000000000000002 RCX: > 0000000000000000 > [ 1975.257214] RDX: 0000000000000000 RSI: ffffffffbb117459 RDI: > 00000000ffffffff > [ 1975.257215] RBP: 0000000000000000 R08: 00000000ffffdfff R09: > 00000000ffffdfff > [ 1975.257215] R10: ffffffffbb472dc0 R11: ffffffffbb472dc0 R12: > 0000000000000000 > [ 1975.257216] R13: ffff9fc521173e78 R14: 00000000000decc7 R15: > fff000003fffffff > [ 1975.257217] FS: 00007fb2137fe6c0(0000) GS:ffff9fcb7eb40000(0000) > knlGS:0000000000000000 > [ 1975.257218] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1975.257219] CR2: 0000000000000036 CR3: 0000000164114000 CR4: > 0000000000750ee0 > [ 1975.257220] PKRU: 55555554 > > (full dmesg and my local changeset in attachments for your reference) > #regzbot poke
On 16.12.22 06:23, Thorsten Leemhuis wrote: > Hi, this is your Linux kernel regression tracker. Top-posting for once, > to make this easily accessible to everyone. /me again > Was some progress made to get this regression resolved? From here it > looks kinda stalled, that's why I'm asking -- but maybe I just missed > something. Did anything happen to get this regression resolved? Doesn't look like it, but maybe I missed some progress. Willy, Mikhail confirmed off-list to me that the problem still exists. He also tried you patch and reported back. Is there something else you need? Side note: I lost this out of sight during the festive season and should have asked this earlier, but better late than never. :-D Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. #regzbot poke > On 06.12.22 03:08, Mikhail Pletnev wrote: >> On Mon, 5 Dec 2022 20:25:11 +0000 >> Matthew Wilcox <willy@infradead.org> wrote: >>> >>> Thanks! I think this may be the problem ... >>> >> >> Hi Matthew, thanks for swift response, i've applied your last patch and ran >> my stress test a couple of times. It's still constistently crashing (albeit >> it seems in a different place): >> >> [ 1975.257126] ***BAD SIBLING*** index 912583 offset 4 >> [ 1975.257128] node ffff9fc817e01ff0 offset 51 parent ffff9fc5c7a31ff0 shift >> 0 count 64 values 48 array ffff9fc521173e80 list ffff9fc817e02008 >> ffff9fc817e02008 marks 0 0 0 >> [ 1975.257133] BUG: kernel NULL pointer dereference, address: >> 0000000000000036 >> [ 1975.257135] #PF: supervisor read access in kernel mode >> [ 1975.257137] #PF: error_code(0x0000) - not-present page >> [ 1975.257138] PGD 0 P4D 0 >> [ 1975.257139] Oops: 0000 [#1] PREEMPT SMP NOPTI >> [ 1975.257141] CPU: 5 PID: 8303 Comm: deluge-gtk Not tainted >> 5.17.0-rc4_ap_test-00163-g793917d997df-dirty #6 >> [ 1975.257144] Hardware name: Micro-Star International Co., Ltd. MS-7C35/MEG >> X570 UNIFY (MS-7C35), BIOS A.C3 03/15/2022 >> [ 1975.257146] RIP: 0010:__filemap_get_folio >> (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/atomic.h:29 >> /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1158 >> /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1183 >> /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-instrumented.h:608 >> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:238 >> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:247 >> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:280 >> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:313 >> /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1899 >> /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1951) >> [ 1975.257152] Code: 10 e8 56 fd 67 00 48 89 c3 48 3d 02 04 00 00 74 e2 48 >> 3d 06 04 00 00 74 da 48 85 c0 0f 84 3e 02 00 00 a8 01 0f 85 40 02 00 00 <8b> >> 40 34 85 c0 74 c2 8d 50 01 f0 0f b1 53 34 75 f2 48 8b 54 24 28 >> All code >> ======== >> 0: 10 e8 adc %ch,%al >> 2: 56 push %rsi >> 3: fd std >> 4: 67 00 48 89 add %cl,-0x77(%eax) >> 8: c3 ret >> 9: 48 3d 02 04 00 00 cmp $0x402,%rax >> f: 74 e2 je 0xfffffffffffffff3 >> 11: 48 3d 06 04 00 00 cmp $0x406,%rax >> 17: 74 da je 0xfffffffffffffff3 >> 19: 48 85 c0 test %rax,%rax >> 1c: 0f 84 3e 02 00 00 je 0x260 >> 22: a8 01 test $0x1,%al >> 24: 0f 85 40 02 00 00 jne 0x26a >> 2a:* 8b 40 34 mov 0x34(%rax),%eax <-- >> trapping instruction >> 2d: 85 c0 test %eax,%eax >> 2f: 74 c2 je 0xfffffffffffffff3 >> 31: 8d 50 01 lea 0x1(%rax),%edx >> 34: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) >> 39: 75 f2 jne 0x2d >> 3b: 48 8b 54 24 28 mov 0x28(%rsp),%rdx >> >> Code starting with the faulting instruction >> =========================================== >> 0: 8b 40 34 mov 0x34(%rax),%eax >> 3: 85 c0 test %eax,%eax >> 5: 74 c2 je 0xffffffffffffffc9 >> 7: 8d 50 01 lea 0x1(%rax),%edx >> a: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) >> f: 75 f2 jne 0x3 >> 11: 48 8b 54 24 28 mov 0x28(%rsp),%rdx >> [ 1975.257154] RSP: 0000:ffffc2d744c37cb0 EFLAGS: 00010246 >> [ 1975.257155] RAX: 0000000000000002 RBX: 0000000000000002 RCX: >> 0000000000000000 >> [ 1975.257156] RDX: 0000000000000000 RSI: ffffffffbb117459 RDI: >> 00000000ffffffff >> [ 1975.257157] RBP: 0000000000000000 R08: 00000000ffffdfff R09: >> 00000000ffffdfff >> [ 1975.257158] R10: ffffffffbb472dc0 R11: ffffffffbb472dc0 R12: >> 0000000000000000 >> [ 1975.257159] R13: ffff9fc521173e78 R14: 00000000000decc7 R15: >> fff000003fffffff >> [ 1975.257160] FS: 00007fb2137fe6c0(0000) GS:ffff9fcb7eb40000(0000) >> knlGS:0000000000000000 >> [ 1975.257161] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 1975.257162] CR2: 0000000000000036 CR3: 0000000164114000 CR4: >> 0000000000750ee0 >> [ 1975.257163] PKRU: 55555554 >> [ 1975.257163] Call Trace: >> [ 1975.257164] <TASK> >> [ 1975.257166] ? page_add_file_rmap >> (/home/reinhardt/dev-apps/kernel/linux/./include/linux/page-flags.h:195 >> /home/reinhardt/dev-apps/kernel/linux/mm/internal.h:440 >> /home/reinhardt/dev-apps/kernel/linux/mm/rmap.c:1270) >> [ 1975.257169] filemap_fault >> (/home/reinhardt/dev-apps/kernel/linux/./include/linux/pagemap.h:531 >> /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:3107) >> [ 1975.257172] __do_fault >> (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:3852) >> [ 1975.257174] __handle_mm_fault >> (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4169 >> /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4297 >> /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4555 >> /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4690) >> [ 1975.257176] handle_mm_fault >> (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4788) >> [ 1975.257178] do_user_addr_fault >> (/home/reinhardt/dev-apps/kernel/linux/./include/linux/sched/signal.h:404 >> /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1399) >> [ 1975.257181] exc_page_fault >> (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/irqflags.h:40 >> /home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/irqflags.h:75 >> /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1492 >> /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1540) >> [ 1975.257184] ? asm_exc_page_fault >> (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/idtentry.h:568) >> [ 1975.257186] asm_exc_page_fault >> (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/idtentry.h:568) >> [ 1975.257188] RIP: 0033:0x7fb265b88409 >> [ 1975.257189] Code: 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 >> 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 83 fa 20 72 27 <c5> >> fe 6f 06 48 83 fa 40 0f 87 a9 00 00 00 c5 fe 6f 4c 16 e0 c5 fe >> All code >> ======== >> 0: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) >> 7: 00 00 00 00 >> b: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) >> 12: 00 00 00 00 >> 16: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) >> 1d: 00 00 00 00 >> 21: 48 89 f8 mov %rdi,%rax >> 24: 48 83 fa 20 cmp $0x20,%rdx >> 28: 72 27 jb 0x51 >> 2a:* c5 fe 6f 06 vmovdqu (%rsi),%ymm0 <-- >> trapping instruction >> 2e: 48 83 fa 40 cmp $0x40,%rdx >> 32: 0f 87 a9 00 00 00 ja 0xe1 >> 38: c5 fe 6f 4c 16 e0 vmovdqu -0x20(%rsi,%rdx,1),%ymm1 >> 3e: c5 .byte 0xc5 >> 3f: fe .byte 0xfe >> >> Code starting with the faulting instruction >> =========================================== >> 0: c5 fe 6f 06 vmovdqu (%rsi),%ymm0 >> 4: 48 83 fa 40 cmp $0x40,%rdx >> 8: 0f 87 a9 00 00 00 ja 0xb7 >> e: c5 fe 6f 4c 16 e0 vmovdqu -0x20(%rsi,%rdx,1),%ymm1 >> 14: c5 .byte 0xc5 >> 15: fe .byte 0xfe >> [ 1975.257190] RSP: 002b:00007fb2137fd908 EFLAGS: 00010202 >> [ 1975.257191] RAX: 00007fb204012a80 RBX: 0000000000000000 RCX: >> 00007fb2137fda90 >> [ 1975.257192] RDX: 0000000000004000 RSI: 00007f9fddbb51c3 RDI: >> 00007fb204012a80 >> [ 1975.257193] RBP: 00007fb2137fd928 R08: 00000000638ea1ab R09: >> 0000000000000000 >> [ 1975.257193] R10: 0000000000000008 R11: 0000000000000246 R12: >> 00007fb204000bb0 >> [ 1975.257194] R13: 00007fb21809a5a0 R14: 00000000decc71c3 R15: >> 0000000000004000 >> [ 1975.257196] </TASK> >> [ 1975.257196] Modules linked in: overlay xt_addrtype amdgpu drm_ttm_helper >> ttm gpu_sched drm_kms_helper iwlmvm backlight syscopyarea mac80211 >> sysfillrect sysimgblt libarc4 fb_sys_fops iwlwifi cfg80211 i2c_piix4 k10temp >> fuse configfs efivarfs >> [ 1975.257207] CR2: 0000000000000036 >> [ 1975.257208] ---[ end trace 0000000000000000 ]--- >> [ 1975.257209] RIP: 0010:__filemap_get_folio >> (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/atomic.h:29 >> /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1158 >> /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1183 >> /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-instrumented.h:608 >> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:238 >> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:247 >> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:280 >> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:313 >> /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1899 >> /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1951) >> [ 1975.257211] Code: 10 e8 56 fd 67 00 48 89 c3 48 3d 02 04 00 00 74 e2 48 >> 3d 06 04 00 00 74 da 48 85 c0 0f 84 3e 02 00 00 a8 01 0f 85 40 02 00 00 <8b> >> 40 34 85 c0 74 c2 8d 50 01 f0 0f b1 53 34 75 f2 48 8b 54 24 28 >> All code >> ======== >> 0: 10 e8 adc %ch,%al >> 2: 56 push %rsi >> 3: fd std >> 4: 67 00 48 89 add %cl,-0x77(%eax) >> 8: c3 ret >> 9: 48 3d 02 04 00 00 cmp $0x402,%rax >> f: 74 e2 je 0xfffffffffffffff3 >> 11: 48 3d 06 04 00 00 cmp $0x406,%rax >> 17: 74 da je 0xfffffffffffffff3 >> 19: 48 85 c0 test %rax,%rax >> 1c: 0f 84 3e 02 00 00 je 0x260 >> 22: a8 01 test $0x1,%al >> 24: 0f 85 40 02 00 00 jne 0x26a >> 2a:* 8b 40 34 mov 0x34(%rax),%eax <-- >> trapping instruction >> 2d: 85 c0 test %eax,%eax >> 2f: 74 c2 je 0xfffffffffffffff3 >> 31: 8d 50 01 lea 0x1(%rax),%edx >> 34: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) >> 39: 75 f2 jne 0x2d >> 3b: 48 8b 54 24 28 mov 0x28(%rsp),%rdx >> >> Code starting with the faulting instruction >> =========================================== >> 0: 8b 40 34 mov 0x34(%rax),%eax >> 3: 85 c0 test %eax,%eax >> 5: 74 c2 je 0xffffffffffffffc9 >> 7: 8d 50 01 lea 0x1(%rax),%edx >> a: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) >> f: 75 f2 jne 0x3 >> 11: 48 8b 54 24 28 mov 0x28(%rsp),%rdx >> [ 1975.257212] RSP: 0000:ffffc2d744c37cb0 EFLAGS: 00010246 >> [ 1975.257213] RAX: 0000000000000002 RBX: 0000000000000002 RCX: >> 0000000000000000 >> [ 1975.257214] RDX: 0000000000000000 RSI: ffffffffbb117459 RDI: >> 00000000ffffffff >> [ 1975.257215] RBP: 0000000000000000 R08: 00000000ffffdfff R09: >> 00000000ffffdfff >> [ 1975.257215] R10: ffffffffbb472dc0 R11: ffffffffbb472dc0 R12: >> 0000000000000000 >> [ 1975.257216] R13: ffff9fc521173e78 R14: 00000000000decc7 R15: >> fff000003fffffff >> [ 1975.257217] FS: 00007fb2137fe6c0(0000) GS:ffff9fcb7eb40000(0000) >> knlGS:0000000000000000 >> [ 1975.257218] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 1975.257219] CR2: 0000000000000036 CR3: 0000000164114000 CR4: >> 0000000000750ee0 >> [ 1975.257220] PKRU: 55555554 >> >> (full dmesg and my local changeset in attachments for your reference) >> > > #regzbot poke >
Created attachment 303783 [details] bug-latest I did some more testing on v6.1.12 and reproduced the issue. But i have new bit of information: since the last time i've seen this issue i've migrated most of my storage from XFS to BTRFS and i couldn't reproduce the issue again today until i switched the source volume in the test back to XFS. So it seems bug is either in the way that XFS talks to mm/folios or is just triggered by it. anyway, i attached a report from v6.1.2 (seems to be happening in the same place) On 2/24/23 13:21, Linux regression tracking (Thorsten Leemhuis) wrote: > On 16.12.22 06:23, Thorsten Leemhuis wrote: >> Hi, this is your Linux kernel regression tracker. Top-posting for once, >> to make this easily accessible to everyone. > /me again > >> Was some progress made to get this regression resolved? From here it >> looks kinda stalled, that's why I'm asking -- but maybe I just missed >> something. > Did anything happen to get this regression resolved? Doesn't look like > it, but maybe I missed some progress. > > Willy, Mikhail confirmed off-list to me that the problem still exists. > He also tried you patch and reported back. Is there something else you need? > > Side note: I lost this out of sight during the festive season and should > have asked this earlier, but better late than never. :-D > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) > -- > Everything you wanna know about Linux kernel regression tracking: > https://linux-regtracking.leemhuis.info/about/#tldr > If I did something stupid, please tell me, as explained on that page. > > #regzbot poke > >> On 06.12.22 03:08, Mikhail Pletnev wrote: >>> On Mon, 5 Dec 2022 20:25:11 +0000 >>> Matthew Wilcox <willy@infradead.org> wrote: >>>> Thanks! I think this may be the problem ... >>>> >>> Hi Matthew, thanks for swift response, i've applied your last patch and ran >>> my stress test a couple of times. It's still constistently crashing (albeit >>> it seems in a different place): >>> >>> [ 1975.257126] ***BAD SIBLING*** index 912583 offset 4 >>> [ 1975.257128] node ffff9fc817e01ff0 offset 51 parent ffff9fc5c7a31ff0 >>> shift 0 count 64 values 48 array ffff9fc521173e80 list ffff9fc817e02008 >>> ffff9fc817e02008 marks 0 0 0 >>> [ 1975.257133] BUG: kernel NULL pointer dereference, address: >>> 0000000000000036 >>> [ 1975.257135] #PF: supervisor read access in kernel mode >>> [ 1975.257137] #PF: error_code(0x0000) - not-present page >>> [ 1975.257138] PGD 0 P4D 0 >>> [ 1975.257139] Oops: 0000 [#1] PREEMPT SMP NOPTI >>> [ 1975.257141] CPU: 5 PID: 8303 Comm: deluge-gtk Not tainted >>> 5.17.0-rc4_ap_test-00163-g793917d997df-dirty #6 >>> [ 1975.257144] Hardware name: Micro-Star International Co., Ltd. >>> MS-7C35/MEG X570 UNIFY (MS-7C35), BIOS A.C3 03/15/2022 >>> [ 1975.257146] RIP: 0010:__filemap_get_folio >>> (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/atomic.h:29 >>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1158 >>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1183 >>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-instrumented.h:608 >>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:238 >>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:247 >>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:280 >>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:313 >>> /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1899 >>> /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1951) >>> [ 1975.257152] Code: 10 e8 56 fd 67 00 48 89 c3 48 3d 02 04 00 00 74 e2 48 >>> 3d 06 04 00 00 74 da 48 85 c0 0f 84 3e 02 00 00 a8 01 0f 85 40 02 00 00 >>> <8b> 40 34 85 c0 74 c2 8d 50 01 f0 0f b1 53 34 75 f2 48 8b 54 24 28 >>> All code >>> ======== >>> 0: 10 e8 adc %ch,%al >>> 2: 56 push %rsi >>> 3: fd std >>> 4: 67 00 48 89 add %cl,-0x77(%eax) >>> 8: c3 ret >>> 9: 48 3d 02 04 00 00 cmp $0x402,%rax >>> f: 74 e2 je 0xfffffffffffffff3 >>> 11: 48 3d 06 04 00 00 cmp $0x406,%rax >>> 17: 74 da je 0xfffffffffffffff3 >>> 19: 48 85 c0 test %rax,%rax >>> 1c: 0f 84 3e 02 00 00 je 0x260 >>> 22: a8 01 test $0x1,%al >>> 24: 0f 85 40 02 00 00 jne 0x26a >>> 2a:* 8b 40 34 mov 0x34(%rax),%eax <-- >>> trapping instruction >>> 2d: 85 c0 test %eax,%eax >>> 2f: 74 c2 je 0xfffffffffffffff3 >>> 31: 8d 50 01 lea 0x1(%rax),%edx >>> 34: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) >>> 39: 75 f2 jne 0x2d >>> 3b: 48 8b 54 24 28 mov 0x28(%rsp),%rdx >>> >>> Code starting with the faulting instruction >>> =========================================== >>> 0: 8b 40 34 mov 0x34(%rax),%eax >>> 3: 85 c0 test %eax,%eax >>> 5: 74 c2 je 0xffffffffffffffc9 >>> 7: 8d 50 01 lea 0x1(%rax),%edx >>> a: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) >>> f: 75 f2 jne 0x3 >>> 11: 48 8b 54 24 28 mov 0x28(%rsp),%rdx >>> [ 1975.257154] RSP: 0000:ffffc2d744c37cb0 EFLAGS: 00010246 >>> [ 1975.257155] RAX: 0000000000000002 RBX: 0000000000000002 RCX: >>> 0000000000000000 >>> [ 1975.257156] RDX: 0000000000000000 RSI: ffffffffbb117459 RDI: >>> 00000000ffffffff >>> [ 1975.257157] RBP: 0000000000000000 R08: 00000000ffffdfff R09: >>> 00000000ffffdfff >>> [ 1975.257158] R10: ffffffffbb472dc0 R11: ffffffffbb472dc0 R12: >>> 0000000000000000 >>> [ 1975.257159] R13: ffff9fc521173e78 R14: 00000000000decc7 R15: >>> fff000003fffffff >>> [ 1975.257160] FS: 00007fb2137fe6c0(0000) GS:ffff9fcb7eb40000(0000) >>> knlGS:0000000000000000 >>> [ 1975.257161] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [ 1975.257162] CR2: 0000000000000036 CR3: 0000000164114000 CR4: >>> 0000000000750ee0 >>> [ 1975.257163] PKRU: 55555554 >>> [ 1975.257163] Call Trace: >>> [ 1975.257164] <TASK> >>> [ 1975.257166] ? page_add_file_rmap >>> (/home/reinhardt/dev-apps/kernel/linux/./include/linux/page-flags.h:195 >>> /home/reinhardt/dev-apps/kernel/linux/mm/internal.h:440 >>> /home/reinhardt/dev-apps/kernel/linux/mm/rmap.c:1270) >>> [ 1975.257169] filemap_fault >>> (/home/reinhardt/dev-apps/kernel/linux/./include/linux/pagemap.h:531 >>> /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:3107) >>> [ 1975.257172] __do_fault >>> (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:3852) >>> [ 1975.257174] __handle_mm_fault >>> (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4169 >>> /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4297 >>> /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4555 >>> /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4690) >>> [ 1975.257176] handle_mm_fault >>> (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4788) >>> [ 1975.257178] do_user_addr_fault >>> (/home/reinhardt/dev-apps/kernel/linux/./include/linux/sched/signal.h:404 >>> /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1399) >>> [ 1975.257181] exc_page_fault >>> (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/irqflags.h:40 >>> /home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/irqflags.h:75 >>> /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1492 >>> /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1540) >>> [ 1975.257184] ? asm_exc_page_fault >>> (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/idtentry.h:568) >>> [ 1975.257186] asm_exc_page_fault >>> (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/idtentry.h:568) >>> [ 1975.257188] RIP: 0033:0x7fb265b88409 >>> [ 1975.257189] Code: 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 >>> 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 83 fa 20 72 27 >>> <c5> fe 6f 06 48 83 fa 40 0f 87 a9 00 00 00 c5 fe 6f 4c 16 e0 c5 fe >>> All code >>> ======== >>> 0: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) >>> 7: 00 00 00 00 >>> b: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) >>> 12: 00 00 00 00 >>> 16: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) >>> 1d: 00 00 00 00 >>> 21: 48 89 f8 mov %rdi,%rax >>> 24: 48 83 fa 20 cmp $0x20,%rdx >>> 28: 72 27 jb 0x51 >>> 2a:* c5 fe 6f 06 vmovdqu (%rsi),%ymm0 <-- >>> trapping instruction >>> 2e: 48 83 fa 40 cmp $0x40,%rdx >>> 32: 0f 87 a9 00 00 00 ja 0xe1 >>> 38: c5 fe 6f 4c 16 e0 vmovdqu -0x20(%rsi,%rdx,1),%ymm1 >>> 3e: c5 .byte 0xc5 >>> 3f: fe .byte 0xfe >>> >>> Code starting with the faulting instruction >>> =========================================== >>> 0: c5 fe 6f 06 vmovdqu (%rsi),%ymm0 >>> 4: 48 83 fa 40 cmp $0x40,%rdx >>> 8: 0f 87 a9 00 00 00 ja 0xb7 >>> e: c5 fe 6f 4c 16 e0 vmovdqu -0x20(%rsi,%rdx,1),%ymm1 >>> 14: c5 .byte 0xc5 >>> 15: fe .byte 0xfe >>> [ 1975.257190] RSP: 002b:00007fb2137fd908 EFLAGS: 00010202 >>> [ 1975.257191] RAX: 00007fb204012a80 RBX: 0000000000000000 RCX: >>> 00007fb2137fda90 >>> [ 1975.257192] RDX: 0000000000004000 RSI: 00007f9fddbb51c3 RDI: >>> 00007fb204012a80 >>> [ 1975.257193] RBP: 00007fb2137fd928 R08: 00000000638ea1ab R09: >>> 0000000000000000 >>> [ 1975.257193] R10: 0000000000000008 R11: 0000000000000246 R12: >>> 00007fb204000bb0 >>> [ 1975.257194] R13: 00007fb21809a5a0 R14: 00000000decc71c3 R15: >>> 0000000000004000 >>> [ 1975.257196] </TASK> >>> [ 1975.257196] Modules linked in: overlay xt_addrtype amdgpu drm_ttm_helper >>> ttm gpu_sched drm_kms_helper iwlmvm backlight syscopyarea mac80211 >>> sysfillrect sysimgblt libarc4 fb_sys_fops iwlwifi cfg80211 i2c_piix4 >>> k10temp fuse configfs efivarfs >>> [ 1975.257207] CR2: 0000000000000036 >>> [ 1975.257208] ---[ end trace 0000000000000000 ]--- >>> [ 1975.257209] RIP: 0010:__filemap_get_folio >>> (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/atomic.h:29 >>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1158 >>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1183 >>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-instrumented.h:608 >>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:238 >>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:247 >>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:280 >>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:313 >>> /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1899 >>> /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1951) >>> [ 1975.257211] Code: 10 e8 56 fd 67 00 48 89 c3 48 3d 02 04 00 00 74 e2 48 >>> 3d 06 04 00 00 74 da 48 85 c0 0f 84 3e 02 00 00 a8 01 0f 85 40 02 00 00 >>> <8b> 40 34 85 c0 74 c2 8d 50 01 f0 0f b1 53 34 75 f2 48 8b 54 24 28 >>> All code >>> ======== >>> 0: 10 e8 adc %ch,%al >>> 2: 56 push %rsi >>> 3: fd std >>> 4: 67 00 48 89 add %cl,-0x77(%eax) >>> 8: c3 ret >>> 9: 48 3d 02 04 00 00 cmp $0x402,%rax >>> f: 74 e2 je 0xfffffffffffffff3 >>> 11: 48 3d 06 04 00 00 cmp $0x406,%rax >>> 17: 74 da je 0xfffffffffffffff3 >>> 19: 48 85 c0 test %rax,%rax >>> 1c: 0f 84 3e 02 00 00 je 0x260 >>> 22: a8 01 test $0x1,%al >>> 24: 0f 85 40 02 00 00 jne 0x26a >>> 2a:* 8b 40 34 mov 0x34(%rax),%eax <-- >>> trapping instruction >>> 2d: 85 c0 test %eax,%eax >>> 2f: 74 c2 je 0xfffffffffffffff3 >>> 31: 8d 50 01 lea 0x1(%rax),%edx >>> 34: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) >>> 39: 75 f2 jne 0x2d >>> 3b: 48 8b 54 24 28 mov 0x28(%rsp),%rdx >>> >>> Code starting with the faulting instruction >>> =========================================== >>> 0: 8b 40 34 mov 0x34(%rax),%eax >>> 3: 85 c0 test %eax,%eax >>> 5: 74 c2 je 0xffffffffffffffc9 >>> 7: 8d 50 01 lea 0x1(%rax),%edx >>> a: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) >>> f: 75 f2 jne 0x3 >>> 11: 48 8b 54 24 28 mov 0x28(%rsp),%rdx >>> [ 1975.257212] RSP: 0000:ffffc2d744c37cb0 EFLAGS: 00010246 >>> [ 1975.257213] RAX: 0000000000000002 RBX: 0000000000000002 RCX: >>> 0000000000000000 >>> [ 1975.257214] RDX: 0000000000000000 RSI: ffffffffbb117459 RDI: >>> 00000000ffffffff >>> [ 1975.257215] RBP: 0000000000000000 R08: 00000000ffffdfff R09: >>> 00000000ffffdfff >>> [ 1975.257215] R10: ffffffffbb472dc0 R11: ffffffffbb472dc0 R12: >>> 0000000000000000 >>> [ 1975.257216] R13: ffff9fc521173e78 R14: 00000000000decc7 R15: >>> fff000003fffffff >>> [ 1975.257217] FS: 00007fb2137fe6c0(0000) GS:ffff9fcb7eb40000(0000) >>> knlGS:0000000000000000 >>> [ 1975.257218] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [ 1975.257219] CR2: 0000000000000036 CR3: 0000000164114000 CR4: >>> 0000000000750ee0 >>> [ 1975.257220] PKRU: 55555554 >>> >>> (full dmesg and my local changeset in attachments for your reference) >>> >> #regzbot poke >>
On 24.02.23 19:08, Mikhail Pletenv wrote: > I did some more testing on v6.1.12 and reproduced the issue. But i have > new bit of information: since the last time i've seen this issue i've > migrated most of my storage from XFS to BTRFS and i couldn't reproduce > the issue again today until i switched the source volume in the test > back to XFS. So it seems bug is either in the way that XFS talks to > mm/folios or is just triggered by it. > > anyway, i attached a report from v6.1.2 (seems to be happening in the > same place) Hi Willy! I'd like to bring this back onto your radar, as this regression is still unsolved afaics -- the patch you provided only partally helped. Or was progress to fix this made in a different thread and I just missed it? Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. #regzbot poke > On 2/24/23 13:21, Linux regression tracking (Thorsten Leemhuis) wrote: >> On 16.12.22 06:23, Thorsten Leemhuis wrote: >>> Hi, this is your Linux kernel regression tracker. Top-posting for once, >>> to make this easily accessible to everyone. >> /me again >> >>> Was some progress made to get this regression resolved? From here it >>> looks kinda stalled, that's why I'm asking -- but maybe I just missed >>> something. >> Did anything happen to get this regression resolved? Doesn't look like >> it, but maybe I missed some progress. >> >> Willy, Mikhail confirmed off-list to me that the problem still exists. >> He also tried you patch and reported back. Is there something else you >> need? >> >> Side note: I lost this out of sight during the festive season and should >> have asked this earlier, but better late than never. :-D >> >> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) >> -- >> Everything you wanna know about Linux kernel regression tracking: >> https://linux-regtracking.leemhuis.info/about/#tldr >> If I did something stupid, please tell me, as explained on that page. >> >> #regzbot poke >> >>> On 06.12.22 03:08, Mikhail Pletnev wrote: >>>> On Mon, 5 Dec 2022 20:25:11 +0000 >>>> Matthew Wilcox <willy@infradead.org> wrote: >>>>> Thanks! I think this may be the problem ... >>>>> >>>> Hi Matthew, thanks for swift response, i've applied your last patch >>>> and ran my stress test a couple of times. It's still constistently >>>> crashing (albeit it seems in a different place): >>>> >>>> [ 1975.257126] ***BAD SIBLING*** index 912583 offset 4 >>>> [ 1975.257128] node ffff9fc817e01ff0 offset 51 parent >>>> ffff9fc5c7a31ff0 shift 0 count 64 values 48 array ffff9fc521173e80 >>>> list ffff9fc817e02008 ffff9fc817e02008 marks 0 0 0 >>>> [ 1975.257133] BUG: kernel NULL pointer dereference, address: >>>> 0000000000000036 >>>> [ 1975.257135] #PF: supervisor read access in kernel mode >>>> [ 1975.257137] #PF: error_code(0x0000) - not-present page >>>> [ 1975.257138] PGD 0 P4D 0 >>>> [ 1975.257139] Oops: 0000 [#1] PREEMPT SMP NOPTI >>>> [ 1975.257141] CPU: 5 PID: 8303 Comm: deluge-gtk Not tainted >>>> 5.17.0-rc4_ap_test-00163-g793917d997df-dirty #6 >>>> [ 1975.257144] Hardware name: Micro-Star International Co., Ltd. >>>> MS-7C35/MEG X570 UNIFY (MS-7C35), BIOS A.C3 03/15/2022 >>>> [ 1975.257146] RIP: 0010:__filemap_get_folio >>>> (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/atomic.h:29 >>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1158 >>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1183 >>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-instrumented.h:608 >>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:238 >>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:247 >>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:280 >>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:313 >>>> /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1899 >>>> /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1951) >>>> [ 1975.257152] Code: 10 e8 56 fd 67 00 48 89 c3 48 3d 02 04 00 00 74 >>>> e2 48 3d 06 04 00 00 74 da 48 85 c0 0f 84 3e 02 00 00 a8 01 0f 85 40 >>>> 02 00 00 <8b> 40 34 85 c0 74 c2 8d 50 01 f0 0f b1 53 34 75 f2 48 8b >>>> 54 24 28 >>>> All code >>>> ======== >>>> 0: 10 e8 adc %ch,%al >>>> 2: 56 push %rsi >>>> 3: fd std >>>> 4: 67 00 48 89 add %cl,-0x77(%eax) >>>> 8: c3 ret >>>> 9: 48 3d 02 04 00 00 cmp $0x402,%rax >>>> f: 74 e2 je 0xfffffffffffffff3 >>>> 11: 48 3d 06 04 00 00 cmp $0x406,%rax >>>> 17: 74 da je 0xfffffffffffffff3 >>>> 19: 48 85 c0 test %rax,%rax >>>> 1c: 0f 84 3e 02 00 00 je 0x260 >>>> 22: a8 01 test $0x1,%al >>>> 24: 0f 85 40 02 00 00 jne 0x26a >>>> 2a:* 8b 40 34 mov 0x34(%rax),%eax >>>> <-- trapping instruction >>>> 2d: 85 c0 test %eax,%eax >>>> 2f: 74 c2 je 0xfffffffffffffff3 >>>> 31: 8d 50 01 lea 0x1(%rax),%edx >>>> 34: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) >>>> 39: 75 f2 jne 0x2d >>>> 3b: 48 8b 54 24 28 mov 0x28(%rsp),%rdx >>>> >>>> Code starting with the faulting instruction >>>> =========================================== >>>> 0: 8b 40 34 mov 0x34(%rax),%eax >>>> 3: 85 c0 test %eax,%eax >>>> 5: 74 c2 je 0xffffffffffffffc9 >>>> 7: 8d 50 01 lea 0x1(%rax),%edx >>>> a: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) >>>> f: 75 f2 jne 0x3 >>>> 11: 48 8b 54 24 28 mov 0x28(%rsp),%rdx >>>> [ 1975.257154] RSP: 0000:ffffc2d744c37cb0 EFLAGS: 00010246 >>>> [ 1975.257155] RAX: 0000000000000002 RBX: 0000000000000002 RCX: >>>> 0000000000000000 >>>> [ 1975.257156] RDX: 0000000000000000 RSI: ffffffffbb117459 RDI: >>>> 00000000ffffffff >>>> [ 1975.257157] RBP: 0000000000000000 R08: 00000000ffffdfff R09: >>>> 00000000ffffdfff >>>> [ 1975.257158] R10: ffffffffbb472dc0 R11: ffffffffbb472dc0 R12: >>>> 0000000000000000 >>>> [ 1975.257159] R13: ffff9fc521173e78 R14: 00000000000decc7 R15: >>>> fff000003fffffff >>>> [ 1975.257160] FS: 00007fb2137fe6c0(0000) GS:ffff9fcb7eb40000(0000) >>>> knlGS:0000000000000000 >>>> [ 1975.257161] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> [ 1975.257162] CR2: 0000000000000036 CR3: 0000000164114000 CR4: >>>> 0000000000750ee0 >>>> [ 1975.257163] PKRU: 55555554 >>>> [ 1975.257163] Call Trace: >>>> [ 1975.257164] <TASK> >>>> [ 1975.257166] ? page_add_file_rmap >>>> (/home/reinhardt/dev-apps/kernel/linux/./include/linux/page-flags.h:195 >>>> /home/reinhardt/dev-apps/kernel/linux/mm/internal.h:440 >>>> /home/reinhardt/dev-apps/kernel/linux/mm/rmap.c:1270) >>>> [ 1975.257169] filemap_fault >>>> (/home/reinhardt/dev-apps/kernel/linux/./include/linux/pagemap.h:531 >>>> /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:3107) >>>> [ 1975.257172] __do_fault >>>> (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:3852) >>>> [ 1975.257174] __handle_mm_fault >>>> (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4169 >>>> /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4297 >>>> /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4555 >>>> /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4690) >>>> [ 1975.257176] handle_mm_fault >>>> (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4788) >>>> [ 1975.257178] do_user_addr_fault >>>> (/home/reinhardt/dev-apps/kernel/linux/./include/linux/sched/signal.h:404 >>>> /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1399) >>>> [ 1975.257181] exc_page_fault >>>> >>>> (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/irqflags.h:40 >>>> /home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/irqflags.h:75 >>>> /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1492 >>>> /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1540) >>>> [ 1975.257184] ? asm_exc_page_fault >>>> >>>> (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/idtentry.h:568) >>>> [ 1975.257186] asm_exc_page_fault >>>> >>>> (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/idtentry.h:568) >>>> [ 1975.257188] RIP: 0033:0x7fb265b88409 >>>> [ 1975.257189] Code: 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f >>>> 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 83 fa >>>> 20 72 27 <c5> fe 6f 06 48 83 fa 40 0f 87 a9 00 00 00 c5 fe 6f 4c 16 >>>> e0 c5 fe >>>> All code >>>> ======== >>>> 0: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) >>>> 7: 00 00 00 00 >>>> b: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) >>>> 12: 00 00 00 00 >>>> 16: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) >>>> 1d: 00 00 00 00 >>>> 21: 48 89 f8 mov %rdi,%rax >>>> 24: 48 83 fa 20 cmp $0x20,%rdx >>>> 28: 72 27 jb 0x51 >>>> 2a:* c5 fe 6f 06 vmovdqu (%rsi),%ymm0 <-- >>>> trapping instruction >>>> 2e: 48 83 fa 40 cmp $0x40,%rdx >>>> 32: 0f 87 a9 00 00 00 ja 0xe1 >>>> 38: c5 fe 6f 4c 16 e0 vmovdqu -0x20(%rsi,%rdx,1),%ymm1 >>>> 3e: c5 .byte 0xc5 >>>> 3f: fe .byte 0xfe >>>> >>>> Code starting with the faulting instruction >>>> =========================================== >>>> 0: c5 fe 6f 06 vmovdqu (%rsi),%ymm0 >>>> 4: 48 83 fa 40 cmp $0x40,%rdx >>>> 8: 0f 87 a9 00 00 00 ja 0xb7 >>>> e: c5 fe 6f 4c 16 e0 vmovdqu -0x20(%rsi,%rdx,1),%ymm1 >>>> 14: c5 .byte 0xc5 >>>> 15: fe .byte 0xfe >>>> [ 1975.257190] RSP: 002b:00007fb2137fd908 EFLAGS: 00010202 >>>> [ 1975.257191] RAX: 00007fb204012a80 RBX: 0000000000000000 RCX: >>>> 00007fb2137fda90 >>>> [ 1975.257192] RDX: 0000000000004000 RSI: 00007f9fddbb51c3 RDI: >>>> 00007fb204012a80 >>>> [ 1975.257193] RBP: 00007fb2137fd928 R08: 00000000638ea1ab R09: >>>> 0000000000000000 >>>> [ 1975.257193] R10: 0000000000000008 R11: 0000000000000246 R12: >>>> 00007fb204000bb0 >>>> [ 1975.257194] R13: 00007fb21809a5a0 R14: 00000000decc71c3 R15: >>>> 0000000000004000 >>>> [ 1975.257196] </TASK> >>>> [ 1975.257196] Modules linked in: overlay xt_addrtype amdgpu >>>> drm_ttm_helper ttm gpu_sched drm_kms_helper iwlmvm backlight >>>> syscopyarea mac80211 sysfillrect sysimgblt libarc4 fb_sys_fops >>>> iwlwifi cfg80211 i2c_piix4 k10temp fuse configfs efivarfs >>>> [ 1975.257207] CR2: 0000000000000036 >>>> [ 1975.257208] ---[ end trace 0000000000000000 ]--- >>>> [ 1975.257209] RIP: 0010:__filemap_get_folio >>>> (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/atomic.h:29 >>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1158 >>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1183 >>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-instrumented.h:608 >>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:238 >>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:247 >>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:280 >>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:313 >>>> /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1899 >>>> /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1951) >>>> [ 1975.257211] Code: 10 e8 56 fd 67 00 48 89 c3 48 3d 02 04 00 00 74 >>>> e2 48 3d 06 04 00 00 74 da 48 85 c0 0f 84 3e 02 00 00 a8 01 0f 85 40 >>>> 02 00 00 <8b> 40 34 85 c0 74 c2 8d 50 01 f0 0f b1 53 34 75 f2 48 8b >>>> 54 24 28 >>>> All code >>>> ======== >>>> 0: 10 e8 adc %ch,%al >>>> 2: 56 push %rsi >>>> 3: fd std >>>> 4: 67 00 48 89 add %cl,-0x77(%eax) >>>> 8: c3 ret >>>> 9: 48 3d 02 04 00 00 cmp $0x402,%rax >>>> f: 74 e2 je 0xfffffffffffffff3 >>>> 11: 48 3d 06 04 00 00 cmp $0x406,%rax >>>> 17: 74 da je 0xfffffffffffffff3 >>>> 19: 48 85 c0 test %rax,%rax >>>> 1c: 0f 84 3e 02 00 00 je 0x260 >>>> 22: a8 01 test $0x1,%al >>>> 24: 0f 85 40 02 00 00 jne 0x26a >>>> 2a:* 8b 40 34 mov 0x34(%rax),%eax >>>> <-- trapping instruction >>>> 2d: 85 c0 test %eax,%eax >>>> 2f: 74 c2 je 0xfffffffffffffff3 >>>> 31: 8d 50 01 lea 0x1(%rax),%edx >>>> 34: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) >>>> 39: 75 f2 jne 0x2d >>>> 3b: 48 8b 54 24 28 mov 0x28(%rsp),%rdx >>>> >>>> Code starting with the faulting instruction >>>> =========================================== >>>> 0: 8b 40 34 mov 0x34(%rax),%eax >>>> 3: 85 c0 test %eax,%eax >>>> 5: 74 c2 je 0xffffffffffffffc9 >>>> 7: 8d 50 01 lea 0x1(%rax),%edx >>>> a: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) >>>> f: 75 f2 jne 0x3 >>>> 11: 48 8b 54 24 28 mov 0x28(%rsp),%rdx >>>> [ 1975.257212] RSP: 0000:ffffc2d744c37cb0 EFLAGS: 00010246 >>>> [ 1975.257213] RAX: 0000000000000002 RBX: 0000000000000002 RCX: >>>> 0000000000000000 >>>> [ 1975.257214] RDX: 0000000000000000 RSI: ffffffffbb117459 RDI: >>>> 00000000ffffffff >>>> [ 1975.257215] RBP: 0000000000000000 R08: 00000000ffffdfff R09: >>>> 00000000ffffdfff >>>> [ 1975.257215] R10: ffffffffbb472dc0 R11: ffffffffbb472dc0 R12: >>>> 0000000000000000 >>>> [ 1975.257216] R13: ffff9fc521173e78 R14: 00000000000decc7 R15: >>>> fff000003fffffff >>>> [ 1975.257217] FS: 00007fb2137fe6c0(0000) GS:ffff9fcb7eb40000(0000) >>>> knlGS:0000000000000000 >>>> [ 1975.257218] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> [ 1975.257219] CR2: 0000000000000036 CR3: 0000000164114000 CR4: >>>> 0000000000750ee0 >>>> [ 1975.257220] PKRU: 55555554 >>>> >>>> (full dmesg and my local changeset in attachments for your reference) >>>> >>> #regzbot poke >>>
On 14.03.23 11:17, Linux regression tracking (Thorsten Leemhuis) wrote: > On 24.02.23 19:08, Mikhail Pletenv wrote: >> I did some more testing on v6.1.12 and reproduced the issue. But i have >> new bit of information: since the last time i've seen this issue i've >> migrated most of my storage from XFS to BTRFS and i couldn't reproduce >> the issue again today until i switched the source volume in the test >> back to XFS. So it seems bug is either in the way that XFS talks to >> mm/folios or is just triggered by it. >> >> anyway, i attached a report from v6.1.2 (seems to be happening in the >> same place) > > Hi Willy! I'd like to bring this back onto your radar, as this > regression is still unsolved afaics -- the patch you provided only > partially helped. Or was progress to fix this made in a different thread > and I just missed it? Willy, I know, I'm kinda annoying, but it's part of my job, hence please allow me to ask: Do you still have this regression on your todo list somewhere? The problem is now known and bisected since November. I understand that this is not something that can be fixed quickly, but at the same time it's quite a while already. Or has progress to fix this been made and I just it? Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. >> On 2/24/23 13:21, Linux regression tracking (Thorsten Leemhuis) wrote: >>> On 16.12.22 06:23, Thorsten Leemhuis wrote: >>>> Hi, this is your Linux kernel regression tracker. Top-posting for once, >>>> to make this easily accessible to everyone. >>> /me again >>> >>>> Was some progress made to get this regression resolved? From here it >>>> looks kinda stalled, that's why I'm asking -- but maybe I just missed >>>> something. >>> Did anything happen to get this regression resolved? Doesn't look like >>> it, but maybe I missed some progress. >>> >>> Willy, Mikhail confirmed off-list to me that the problem still exists. >>> He also tried you patch and reported back. Is there something else you >>> need? >>> >>> Side note: I lost this out of sight during the festive season and should >>> have asked this earlier, but better late than never. :-D >>> >>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) >>> -- >>> Everything you wanna know about Linux kernel regression tracking: >>> https://linux-regtracking.leemhuis.info/about/#tldr >>> If I did something stupid, please tell me, as explained on that page. >>> >>> #regzbot poke >>> >>>> On 06.12.22 03:08, Mikhail Pletnev wrote: >>>>> On Mon, 5 Dec 2022 20:25:11 +0000 >>>>> Matthew Wilcox <willy@infradead.org> wrote: >>>>>> Thanks! I think this may be the problem ... >>>>>> >>>>> Hi Matthew, thanks for swift response, i've applied your last patch >>>>> and ran my stress test a couple of times. It's still constistently >>>>> crashing (albeit it seems in a different place): >>>>> >>>>> [ 1975.257126] ***BAD SIBLING*** index 912583 offset 4 >>>>> [ 1975.257128] node ffff9fc817e01ff0 offset 51 parent >>>>> ffff9fc5c7a31ff0 shift 0 count 64 values 48 array ffff9fc521173e80 >>>>> list ffff9fc817e02008 ffff9fc817e02008 marks 0 0 0 >>>>> [ 1975.257133] BUG: kernel NULL pointer dereference, address: >>>>> 0000000000000036 >>>>> [ 1975.257135] #PF: supervisor read access in kernel mode >>>>> [ 1975.257137] #PF: error_code(0x0000) - not-present page >>>>> [ 1975.257138] PGD 0 P4D 0 >>>>> [ 1975.257139] Oops: 0000 [#1] PREEMPT SMP NOPTI >>>>> [ 1975.257141] CPU: 5 PID: 8303 Comm: deluge-gtk Not tainted >>>>> 5.17.0-rc4_ap_test-00163-g793917d997df-dirty #6 >>>>> [ 1975.257144] Hardware name: Micro-Star International Co., Ltd. >>>>> MS-7C35/MEG X570 UNIFY (MS-7C35), BIOS A.C3 03/15/2022 >>>>> [ 1975.257146] RIP: 0010:__filemap_get_folio >>>>> (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/atomic.h:29 >>>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1158 >>>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1183 >>>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-instrumented.h:608 >>>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:238 >>>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:247 >>>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:280 >>>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:313 >>>>> /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1899 >>>>> /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1951) >>>>> [ 1975.257152] Code: 10 e8 56 fd 67 00 48 89 c3 48 3d 02 04 00 00 74 >>>>> e2 48 3d 06 04 00 00 74 da 48 85 c0 0f 84 3e 02 00 00 a8 01 0f 85 40 >>>>> 02 00 00 <8b> 40 34 85 c0 74 c2 8d 50 01 f0 0f b1 53 34 75 f2 48 8b >>>>> 54 24 28 >>>>> All code >>>>> ======== >>>>> 0: 10 e8 adc %ch,%al >>>>> 2: 56 push %rsi >>>>> 3: fd std >>>>> 4: 67 00 48 89 add %cl,-0x77(%eax) >>>>> 8: c3 ret >>>>> 9: 48 3d 02 04 00 00 cmp $0x402,%rax >>>>> f: 74 e2 je 0xfffffffffffffff3 >>>>> 11: 48 3d 06 04 00 00 cmp $0x406,%rax >>>>> 17: 74 da je 0xfffffffffffffff3 >>>>> 19: 48 85 c0 test %rax,%rax >>>>> 1c: 0f 84 3e 02 00 00 je 0x260 >>>>> 22: a8 01 test $0x1,%al >>>>> 24: 0f 85 40 02 00 00 jne 0x26a >>>>> 2a:* 8b 40 34 mov 0x34(%rax),%eax >>>>> <-- trapping instruction >>>>> 2d: 85 c0 test %eax,%eax >>>>> 2f: 74 c2 je 0xfffffffffffffff3 >>>>> 31: 8d 50 01 lea 0x1(%rax),%edx >>>>> 34: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) >>>>> 39: 75 f2 jne 0x2d >>>>> 3b: 48 8b 54 24 28 mov 0x28(%rsp),%rdx >>>>> >>>>> Code starting with the faulting instruction >>>>> =========================================== >>>>> 0: 8b 40 34 mov 0x34(%rax),%eax >>>>> 3: 85 c0 test %eax,%eax >>>>> 5: 74 c2 je 0xffffffffffffffc9 >>>>> 7: 8d 50 01 lea 0x1(%rax),%edx >>>>> a: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) >>>>> f: 75 f2 jne 0x3 >>>>> 11: 48 8b 54 24 28 mov 0x28(%rsp),%rdx >>>>> [ 1975.257154] RSP: 0000:ffffc2d744c37cb0 EFLAGS: 00010246 >>>>> [ 1975.257155] RAX: 0000000000000002 RBX: 0000000000000002 RCX: >>>>> 0000000000000000 >>>>> [ 1975.257156] RDX: 0000000000000000 RSI: ffffffffbb117459 RDI: >>>>> 00000000ffffffff >>>>> [ 1975.257157] RBP: 0000000000000000 R08: 00000000ffffdfff R09: >>>>> 00000000ffffdfff >>>>> [ 1975.257158] R10: ffffffffbb472dc0 R11: ffffffffbb472dc0 R12: >>>>> 0000000000000000 >>>>> [ 1975.257159] R13: ffff9fc521173e78 R14: 00000000000decc7 R15: >>>>> fff000003fffffff >>>>> [ 1975.257160] FS: 00007fb2137fe6c0(0000) GS:ffff9fcb7eb40000(0000) >>>>> knlGS:0000000000000000 >>>>> [ 1975.257161] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>> [ 1975.257162] CR2: 0000000000000036 CR3: 0000000164114000 CR4: >>>>> 0000000000750ee0 >>>>> [ 1975.257163] PKRU: 55555554 >>>>> [ 1975.257163] Call Trace: >>>>> [ 1975.257164] <TASK> >>>>> [ 1975.257166] ? page_add_file_rmap >>>>> (/home/reinhardt/dev-apps/kernel/linux/./include/linux/page-flags.h:195 >>>>> /home/reinhardt/dev-apps/kernel/linux/mm/internal.h:440 >>>>> /home/reinhardt/dev-apps/kernel/linux/mm/rmap.c:1270) >>>>> [ 1975.257169] filemap_fault >>>>> (/home/reinhardt/dev-apps/kernel/linux/./include/linux/pagemap.h:531 >>>>> /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:3107) >>>>> [ 1975.257172] __do_fault >>>>> (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:3852) >>>>> [ 1975.257174] __handle_mm_fault >>>>> (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4169 >>>>> /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4297 >>>>> /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4555 >>>>> /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4690) >>>>> [ 1975.257176] handle_mm_fault >>>>> (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4788) >>>>> [ 1975.257178] do_user_addr_fault >>>>> (/home/reinhardt/dev-apps/kernel/linux/./include/linux/sched/signal.h:404 >>>>> /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1399) >>>>> [ 1975.257181] exc_page_fault >>>>> >>>>> (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/irqflags.h:40 >>>>> /home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/irqflags.h:75 >>>>> /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1492 >>>>> /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1540) >>>>> [ 1975.257184] ? asm_exc_page_fault >>>>> >>>>> (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/idtentry.h:568) >>>>> [ 1975.257186] asm_exc_page_fault >>>>> >>>>> (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/idtentry.h:568) >>>>> [ 1975.257188] RIP: 0033:0x7fb265b88409 >>>>> [ 1975.257189] Code: 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f >>>>> 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 83 fa >>>>> 20 72 27 <c5> fe 6f 06 48 83 fa 40 0f 87 a9 00 00 00 c5 fe 6f 4c 16 >>>>> e0 c5 fe >>>>> All code >>>>> ======== >>>>> 0: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) >>>>> 7: 00 00 00 00 >>>>> b: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) >>>>> 12: 00 00 00 00 >>>>> 16: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) >>>>> 1d: 00 00 00 00 >>>>> 21: 48 89 f8 mov %rdi,%rax >>>>> 24: 48 83 fa 20 cmp $0x20,%rdx >>>>> 28: 72 27 jb 0x51 >>>>> 2a:* c5 fe 6f 06 vmovdqu (%rsi),%ymm0 <-- >>>>> trapping instruction >>>>> 2e: 48 83 fa 40 cmp $0x40,%rdx >>>>> 32: 0f 87 a9 00 00 00 ja 0xe1 >>>>> 38: c5 fe 6f 4c 16 e0 vmovdqu -0x20(%rsi,%rdx,1),%ymm1 >>>>> 3e: c5 .byte 0xc5 >>>>> 3f: fe .byte 0xfe >>>>> >>>>> Code starting with the faulting instruction >>>>> =========================================== >>>>> 0: c5 fe 6f 06 vmovdqu (%rsi),%ymm0 >>>>> 4: 48 83 fa 40 cmp $0x40,%rdx >>>>> 8: 0f 87 a9 00 00 00 ja 0xb7 >>>>> e: c5 fe 6f 4c 16 e0 vmovdqu -0x20(%rsi,%rdx,1),%ymm1 >>>>> 14: c5 .byte 0xc5 >>>>> 15: fe .byte 0xfe >>>>> [ 1975.257190] RSP: 002b:00007fb2137fd908 EFLAGS: 00010202 >>>>> [ 1975.257191] RAX: 00007fb204012a80 RBX: 0000000000000000 RCX: >>>>> 00007fb2137fda90 >>>>> [ 1975.257192] RDX: 0000000000004000 RSI: 00007f9fddbb51c3 RDI: >>>>> 00007fb204012a80 >>>>> [ 1975.257193] RBP: 00007fb2137fd928 R08: 00000000638ea1ab R09: >>>>> 0000000000000000 >>>>> [ 1975.257193] R10: 0000000000000008 R11: 0000000000000246 R12: >>>>> 00007fb204000bb0 >>>>> [ 1975.257194] R13: 00007fb21809a5a0 R14: 00000000decc71c3 R15: >>>>> 0000000000004000 >>>>> [ 1975.257196] </TASK> >>>>> [ 1975.257196] Modules linked in: overlay xt_addrtype amdgpu >>>>> drm_ttm_helper ttm gpu_sched drm_kms_helper iwlmvm backlight >>>>> syscopyarea mac80211 sysfillrect sysimgblt libarc4 fb_sys_fops >>>>> iwlwifi cfg80211 i2c_piix4 k10temp fuse configfs efivarfs >>>>> [ 1975.257207] CR2: 0000000000000036 >>>>> [ 1975.257208] ---[ end trace 0000000000000000 ]--- >>>>> [ 1975.257209] RIP: 0010:__filemap_get_folio >>>>> (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/atomic.h:29 >>>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1158 >>>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1183 >>>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-instrumented.h:608 >>>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:238 >>>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:247 >>>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:280 >>>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:313 >>>>> /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1899 >>>>> /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1951) >>>>> [ 1975.257211] Code: 10 e8 56 fd 67 00 48 89 c3 48 3d 02 04 00 00 74 >>>>> e2 48 3d 06 04 00 00 74 da 48 85 c0 0f 84 3e 02 00 00 a8 01 0f 85 40 >>>>> 02 00 00 <8b> 40 34 85 c0 74 c2 8d 50 01 f0 0f b1 53 34 75 f2 48 8b >>>>> 54 24 28 >>>>> All code >>>>> ======== >>>>> 0: 10 e8 adc %ch,%al >>>>> 2: 56 push %rsi >>>>> 3: fd std >>>>> 4: 67 00 48 89 add %cl,-0x77(%eax) >>>>> 8: c3 ret >>>>> 9: 48 3d 02 04 00 00 cmp $0x402,%rax >>>>> f: 74 e2 je 0xfffffffffffffff3 >>>>> 11: 48 3d 06 04 00 00 cmp $0x406,%rax >>>>> 17: 74 da je 0xfffffffffffffff3 >>>>> 19: 48 85 c0 test %rax,%rax >>>>> 1c: 0f 84 3e 02 00 00 je 0x260 >>>>> 22: a8 01 test $0x1,%al >>>>> 24: 0f 85 40 02 00 00 jne 0x26a >>>>> 2a:* 8b 40 34 mov 0x34(%rax),%eax >>>>> <-- trapping instruction >>>>> 2d: 85 c0 test %eax,%eax >>>>> 2f: 74 c2 je 0xfffffffffffffff3 >>>>> 31: 8d 50 01 lea 0x1(%rax),%edx >>>>> 34: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) >>>>> 39: 75 f2 jne 0x2d >>>>> 3b: 48 8b 54 24 28 mov 0x28(%rsp),%rdx >>>>> >>>>> Code starting with the faulting instruction >>>>> =========================================== >>>>> 0: 8b 40 34 mov 0x34(%rax),%eax >>>>> 3: 85 c0 test %eax,%eax >>>>> 5: 74 c2 je 0xffffffffffffffc9 >>>>> 7: 8d 50 01 lea 0x1(%rax),%edx >>>>> a: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) >>>>> f: 75 f2 jne 0x3 >>>>> 11: 48 8b 54 24 28 mov 0x28(%rsp),%rdx >>>>> [ 1975.257212] RSP: 0000:ffffc2d744c37cb0 EFLAGS: 00010246 >>>>> [ 1975.257213] RAX: 0000000000000002 RBX: 0000000000000002 RCX: >>>>> 0000000000000000 >>>>> [ 1975.257214] RDX: 0000000000000000 RSI: ffffffffbb117459 RDI: >>>>> 00000000ffffffff >>>>> [ 1975.257215] RBP: 0000000000000000 R08: 00000000ffffdfff R09: >>>>> 00000000ffffdfff >>>>> [ 1975.257215] R10: ffffffffbb472dc0 R11: ffffffffbb472dc0 R12: >>>>> 0000000000000000 >>>>> [ 1975.257216] R13: ffff9fc521173e78 R14: 00000000000decc7 R15: >>>>> fff000003fffffff >>>>> [ 1975.257217] FS: 00007fb2137fe6c0(0000) GS:ffff9fcb7eb40000(0000) >>>>> knlGS:0000000000000000 >>>>> [ 1975.257218] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>> [ 1975.257219] CR2: 0000000000000036 CR3: 0000000164114000 CR4: >>>>> 0000000000750ee0 >>>>> [ 1975.257220] PKRU: 55555554 >>>>> >>>>> (full dmesg and my local changeset in attachments for your reference) >>>>> >>>> #regzbot poke >>>> > >
On 17.04.23 13:12, Linux regression tracking (Thorsten Leemhuis) wrote: > On 14.03.23 11:17, Linux regression tracking (Thorsten Leemhuis) wrote: >> On 24.02.23 19:08, Mikhail Pletenv wrote: >>> I did some more testing on v6.1.12 and reproduced the issue. But i have >>> new bit of information: since the last time i've seen this issue i've >>> migrated most of my storage from XFS to BTRFS and i couldn't reproduce >>> the issue again today until i switched the source volume in the test >>> back to XFS. So it seems bug is either in the way that XFS talks to >>> mm/folios or is just triggered by it. >>> >>> anyway, i attached a report from v6.1.2 (seems to be happening in the >>> same place) >> >> Hi Willy! I'd like to bring this back onto your radar, as this >> regression is still unsolved afaics -- the patch you provided only >> partially helped. Or was progress to fix this made in a different thread >> and I just missed it? > > Willy, I know, I'm kinda annoying, but it's part of my job, hence please > allow me to ask: > > Do you still have this regression on your todo list somewhere? The > problem is now known and bisected since November. I understand that this > is not something that can be fixed quickly, but at the same time it's > quite a while already. > > Or has progress to fix this been made and I just it? Hmm, no reply. Does nobody care anymore or was this resolved and I just missed it? Mikhail Pletnev: is the problem still happening with latest mainline? Or deid you stop caring after you migrated your storage to btrfs? Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. >>> On 2/24/23 13:21, Linux regression tracking (Thorsten Leemhuis) wrote: >>>> On 16.12.22 06:23, Thorsten Leemhuis wrote: >>>>> Hi, this is your Linux kernel regression tracker. Top-posting for once, >>>>> to make this easily accessible to everyone. >>>> /me again >>>> >>>>> Was some progress made to get this regression resolved? From here it >>>>> looks kinda stalled, that's why I'm asking -- but maybe I just missed >>>>> something. >>>> Did anything happen to get this regression resolved? Doesn't look like >>>> it, but maybe I missed some progress. >>>> >>>> Willy, Mikhail confirmed off-list to me that the problem still exists. >>>> He also tried you patch and reported back. Is there something else you >>>> need? >>>> >>>> Side note: I lost this out of sight during the festive season and should >>>> have asked this earlier, but better late than never. :-D >>>> >>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) >>>> -- >>>> Everything you wanna know about Linux kernel regression tracking: >>>> https://linux-regtracking.leemhuis.info/about/#tldr >>>> If I did something stupid, please tell me, as explained on that page. >>>> >>>> #regzbot poke >>>> >>>>> On 06.12.22 03:08, Mikhail Pletnev wrote: >>>>>> On Mon, 5 Dec 2022 20:25:11 +0000 >>>>>> Matthew Wilcox <willy@infradead.org> wrote: >>>>>>> Thanks! I think this may be the problem ... >>>>>>> >>>>>> Hi Matthew, thanks for swift response, i've applied your last patch >>>>>> and ran my stress test a couple of times. It's still constistently >>>>>> crashing (albeit it seems in a different place): >>>>>> >>>>>> [ 1975.257126] ***BAD SIBLING*** index 912583 offset 4 >>>>>> [ 1975.257128] node ffff9fc817e01ff0 offset 51 parent >>>>>> ffff9fc5c7a31ff0 shift 0 count 64 values 48 array ffff9fc521173e80 >>>>>> list ffff9fc817e02008 ffff9fc817e02008 marks 0 0 0 >>>>>> [ 1975.257133] BUG: kernel NULL pointer dereference, address: >>>>>> 0000000000000036 >>>>>> [ 1975.257135] #PF: supervisor read access in kernel mode >>>>>> [ 1975.257137] #PF: error_code(0x0000) - not-present page >>>>>> [ 1975.257138] PGD 0 P4D 0 >>>>>> [ 1975.257139] Oops: 0000 [#1] PREEMPT SMP NOPTI >>>>>> [ 1975.257141] CPU: 5 PID: 8303 Comm: deluge-gtk Not tainted >>>>>> 5.17.0-rc4_ap_test-00163-g793917d997df-dirty #6 >>>>>> [ 1975.257144] Hardware name: Micro-Star International Co., Ltd. >>>>>> MS-7C35/MEG X570 UNIFY (MS-7C35), BIOS A.C3 03/15/2022 >>>>>> [ 1975.257146] RIP: 0010:__filemap_get_folio >>>>>> >>>>>> (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/atomic.h:29 >>>>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1158 >>>>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1183 >>>>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-instrumented.h:608 >>>>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:238 >>>>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:247 >>>>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:280 >>>>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:313 >>>>>> /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1899 >>>>>> /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1951) >>>>>> [ 1975.257152] Code: 10 e8 56 fd 67 00 48 89 c3 48 3d 02 04 00 00 74 >>>>>> e2 48 3d 06 04 00 00 74 da 48 85 c0 0f 84 3e 02 00 00 a8 01 0f 85 40 >>>>>> 02 00 00 <8b> 40 34 85 c0 74 c2 8d 50 01 f0 0f b1 53 34 75 f2 48 8b >>>>>> 54 24 28 >>>>>> All code >>>>>> ======== >>>>>> 0: 10 e8 adc %ch,%al >>>>>> 2: 56 push %rsi >>>>>> 3: fd std >>>>>> 4: 67 00 48 89 add %cl,-0x77(%eax) >>>>>> 8: c3 ret >>>>>> 9: 48 3d 02 04 00 00 cmp $0x402,%rax >>>>>> f: 74 e2 je 0xfffffffffffffff3 >>>>>> 11: 48 3d 06 04 00 00 cmp $0x406,%rax >>>>>> 17: 74 da je 0xfffffffffffffff3 >>>>>> 19: 48 85 c0 test %rax,%rax >>>>>> 1c: 0f 84 3e 02 00 00 je 0x260 >>>>>> 22: a8 01 test $0x1,%al >>>>>> 24: 0f 85 40 02 00 00 jne 0x26a >>>>>> 2a:* 8b 40 34 mov 0x34(%rax),%eax >>>>>> <-- trapping instruction >>>>>> 2d: 85 c0 test %eax,%eax >>>>>> 2f: 74 c2 je 0xfffffffffffffff3 >>>>>> 31: 8d 50 01 lea 0x1(%rax),%edx >>>>>> 34: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) >>>>>> 39: 75 f2 jne 0x2d >>>>>> 3b: 48 8b 54 24 28 mov 0x28(%rsp),%rdx >>>>>> >>>>>> Code starting with the faulting instruction >>>>>> =========================================== >>>>>> 0: 8b 40 34 mov 0x34(%rax),%eax >>>>>> 3: 85 c0 test %eax,%eax >>>>>> 5: 74 c2 je 0xffffffffffffffc9 >>>>>> 7: 8d 50 01 lea 0x1(%rax),%edx >>>>>> a: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) >>>>>> f: 75 f2 jne 0x3 >>>>>> 11: 48 8b 54 24 28 mov 0x28(%rsp),%rdx >>>>>> [ 1975.257154] RSP: 0000:ffffc2d744c37cb0 EFLAGS: 00010246 >>>>>> [ 1975.257155] RAX: 0000000000000002 RBX: 0000000000000002 RCX: >>>>>> 0000000000000000 >>>>>> [ 1975.257156] RDX: 0000000000000000 RSI: ffffffffbb117459 RDI: >>>>>> 00000000ffffffff >>>>>> [ 1975.257157] RBP: 0000000000000000 R08: 00000000ffffdfff R09: >>>>>> 00000000ffffdfff >>>>>> [ 1975.257158] R10: ffffffffbb472dc0 R11: ffffffffbb472dc0 R12: >>>>>> 0000000000000000 >>>>>> [ 1975.257159] R13: ffff9fc521173e78 R14: 00000000000decc7 R15: >>>>>> fff000003fffffff >>>>>> [ 1975.257160] FS: 00007fb2137fe6c0(0000) GS:ffff9fcb7eb40000(0000) >>>>>> knlGS:0000000000000000 >>>>>> [ 1975.257161] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>> [ 1975.257162] CR2: 0000000000000036 CR3: 0000000164114000 CR4: >>>>>> 0000000000750ee0 >>>>>> [ 1975.257163] PKRU: 55555554 >>>>>> [ 1975.257163] Call Trace: >>>>>> [ 1975.257164] <TASK> >>>>>> [ 1975.257166] ? page_add_file_rmap >>>>>> (/home/reinhardt/dev-apps/kernel/linux/./include/linux/page-flags.h:195 >>>>>> /home/reinhardt/dev-apps/kernel/linux/mm/internal.h:440 >>>>>> /home/reinhardt/dev-apps/kernel/linux/mm/rmap.c:1270) >>>>>> [ 1975.257169] filemap_fault >>>>>> (/home/reinhardt/dev-apps/kernel/linux/./include/linux/pagemap.h:531 >>>>>> /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:3107) >>>>>> [ 1975.257172] __do_fault >>>>>> (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:3852) >>>>>> [ 1975.257174] __handle_mm_fault >>>>>> (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4169 >>>>>> /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4297 >>>>>> /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4555 >>>>>> /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4690) >>>>>> [ 1975.257176] handle_mm_fault >>>>>> (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4788) >>>>>> [ 1975.257178] do_user_addr_fault >>>>>> >>>>>> (/home/reinhardt/dev-apps/kernel/linux/./include/linux/sched/signal.h:404 >>>>>> /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1399) >>>>>> [ 1975.257181] exc_page_fault >>>>>> >>>>>> (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/irqflags.h:40 >>>>>> /home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/irqflags.h:75 >>>>>> /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1492 >>>>>> /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1540) >>>>>> [ 1975.257184] ? asm_exc_page_fault >>>>>> >>>>>> (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/idtentry.h:568) >>>>>> [ 1975.257186] asm_exc_page_fault >>>>>> >>>>>> (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/idtentry.h:568) >>>>>> [ 1975.257188] RIP: 0033:0x7fb265b88409 >>>>>> [ 1975.257189] Code: 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f >>>>>> 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 83 fa >>>>>> 20 72 27 <c5> fe 6f 06 48 83 fa 40 0f 87 a9 00 00 00 c5 fe 6f 4c 16 >>>>>> e0 c5 fe >>>>>> All code >>>>>> ======== >>>>>> 0: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) >>>>>> 7: 00 00 00 00 >>>>>> b: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) >>>>>> 12: 00 00 00 00 >>>>>> 16: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) >>>>>> 1d: 00 00 00 00 >>>>>> 21: 48 89 f8 mov %rdi,%rax >>>>>> 24: 48 83 fa 20 cmp $0x20,%rdx >>>>>> 28: 72 27 jb 0x51 >>>>>> 2a:* c5 fe 6f 06 vmovdqu (%rsi),%ymm0 <-- >>>>>> trapping instruction >>>>>> 2e: 48 83 fa 40 cmp $0x40,%rdx >>>>>> 32: 0f 87 a9 00 00 00 ja 0xe1 >>>>>> 38: c5 fe 6f 4c 16 e0 vmovdqu -0x20(%rsi,%rdx,1),%ymm1 >>>>>> 3e: c5 .byte 0xc5 >>>>>> 3f: fe .byte 0xfe >>>>>> >>>>>> Code starting with the faulting instruction >>>>>> =========================================== >>>>>> 0: c5 fe 6f 06 vmovdqu (%rsi),%ymm0 >>>>>> 4: 48 83 fa 40 cmp $0x40,%rdx >>>>>> 8: 0f 87 a9 00 00 00 ja 0xb7 >>>>>> e: c5 fe 6f 4c 16 e0 vmovdqu -0x20(%rsi,%rdx,1),%ymm1 >>>>>> 14: c5 .byte 0xc5 >>>>>> 15: fe .byte 0xfe >>>>>> [ 1975.257190] RSP: 002b:00007fb2137fd908 EFLAGS: 00010202 >>>>>> [ 1975.257191] RAX: 00007fb204012a80 RBX: 0000000000000000 RCX: >>>>>> 00007fb2137fda90 >>>>>> [ 1975.257192] RDX: 0000000000004000 RSI: 00007f9fddbb51c3 RDI: >>>>>> 00007fb204012a80 >>>>>> [ 1975.257193] RBP: 00007fb2137fd928 R08: 00000000638ea1ab R09: >>>>>> 0000000000000000 >>>>>> [ 1975.257193] R10: 0000000000000008 R11: 0000000000000246 R12: >>>>>> 00007fb204000bb0 >>>>>> [ 1975.257194] R13: 00007fb21809a5a0 R14: 00000000decc71c3 R15: >>>>>> 0000000000004000 >>>>>> [ 1975.257196] </TASK> >>>>>> [ 1975.257196] Modules linked in: overlay xt_addrtype amdgpu >>>>>> drm_ttm_helper ttm gpu_sched drm_kms_helper iwlmvm backlight >>>>>> syscopyarea mac80211 sysfillrect sysimgblt libarc4 fb_sys_fops >>>>>> iwlwifi cfg80211 i2c_piix4 k10temp fuse configfs efivarfs >>>>>> [ 1975.257207] CR2: 0000000000000036 >>>>>> [ 1975.257208] ---[ end trace 0000000000000000 ]--- >>>>>> [ 1975.257209] RIP: 0010:__filemap_get_folio >>>>>> >>>>>> (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/atomic.h:29 >>>>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1158 >>>>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1183 >>>>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-instrumented.h:608 >>>>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:238 >>>>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:247 >>>>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:280 >>>>>> /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:313 >>>>>> /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1899 >>>>>> /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1951) >>>>>> [ 1975.257211] Code: 10 e8 56 fd 67 00 48 89 c3 48 3d 02 04 00 00 74 >>>>>> e2 48 3d 06 04 00 00 74 da 48 85 c0 0f 84 3e 02 00 00 a8 01 0f 85 40 >>>>>> 02 00 00 <8b> 40 34 85 c0 74 c2 8d 50 01 f0 0f b1 53 34 75 f2 48 8b >>>>>> 54 24 28 >>>>>> All code >>>>>> ======== >>>>>> 0: 10 e8 adc %ch,%al >>>>>> 2: 56 push %rsi >>>>>> 3: fd std >>>>>> 4: 67 00 48 89 add %cl,-0x77(%eax) >>>>>> 8: c3 ret >>>>>> 9: 48 3d 02 04 00 00 cmp $0x402,%rax >>>>>> f: 74 e2 je 0xfffffffffffffff3 >>>>>> 11: 48 3d 06 04 00 00 cmp $0x406,%rax >>>>>> 17: 74 da je 0xfffffffffffffff3 >>>>>> 19: 48 85 c0 test %rax,%rax >>>>>> 1c: 0f 84 3e 02 00 00 je 0x260 >>>>>> 22: a8 01 test $0x1,%al >>>>>> 24: 0f 85 40 02 00 00 jne 0x26a >>>>>> 2a:* 8b 40 34 mov 0x34(%rax),%eax >>>>>> <-- trapping instruction >>>>>> 2d: 85 c0 test %eax,%eax >>>>>> 2f: 74 c2 je 0xfffffffffffffff3 >>>>>> 31: 8d 50 01 lea 0x1(%rax),%edx >>>>>> 34: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) >>>>>> 39: 75 f2 jne 0x2d >>>>>> 3b: 48 8b 54 24 28 mov 0x28(%rsp),%rdx >>>>>> >>>>>> Code starting with the faulting instruction >>>>>> =========================================== >>>>>> 0: 8b 40 34 mov 0x34(%rax),%eax >>>>>> 3: 85 c0 test %eax,%eax >>>>>> 5: 74 c2 je 0xffffffffffffffc9 >>>>>> 7: 8d 50 01 lea 0x1(%rax),%edx >>>>>> a: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) >>>>>> f: 75 f2 jne 0x3 >>>>>> 11: 48 8b 54 24 28 mov 0x28(%rsp),%rdx >>>>>> [ 1975.257212] RSP: 0000:ffffc2d744c37cb0 EFLAGS: 00010246 >>>>>> [ 1975.257213] RAX: 0000000000000002 RBX: 0000000000000002 RCX: >>>>>> 0000000000000000 >>>>>> [ 1975.257214] RDX: 0000000000000000 RSI: ffffffffbb117459 RDI: >>>>>> 00000000ffffffff >>>>>> [ 1975.257215] RBP: 0000000000000000 R08: 00000000ffffdfff R09: >>>>>> 00000000ffffdfff >>>>>> [ 1975.257215] R10: ffffffffbb472dc0 R11: ffffffffbb472dc0 R12: >>>>>> 0000000000000000 >>>>>> [ 1975.257216] R13: ffff9fc521173e78 R14: 00000000000decc7 R15: >>>>>> fff000003fffffff >>>>>> [ 1975.257217] FS: 00007fb2137fe6c0(0000) GS:ffff9fcb7eb40000(0000) >>>>>> knlGS:0000000000000000 >>>>>> [ 1975.257218] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>> [ 1975.257219] CR2: 0000000000000036 CR3: 0000000164114000 CR4: >>>>>> 0000000000750ee0 >>>>>> [ 1975.257220] PKRU: 55555554 >>>>>> >>>>>> (full dmesg and my local changeset in attachments for your reference) >>>>>> >>>>> #regzbot poke >>>>> >> >>
Created attachment 304378 [details] Retry mapping search if result is xarray internal entry (In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #21) > Does nobody care anymore or was this resolved and I just > missed it? I am using this workaround patch to replace fault by warning.
[Mon Jun 5 21:26:35 2023] ------------[ cut here ]------------ WARNING: CPU: 3 PID: 40057 at mm/filemap.c:1861 __filemap_get_folio+0x29b/0x390 Modules linked in: usbhid reiserfs wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha snd_seq_dummy snd_hrtimer snd_seq snd_seq_device dm_crypt encrypted_keys algif_skcipher uvcvideo uvc videobuf2_vmalloc videobuf2_memops uas videobuf2_v4l2 videobuf2_common usb_storage videodev mc btusb btintel sch_fq_codel snd_hda_codec_hdmi snd_ctl_led snd_hda_codec_realtek snd_hda_codec_generic nvidia_drm(PO) nvidia_modeset(PO) iwlmvm mac80211 libarc4 intel_tcc_cooling x86_pkg_temp_thermal intel_xhci_usb_role_switch kvm_intel iwlwifi mei_hdcp mei_pxp nvidia(PO) kvm snd_hda_intel irqbypass crc32_pclmul snd_intel_dspcfg xhci_pci crc32c_intel snd_hda_codec polyval_clmulni polyval_generic ghash_clmulni_intel snd_hwdep sha512_ssse3 xhci_hcd aesni_intel snd_hda_core crypto_simd cfg80211 cryptd snd_pcm e1000e snd_timer mei_me usbcore ucsi_acpi typec_ucsi mei usb_common typec intel_pch_thermal roles tpm_crb thinkpad_acpi ledtrig_audio platform_profile snd soundcore tpm_tis i2c_hid_acpi tpm_tis_core i2c_hid tpm intel_wmi_thunderbolt wmi_bmof think_lmi firmware_attributes_class i915 i2c_algo_bit cec drm_buddy video wmi zram drm_display_helper ttm zsmalloc drm_kms_helper syscopyarea sysfillrect sysimgblt msr fuse dm_mod configfs efivarfs dmi_sysfs CPU: 3 PID: 40057 Comm: qbittorrent Tainted: P U O 6.3.5-gentoo #1 Hardware name: LENOVO 20H9CTO1WW/20H9CTO1WW, BIOS N1VET63W (1.53 ) 12/20/2022 RIP: 0010:__filemap_get_folio+0x29b/0x390 Code: ff e8 c9 81 00 00 e9 33 ff ff ff 49 8b 7d 00 e8 6b 77 02 00 48 89 c2 44 89 e0 80 cc 10 f6 42 44 01 44 0f 45 e0 e9 a3 fe ff ff <0f> 0b e9 d3 fd ff ff f0 41 80 27 fe 0f 89 02 ff ff ff 31 f6 4c 89 RSP: 0000:ffffa2d5c214fc90 EFLAGS: 00010246 RAX: 0000000000000002 RBX: 0000000000000002 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff9bdbf15ca6b0 R14: 000000000002da46 R15: fff000003fffffff FS: 00007f2602ffd6c0(0000) GS:ffff9bdf10780000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f19f72466a7 CR3: 0000000223d5e004 CR4: 00000000003706e0 Call Trace: <TASK> ? __filemap_get_folio+0x29b/0x390 ? __warn+0x7d/0x130 ? __filemap_get_folio+0x29b/0x390 ? report_bug+0x19e/0x1d0 ? handle_bug+0x3c/0x70 ? exc_invalid_op+0x13/0x70 ? asm_exc_invalid_op+0x16/0x20 ? __filemap_get_folio+0x29b/0x390 filemap_fault+0x65/0x910 ? preempt_count_add+0x4f/0xb0 ? up_read+0x37/0x80 __do_fault+0x2e/0xb0 do_fault+0x1ee/0x5a0 __handle_mm_fault+0x5ac/0xc70 handle_mm_fault+0xed/0x2d0 exc_page_fault+0x1bb/0x690 ? do_syscall_64+0x67/0x90 asm_exc_page_fault+0x22/0x30 RIP: 0033:0x7f264a377709 Code: 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 83 fa 20 72 27 <c5> fe 6f 06 48 83 fa 40 0f 87 a9 00 00 00 c5 fe 6f 4c 16 e0 c5 fe RSP: 002b:00007f2602ffb428 EFLAGS: 00010202 RAX: 00007f25fc01a740 RBX: 0000000000000000 RCX: 00007f2602ffb5b0 RDX: 0000000000004000 RSI: 00007f19f72466a7 RDI: 00007f25fc01a740 RBP: 00007f2602ffb448 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000008 R11: 0000000000000246 R12: 00007f25fc000c10 R13: 00007f25fc0158d0 R14: 000000002da466a7 R15: 0000000000004000 </TASK> [Mon Jun 5 21:26:35 2023] ---[ end trace 0000000000000000 ]--- Sorry for tainted kernel trace but without nvidia driver the result is same.
#regzbot link: https://github.com/arvidn/libtorrent/issues/6952 #regzbot link: https://bugs.gentoo.org/909511
#regzbot monitor https://lore.kernel.org/all/20230310070023.GA13563@lst.de/
(In reply to Sam James from comment #25) > #regzbot monitor https://lore.kernel.org/all/20230310070023.GA13563@lst.de/ Sorry, one more to fix syntax: #regzbot monitor: https://lore.kernel.org/all/20230310070023.GA13563@lst.de/
(In reply to recent replies from Sam James) thx for the links, but regzbot commands for now don't work in bugzilla; that soon will change indirectly, hopefully. FWIW, does this mean this problem is fixed by https://lore.kernel.org/all/20230310070023.GA13563@lst.de/ (and thus in 6.4?)
Ah, apologies. Worth adding that to https://www.kernel.org/doc/html/next/process/handling-regressions.html for now? As for the fix: https://github.com/arvidn/libtorrent/issues/6952#issuecomment-1506783186 and https://github.com/arvidn/libtorrent/issues/6952#issuecomment-1506873609 suggests it's a partial fix.
Seems this regression (and the one reported in Bug 217441, which look somewhat similar) is not really handled appropriately by the developers, hence I plan to slightly escalate this. But before I do this, could anyone still affected by this please check if 6.5-rc1 still shows the problem?
No, it is not fixed in 6.5-rc3. The failure point has changed, but the problem is still there. Similarly to Bug 217441, the error point is now shown as filemap_get_entry: [54413.772267] BUG: kernel NULL pointer dereference, address: 00000000000000f6 [54413.772275] #PF: supervisor read access in kernel mode [54413.772278] #PF: error_code(0x0000) - not-present page [54413.772281] PGD 0 P4D 0 [54413.772286] Oops: 0000 [#1] PREEMPT SMP PTI [54413.772290] CPU: 1 PID: 49627 Comm: qbittorrent Not tainted 6.5.0-060500rc3-generic #202307232333 [54413.772294] Hardware name: Dell (...) [54413.772297] RIP: 0010:filemap_get_entry+0x87/0x160 [54413.772306] Code: a8 48 c7 45 c0 03 00 00 00 e8 85 6f cf 00 48 89 c3 48 3d 02 04 00 00 74 e4 48 3d 06 04 00 00 74 dc 48 85 c0 74 5d a8 01 75 59 <8b> 40 34 85 c0 74 cc 8d 50 01 f0 0f b1 53 34 75 f2 4c 8b 6d c0 4d [54413.772310] RSP: 0000:ffffba08c7313b88 EFLAGS: 00010246 [54413.772314] RAX: 00000000000000c2 RBX: 00000000000000c2 RCX: 0000000000000000 [54413.772317] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [54413.772319] RBP: ffffba08c7313be8 R08: 0000000000000000 R09: 0000000000000000 [54413.772322] R10: 0000000000000000 R11: 0000000000000000 R12: ffff96735fa08ab0 [54413.772325] R13: 0000000000020f77 R14: 0000000000000000 R15: 0000000000000000 [54413.772328] FS: 00007f18e4bfb6c0(0000) GS:ffff967776080000(0000) knlGS:0000000000000000 [54413.772331] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [54413.772334] CR2: 00000000000000f6 CR3: 000000021e734002 CR4: 00000000003706e0 [54413.772337] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [54413.772340] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [54413.772342] Call Trace: [54413.772345] <TASK> [54413.772349] ? show_regs+0x6d/0x80 [54413.772355] ? __die+0x24/0x80 [54413.772360] ? page_fault_oops+0x99/0x1b0 [54413.772367] ? do_user_addr_fault+0x316/0x6b0 [54413.772371] ? exc_page_fault+0x83/0x1b0 [54413.772378] ? asm_exc_page_fault+0x27/0x30 [54413.772386] ? filemap_get_entry+0x87/0x160 [54413.772390] ? filemap_get_entry+0x6b/0x160 [54413.772395] __filemap_get_folio+0x2d/0x230 [54413.772401] filemap_fault+0x68/0x790 [54413.772406] ? next_uptodate_page+0x169/0x270 [54413.772410] ? xas_find+0x16d/0x1e0 [54413.772418] __xfs_filemap_fault+0x61/0x2b0 [xfs] [54413.772657] xfs_filemap_fault+0x3a/0x50 [xfs] [54413.772795] __do_fault+0x36/0x150 [54413.772799] do_read_fault+0x11d/0x170 [54413.772802] do_fault+0xec/0x170 [54413.772804] handle_pte_fault+0x74/0x170 [54413.772807] __handle_mm_fault+0x658/0x720 [54413.772813] handle_mm_fault+0x164/0x360 [54413.772816] do_user_addr_fault+0x212/0x6b0 [54413.772819] ? _copy_to_user+0x25/0x70 [54413.772823] exc_page_fault+0x83/0x1b0 [54413.772827] asm_exc_page_fault+0x27/0x30 [54413.772831] RIP: 0033:0x7f1927171589 [54413.772861] Code: 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 83 fa 20 72 27 <c5> fe 6f 06 48 83 fa 40 0f 87 a9 00 00 00 c5 fe 6f 4c 16 e0 c5 fe [54413.772864] RSP: 002b:00007f18e4bf94f8 EFLAGS: 00010202 [54413.772866] RAX: 00007f18b000cb30 RBX: 0000000000000000 RCX: 00007f18e4bf9650 [54413.772868] RDX: 0000000000004000 RSI: 00007f00ae17727a RDI: 00007f18b000cb30 [54413.772870] RBP: 00007f18e4bf9518 R08: 0000000000000000 R09: 0000000000000000 [54413.772872] R10: 0000000000000008 R11: 0000000000000246 R12: 00007f18b0000fa0 [54413.772873] R13: 0000000020f7727a R14: 00007f18f4019f58 R15: 0000000000004000 [54413.772877] </TASK> [54413.772879] Modules linked in: cpuid nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter bridge stp llc ccm rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet qcserial usb_wwan mii usbserial snd_hda_codec_hdmi snd_ctl_led snd_hda_codec_realtek snd_hda_codec_generic qrtr overlay cmac algif_hash algif_skcipher af_alg bnep ip6t_REJECT nf_reject_ipv6 xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 nft_compat nf_tables nfnetlink snd_sof_pci_intel_skl snd_sof_intel_hda_common soundwire_intel snd_sof_intel_hda_mlink soundwire_cadence binfmt_misc snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils soundwire_generic_allocation soundwire_bus snd_soc_avs snd_soc_hda_codec snd_soc_skl snd_soc_hdac_hda snd_hda_ext_core snd_soc_sst_ipc snd_soc_sst_dsp snd_soc_acpi_intel_match snd_soc_acpi uvcvideo snd_soc_core videobuf2_vmalloc btusb uvc btrtl snd_compress videobuf2_memops [54413.772946] intel_tcc_cooling iwlmvm videobuf2_v4l2 ac97_bus btbcm x86_pkg_temp_thermal btintel intel_powerclamp btmtk videodev snd_pcm_dmaengine mac80211 bluetooth videobuf2_common coretemp ecdh_generic snd_hda_intel hid_multitouch mc ecc libarc4 mei_hdcp mei_pxp intel_rapl_msr kvm_intel i915 dell_laptop iwlwifi snd_intel_dspcfg snd_intel_sdw_acpi dell_smm_hwmon dell_wmi snd_hda_codec drm_buddy kvm ttm snd_hda_core drm_display_helper irqbypass dell_smbios snd_hwdep dcdbas rapl snd_pcm cec snd_timer processor_thermal_device_pci_legacy processor_thermal_device ledtrig_audio intel_cstate processor_thermal_rfim dell_wmi_descriptor rc_core ee1004 sparse_keymap wmi_bmof pcspkr snd cfg80211 processor_thermal_mbox mei_me processor_thermal_rapl soundcore drm_kms_helper intel_rapl_common mei intel_soc_dts_iosf i2c_algo_bit intel_pch_thermal intel_xhci_usb_role_switch int3403_thermal int340x_thermal_zone int3400_thermal acpi_pad dell_rbtn acpi_thermal_rel input_leds joydev serio_raw mac_hid msr parport_pc ppdev lp parport drm [54413.773035] efi_pstore dmi_sysfs ip_tables x_tables autofs4 xfs libcrc32c ses enclosure scsi_transport_sas dm_crypt hid_generic usbhid uas hid usb_storage crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 aesni_intel rtsx_pci_sdmmc crypto_simd e1000e cryptd psmouse i2c_i801 ahci rtsx_pci libahci i2c_smbus xhci_pci xhci_pci_renesas video wmi [last unloaded: evbug] [54413.773066] CR2: 00000000000000f6 [54413.773068] ---[ end trace 0000000000000000 ]--- [54413.773070] RIP: 0010:filemap_get_entry+0x87/0x160 [54413.773074] Code: a8 48 c7 45 c0 03 00 00 00 e8 85 6f cf 00 48 89 c3 48 3d 02 04 00 00 74 e4 48 3d 06 04 00 00 74 dc 48 85 c0 74 5d a8 01 75 59 <8b> 40 34 85 c0 74 cc 8d 50 01 f0 0f b1 53 34 75 f2 4c 8b 6d c0 4d [54413.773076] RSP: 0000:ffffba08c7313b88 EFLAGS: 00010246 [54413.773080] RAX: 00000000000000c2 RBX: 00000000000000c2 RCX: 0000000000000000 [54413.773082] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [54413.773084] RBP: ffffba08c7313be8 R08: 0000000000000000 R09: 0000000000000000 [54413.773086] R10: 0000000000000000 R11: 0000000000000000 R12: ffff96735fa08ab0 [54413.773089] R13: 0000000000020f77 R14: 0000000000000000 R15: 0000000000000000 [54413.773091] FS: 00007f18e4bfb6c0(0000) GS:ffff967776080000(0000) knlGS:0000000000000000 [54413.773094] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [54413.773097] CR2: 00000000000000f6 CR3: 000000021e734002 CR4: 00000000003706e0 [54413.773100] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [54413.773102] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [54413.773104] note: qbittorrent[49627] exited with irqs disabled
Created attachment 304704 [details] proposed fix + testcase I have managed to create a test-case that reproduces the symptoms, and also a fix for that test-case. Maybe you're producing the symptoms in some excitingly different way, so this may not solve your problems.
(In reply to Matthew Wilcox from comment #31) > Created attachment 304704 [details] > proposed fix + testcase Did anybody try to check if this helped? That would be really really helpful. BTW, could somebody involved in https://github.com/arvidn/libtorrent/issues/6952 and https://bugs.gentoo.org/909511 maybe ask people there to test this?
I have applied the patch onto 6.5-rc5 and left it overnight - not issues yet. I'll keep it running and will update you tomorrow (it never lasted more than a day without an "Oops"). Thanks!
> I'll keep it running and will update you tomorrow (it never lasted more than > a day without an "Oops"). No issues - I'd say the patch is good to go.
This landed in 6.6 as cbc02854331edc6dc22d8b77b6e22e38ebc7dd51 then 3095dd99dd759a5cab8bb81674bc133b1365fb6b.