Bug 218818
Summary: | BUG: unable to handle page fault for address: 00000000000a0955 | ||
---|---|---|---|
Product: | Memory Management | Reporter: | Jean-Louis Dupond (jean-louis) |
Component: | Page Allocator | Assignee: | Andrew Morton (akpm) |
Status: | NEW --- | ||
Severity: | normal | CC: | alekseymorar, jean-louis, nvaert1986, regressions, vojema9981 |
Priority: | P3 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | Subsystem: | ||
Regression: | No | Bisected commit-id: |
Description
Jean-Louis Dupond
2024-05-08 06:58:16 UTC
yesterday it was: mei 07 16:38:54 xxx kernel: BUG: unable to handle page fault for address: 00000000000f2bf3 mei 07 16:38:54 xxx kernel: #PF: supervisor read access in kernel mode mei 07 16:38:54 xxx kernel: #PF: error_code(0x0000) - not-present page Again when closing Firefox Please run memtest86 or memtest86+ for at least an hour. (In reply to Artem S. Tashkinov from comment #2) > Please run memtest86 or memtest86+ for at least an hour. Did run it for +9 hours today, no errors. So seems like memory itself is fine. Is going back to some earlier kernel series (6.6.y?) for a few days a option to rule out problems of the hardware or some other software? And if it happens again, could you save the whole error please? That would allow checking if the backtraces are similar or differ each time. (In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #4) > Is going back to some earlier kernel series (6.6.y?) for a few days a option > to rule out problems of the hardware or some other software? I doubt, thing is that this happens like twice a month. So no good reproducer. So if I run 6.6.x for example for a week or 2, and it didn't occur. Can we say it's not there or I was just lucky :) > > And if it happens again, could you save the whole error please? That would > allow checking if the backtraces are similar or differ each time. I'll do. The last one on the 7th of may just didn't log the whole stacktrace. So I didnt have it. BUT I think we have some more info in https://bbs.archlinux.org/viewtopic.php?id=294475 The stacktrace there looks similar. And somebody reports it didn't occur anymore since he disabled zswap. A similar one here also: https://www.reddit.com/r/linux_gaming/comments/1b7qxjp/whenever_i_play_rdr2_for_a_while_and_exit_the/ But I also stumbled upon https://forums.developer.nvidia.com/t/series-550-freezes-laptop/284772 A whole lot of people reporting similar crashes, but all with Nvidia driver installed (just like me). Check your hard drive/ssd. You may ran out of storage space because its fails on swap > all with Nvidia driver installed (just like me). Well, then chances are slim that a developer will look into this; and the zswap developers are unlikely to see this report here anyway. For details see: https://linux-regtracking.leemhuis.info/post/frequent-reasons-why-linux-kernel-bug-reports-are-ignored/ I'm experiencing the exact same issue on a Dell Precision 3581. It also happens on kernel 6.6.32, but it's rare. I'm currently running kernel 6.9.3 where it happens frequently and when it occurs, it always occurs when shutting down the laptop. (In reply to nvaert1986 from comment #8) > I'm experiencing the exact same issue on a Dell Precision 3581. It also > happens on kernel 6.6.32, but it's rare. I'm currently running kernel 6.9.3 > where it happens frequently and when it occurs, it always occurs when > shutting down the laptop. Also running the nvidia module or not? For me it seems to be fixed (at least it didn't occur anymore) since the switch to nvidia-open drivers. (In reply to nvaert1986 from comment #8) > I'm experiencing the exact same issue on a Dell Precision 3581. It also > happens on kernel 6.6.32, but it's rare. I'm currently running kernel 6.9.3 > where it happens frequently and when it occurs, it always occurs when > shutting down the laptop. I've updated to te latest 550 series nvidia-drivers in ~amd64 for now to see what that does. Hopefully it'll fix the annoying bug. Hi. I have the same problem on Arch Linux This host is used as hypervisor, however, all the VM are shutdown. My network is set up as a bridge. Interestingly, today I hit this error after adding the second IP address with a different CIDR and start ping devices on that network. [root@archlinux ~]# [ 6186.630109] BUG: unable to handle page fault for address: 0000000000204fa0 [ 6186.630491] #PF: supervisor write access in kernel mode [ 6186.630566] #PF: error_code(0x0002) - not-present page [ 6186.630640] PGD 0 P4D 0 [ 6186.630685] Oops: Oops: 0002 [#1] PREEMPT SMP PTI [ 6186.630760] CPU: 3 UID: 0 PID: 0 Comm: swapper/3 Not tainted 6.11.7-arch1-1 #1 1400000003000000474e5500ee13b5ab63fad4da [ 6186.630931] Hardware name: HP HP EliteDesk 800 G3 SFF/8299, BIOS P01 Ver. 02.50 07/17/2024 [ 6186.631044] RIP: 0010:_raw_spin_lock+0x17/0x30 [ 6186.631115] Code: 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 a8 c2 1b 5c 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e8 f7 01 00 00 90 c3 cc cc [ 6186.631350] RSP: 0018:ffffba6400204f98 EFLAGS: 00010046 [ 6186.631427] RAX: 0000000000000000 RBX: ffff8f6cc104cc00 RCX: 00000001001af23f [ 6186.631524] RDX: 0000000000000001 RSI: 76ffffffa3e69dc6 RDI: 0000000000204fa0 [ 6186.631622] RBP: 0000000000000001 R08: 0000160bcfc9116a R09: 30c7a363b7c70caf [ 6186.632020] R10: 0000000000000000 R11: ffffba6400204ff8 R12: ffff8f6cc104cce4 [ 6186.632114] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 6186.632211] FS: 0000000000000000(0000) GS:ffff8f71dbb80000(0000) knlGS:0000000000000000 [ 6186.632322] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6186.632402] CR2: 0000000000204fa0 CR3: 0000000122476006 CR4: 00000000003706f0 [ 6186.632498] Call Trace: [ 6186.632539] <IRQ> [ 6186.632575] ? __die_body.cold+0x19/0x27 [ 6186.632637] ? page_fault_oops+0x15a/0x2d0 [ 6186.632705] ? exc_page_fault+0x81/0x190 [ 6186.632765] ? asm_exc_page_fault+0x26/0x30 [ 6186.632831] ? _raw_spin_lock+0x17/0x30 [ 6186.632894] handle_irq_event+0x56/0x90 [ 6186.632958] handle_edge_irq+0x9a/0x260 [ 6186.633019] __common_interrupt+0x3e/0xa0 [ 6186.633084] common_interrupt+0x80/0xa0 [ 6186.633196] </IRQ> [ 6186.633233] <TASK> [ 6186.633266] asm_common_interrupt+0x26/0x40 [ 6186.633693] RIP: 0010:cpuidle_enter_state+0xc6/0x420 [ 6186.633767] Code: 00 00 e8 7d 53 2d ff e8 28 f1 ff ff 49 89 c5 0f 1f 44 00 00 31 ff e8 39 34 2c ff 45 84 ff 0f 85 aa 01 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 84 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d [ 6186.634011] RSP: 0018:ffffba6400147e80 EFLAGS: 00000246 [ 6186.634091] RAX: ffff8f71dbb80000 RBX: 0000000000000002 RCX: 0000000000000000 [ 6186.634196] RDX: 000005a06fe37981 RSI: fffffffdbfe789de RDI: 0000000000000000 [ 6186.634294] RBP: ffff8f71dbbc12c8 R08: 0000000000000004 R09: 000000000000004e [ 6186.634390] R10: 0000000000000018 R11: ffff8f71dbbb4be4 R12: ffffffffa5152b80 [ 6186.634487] R13: 000005a06fe37981 R14: 0000000000000002 R15: 0000000000000000 [ 6186.634593] cpuidle_enter+0x2d/0x40 [ 6186.634654] do_idle+0x1b0/0x210 [ 6186.634711] cpu_startup_entry+0x29/0x30 [ 6186.634771] start_secondary+0x11c/0x140 [ 6186.634832] common_startup_64+0x13e/0x141 [ 6186.634904] </TASK> [ 6186.635314] Modules linked in: xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo ip6table_nat ip6table_filter ip6_tables iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c xt_addrtype iptable_filter overlay rpcrdma rdma_cm iw_cm ib_cm ib_core bridge stp llc vfat fat intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_pmc_core_pltdrv intel_pmc_core intel_vsec pmt_telemetry pmt_class intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_soc_avs snd_soc_hda_codec snd_hda_ext_core snd_hda_codec_hdmi kvm snd_hda_codec_conexant snd_soc_core snd_hda_codec_generic crct10dif_pclmul crc32_pclmul snd_compress polyval_clmulni ac97_bus polyval_generic snd_pcm_dmaengine ghash_clmulni_intel snd_hda_intel snd_usb_audio sha512_ssse3 snd_intel_dspcfg sha256_ssse3 snd_intel_sdw_acpi snd_usbmidi_lib sha1_ssse3 snd_ump snd_hda_codec snd_rawmidi aesni_intel iwlwifi snd_hda_core gf128mul snd_seq_device crypto_simd [ 6186.635473] snd_hwdep cryptd mc mei_wdt mei_hdcp mei_pxp r8169 snd_pcm rapl cfg80211 realtek intel_cstate snd_timer hp_wmi mdio_devres platform_profileintel_uncore mei_me sparse_keymap psmouse wmi_bmof snd libphy pcspkr rfkill mei soundcore acpi_pad mousedev joydev mac_hid nfsd auth_rpcgss nfs_acl lockd grace crypto_user loop dm_mod sunrpc nfnetlink ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 hid_logitech_hidpp hid_logitech_dj hid_generic usbhid nouveau i915 drm_ttm_helper gpu_sched serio_raw drm_gpuvm atkbd drm_exec libps2 mxm_wmi vivaldi_fmap drm_buddy i2c_algo_bit nvme crc32c_intel intel_gtt ttm nvme_core drm_display_helper nvme_auth xhci_pci cec xhci_pci_renesas video i8042 serio wmi vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd [ 6186.637799] CR2: 0000000000204fa0 [ 6186.637851] ---[ end trace 0000000000000000 ]--- [ 6186.637921] RIP: 0010:_raw_spin_lock+0x17/0x30 [ 6186.637989] Code: 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 a8 c2 1b 5c 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e8 f7 01 00 00 90 c3 cc cc [ 6186.638453] RSP: 0018:ffffba6400204f98 EFLAGS: 00010046 [ 6186.638528] RAX: 0000000000000000 RBX: ffff8f6cc104cc00 RCX: 00000001001af23f [ 6186.638628] RDX: 0000000000000001 RSI: 76ffffffa3e69dc6 RDI: 0000000000204fa0 [ 6186.638724] RBP: 0000000000000001 R08: 0000160bcfc9116a R09: 30c7a363b7c70caf [ 6186.638818] R10: 0000000000000000 R11: ffffba6400204ff8 R12: ffff8f6cc104cce4 [ 6186.638914] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 6186.639011] FS: 0000000000000000(0000) GS:ffff8f71dbb80000(0000) knlGS:0000000000000000 [ 6186.639121] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6186.639201] CR2: 0000000000204fa0 CR3: 0000000122476006 CR4: 00000000003706f0 [ 6186.639299] Kernel panic - not syncing: Fatal exception in interrupt [ 6186.639445] Kernel Offset: 0x22000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 6186.639706] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- |