Created attachment 287693 [details] dmesg (kernel 5.5.6, Shuttle XPC FS51, Pentium 4) Happens every time on this machine when I build a large project (e.g. boost-1.72.0): [...] [ 1079.810216] BUG: kernel NULL pointer dereference, address: 00000000 [ 1079.810430] #PF: supervisor read access in kernel mode [ 1079.810583] #PF: error_code(0x0000) - not-present page [ 1079.810736] *pde = 00000000 [ 1079.810825] Oops: 0000 [#1] SMP [ 1079.810921] CPU: 0 PID: 53 Comm: kswapd0 Not tainted 5.5.6-gentoo-Pentium4 #6 [ 1079.811134] Hardware name: /FS51, BIOS 6.00 PG 12/02/2003 [ 1079.811304] EIP: __cpa_process_fault+0x205/0x226 [ 1079.811443] Code: 2d 00 00 00 40 39 d0 76 1f 81 fa ff ff ff bf 76 17 c7 47 10 01 00 00 00 81 c3 00 00 00 40 c1 eb 0c 89 5f 18 31 f6 eb 19 8b 07 <ff> 30 53 68 54 fb c8 ca e8 84 7a 00 00 0f 0b 83 c4 0c be f2 ff ff [ 1079.811979] EAX: 00000000 EBX: 00000000 ECX: 00000001 EDX: 00000000 [ 1079.812167] ESI: 00000001 EDI: f5e6fd44 EBP: f5e6fcbc ESP: f5e6fc98 [ 1079.812355] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010213 [ 1079.812557] CR0: 80050033 CR2: 00000000 CR3: 052af000 CR4: 000006d0 [ 1079.812740] Call Trace: [ 1079.812820] ? _raw_spin_lock+0x22/0x2a [ 1079.812936] ? lookup_address+0x1d/0x20 [ 1079.813047] __change_page_attr_set_clr+0x85/0x551 [ 1079.813189] ? __mutex_unlock_slowpath+0x20/0x1b6 [ 1079.813326] ? mutex_unlock+0xb/0xd [ 1079.813432] ? _vm_unmap_aliases.part.0+0x11f/0x127 [ 1079.813575] change_page_attr_set_clr+0xdc/0x1af [ 1079.813715] set_pages_array_wb+0x20/0x7b [ 1079.813848] ttm_pages_put+0x22/0x71 [ttm] [ 1079.813975] ttm_page_pool_free+0xf6/0x111 [ttm] [ 1079.814116] ttm_pool_shrink_scan+0x9c/0xd1 [ttm] [ 1079.814261] shrink_slab.constprop.0+0x248/0x38f [ 1079.814398] shrink_node+0x54a/0x70c [ 1079.814505] kswapd+0x4b9/0x62d [ 1079.814601] ? kswapd+0x4b9/0x62d [ 1079.814705] kthread+0xd1/0xd3 [ 1079.814797] ? try_to_free_pages+0x3d4/0x3d4 [ 1079.814925] ? kthread_delayed_work_timer_fn+0x6a/0x6a [ 1079.815076] ret_from_fork+0x2e/0x38 [ 1079.815182] Modules linked in: auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc ctr aes_generic libaes ccm hid_generic usbhid hid rt2500pci eeprom_93cx6 rt2x00pci rt2x00mmio rt2x00lib led_class mac80211 radeon evdev hwmon i2c_algo_bit drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea fb font firewire_ohci fbdev ttm firewire_core 8139too crc_itu_t mii sr_mod cfg80211 cdrom drm fan thermal drm_panel_orientation_quirks ohci_pci snd_intel8x0 backlight 8250 snd_ac97_codec 8250_base serial_core ehci_pci ohci_hcd rfkill ehci_hcd ac97_bus libarc4 snd_pcm usbcore sis_agp snd_timer usb_common snd agpgart button processor soundcore i2c_sis96x [ 1079.835024] CR2: 0000000000000000 [ 1079.844798] ---[ end trace 2acb3661952bc786 ]--- [ 1079.854557] EIP: __cpa_process_fault+0x205/0x226 [ 1079.864444] Code: 2d 00 00 00 40 39 d0 76 1f 81 fa ff ff ff bf 76 17 c7 47 10 01 00 00 00 81 c3 00 00 00 40 c1 eb 0c 89 5f 18 31 f6 eb 19 8b 07 <ff> 30 53 68 54 fb c8 ca e8 84 7a 00 00 0f 0b 83 c4 0c be f2 ff ff [ 1079.874942] EAX: 00000000 EBX: 00000000 ECX: 00000001 EDX: 00000000 [ 1079.885370] ESI: 00000001 EDI: f5e6fd44 EBP: f5e6fcbc ESP: f5e6fc98 [ 1079.895776] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010213 [ 1079.906304] CR0: 80050033 CR2: 00000000 CR3: 052af000 CR4: 000006d0 Don't think it has much to do with ttm strictly speaking, as I am running this machine without X via ssh. # inxi -b System: Kernel: 5.5.6-gentoo-Pentium4 i686 bits: 32 Console: tty 0 Distro: Gentoo Base System release 2.6 Machine: Type: Desktop Mobo: Shuttle model: FS51 serial: N/A BIOS: Phoenix v: 6.00 PG date: 12/02/2003 CPU: Single Core: Intel Pentium 4 type: MCP speed: 2796 MHz Graphics: Device-1: AMD RV350 [Radeon 9550/9600/X1050 Series] driver: radeon v: kernel Display: server: X.org 1.20.6 driver: ati,radeon unloaded: fbdev,modesetting tty: 104x53 Message: Advanced graphics data unavailable in console for root. Network: Device-1: Ralink RT2500 Wireless 802.11bg driver: rt2500pci Device-2: Realtek RTL-8100/8101L/8139 PCI Fast Ethernet Adapter driver: 8139too Drives: Local Storage: total: 76.69 GiB used: 2.77 GiB (3.6%) Info: Processes: 95 Uptime: 18m Memory: 1.97 GiB used: 159.6 MiB (7.9%) Init: systemd Shell: bash inxi: 3.0.36
Created attachment 287695 [details] kernel .config (kernel 5.5.6, Shuttle XPC FS51, Pentium 4)
Created attachment 287765 [details] dmesg (kernel 5.6-rc4, Shuttle XPC FS51, Pentium 4) Same on kernel 5.6-rc4: [...] [ 908.356444] BUG: kernel NULL pointer dereference, address: 00000000 [ 908.356670] #PF: supervisor read access in kernel mode [ 908.356823] #PF: error_code(0x0000) - not-present page [ 908.356974] *pde = 00000000 [ 908.357064] Oops: 0000 [#1] SMP [ 908.357163] CPU: 0 PID: 53 Comm: kswapd0 Not tainted 5.6.0-rc4-Pentium4 #1 [ 908.357367] Hardware name: /FS51, BIOS 6.00 PG 12/02/2003 [ 908.357535] EIP: __cpa_process_fault+0x205/0x226 [ 908.357674] Code: 2d 00 00 00 40 39 d0 76 1f 81 fa ff ff ff bf 76 17 c7 47 10 01 00 00 00 81 c3 00 00 00 40 c1 eb 0c 89 5f 18 31 f6 eb 19 8b 07 <ff> 30 53 68 56 ba 89 c9 e8 f8 68 00 00 0f 0b 83 c4 0c be f2 ff ff [ 908.358228] EAX: 00000000 EBX: 00000000 ECX: 00000001 EDX: 00000000 [ 908.358411] ESI: 00000001 EDI: f5e6fd4c EBP: f5e6fcc4 ESP: f5e6fca0 [ 908.358598] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010213 [ 908.358798] CR0: 80050033 CR2: 00000000 CR3: 00333000 CR4: 000006d0 [ 908.358981] Call Trace: [ 908.359062] ? _raw_spin_lock+0x22/0x2a [ 908.359176] ? lookup_address+0x1d/0x20 [ 908.359289] __change_page_attr_set_clr+0x85/0x551 [ 908.359436] ? __mutex_unlock_slowpath+0x20/0x1b6 [ 908.368244] ? mutex_unlock+0xb/0xd [ 908.377037] ? _vm_unmap_aliases.part.0+0x11f/0x127 [ 908.385944] change_page_attr_set_clr+0xdc/0x1af [ 908.394889] set_pages_array_wb+0x20/0x7b [ 908.403630] ttm_pages_put+0x22/0x71 [ttm] [ 908.412159] ttm_page_pool_free+0xf6/0x111 [ttm] [ 908.420492] ttm_pool_shrink_scan+0x9c/0xd1 [ttm] [ 908.428885] shrink_slab.constprop.0+0x248/0x38f [ 908.437241] shrink_node+0x533/0x6f2 [ 908.445492] kswapd+0x4b9/0x62d [ 908.453714] ? kswapd+0x4b9/0x62d [ 908.461937] kthread+0xd1/0xd3 [ 908.470055] ? try_to_free_pages+0x3d4/0x3d4 [ 908.478143] ? kthread_delayed_work_timer_fn+0x6a/0x6a [ 908.486242] ret_from_fork+0x2e/0x38 [ 908.494256] Modules linked in: auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc ctr aes_generic libaes ccm hid_generic usbhid hid rt2500pci eeprom_93cx6 rt2x00pci rt2x00mmio rt2x00lib led_class mac80211 radeon evdev hwmon i2c_algo_bit drm_kms_helper cfbfillrect syscopyarea cfg80211 cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea firewire_ohci firewire_core fb rfkill font crc_itu_t 8139too libarc4 mii fbdev sr_mod cdrom ttm thermal fan ohci_pci drm 8250 snd_intel8x0 8250_base serial_core snd_ac97_codec ac97_bus ehci_pci ohci_hcd snd_pcm drm_panel_orientation_quirks ehci_hcd button backlight snd_timer usbcore sis_agp agpgart snd i2c_sis96x usb_common processor soundcore zstd zram zsmalloc [ 908.522107] CR2: 0000000000000000 [ 908.531646] ---[ end trace f8cc5b63e4c76d19 ]--- [ 908.541190] EIP: __cpa_process_fault+0x205/0x226 [ 908.550708] Code: 2d 00 00 00 40 39 d0 76 1f 81 fa ff ff ff bf 76 17 c7 47 10 01 00 00 00 81 c3 00 00 00 40 c1 eb 0c 89 5f 18 31 f6 eb 19 8b 07 <ff> 30 53 68 56 ba 89 c9 e8 f8 68 00 00 0f 0b 83 c4 0c be f2 ff ff [ 908.560958] EAX: 00000000 EBX: 00000000 ECX: 00000001 EDX: 00000000 [ 908.571156] ESI: 00000001 EDI: f5e6fd4c EBP: f5e6fcc4 ESP: f5e6fca0 [ 908.581412] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010213 [ 908.591710] CR0: 80050033 CR2: 00000000 CR3: 00333000 CR4: 000006d0
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Mon, 02 Mar 2020 21:55:10 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=206697 > > --- Comment #2 from Erhard F. (erhard_f@mailbox.org) --- > Created attachment 287765 [details] > --> https://bugzilla.kernel.org/attachment.cgi?id=287765&action=edit > dmesg (kernel 5.6-rc4, Shuttle XPC FS51, Pentium 4) > > Same on kernel 5.6-rc4: Thanks. This looks like a regression in the DRM code. I've added suitable Ccs. > [...] > [ 908.356444] BUG: kernel NULL pointer dereference, address: 00000000 > [ 908.356670] #PF: supervisor read access in kernel mode > [ 908.356823] #PF: error_code(0x0000) - not-present page > [ 908.356974] *pde = 00000000 > [ 908.357064] Oops: 0000 [#1] SMP > [ 908.357163] CPU: 0 PID: 53 Comm: kswapd0 Not tainted 5.6.0-rc4-Pentium4 #1 > [ 908.357367] Hardware name: /FS51, BIOS 6.00 PG 12/02/2003 > [ 908.357535] EIP: __cpa_process_fault+0x205/0x226 > [ 908.357674] Code: 2d 00 00 00 40 39 d0 76 1f 81 fa ff ff ff bf 76 17 c7 47 > 10 01 00 00 00 81 c3 00 00 00 40 c1 eb 0c 89 5f 18 31 f6 eb 19 8b 07 <ff> 30 > 53 > 68 56 ba 89 c9 e8 f8 68 00 00 0f 0b 83 c4 0c be f2 ff ff > [ 908.358228] EAX: 00000000 EBX: 00000000 ECX: 00000001 EDX: 00000000 > [ 908.358411] ESI: 00000001 EDI: f5e6fd4c EBP: f5e6fcc4 ESP: f5e6fca0 > [ 908.358598] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010213 > [ 908.358798] CR0: 80050033 CR2: 00000000 CR3: 00333000 CR4: 000006d0 > [ 908.358981] Call Trace: > [ 908.359062] ? _raw_spin_lock+0x22/0x2a > [ 908.359176] ? lookup_address+0x1d/0x20 > [ 908.359289] __change_page_attr_set_clr+0x85/0x551 > [ 908.359436] ? __mutex_unlock_slowpath+0x20/0x1b6 > [ 908.368244] ? mutex_unlock+0xb/0xd > [ 908.377037] ? _vm_unmap_aliases.part.0+0x11f/0x127 > [ 908.385944] change_page_attr_set_clr+0xdc/0x1af > [ 908.394889] set_pages_array_wb+0x20/0x7b > [ 908.403630] ttm_pages_put+0x22/0x71 [ttm] > [ 908.412159] ttm_page_pool_free+0xf6/0x111 [ttm] > [ 908.420492] ttm_pool_shrink_scan+0x9c/0xd1 [ttm] > [ 908.428885] shrink_slab.constprop.0+0x248/0x38f > [ 908.437241] shrink_node+0x533/0x6f2 > [ 908.445492] kswapd+0x4b9/0x62d > [ 908.453714] ? kswapd+0x4b9/0x62d > [ 908.461937] kthread+0xd1/0xd3 > [ 908.470055] ? try_to_free_pages+0x3d4/0x3d4 > [ 908.478143] ? kthread_delayed_work_timer_fn+0x6a/0x6a > [ 908.486242] ret_from_fork+0x2e/0x38 > [ 908.494256] Modules linked in: auth_rpcgss nfsv4 dns_resolver nfs lockd > grace sunrpc ctr aes_generic libaes ccm hid_generic usbhid hid rt2500pci > eeprom_93cx6 rt2x00pci rt2x00mmio rt2x00lib led_class mac80211 radeon evdev > hwmon i2c_algo_bit drm_kms_helper cfbfillrect syscopyarea cfg80211 cfbimgblt > sysfillrect sysimgblt fb_sys_fops cfbcopyarea firewire_ohci firewire_core fb > rfkill font crc_itu_t 8139too libarc4 mii fbdev sr_mod cdrom ttm thermal fan > ohci_pci drm 8250 snd_intel8x0 8250_base serial_core snd_ac97_codec ac97_bus > ehci_pci ohci_hcd snd_pcm drm_panel_orientation_quirks ehci_hcd button > backlight snd_timer usbcore sis_agp agpgart snd i2c_sis96x usb_common > processor > soundcore zstd zram zsmalloc > [ 908.522107] CR2: 0000000000000000 > [ 908.531646] ---[ end trace f8cc5b63e4c76d19 ]--- > [ 908.541190] EIP: __cpa_process_fault+0x205/0x226 > [ 908.550708] Code: 2d 00 00 00 40 39 d0 76 1f 81 fa ff ff ff bf 76 17 c7 47 > 10 01 00 00 00 81 c3 00 00 00 40 c1 eb 0c 89 5f 18 31 f6 eb 19 8b 07 <ff> 30 > 53 > 68 56 ba 89 c9 e8 f8 68 00 00 0f 0b 83 c4 0c be f2 ff ff > [ 908.560958] EAX: 00000000 EBX: 00000000 ECX: 00000001 EDX: 00000000 > [ 908.571156] ESI: 00000001 EDI: f5e6fd4c EBP: f5e6fcc4 ESP: f5e6fca0 > [ 908.581412] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010213 > [ 908.591710] CR0: 80050033 CR2: 00000000 CR3: 00333000 CR4: 000006d0
On Mon, 02 Mar 2020 23:03:31 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=206697 > > --- Comment #3 from Andrew Morton (akpm@linux-foundation.org) --- > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Mon, 02 Mar 2020 21:55:10 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=206697 > > > > --- Comment #2 from Erhard F. (erhard_f@mailbox.org) --- > > Created attachment 287765 [details] > > --> https://bugzilla.kernel.org/attachment.cgi?id=287765&action=edit > > dmesg (kernel 5.6-rc4, Shuttle XPC FS51, Pentium 4) > > > > Same on kernel 5.6-rc4: > > Thanks. This looks like a regression in the DRM code. I've added > suitable Ccs. Been running the box without loading ttm, drm, radeon modules (kernel 5.6-rc4) via ssh for two days now building stuff. Which worked flawlessly. With ttm, drm, radeon loaded I hit this bug within half an hour. So it really seems the drm code causing this bug.
Looks mostly the same on kernel 5.7-rc1. The line after kthread+0xd1/0xd3 is different. It was "? try_to_free_pages+0x3d4/0x3d4" on 5.5.6 and 5.6-rc4, but is "? shrink_node+0x6f2/0x6f2" on 5.7-rc1 now. [...] Apr 17 00:28:40 BUG: kernel NULL pointer dereference, address: 00000000 Apr 17 00:28:40 #PF: supervisor read access in kernel mode Apr 17 00:28:40 #PF: error_code(0x0000) - not-present page Apr 17 00:28:40 *pde = 00000000 Apr 17 00:28:40 Oops: 0000 [#1] SMP Apr 17 00:28:40 CPU: 0 PID: 53 Comm: kswapd0 Not tainted 5.7.0-rc1-Pentium4 #1 Apr 17 00:28:40 Hardware name: /FS51, BIOS 6.00 PG 12/02/2003 Apr 17 00:28:40 EIP: __cpa_process_fault+0x205/0x226 Apr 17 00:28:40 Code: 2d 00 00 00 40 39 d0 76 1f 81 fa ff ff ff bf 76 17 c7 47 10 01 00 00 00 81 c3 00 00 00 40 c1 eb 0c 89 5f 18 31 f6 eb 19 8b 07 <ff> 30 53 68 d5 c7 cb d8 e8 cb 68 00 00 0f 0b> Apr 17 00:28:40 EAX: 00000000 EBX: 00000000 ECX: 00000001 EDX: 00000000 Apr 17 00:28:40 ESI: 00000001 EDI: f5e5bd4c EBP: f5e5bcc4 ESP: f5e5bca0 Apr 17 00:28:40 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010213 Apr 17 00:28:40 CR0: 80050033 CR2: 00000000 CR3: 05ab3000 CR4: 000006d0 Apr 17 00:28:40 Call Trace: Apr 17 00:28:40 ? _raw_spin_lock+0x22/0x2a Apr 17 00:28:40 ? lookup_address+0x1d/0x20 Apr 17 00:28:40 __change_page_attr_set_clr+0x85/0x551 Apr 17 00:28:40 ? __mutex_unlock_slowpath+0x20/0x1b6 Apr 17 00:28:40 ? mutex_unlock+0xb/0xd Apr 17 00:28:40 ? _vm_unmap_aliases.part.0+0x11f/0x127 Apr 17 00:28:40 change_page_attr_set_clr+0xdc/0x1af Apr 17 00:28:40 set_pages_array_wb+0x20/0x7b Apr 17 00:28:40 ttm_pages_put+0x22/0x71 [ttm] Apr 17 00:28:40 ttm_page_pool_free+0xa1/0x111 [ttm] Apr 17 00:28:40 ttm_pool_shrink_scan+0x9c/0xd1 [ttm] Apr 17 00:28:40 shrink_slab.constprop.0+0x248/0x38f Apr 17 00:28:40 shrink_node+0x533/0x6f2 Apr 17 00:28:40 kswapd+0x4b6/0x628 Apr 17 00:28:40 ? kswapd+0x4b6/0x628 Apr 17 00:28:40 kthread+0xd1/0xd3 Apr 17 00:28:40 ? shrink_node+0x6f2/0x6f2 Apr 17 00:28:40 ? kthread_delayed_work_timer_fn+0x6a/0x6a Apr 17 00:28:40 ret_from_fork+0x2e/0x38 Apr 17 00:28:40 Modules linked in: fuse auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc rt2500pci eeprom_93cx6 rt2x00pci rt2x00mmio rt2x00lib led_class mac80211 radeon hwmon i2c_algo_bit d> Apr 17 00:28:40 CR2: 0000000000000000 Apr 17 00:28:40 ---[ end trace 49fbdfbb6e459a06 ]--- Apr 17 00:28:40 EIP: __cpa_process_fault+0x205/0x226 Apr 17 00:28:40 Code: 2d 00 00 00 40 39 d0 76 1f 81 fa ff ff ff bf 76 17 c7 47 10 01 00 00 00 81 c3 00 00 00 40 c1 eb 0c 89 5f 18 31 f6 eb 19 8b 07 <ff> 30 53 68 d5 c7 cb d8 e8 cb 68 00 00 0f 0b> Apr 17 00:28:40 EAX: 00000000 EBX: 00000000 ECX: 00000001 EDX: 00000000 Apr 17 00:28:40 ESI: 00000001 EDI: f5e5bd4c EBP: f5e5bcc4 ESP: f5e5bca0 Apr 17 00:28:40 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010213 Apr 17 00:28:40 CR0: 80050033 CR2: 00000000 CR3: 05ab3000 CR4: 000006d0
On Mon, 2 Mar 2020 15:03:29 -0800 Andrew Morton <akpm@linux-foundation.org> wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Mon, 02 Mar 2020 21:55:10 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=206697 > > > > --- Comment #2 from Erhard F. (erhard_f@mailbox.org) --- > > Created attachment 287765 [details] > > --> https://bugzilla.kernel.org/attachment.cgi?id=287765&action=edit > > dmesg (kernel 5.6-rc4, Shuttle XPC FS51, Pentium 4) > > > > Same on kernel 5.6-rc4: > > Thanks. This looks like a regression in the DRM code. I've added > suitable Ccs. Erhard, please let's handle this issue via email, not via the bugzilla interface. This does appear to be a DRM issue, and it has been reproduced in 5.7-rc1. Latest update below: From: bugzilla-daemon@bugzilla.kernel.org To: akpm@linux-foundation.org Subject: [Bug 206697] #PF: supervisor read access in kernel mode, #PF: error_code(0x0000) - not-present page while building a large project Date: Fri, 17 Apr 2020 07:45:23 +0000 https://bugzilla.kernel.org/show_bug.cgi?id=206697 --- Comment #5 from Erhard F. (erhard_f@mailbox.org) --- Looks mostly the same on kernel 5.7-rc1. The line after kthread+0xd1/0xd3 is different. It was "? try_to_free_pages+0x3d4/0x3d4" on 5.5.6 and 5.6-rc4, but is "? shrink_node+0x6f2/0x6f2" on 5.7-rc1 now. [...] Apr 17 00:28:40 BUG: kernel NULL pointer dereference, address: 00000000 Apr 17 00:28:40 #PF: supervisor read access in kernel mode Apr 17 00:28:40 #PF: error_code(0x0000) - not-present page Apr 17 00:28:40 *pde = 00000000 Apr 17 00:28:40 Oops: 0000 [#1] SMP Apr 17 00:28:40 CPU: 0 PID: 53 Comm: kswapd0 Not tainted 5.7.0-rc1-Pentium4 #1 Apr 17 00:28:40 Hardware name: /FS51, BIOS 6.00 PG 12/02/2003 Apr 17 00:28:40 EIP: __cpa_process_fault+0x205/0x226 Apr 17 00:28:40 Code: 2d 00 00 00 40 39 d0 76 1f 81 fa ff ff ff bf 76 17 c7 47 10 01 00 00 00 81 c3 00 00 00 40 c1 eb 0c 89 5f 18 31 f6 eb 19 8b 07 <ff> 30 53 68 d5 c7 cb d8 e8 cb 68 00 00 0f 0b> Apr 17 00:28:40 EAX: 00000000 EBX: 00000000 ECX: 00000001 EDX: 00000000 Apr 17 00:28:40 ESI: 00000001 EDI: f5e5bd4c EBP: f5e5bcc4 ESP: f5e5bca0 Apr 17 00:28:40 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010213 Apr 17 00:28:40 CR0: 80050033 CR2: 00000000 CR3: 05ab3000 CR4: 000006d0 Apr 17 00:28:40 Call Trace: Apr 17 00:28:40 ? _raw_spin_lock+0x22/0x2a Apr 17 00:28:40 ? lookup_address+0x1d/0x20 Apr 17 00:28:40 __change_page_attr_set_clr+0x85/0x551 Apr 17 00:28:40 ? __mutex_unlock_slowpath+0x20/0x1b6 Apr 17 00:28:40 ? mutex_unlock+0xb/0xd Apr 17 00:28:40 ? _vm_unmap_aliases.part.0+0x11f/0x127 Apr 17 00:28:40 change_page_attr_set_clr+0xdc/0x1af Apr 17 00:28:40 set_pages_array_wb+0x20/0x7b Apr 17 00:28:40 ttm_pages_put+0x22/0x71 [ttm] Apr 17 00:28:40 ttm_page_pool_free+0xa1/0x111 [ttm] Apr 17 00:28:40 ttm_pool_shrink_scan+0x9c/0xd1 [ttm] Apr 17 00:28:40 shrink_slab.constprop.0+0x248/0x38f Apr 17 00:28:40 shrink_node+0x533/0x6f2 Apr 17 00:28:40 kswapd+0x4b6/0x628 Apr 17 00:28:40 ? kswapd+0x4b6/0x628 Apr 17 00:28:40 kthread+0xd1/0xd3 Apr 17 00:28:40 ? shrink_node+0x6f2/0x6f2 Apr 17 00:28:40 ? kthread_delayed_work_timer_fn+0x6a/0x6a Apr 17 00:28:40 ret_from_fork+0x2e/0x38 Apr 17 00:28:40 Modules linked in: fuse auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc rt2500pci eeprom_93cx6 rt2x00pci rt2x00mmio rt2x00lib led_class mac80211 radeon hwmon i2c_algo_bit d> Apr 17 00:28:40 CR2: 0000000000000000 Apr 17 00:28:40 ---[ end trace 49fbdfbb6e459a06 ]--- Apr 17 00:28:40 EIP: __cpa_process_fault+0x205/0x226 Apr 17 00:28:40 Code: 2d 00 00 00 40 39 d0 76 1f 81 fa ff ff ff bf 76 17 c7 47 10 01 00 00 00 81 c3 00 00 00 40 c1 eb 0c 89 5f 18 31 f6 eb 19 8b 07 <ff> 30 53 68 d5 c7 cb d8 e8 cb 68 00 00 0f 0b> Apr 17 00:28:40 EAX: 00000000 EBX: 00000000 ECX: 00000001 EDX: 00000000 Apr 17 00:28:40 ESI: 00000001 EDI: f5e5bd4c EBP: f5e5bcc4 ESP: f5e5bca0 Apr 17 00:28:40 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010213 Apr 17 00:28:40 CR0: 80050033 CR2: 00000000 CR3: 05ab3000 CR4: 000006d0
Can you bisect? If you are not actually using the GPU, I it shouldn't really be allocating any memory other than what it allocates for the console framebuffer and any driver structures. I think what may be happening is that you are hitting memory pressure and ttm is attempting to return memory and hits some failure.
On Wed, 22 Apr 2020 20:20:24 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=206697 > > Alex Deucher (alexdeucher@gmail.com) changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |alexdeucher@gmail.com > > --- Comment #7 from Alex Deucher (alexdeucher@gmail.com) --- > Can you bisect? If you are not actually using the GPU, I it shouldn't really > be allocating any memory other than what it allocates for the console > framebuffer and any driver structures. I think what may be happening is that > you are hitting memory pressure and ttm is attempting to return memory and > hits > some failure. I will try to but it will take a considerable amount of time. It's an old i686 box of course, but it's new to me. The oldest kernel I ran this machine on is 5.5.6, the one for which I reported this bug. Older kernels won't boot it's btrfs root partition (xxhash, zstd compressed). So I have to do another install in an ext4 partition, replicate the setup and find a suitable starting point for bisecting.
Created attachment 289027 [details] bisect01.log Finally I finished bisecting. Criterion for a 'good' bisect run were successfully building llvm 5 times in a row. At 'bad' bisect runs the bug did show up at the 1st llvm build most of the time, rarely at the 2nd llvm build at the latest. It turned out the offending commit is also the cause of this org bug: https://gitlab.freedesktop.org/xorg/driver/xf86-video-ati/-/issues/191#note_489802 # git bisect good | tee -a ~/bisect01.log 33b3ad3788aba846fc8b9a065fe2685a0b64f713 is the first bad commit commit 33b3ad3788aba846fc8b9a065fe2685a0b64f713 Author: Christoph Hellwig <hch@lst.de> Date: Thu Aug 15 09:27:00 2019 +0200 drm/radeon: handle PCIe root ports with addressing limitations radeon uses a need_dma32 flag to indicate to the drm core that some allocations need to be done using GFP_DMA32, but it only checks the device addressing capabilities to make that decision. Unfortunately PCIe root ports that have limited addressing exist as well. Use the dma_addressing_limited instead to also take those into account. Reviewed-by: Christian König <christian.koenig@amd.com> Reported-by: Atish Patra <Atish.Patra@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> drivers/gpu/drm/radeon/radeon.h | 1 - drivers/gpu/drm/radeon/radeon_device.c | 12 +++++------- drivers/gpu/drm/radeon/radeon_ttm.c | 2 +- 3 files changed, 6 insertions(+), 9 deletions(-)
Created attachment 290953 [details] revert does this revert fix the issue?
Does it work correctly with 5.9-rc1 or newer?
Another thing to try, does setting radeon.agpmode=-1 fix the issue?
On Wed, 16 Sep 2020 20:50:21 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=206697 > > --- Comment #11 from Alex Deucher (alexdeucher@gmail.com) --- > Does it work correctly with 5.9-rc1 or newer? Yes, I had the machine building with 5.9-rc2 for 2 days without issues. Though in my 5.9-rc .config I have not set AGP nor AGP_SIS 'cause it is no longer needed. I did not try building stuff with an affected kernel and radeon.agpmode=-1 yet. But I shall try and report back as soon as I have.
On Wed, 16 Sep 2020 22:06:25 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=206697 > > --- Comment #12 from Alex Deucher (alexdeucher@gmail.com) --- > Another thing to try, does setting radeon.agpmode=-1 fix the issue? Yes, booting an affected kernel like 5.8.1 with radeon.agpmode=-1 also seems to fix the issue. Succesfully built llvm 10 times in a row which would normally crash the machine on the 2nd or 3rd build at the latest.
I'm hitting the same bug with kernel 5.9.14, same stack as the original comment. However I'm using the nouveau driver (so not radeon). Hardware name: Acer Aspire R3610/FMCP7A-ION-LE, BIOS P01-A4 11/03/2009 Modules linked in: dm_mod rpcsec_gss_krb5 md4 cmac nls_utf8 cifs libdes dns_resolver fscache fuse hwmon_vid nouveau snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec ath5k mxm_wmi snd_hda_core ttm ath mac80211 wmi_bmof drm_kms_helper snd_hwdep coretemp cec snd_pcm cfg80211 input_leds rc_core syscopyarea mousedev snd_timer sysfillrect sysimgblt snd rfkill fb_sys_fops pcspkr i2c_algo_bit libarc4 forcedeth soundcore nv_tco i2c_nforce2 wmi evdev tcp_bbr nfsd drm auth_rpcgss nfs_acl sg lockd grace sunrpc nfs_ssc agpgart ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid uas usb_storage ohci_pci ehci_pci ohci_hcd ehci_hcd