Bug 206697 - #PF: supervisor read access in kernel mode, #PF: error_code(0x0000) - not-present page while building a large project
Summary: #PF: supervisor read access in kernel mode, #PF: error_code(0x0000) - not-pre...
Status: NEW
Alias: None
Product: Memory Management
Classification: Unclassified
Component: Page Allocator (show other bugs)
Hardware: IA-32 Linux
: P1 normal
Assignee: Andrew Morton
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-02-28 00:34 UTC by Erhard F.
Modified: 2020-12-27 14:18 UTC (History)
3 users (show)

See Also:
Kernel Version: 5.5.6
Tree: Mainline
Regression: No


Attachments
dmesg (kernel 5.5.6, Shuttle XPC FS51, Pentium 4) (42.74 KB, text/plain)
2020-02-28 00:34 UTC, Erhard F.
Details
kernel .config (kernel 5.5.6, Shuttle XPC FS51, Pentium 4) (98.73 KB, text/plain)
2020-02-28 00:34 UTC, Erhard F.
Details
dmesg (kernel 5.6-rc4, Shuttle XPC FS51, Pentium 4) (43.77 KB, text/plain)
2020-03-02 21:55 UTC, Erhard F.
Details
bisect01.log (4.37 KB, text/x-log)
2020-05-09 17:29 UTC, Erhard F.
Details
revert (2.97 KB, patch)
2020-08-17 15:13 UTC, Alex Deucher
Details | Diff

Description Erhard F. 2020-02-28 00:34:02 UTC
Created attachment 287693 [details]
dmesg (kernel 5.5.6, Shuttle XPC FS51, Pentium 4)

Happens every time on this machine when I build a large project (e.g. boost-1.72.0):

[...]
[ 1079.810216] BUG: kernel NULL pointer dereference, address: 00000000
[ 1079.810430] #PF: supervisor read access in kernel mode
[ 1079.810583] #PF: error_code(0x0000) - not-present page
[ 1079.810736] *pde = 00000000 
[ 1079.810825] Oops: 0000 [#1] SMP
[ 1079.810921] CPU: 0 PID: 53 Comm: kswapd0 Not tainted 5.5.6-gentoo-Pentium4 #6
[ 1079.811134] Hardware name:  /FS51, BIOS 6.00 PG 12/02/2003
[ 1079.811304] EIP: __cpa_process_fault+0x205/0x226
[ 1079.811443] Code: 2d 00 00 00 40 39 d0 76 1f 81 fa ff ff ff bf 76 17 c7 47 10 01 00 00 00 81 c3 00 00 00 40 c1 eb 0c 89 5f 18 31 f6 eb 19 8b 07 <ff> 30 53 68 54 fb c8 ca e8 84 7a 00 00 0f 0b 83 c4 0c be f2 ff ff
[ 1079.811979] EAX: 00000000 EBX: 00000000 ECX: 00000001 EDX: 00000000
[ 1079.812167] ESI: 00000001 EDI: f5e6fd44 EBP: f5e6fcbc ESP: f5e6fc98
[ 1079.812355] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010213
[ 1079.812557] CR0: 80050033 CR2: 00000000 CR3: 052af000 CR4: 000006d0
[ 1079.812740] Call Trace:
[ 1079.812820]  ? _raw_spin_lock+0x22/0x2a
[ 1079.812936]  ? lookup_address+0x1d/0x20
[ 1079.813047]  __change_page_attr_set_clr+0x85/0x551
[ 1079.813189]  ? __mutex_unlock_slowpath+0x20/0x1b6
[ 1079.813326]  ? mutex_unlock+0xb/0xd
[ 1079.813432]  ? _vm_unmap_aliases.part.0+0x11f/0x127
[ 1079.813575]  change_page_attr_set_clr+0xdc/0x1af
[ 1079.813715]  set_pages_array_wb+0x20/0x7b
[ 1079.813848]  ttm_pages_put+0x22/0x71 [ttm]
[ 1079.813975]  ttm_page_pool_free+0xf6/0x111 [ttm]
[ 1079.814116]  ttm_pool_shrink_scan+0x9c/0xd1 [ttm]
[ 1079.814261]  shrink_slab.constprop.0+0x248/0x38f
[ 1079.814398]  shrink_node+0x54a/0x70c
[ 1079.814505]  kswapd+0x4b9/0x62d
[ 1079.814601]  ? kswapd+0x4b9/0x62d
[ 1079.814705]  kthread+0xd1/0xd3
[ 1079.814797]  ? try_to_free_pages+0x3d4/0x3d4
[ 1079.814925]  ? kthread_delayed_work_timer_fn+0x6a/0x6a
[ 1079.815076]  ret_from_fork+0x2e/0x38
[ 1079.815182] Modules linked in: auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc ctr aes_generic libaes ccm hid_generic usbhid hid rt2500pci eeprom_93cx6 rt2x00pci rt2x00mmio rt2x00lib led_class mac80211 radeon evdev hwmon i2c_algo_bit drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea fb font firewire_ohci fbdev ttm firewire_core 8139too crc_itu_t mii sr_mod cfg80211 cdrom drm fan thermal drm_panel_orientation_quirks ohci_pci snd_intel8x0 backlight 8250 snd_ac97_codec 8250_base serial_core ehci_pci ohci_hcd rfkill ehci_hcd ac97_bus libarc4 snd_pcm usbcore sis_agp snd_timer usb_common snd agpgart button processor soundcore i2c_sis96x
[ 1079.835024] CR2: 0000000000000000
[ 1079.844798] ---[ end trace 2acb3661952bc786 ]---
[ 1079.854557] EIP: __cpa_process_fault+0x205/0x226
[ 1079.864444] Code: 2d 00 00 00 40 39 d0 76 1f 81 fa ff ff ff bf 76 17 c7 47 10 01 00 00 00 81 c3 00 00 00 40 c1 eb 0c 89 5f 18 31 f6 eb 19 8b 07 <ff> 30 53 68 54 fb c8 ca e8 84 7a 00 00 0f 0b 83 c4 0c be f2 ff ff
[ 1079.874942] EAX: 00000000 EBX: 00000000 ECX: 00000001 EDX: 00000000
[ 1079.885370] ESI: 00000001 EDI: f5e6fd44 EBP: f5e6fcbc ESP: f5e6fc98
[ 1079.895776] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010213
[ 1079.906304] CR0: 80050033 CR2: 00000000 CR3: 052af000 CR4: 000006d0

Don't think it has much to do with ttm strictly speaking, as I am running this machine without X via ssh.

# inxi -b
System:    Kernel: 5.5.6-gentoo-Pentium4 i686 bits: 32 Console: tty 0 
           Distro: Gentoo Base System release 2.6 
Machine:   Type: Desktop Mobo: Shuttle model: FS51 serial: N/A BIOS: Phoenix v: 6.00 PG 
           date: 12/02/2003 
CPU:       Single Core: Intel Pentium 4 type: MCP speed: 2796 MHz 
Graphics:  Device-1: AMD RV350 [Radeon 9550/9600/X1050 Series] driver: radeon v: kernel 
           Display: server: X.org 1.20.6 driver: ati,radeon unloaded: fbdev,modesetting tty: 104x53 
           Message: Advanced graphics data unavailable in console for root. 
Network:   Device-1: Ralink RT2500 Wireless 802.11bg driver: rt2500pci 
           Device-2: Realtek RTL-8100/8101L/8139 PCI Fast Ethernet Adapter driver: 8139too 
Drives:    Local Storage: total: 76.69 GiB used: 2.77 GiB (3.6%) 
Info:      Processes: 95 Uptime: 18m Memory: 1.97 GiB used: 159.6 MiB (7.9%) Init: systemd 
           Shell: bash inxi: 3.0.36
Comment 1 Erhard F. 2020-02-28 00:34:42 UTC
Created attachment 287695 [details]
kernel .config (kernel 5.5.6, Shuttle XPC FS51, Pentium 4)
Comment 2 Erhard F. 2020-03-02 21:55:10 UTC
Created attachment 287765 [details]
dmesg (kernel 5.6-rc4, Shuttle XPC FS51, Pentium 4)

Same on kernel 5.6-rc4:

[...]
[  908.356444] BUG: kernel NULL pointer dereference, address: 00000000
[  908.356670] #PF: supervisor read access in kernel mode
[  908.356823] #PF: error_code(0x0000) - not-present page
[  908.356974] *pde = 00000000 
[  908.357064] Oops: 0000 [#1] SMP
[  908.357163] CPU: 0 PID: 53 Comm: kswapd0 Not tainted 5.6.0-rc4-Pentium4 #1
[  908.357367] Hardware name:  /FS51, BIOS 6.00 PG 12/02/2003
[  908.357535] EIP: __cpa_process_fault+0x205/0x226
[  908.357674] Code: 2d 00 00 00 40 39 d0 76 1f 81 fa ff ff ff bf 76 17 c7 47 10 01 00 00 00 81 c3 00 00 00 40 c1 eb 0c 89 5f 18 31 f6 eb 19 8b 07 <ff> 30 53 68 56 ba 89 c9 e8 f8 68 00 00 0f 0b 83 c4 0c be f2 ff ff
[  908.358228] EAX: 00000000 EBX: 00000000 ECX: 00000001 EDX: 00000000
[  908.358411] ESI: 00000001 EDI: f5e6fd4c EBP: f5e6fcc4 ESP: f5e6fca0
[  908.358598] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010213
[  908.358798] CR0: 80050033 CR2: 00000000 CR3: 00333000 CR4: 000006d0
[  908.358981] Call Trace:
[  908.359062]  ? _raw_spin_lock+0x22/0x2a
[  908.359176]  ? lookup_address+0x1d/0x20
[  908.359289]  __change_page_attr_set_clr+0x85/0x551
[  908.359436]  ? __mutex_unlock_slowpath+0x20/0x1b6
[  908.368244]  ? mutex_unlock+0xb/0xd
[  908.377037]  ? _vm_unmap_aliases.part.0+0x11f/0x127
[  908.385944]  change_page_attr_set_clr+0xdc/0x1af
[  908.394889]  set_pages_array_wb+0x20/0x7b
[  908.403630]  ttm_pages_put+0x22/0x71 [ttm]
[  908.412159]  ttm_page_pool_free+0xf6/0x111 [ttm]
[  908.420492]  ttm_pool_shrink_scan+0x9c/0xd1 [ttm]
[  908.428885]  shrink_slab.constprop.0+0x248/0x38f
[  908.437241]  shrink_node+0x533/0x6f2
[  908.445492]  kswapd+0x4b9/0x62d
[  908.453714]  ? kswapd+0x4b9/0x62d
[  908.461937]  kthread+0xd1/0xd3
[  908.470055]  ? try_to_free_pages+0x3d4/0x3d4
[  908.478143]  ? kthread_delayed_work_timer_fn+0x6a/0x6a
[  908.486242]  ret_from_fork+0x2e/0x38
[  908.494256] Modules linked in: auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc ctr aes_generic libaes ccm hid_generic usbhid hid rt2500pci eeprom_93cx6 rt2x00pci rt2x00mmio rt2x00lib led_class mac80211 radeon evdev hwmon i2c_algo_bit drm_kms_helper cfbfillrect syscopyarea cfg80211 cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea firewire_ohci firewire_core fb rfkill font crc_itu_t 8139too libarc4 mii fbdev sr_mod cdrom ttm thermal fan ohci_pci drm 8250 snd_intel8x0 8250_base serial_core snd_ac97_codec ac97_bus ehci_pci ohci_hcd snd_pcm drm_panel_orientation_quirks ehci_hcd button backlight snd_timer usbcore sis_agp agpgart snd i2c_sis96x usb_common processor soundcore zstd zram zsmalloc
[  908.522107] CR2: 0000000000000000
[  908.531646] ---[ end trace f8cc5b63e4c76d19 ]---
[  908.541190] EIP: __cpa_process_fault+0x205/0x226
[  908.550708] Code: 2d 00 00 00 40 39 d0 76 1f 81 fa ff ff ff bf 76 17 c7 47 10 01 00 00 00 81 c3 00 00 00 40 c1 eb 0c 89 5f 18 31 f6 eb 19 8b 07 <ff> 30 53 68 56 ba 89 c9 e8 f8 68 00 00 0f 0b 83 c4 0c be f2 ff ff
[  908.560958] EAX: 00000000 EBX: 00000000 ECX: 00000001 EDX: 00000000
[  908.571156] ESI: 00000001 EDI: f5e6fd4c EBP: f5e6fcc4 ESP: f5e6fca0
[  908.581412] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010213
[  908.591710] CR0: 80050033 CR2: 00000000 CR3: 00333000 CR4: 000006d0
Comment 3 Andrew Morton 2020-03-02 23:03:31 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Mon, 02 Mar 2020 21:55:10 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=206697
> 
> --- Comment #2 from Erhard F. (erhard_f@mailbox.org) ---
> Created attachment 287765 [details]
>   --> https://bugzilla.kernel.org/attachment.cgi?id=287765&action=edit
> dmesg (kernel 5.6-rc4, Shuttle XPC FS51, Pentium 4)
> 
> Same on kernel 5.6-rc4:

Thanks.  This looks like a regression in the DRM code.  I've added
suitable Ccs.


> [...]
> [  908.356444] BUG: kernel NULL pointer dereference, address: 00000000
> [  908.356670] #PF: supervisor read access in kernel mode
> [  908.356823] #PF: error_code(0x0000) - not-present page
> [  908.356974] *pde = 00000000 
> [  908.357064] Oops: 0000 [#1] SMP
> [  908.357163] CPU: 0 PID: 53 Comm: kswapd0 Not tainted 5.6.0-rc4-Pentium4 #1
> [  908.357367] Hardware name:  /FS51, BIOS 6.00 PG 12/02/2003
> [  908.357535] EIP: __cpa_process_fault+0x205/0x226
> [  908.357674] Code: 2d 00 00 00 40 39 d0 76 1f 81 fa ff ff ff bf 76 17 c7 47
> 10 01 00 00 00 81 c3 00 00 00 40 c1 eb 0c 89 5f 18 31 f6 eb 19 8b 07 <ff> 30
> 53
> 68 56 ba 89 c9 e8 f8 68 00 00 0f 0b 83 c4 0c be f2 ff ff
> [  908.358228] EAX: 00000000 EBX: 00000000 ECX: 00000001 EDX: 00000000
> [  908.358411] ESI: 00000001 EDI: f5e6fd4c EBP: f5e6fcc4 ESP: f5e6fca0
> [  908.358598] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010213
> [  908.358798] CR0: 80050033 CR2: 00000000 CR3: 00333000 CR4: 000006d0
> [  908.358981] Call Trace:
> [  908.359062]  ? _raw_spin_lock+0x22/0x2a
> [  908.359176]  ? lookup_address+0x1d/0x20
> [  908.359289]  __change_page_attr_set_clr+0x85/0x551
> [  908.359436]  ? __mutex_unlock_slowpath+0x20/0x1b6
> [  908.368244]  ? mutex_unlock+0xb/0xd
> [  908.377037]  ? _vm_unmap_aliases.part.0+0x11f/0x127
> [  908.385944]  change_page_attr_set_clr+0xdc/0x1af
> [  908.394889]  set_pages_array_wb+0x20/0x7b
> [  908.403630]  ttm_pages_put+0x22/0x71 [ttm]
> [  908.412159]  ttm_page_pool_free+0xf6/0x111 [ttm]
> [  908.420492]  ttm_pool_shrink_scan+0x9c/0xd1 [ttm]
> [  908.428885]  shrink_slab.constprop.0+0x248/0x38f
> [  908.437241]  shrink_node+0x533/0x6f2
> [  908.445492]  kswapd+0x4b9/0x62d
> [  908.453714]  ? kswapd+0x4b9/0x62d
> [  908.461937]  kthread+0xd1/0xd3
> [  908.470055]  ? try_to_free_pages+0x3d4/0x3d4
> [  908.478143]  ? kthread_delayed_work_timer_fn+0x6a/0x6a
> [  908.486242]  ret_from_fork+0x2e/0x38
> [  908.494256] Modules linked in: auth_rpcgss nfsv4 dns_resolver nfs lockd
> grace sunrpc ctr aes_generic libaes ccm hid_generic usbhid hid rt2500pci
> eeprom_93cx6 rt2x00pci rt2x00mmio rt2x00lib led_class mac80211 radeon evdev
> hwmon i2c_algo_bit drm_kms_helper cfbfillrect syscopyarea cfg80211 cfbimgblt
> sysfillrect sysimgblt fb_sys_fops cfbcopyarea firewire_ohci firewire_core fb
> rfkill font crc_itu_t 8139too libarc4 mii fbdev sr_mod cdrom ttm thermal fan
> ohci_pci drm 8250 snd_intel8x0 8250_base serial_core snd_ac97_codec ac97_bus
> ehci_pci ohci_hcd snd_pcm drm_panel_orientation_quirks ehci_hcd button
> backlight snd_timer usbcore sis_agp agpgart snd i2c_sis96x usb_common
> processor
> soundcore zstd zram zsmalloc
> [  908.522107] CR2: 0000000000000000
> [  908.531646] ---[ end trace f8cc5b63e4c76d19 ]---
> [  908.541190] EIP: __cpa_process_fault+0x205/0x226
> [  908.550708] Code: 2d 00 00 00 40 39 d0 76 1f 81 fa ff ff ff bf 76 17 c7 47
> 10 01 00 00 00 81 c3 00 00 00 40 c1 eb 0c 89 5f 18 31 f6 eb 19 8b 07 <ff> 30
> 53
> 68 56 ba 89 c9 e8 f8 68 00 00 0f 0b 83 c4 0c be f2 ff ff
> [  908.560958] EAX: 00000000 EBX: 00000000 ECX: 00000001 EDX: 00000000
> [  908.571156] ESI: 00000001 EDI: f5e6fd4c EBP: f5e6fcc4 ESP: f5e6fca0
> [  908.581412] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010213
> [  908.591710] CR0: 80050033 CR2: 00000000 CR3: 00333000 CR4: 000006d0
Comment 4 Erhard F. 2020-03-06 00:44:11 UTC
On Mon, 02 Mar 2020 23:03:31 +0000
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=206697
> 
> --- Comment #3 from Andrew Morton (akpm@linux-foundation.org) ---
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Mon, 02 Mar 2020 21:55:10 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> > https://bugzilla.kernel.org/show_bug.cgi?id=206697
> > 
> > --- Comment #2 from Erhard F. (erhard_f@mailbox.org) ---
> > Created attachment 287765 [details]  
> >   --> https://bugzilla.kernel.org/attachment.cgi?id=287765&action=edit  
> > dmesg (kernel 5.6-rc4, Shuttle XPC FS51, Pentium 4)
> > 
> > Same on kernel 5.6-rc4:  
> 
> Thanks.  This looks like a regression in the DRM code.  I've added
> suitable Ccs.

Been running the box without loading ttm, drm, radeon modules (kernel 5.6-rc4) via ssh for two days now building stuff. Which worked flawlessly. With ttm, drm, radeon loaded I hit this bug within half an hour. So it really seems the drm code causing this bug.
Comment 5 Erhard F. 2020-04-17 07:45:23 UTC
Looks mostly the same on kernel 5.7-rc1. The line after kthread+0xd1/0xd3 is different. It was "? try_to_free_pages+0x3d4/0x3d4" on 5.5.6 and 5.6-rc4, but is "? shrink_node+0x6f2/0x6f2" on 5.7-rc1 now.

[...]
Apr 17 00:28:40 BUG: kernel NULL pointer dereference, address: 00000000
Apr 17 00:28:40 #PF: supervisor read access in kernel mode
Apr 17 00:28:40 #PF: error_code(0x0000) - not-present page
Apr 17 00:28:40 *pde = 00000000 
Apr 17 00:28:40 Oops: 0000 [#1] SMP
Apr 17 00:28:40 CPU: 0 PID: 53 Comm: kswapd0 Not tainted 5.7.0-rc1-Pentium4 #1
Apr 17 00:28:40 Hardware name:  /FS51, BIOS 6.00 PG 12/02/2003
Apr 17 00:28:40 EIP: __cpa_process_fault+0x205/0x226
Apr 17 00:28:40 Code: 2d 00 00 00 40 39 d0 76 1f 81 fa ff ff ff bf 76 17 c7 47 10 01 00 00 00 81 c3 00 00 00 40 c1 eb 0c 89 5f 18 31 f6 eb 19 8b 07 <ff> 30 53 68 d5 c7 cb d8 e8 cb 68 00 00 0f 0b>
Apr 17 00:28:40 EAX: 00000000 EBX: 00000000 ECX: 00000001 EDX: 00000000
Apr 17 00:28:40 ESI: 00000001 EDI: f5e5bd4c EBP: f5e5bcc4 ESP: f5e5bca0
Apr 17 00:28:40 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010213
Apr 17 00:28:40 CR0: 80050033 CR2: 00000000 CR3: 05ab3000 CR4: 000006d0
Apr 17 00:28:40 Call Trace:
Apr 17 00:28:40  ? _raw_spin_lock+0x22/0x2a
Apr 17 00:28:40  ? lookup_address+0x1d/0x20
Apr 17 00:28:40  __change_page_attr_set_clr+0x85/0x551
Apr 17 00:28:40  ? __mutex_unlock_slowpath+0x20/0x1b6
Apr 17 00:28:40  ? mutex_unlock+0xb/0xd
Apr 17 00:28:40  ? _vm_unmap_aliases.part.0+0x11f/0x127
Apr 17 00:28:40  change_page_attr_set_clr+0xdc/0x1af
Apr 17 00:28:40  set_pages_array_wb+0x20/0x7b
Apr 17 00:28:40  ttm_pages_put+0x22/0x71 [ttm]
Apr 17 00:28:40  ttm_page_pool_free+0xa1/0x111 [ttm]
Apr 17 00:28:40  ttm_pool_shrink_scan+0x9c/0xd1 [ttm]
Apr 17 00:28:40  shrink_slab.constprop.0+0x248/0x38f
Apr 17 00:28:40  shrink_node+0x533/0x6f2
Apr 17 00:28:40  kswapd+0x4b6/0x628
Apr 17 00:28:40  ? kswapd+0x4b6/0x628
Apr 17 00:28:40  kthread+0xd1/0xd3
Apr 17 00:28:40  ? shrink_node+0x6f2/0x6f2
Apr 17 00:28:40  ? kthread_delayed_work_timer_fn+0x6a/0x6a
Apr 17 00:28:40  ret_from_fork+0x2e/0x38
Apr 17 00:28:40 Modules linked in: fuse auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc rt2500pci eeprom_93cx6 rt2x00pci rt2x00mmio rt2x00lib led_class mac80211 radeon hwmon i2c_algo_bit d>
Apr 17 00:28:40 CR2: 0000000000000000
Apr 17 00:28:40 ---[ end trace 49fbdfbb6e459a06 ]---
Apr 17 00:28:40 EIP: __cpa_process_fault+0x205/0x226
Apr 17 00:28:40 Code: 2d 00 00 00 40 39 d0 76 1f 81 fa ff ff ff bf 76 17 c7 47 10 01 00 00 00 81 c3 00 00 00 40 c1 eb 0c 89 5f 18 31 f6 eb 19 8b 07 <ff> 30 53 68 d5 c7 cb d8 e8 cb 68 00 00 0f 0b>
Apr 17 00:28:40 EAX: 00000000 EBX: 00000000 ECX: 00000001 EDX: 00000000
Apr 17 00:28:40 ESI: 00000001 EDI: f5e5bd4c EBP: f5e5bcc4 ESP: f5e5bca0
Apr 17 00:28:40 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010213
Apr 17 00:28:40 CR0: 80050033 CR2: 00000000 CR3: 05ab3000 CR4: 000006d0
Comment 6 Andrew Morton 2020-04-17 21:58:16 UTC
On Mon, 2 Mar 2020 15:03:29 -0800 Andrew Morton <akpm@linux-foundation.org> wrote:

> 
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Mon, 02 Mar 2020 21:55:10 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> > https://bugzilla.kernel.org/show_bug.cgi?id=206697
> > 
> > --- Comment #2 from Erhard F. (erhard_f@mailbox.org) ---
> > Created attachment 287765 [details]
> >   --> https://bugzilla.kernel.org/attachment.cgi?id=287765&action=edit
> > dmesg (kernel 5.6-rc4, Shuttle XPC FS51, Pentium 4)
> > 
> > Same on kernel 5.6-rc4:
> 
> Thanks.  This looks like a regression in the DRM code.  I've added
> suitable Ccs.

Erhard, please let's handle this issue via email, not via the bugzilla
interface.

This does appear to be a DRM issue, and it has been reproduced in
5.7-rc1.

Latest update below:

From: bugzilla-daemon@bugzilla.kernel.org
To: akpm@linux-foundation.org
Subject: [Bug 206697] #PF: supervisor read access in kernel mode, #PF: error_code(0x0000) - not-present page while building a large project
Date: Fri, 17 Apr 2020 07:45:23 +0000

https://bugzilla.kernel.org/show_bug.cgi?id=206697

--- Comment #5 from Erhard F. (erhard_f@mailbox.org) ---
Looks mostly the same on kernel 5.7-rc1. The line after kthread+0xd1/0xd3 is
different. It was "? try_to_free_pages+0x3d4/0x3d4" on 5.5.6 and 5.6-rc4, but
is "? shrink_node+0x6f2/0x6f2" on 5.7-rc1 now.

[...]
Apr 17 00:28:40 BUG: kernel NULL pointer dereference, address: 00000000
Apr 17 00:28:40 #PF: supervisor read access in kernel mode
Apr 17 00:28:40 #PF: error_code(0x0000) - not-present page
Apr 17 00:28:40 *pde = 00000000 
Apr 17 00:28:40 Oops: 0000 [#1] SMP
Apr 17 00:28:40 CPU: 0 PID: 53 Comm: kswapd0 Not tainted 5.7.0-rc1-Pentium4 #1
Apr 17 00:28:40 Hardware name:  /FS51, BIOS 6.00 PG 12/02/2003
Apr 17 00:28:40 EIP: __cpa_process_fault+0x205/0x226
Apr 17 00:28:40 Code: 2d 00 00 00 40 39 d0 76 1f 81 fa ff ff ff bf 76 17 c7 47
10 01 00 00 00 81 c3 00 00 00 40 c1 eb 0c 89 5f 18 31 f6 eb 19 8b 07 <ff> 30 53
68 d5 c7 cb d8 e8 cb 68 00 00 0f 0b>
Apr 17 00:28:40 EAX: 00000000 EBX: 00000000 ECX: 00000001 EDX: 00000000
Apr 17 00:28:40 ESI: 00000001 EDI: f5e5bd4c EBP: f5e5bcc4 ESP: f5e5bca0
Apr 17 00:28:40 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010213
Apr 17 00:28:40 CR0: 80050033 CR2: 00000000 CR3: 05ab3000 CR4: 000006d0
Apr 17 00:28:40 Call Trace:
Apr 17 00:28:40  ? _raw_spin_lock+0x22/0x2a
Apr 17 00:28:40  ? lookup_address+0x1d/0x20
Apr 17 00:28:40  __change_page_attr_set_clr+0x85/0x551
Apr 17 00:28:40  ? __mutex_unlock_slowpath+0x20/0x1b6
Apr 17 00:28:40  ? mutex_unlock+0xb/0xd
Apr 17 00:28:40  ? _vm_unmap_aliases.part.0+0x11f/0x127
Apr 17 00:28:40  change_page_attr_set_clr+0xdc/0x1af
Apr 17 00:28:40  set_pages_array_wb+0x20/0x7b
Apr 17 00:28:40  ttm_pages_put+0x22/0x71 [ttm]
Apr 17 00:28:40  ttm_page_pool_free+0xa1/0x111 [ttm]
Apr 17 00:28:40  ttm_pool_shrink_scan+0x9c/0xd1 [ttm]
Apr 17 00:28:40  shrink_slab.constprop.0+0x248/0x38f
Apr 17 00:28:40  shrink_node+0x533/0x6f2
Apr 17 00:28:40  kswapd+0x4b6/0x628
Apr 17 00:28:40  ? kswapd+0x4b6/0x628
Apr 17 00:28:40  kthread+0xd1/0xd3
Apr 17 00:28:40  ? shrink_node+0x6f2/0x6f2
Apr 17 00:28:40  ? kthread_delayed_work_timer_fn+0x6a/0x6a
Apr 17 00:28:40  ret_from_fork+0x2e/0x38
Apr 17 00:28:40 Modules linked in: fuse auth_rpcgss nfsv4 dns_resolver nfs
lockd grace sunrpc rt2500pci eeprom_93cx6 rt2x00pci rt2x00mmio rt2x00lib
led_class mac80211 radeon hwmon i2c_algo_bit d>
Apr 17 00:28:40 CR2: 0000000000000000
Apr 17 00:28:40 ---[ end trace 49fbdfbb6e459a06 ]---
Apr 17 00:28:40 EIP: __cpa_process_fault+0x205/0x226
Apr 17 00:28:40 Code: 2d 00 00 00 40 39 d0 76 1f 81 fa ff ff ff bf 76 17 c7 47
10 01 00 00 00 81 c3 00 00 00 40 c1 eb 0c 89 5f 18 31 f6 eb 19 8b 07 <ff> 30 53
68 d5 c7 cb d8 e8 cb 68 00 00 0f 0b>
Apr 17 00:28:40 EAX: 00000000 EBX: 00000000 ECX: 00000001 EDX: 00000000
Apr 17 00:28:40 ESI: 00000001 EDI: f5e5bd4c EBP: f5e5bcc4 ESP: f5e5bca0
Apr 17 00:28:40 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010213
Apr 17 00:28:40 CR0: 80050033 CR2: 00000000 CR3: 05ab3000 CR4: 000006d0
Comment 7 Alex Deucher 2020-04-22 20:20:24 UTC
Can you bisect?  If you are not actually using the GPU, I it shouldn't really be allocating any memory other than what it allocates for the console framebuffer and any driver structures.  I think what may be happening is that you are hitting memory pressure and ttm is attempting to return memory and hits some failure.
Comment 8 Erhard F. 2020-04-23 00:24:57 UTC
On Wed, 22 Apr 2020 20:20:24 +0000
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=206697
> 
> Alex Deucher (alexdeucher@gmail.com) changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |alexdeucher@gmail.com
> 
> --- Comment #7 from Alex Deucher (alexdeucher@gmail.com) ---
> Can you bisect?  If you are not actually using the GPU, I it shouldn't really
> be allocating any memory other than what it allocates for the console
> framebuffer and any driver structures.  I think what may be happening is that
> you are hitting memory pressure and ttm is attempting to return memory and
> hits
> some failure.

I will try to but it will take a considerable amount of time.

It's an old i686 box of course, but it's new to me. The oldest kernel I ran this machine on is 5.5.6, the one for which I reported this bug. Older kernels won't boot it's btrfs root partition (xxhash, zstd compressed). So I have to do another install in an ext4 partition, replicate the setup and find a suitable starting point for bisecting.
Comment 9 Erhard F. 2020-05-09 17:29:19 UTC
Created attachment 289027 [details]
bisect01.log

Finally I finished bisecting. Criterion for a 'good' bisect run were successfully building llvm 5 times in a row. At 'bad' bisect runs the bug did show up at the 1st llvm build most of the time, rarely at the 2nd llvm build at the latest.

It turned out the offending commit is also the cause of this org bug:
https://gitlab.freedesktop.org/xorg/driver/xf86-video-ati/-/issues/191#note_489802

# git bisect good | tee -a ~/bisect01.log
33b3ad3788aba846fc8b9a065fe2685a0b64f713 is the first bad commit
commit 33b3ad3788aba846fc8b9a065fe2685a0b64f713
Author: Christoph Hellwig <hch@lst.de>
Date:   Thu Aug 15 09:27:00 2019 +0200

    drm/radeon: handle PCIe root ports with addressing limitations
    
    radeon uses a need_dma32 flag to indicate to the drm core that some
    allocations need to be done using GFP_DMA32, but it only checks the
    device addressing capabilities to make that decision.  Unfortunately
    PCIe root ports that have limited addressing exist as well.  Use the
    dma_addressing_limited instead to also take those into account.
    
    Reviewed-by: Christian König <christian.koenig@amd.com>
    Reported-by: Atish Patra <Atish.Patra@wdc.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

 drivers/gpu/drm/radeon/radeon.h        |  1 -
 drivers/gpu/drm/radeon/radeon_device.c | 12 +++++-------
 drivers/gpu/drm/radeon/radeon_ttm.c    |  2 +-
 3 files changed, 6 insertions(+), 9 deletions(-)
Comment 10 Alex Deucher 2020-08-17 15:13:22 UTC
Created attachment 290953 [details]
revert

does this revert fix the issue?
Comment 11 Alex Deucher 2020-09-16 20:50:21 UTC
Does it work correctly with 5.9-rc1 or newer?
Comment 12 Alex Deucher 2020-09-16 22:06:25 UTC
Another thing to try, does setting radeon.agpmode=-1 fix the issue?
Comment 13 Erhard F. 2020-09-17 16:24:03 UTC
On Wed, 16 Sep 2020 20:50:21 +0000
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=206697
> 
> --- Comment #11 from Alex Deucher (alexdeucher@gmail.com) ---
> Does it work correctly with 5.9-rc1 or newer?

Yes, I had the machine building with 5.9-rc2 for 2 days without issues. Though in my 5.9-rc .config I have not set AGP nor AGP_SIS 'cause it is no longer needed.

I did not try building stuff with an affected kernel and radeon.agpmode=-1 yet. But I shall try and report back as soon as I have.
Comment 14 Erhard F. 2020-09-18 16:10:47 UTC
On Wed, 16 Sep 2020 22:06:25 +0000
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=206697
> 
> --- Comment #12 from Alex Deucher (alexdeucher@gmail.com) ---
> Another thing to try, does setting radeon.agpmode=-1 fix the issue?

Yes, booting an affected kernel like 5.8.1 with radeon.agpmode=-1 also seems to fix the issue. Succesfully built llvm 10 times in a row which would normally crash the machine on the 2nd or 3rd build at the latest.
Comment 15 taz.007 2020-12-18 20:22:24 UTC
I'm hitting the same bug with kernel 5.9.14, same stack as the original comment. However I'm using the nouveau driver (so not radeon).

Hardware name: Acer Aspire R3610/FMCP7A-ION-LE, BIOS P01-A4 11/03/2009

Modules linked in: dm_mod rpcsec_gss_krb5 md4 cmac nls_utf8 cifs libdes dns_resolver fscache fuse hwmon_vid nouveau snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec ath5k mxm_wmi snd_hda_core ttm ath mac80211 wmi_bmof drm_kms_helper snd_hwdep coretemp cec snd_pcm cfg80211 input_leds rc_core syscopyarea mousedev snd_timer sysfillrect sysimgblt snd rfkill fb_sys_fops pcspkr i2c_algo_bit libarc4 forcedeth soundcore nv_tco i2c_nforce2 wmi evdev tcp_bbr nfsd drm auth_rpcgss nfs_acl sg lockd grace sunrpc nfs_ssc agpgart ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid uas usb_storage ohci_pci ehci_pci ohci_hcd ehci_hcd

Note You need to log in before you can comment on or make changes to this bug.