Bug 204407

Summary: Bad page state in process Xorg
Product: Memory Management Reporter: Petr Vandrovec (petr)
Component: Page AllocatorAssignee: Andrew Morton (akpm)
Status: NEW ---    
Severity: normal    
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.3.0-rc2-00013 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg
attachment-24570-0.html
0001-mm-page_owner-record-page-owner-for-each-subpage.patch
0002-mm-page_owner-keep-owner-info-when-freeing-the-page.patch
0003-mm-page_owner-debug_pagealloc-save-freeing-stack-tra.patch

Description Petr Vandrovec 2019-08-01 22:34:16 UTC
Created attachment 284081 [details]
dmesg

I've upgraded from 5.3-rc1 to 5.3-rc2, and when I started X server, system became unhappy:

[259701.387365] BUG: Bad page state in process Xorg  pfn:2a300
[259701.393593] page:ffffea0000a8c000 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0
[259701.402832] flags: 0x2000000000000000()
[259701.407426] raw: 2000000000000000 ffffffff822ab778 ffffea0000a8f208 0000000000000000
[259701.415900] raw: 0000000000000000 0000000000000003 00000000ffffff7f 0000000000000000
[259701.424373] page dumped because: nonzero mapcount
[259701.429847] Modules linked in: af_packet xt_REDIRECT nft_compat x_tables nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv4 nf_tables nfnetlink ppdev parport fuse autofs4 binfmt_misc uinput twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common camellia_generic camellia_aesni_avx_x86_64 camellia_x86_64 serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic blowfish_generic blowfish_x86_64 blowfish_common cast5_avx_x86_64 cast5_generic cast_common des_generic cmac xcbc rmd160 af_key xfrm_algo rpcsec_gss_krb5 nfsv4 nls_iso8859_2 cifs libarc4 nfsv3 nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ipv6 crc_ccitt nf_defrag_ipv6 snd_hda_codec_hdmi pktcdvd coretemp hwmon intel_rapl_common iosf_mbi x86_pkg_temp_thermal snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_hda_codec crct10dif_pclmul crc32_pclmul snd_hwdep crc32c_intel snd_hda_core ghash_clmulni_intel snd_pcm_oss uas iTCO_wdt aesni_intel e1000e
[259701.429873]  snd_mixer_oss iTCO_vendor_support aes_x86_64 snd_pcm crypto_simd ptp mei_me dcdbas lpc_ich sr_mod cryptd usb_storage snd_timer glue_helper mfd_core input_leds pps_core tpm_tis cdrom i2c_i801 snd mei tpm_tis_core sg tpm [last unloaded: parport_pc]
[259701.539387] CPU: 10 PID: 4860 Comm: Xorg Tainted: G                T 5.3.0-rc2-64-00013-g03f05a670a3d #69
[259701.549382] Hardware name: Dell Inc. Precision T3610/09M8Y8, BIOS A16 02/05/2018
[259701.549382] Call Trace:
[259701.549382]  dump_stack+0x46/0x60
[259701.549382]  bad_page.cold.28+0x81/0xb4
[259701.549382]  __free_pages_ok+0x236/0x240
[259701.549382]  __ttm_dma_free_page+0x2f/0x40
[259701.549382]  ttm_dma_unpopulate+0x29b/0x370
[259701.549382]  ttm_tt_destroy.part.6+0x44/0x50
[259701.549382]  ttm_bo_cleanup_memtype_use+0x29/0x70
[259701.549382]  ttm_bo_put+0x225/0x280
[259701.549382]  ttm_bo_vm_close+0x10/0x20
[259701.549382]  remove_vma+0x20/0x40
[259701.549382]  __do_munmap+0x2da/0x420
[259701.549382]  __vm_munmap+0x66/0xc0
[259701.549382]  __x64_sys_munmap+0x22/0x30
[259701.549382]  do_syscall_64+0x5e/0x1a0
[259701.549382]  ? prepare_exit_to_usermode+0x75/0xa0
[259701.549382]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[259701.549382] RIP: 0033:0x7f504d0ec1d7
[259701.549382] Code: 10 e9 67 ff ff ff 0f 1f 44 00 00 48 8b 15 b1 6c 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff e9 6b ff ff ff b8 0b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 89 6c 0c 00 f7 d8 64 89 01 48
[259701.549382] RSP: 002b:00007ffe529db138 EFLAGS: 00000206 ORIG_RAX: 000000000000000b
[259701.549382] RAX: ffffffffffffffda RBX: 0000564a5eabce70 RCX: 00007f504d0ec1d7
[259701.549382] RDX: 00007ffe529db140 RSI: 0000000000400000 RDI: 00007f5044b65000
[259701.549382] RBP: 0000564a5eafe460 R08: 000000000000000b R09: 000000010283e000
[259701.549382] R10: 0000000000000001 R11: 0000000000000206 R12: 0000564a5e475b08
[259701.549382] R13: 0000564a5e475c80 R14: 00007ffe529db190 R15: 0000000000000c80
[259701.707238] Disabling lock debugging due to kernel taint


Also - maybe related, maybe not - I've got three userspace crashes earlier on this kernel (but never before):

[77154.886836] iscons.py[12441]: segfault at 2c ip 00000000080cf0b5 sp 00000000f773fb60 error 4 in python[8048000+11a000]
[77154.898376] Code: 02 0f 84 4a 2e 00 00 8b 4d 08 8b bd 04 ff ff ff 8b 59 38 8b 57 20 8b 7b 10 85 ff 0f 84 ee 22 00 00 8b 8d 04 ff ff ff 8b 59 08 <8b> 43 2c 85 c0 0f 84 3c e3 ff ff 8b 51 34 8b 71 38 8b 79 3c ff 00
[119529.983163] in.telnetd[616]: segfault at 0 ip 0000555fdfa09a05 sp 00007ffd8fc05380 error 4 in in.telnetd[555fdfa06000+b000]
[119529.995783] Code: 8d 3d c2 76 00 00 e8 9a 2b 00 00 e9 25 fe ff ff be f2 00 00 00 48 8d 3d ac 76 00 00 e8 84 2b 00 00 eb 96 48 8b 05 63 ee 00 00 <0f> b6 00 e9 fa fe ff ff 44 89 ee 48 8d 3d 8c 76 00 00 e8 64 2b 00
[120884.183003] iscons.py[10779]: segfault at 2c ip 00000000080d0dc2 sp 00000000f7702b60 error 4 in python[8048000+11a000]
[120884.195182] Code: 4c 24 08 89 44 24 04 89 3c 24 e8 d9 0b 01 00 8b 55 f0 8b 7d e8 8b 4d ec 8b 85 04 ff ff ff 89 55 84 89 7d 8c 89 4d 88 8b 50 08 <8b> 7a 2c 85 ff 0f 84 3c 10 00 00 8b bd 04 ff ff ff 89 f8 8b 57 34

I've investigated in.telnetd in detail as I was worried if there is some 0-day being used on my system - and as far as I could tell, problem is that part of the .bss turned to zeroes after process did fork/exec - there was NULL in the variable that cannot have NULL as variable is set to non-NULL value during in.telnetd initialization.

iscons crashes are from NULL pointer dereference too, and iscons does lot of fork/exec as well.
Comment 1 Andrew Morton 2019-08-02 20:23:14 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Thu, 01 Aug 2019 22:34:16 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=204407
> 
>             Bug ID: 204407
>            Summary: Bad page state in process Xorg
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 5.3.0-rc2-00013
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Page Allocator
>           Assignee: akpm@linux-foundation.org
>           Reporter: petr@vandrovec.name
>         Regression: No
> 
> Created attachment 284081 [details]
>   --> https://bugzilla.kernel.org/attachment.cgi?id=284081&action=edit
> dmesg
> 
> I've upgraded from 5.3-rc1 to 5.3-rc2, and when I started X server, system
> became unhappy:
> 
> [259701.387365] BUG: Bad page state in process Xorg  pfn:2a300
> [259701.393593] page:ffffea0000a8c000 refcount:0 mapcount:-128
> mapping:0000000000000000 index:0x0
> [259701.402832] flags: 0x2000000000000000()
> [259701.407426] raw: 2000000000000000 ffffffff822ab778 ffffea0000a8f208
> 0000000000000000
> [259701.415900] raw: 0000000000000000 0000000000000003 00000000ffffff7f
> 0000000000000000
> [259701.424373] page dumped because: nonzero mapcount
> [259701.429847] Modules linked in: af_packet xt_REDIRECT nft_compat x_tables
> nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv4 nf_tables
> nfnetlink ppdev parport fuse autofs4 binfmt_misc uinput twofish_generic
> twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common
> camellia_generic camellia_aesni_avx_x86_64 camellia_x86_64 serpent_avx_x86_64
> serpent_sse2_x86_64 serpent_generic blowfish_generic blowfish_x86_64
> blowfish_common cast5_avx_x86_64 cast5_generic cast_common des_generic cmac
> xcbc rmd160 af_key xfrm_algo rpcsec_gss_krb5 nfsv4 nls_iso8859_2 cifs libarc4
> nfsv3 nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ipv6 crc_ccitt
> nf_defrag_ipv6 snd_hda_codec_hdmi pktcdvd coretemp hwmon intel_rapl_common
> iosf_mbi x86_pkg_temp_thermal snd_hda_codec_realtek snd_hda_codec_generic
> ledtrig_audio snd_hda_intel snd_hda_codec crct10dif_pclmul crc32_pclmul
> snd_hwdep crc32c_intel snd_hda_core ghash_clmulni_intel snd_pcm_oss uas
> iTCO_wdt aesni_intel e1000e
> [259701.429873]  snd_mixer_oss iTCO_vendor_support aes_x86_64 snd_pcm
> crypto_simd ptp mei_me dcdbas lpc_ich sr_mod cryptd usb_storage snd_timer
> glue_helper mfd_core input_leds pps_core tpm_tis cdrom i2c_i801 snd mei
> tpm_tis_core sg tpm [last unloaded: parport_pc]
> [259701.539387] CPU: 10 PID: 4860 Comm: Xorg Tainted: G                T
> 5.3.0-rc2-64-00013-g03f05a670a3d #69
> [259701.549382] Hardware name: Dell Inc. Precision T3610/09M8Y8, BIOS A16
> 02/05/2018
> [259701.549382] Call Trace:
> [259701.549382]  dump_stack+0x46/0x60
> [259701.549382]  bad_page.cold.28+0x81/0xb4
> [259701.549382]  __free_pages_ok+0x236/0x240
> [259701.549382]  __ttm_dma_free_page+0x2f/0x40
> [259701.549382]  ttm_dma_unpopulate+0x29b/0x370
> [259701.549382]  ttm_tt_destroy.part.6+0x44/0x50
> [259701.549382]  ttm_bo_cleanup_memtype_use+0x29/0x70
> [259701.549382]  ttm_bo_put+0x225/0x280
> [259701.549382]  ttm_bo_vm_close+0x10/0x20
> [259701.549382]  remove_vma+0x20/0x40
> [259701.549382]  __do_munmap+0x2da/0x420
> [259701.549382]  __vm_munmap+0x66/0xc0
> [259701.549382]  __x64_sys_munmap+0x22/0x30
> [259701.549382]  do_syscall_64+0x5e/0x1a0
> [259701.549382]  ? prepare_exit_to_usermode+0x75/0xa0
> [259701.549382]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [259701.549382] RIP: 0033:0x7f504d0ec1d7
> [259701.549382] Code: 10 e9 67 ff ff ff 0f 1f 44 00 00 48 8b 15 b1 6c 0c 00
> f7
> d8 64 89 02 48 c7 c0 ff ff ff ff e9 6b ff ff ff b8 0b 00 00 00 0f 05 <48> 3d
> 01
> f0 ff ff 73 01 c3 48 8b 0d 89 6c 0c 00 f7 d8 64 89 01 48
> [259701.549382] RSP: 002b:00007ffe529db138 EFLAGS: 00000206 ORIG_RAX:
> 000000000000000b
> [259701.549382] RAX: ffffffffffffffda RBX: 0000564a5eabce70 RCX:
> 00007f504d0ec1d7
> [259701.549382] RDX: 00007ffe529db140 RSI: 0000000000400000 RDI:
> 00007f5044b65000
> [259701.549382] RBP: 0000564a5eafe460 R08: 000000000000000b R09:
> 000000010283e000
> [259701.549382] R10: 0000000000000001 R11: 0000000000000206 R12:
> 0000564a5e475b08
> [259701.549382] R13: 0000564a5e475c80 R14: 00007ffe529db190 R15:
> 0000000000000c80
> [259701.707238] Disabling lock debugging due to kernel taint

I assume the above is misbehaviour in the DRM code?

> 
> Also - maybe related, maybe not - I've got three userspace crashes earlier on
> this kernel (but never before):
> 
> [77154.886836] iscons.py[12441]: segfault at 2c ip 00000000080cf0b5 sp
> 00000000f773fb60 error 4 in python[8048000+11a000]
> [77154.898376] Code: 02 0f 84 4a 2e 00 00 8b 4d 08 8b bd 04 ff ff ff 8b 59 38
> 8b 57 20 8b 7b 10 85 ff 0f 84 ee 22 00 00 8b 8d 04 ff ff ff 8b 59 08 <8b> 43
> 2c
> 85 c0 0f 84 3c e3 ff ff 8b 51 34 8b 71 38 8b 79 3c ff 00
> [119529.983163] in.telnetd[616]: segfault at 0 ip 0000555fdfa09a05 sp
> 00007ffd8fc05380 error 4 in in.telnetd[555fdfa06000+b000]
> [119529.995783] Code: 8d 3d c2 76 00 00 e8 9a 2b 00 00 e9 25 fe ff ff be f2
> 00
> 00 00 48 8d 3d ac 76 00 00 e8 84 2b 00 00 eb 96 48 8b 05 63 ee 00 00 <0f> b6
> 00
> e9 fa fe ff ff 44 89 ee 48 8d 3d 8c 76 00 00 e8 64 2b 00
> [120884.183003] iscons.py[10779]: segfault at 2c ip 00000000080d0dc2 sp
> 00000000f7702b60 error 4 in python[8048000+11a000]
> [120884.195182] Code: 4c 24 08 89 44 24 04 89 3c 24 e8 d9 0b 01 00 8b 55 f0
> 8b
> 7d e8 8b 4d ec 8b 85 04 ff ff ff 89 55 84 89 7d 8c 89 4d 88 8b 50 08 <8b> 7a
> 2c
> 85 ff 0f 84 3c 10 00 00 8b bd 04 ff ff ff 89 f8 8b 57 34
> 
> I've investigated in.telnetd in detail as I was worried if there is some
> 0-day
> being used on my system - and as far as I could tell, problem is that part of
> the .bss turned to zeroes after process did fork/exec - there was NULL in the
> variable that cannot have NULL as variable is set to non-NULL value during
> in.telnetd initialization.
> 
> iscons crashes are from NULL pointer dereference too, and iscons does lot of
> fork/exec as well.

hm, that does sound unrelated.
Comment 2 willy 2019-08-02 20:33:48 UTC
On Fri, Aug 02, 2019 at 01:23:06PM -0700, Andrew Morton wrote:
> > [259701.387365] BUG: Bad page state in process Xorg  pfn:2a300
> > [259701.393593] page:ffffea0000a8c000 refcount:0 mapcount:-128
> > mapping:0000000000000000 index:0x0

mapcount -128 is PAGE_MAPCOUNT_RESERVE, aka PageBuddy.  I think somebody
called put_page() once more than they should have.  The one before this
caused it to be freed to the page allocator, which set PageBuddy.  Then
this one happened and we got a complaint.

> > [259701.402832] flags: 0x2000000000000000()
> > [259701.407426] raw: 2000000000000000 ffffffff822ab778 ffffea0000a8f208
> > 0000000000000000
> > [259701.415900] raw: 0000000000000000 0000000000000003 00000000ffffff7f
> > 0000000000000000
> > [259701.424373] page dumped because: nonzero mapcount

It occurs to me that when a page is freed, we could record some useful bits
of information in the page from the stack trace to help debug double-free 
situations.  Even just stashing __builtin_return_address in page->mapping
would be helpful, I think.

> > [259701.549382] Call Trace:
> > [259701.549382]  dump_stack+0x46/0x60
> > [259701.549382]  bad_page.cold.28+0x81/0xb4
> > [259701.549382]  __free_pages_ok+0x236/0x240
> > [259701.549382]  __ttm_dma_free_page+0x2f/0x40
> > [259701.549382]  ttm_dma_unpopulate+0x29b/0x370
> > [259701.549382]  ttm_tt_destroy.part.6+0x44/0x50
> > [259701.549382]  ttm_bo_cleanup_memtype_use+0x29/0x70
> > [259701.549382]  ttm_bo_put+0x225/0x280
> > [259701.549382]  ttm_bo_vm_close+0x10/0x20
> > [259701.549382]  remove_vma+0x20/0x40
> > [259701.549382]  __do_munmap+0x2da/0x420
> > [259701.549382]  __vm_munmap+0x66/0xc0
> > [259701.549382]  __x64_sys_munmap+0x22/0x30
> > [259701.549382]  do_syscall_64+0x5e/0x1a0
> > [259701.549382]  ? prepare_exit_to_usermode+0x75/0xa0
> > [259701.549382]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [259701.549382] RIP: 0033:0x7f504d0ec1d7
> > [259701.549382] Code: 10 e9 67 ff ff ff 0f 1f 44 00 00 48 8b 15 b1 6c 0c 00
> f7
> > d8 64 89 02 48 c7 c0 ff ff ff ff e9 6b ff ff ff b8 0b 00 00 00 0f 05 <48>
> 3d 01
> > f0 ff ff 73 01 c3 48 8b 0d 89 6c 0c 00 f7 d8 64 89 01 48
> > [259701.549382] RSP: 002b:00007ffe529db138 EFLAGS: 00000206 ORIG_RAX:
> > 000000000000000b
> > [259701.549382] RAX: ffffffffffffffda RBX: 0000564a5eabce70 RCX:
> > 00007f504d0ec1d7
> > [259701.549382] RDX: 00007ffe529db140 RSI: 0000000000400000 RDI:
> > 00007f5044b65000
> > [259701.549382] RBP: 0000564a5eafe460 R08: 000000000000000b R09:
> > 000000010283e000
> > [259701.549382] R10: 0000000000000001 R11: 0000000000000206 R12:
> > 0000564a5e475b08
> > [259701.549382] R13: 0000564a5e475c80 R14: 00007ffe529db190 R15:
> > 0000000000000c80
> > [259701.707238] Disabling lock debugging due to kernel taint
> 
> I assume the above is misbehaviour in the DRM code?
Comment 3 cai 2019-08-02 21:17:36 UTC
On Fri, 2019-08-02 at 13:33 -0700, Matthew Wilcox wrote:
> On Fri, Aug 02, 2019 at 01:23:06PM -0700, Andrew Morton wrote:
> > > [259701.387365] BUG: Bad page state in process Xorg  pfn:2a300
> > > [259701.393593] page:ffffea0000a8c000 refcount:0 mapcount:-128
> > > mapping:0000000000000000 index:0x0
> 
> mapcount -128 is PAGE_MAPCOUNT_RESERVE, aka PageBuddy.  I think somebody
> called put_page() once more than they should have.  The one before this
> caused it to be freed to the page allocator, which set PageBuddy.  Then
> this one happened and we got a complaint.
> 
> > > [259701.402832] flags: 0x2000000000000000()
> > > [259701.407426] raw: 2000000000000000 ffffffff822ab778 ffffea0000a8f208
> > > 0000000000000000
> > > [259701.415900] raw: 0000000000000000 0000000000000003 00000000ffffff7f
> > > 0000000000000000
> > > [259701.424373] page dumped because: nonzero mapcount
> 
> It occurs to me that when a page is freed, we could record some useful bits
> of information in the page from the stack trace to help debug double-free 
> situations.  Even just stashing __builtin_return_address in page->mapping
> would be helpful, I think.

Sounds like need to enable "page_owner", so it will do  __dump_page_owner().

> 
> > > [259701.549382] Call Trace:
> > > [259701.549382]  dump_stack+0x46/0x60
> > > [259701.549382]  bad_page.cold.28+0x81/0xb4
> > > [259701.549382]  __free_pages_ok+0x236/0x240
> > > [259701.549382]  __ttm_dma_free_page+0x2f/0x40
> > > [259701.549382]  ttm_dma_unpopulate+0x29b/0x370
> > > [259701.549382]  ttm_tt_destroy.part.6+0x44/0x50
> > > [259701.549382]  ttm_bo_cleanup_memtype_use+0x29/0x70
> > > [259701.549382]  ttm_bo_put+0x225/0x280
> > > [259701.549382]  ttm_bo_vm_close+0x10/0x20
> > > [259701.549382]  remove_vma+0x20/0x40
> > > [259701.549382]  __do_munmap+0x2da/0x420
> > > [259701.549382]  __vm_munmap+0x66/0xc0
> > > [259701.549382]  __x64_sys_munmap+0x22/0x30
> > > [259701.549382]  do_syscall_64+0x5e/0x1a0
> > > [259701.549382]  ? prepare_exit_to_usermode+0x75/0xa0
> > > [259701.549382]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > [259701.549382] RIP: 0033:0x7f504d0ec1d7
> > > [259701.549382] Code: 10 e9 67 ff ff ff 0f 1f 44 00 00 48 8b 15 b1 6c 0c
> > > 00 f7
> > > d8 64 89 02 48 c7 c0 ff ff ff ff e9 6b ff ff ff b8 0b 00 00 00 0f 05 <48>
> > > 3d 01
> > > f0 ff ff 73 01 c3 48 8b 0d 89 6c 0c 00 f7 d8 64 89 01 48
> > > [259701.549382] RSP: 002b:00007ffe529db138 EFLAGS: 00000206 ORIG_RAX:
> > > 000000000000000b
> > > [259701.549382] RAX: ffffffffffffffda RBX: 0000564a5eabce70 RCX:
> > > 00007f504d0ec1d7
> > > [259701.549382] RDX: 00007ffe529db140 RSI: 0000000000400000 RDI:
> > > 00007f5044b65000
> > > [259701.549382] RBP: 0000564a5eafe460 R08: 000000000000000b R09:
> > > 000000010283e000
> > > [259701.549382] R10: 0000000000000001 R11: 0000000000000206 R12:
> > > 0000564a5e475b08
> > > [259701.549382] R13: 0000564a5e475c80 R14: 00007ffe529db190 R15:
> > > 0000000000000c80
> > > [259701.707238] Disabling lock debugging due to kernel taint
> > 
> > I assume the above is misbehaviour in the DRM code?
> 
>
Comment 4 Petr Vandrovec 2019-08-02 21:25:21 UTC
Andrew Morton wrote on 8/2/2019 1:23 PM:
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Thu, 01 Aug 2019 22:34:16 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:
> 
>> [259701.549382] Code: 10 e9 67 ff ff ff 0f 1f 44 00 00 48 8b 15 b1 6c 0c 00
>> f7
>> d8 64 89 02 48 c7 c0 ff ff ff ff e9 6b ff ff ff b8 0b 00 00 00 0f 05 <48> 3d
>> 01
>> f0 ff ff 73 01 c3 48 8b 0d 89 6c 0c 00 f7 d8 64 89 01 48
>> [259701.549382] RSP: 002b:00007ffe529db138 EFLAGS: 00000206 ORIG_RAX:
>> 000000000000000b
>> [259701.549382] RAX: ffffffffffffffda RBX: 0000564a5eabce70 RCX:
>> 00007f504d0ec1d7
>> [259701.549382] RDX: 00007ffe529db140 RSI: 0000000000400000 RDI:
>> 00007f5044b65000
>> [259701.549382] RBP: 0000564a5eafe460 R08: 000000000000000b R09:
>> 000000010283e000
>> [259701.549382] R10: 0000000000000001 R11: 0000000000000206 R12:
>> 0000564a5e475b08
>> [259701.549382] R13: 0000564a5e475c80 R14: 00007ffe529db190 R15:
>> 0000000000000c80
>> [259701.707238] Disabling lock debugging due to kernel taint
> 
> I assume the above is misbehaviour in the DRM code?

Most probably.

When I switched back to 5.2, crashes were gone, but log was filled with 
11 millions of messages complaining about alloc_contig_range problems:

[22042.108043] alloc_contig_range: [106f118, 106f119) PFNs busy
[22042.114400] alloc_contig_range: [106f11c, 106f11d) PFNs busy
[22042.120787] alloc_contig_range: [106f11d, 106f11e) PFNs busy
[22047.093057] alloc_contig_range: 47963 callbacks suppressed
[22047.093058] alloc_contig_range: [106f117, 106f118) PFNs busy
[22047.105576] alloc_contig_range: [106f118, 106f119) PFNs busy
[22047.111937] alloc_contig_range: [106f11c, 106f11d) PFNs busy
[22047.118329] alloc_contig_range: [106f11d, 106f11e) PFNs busy

In total there is 9735 messages logged individually, and 11 million 
suppressed:

petr-dev3:~$ dmesg | grep alloc_contig.*PFNs | wc -l
9735
petr-dev3:~$ expr `echo \`dmesg | grep alloc_contig.*callbacks | cut -d' 
' -f3\` | sed -e 's/ / + /'g`
11333722

So it could be my problems are caused by new Xorg driver.

After I disabled CMA system is stable on 5.2.  I did not try 5.3-rc2 yet.

Petr
Comment 5 willy 2019-08-02 22:59:44 UTC
On Fri, Aug 02, 2019 at 05:17:30PM -0400, Qian Cai wrote:
> On Fri, 2019-08-02 at 13:33 -0700, Matthew Wilcox wrote:
> > It occurs to me that when a page is freed, we could record some useful bits
> > of information in the page from the stack trace to help debug double-free 
> > situations.  Even just stashing __builtin_return_address in page->mapping
> > would be helpful, I think.
> 
> Sounds like need to enable "page_owner", so it will do  __dump_page_owner().

That doesn't help because we call reset_page_owner() in the free page path.

We could turn on tracing because we call trace_mm_page_free() in this
path.  That requires the reporter to be able to reproduce the problem,
and it's not clear to me whether this is a "happened once" or "every
time I do this, it happens" problem.
Comment 6 Petr Vandrovec 2019-08-02 23:29:35 UTC
Created attachment 284103 [details]
attachment-24570-0.html

On Fri, Aug 2, 2019, 3:59 PM Matthew Wilcox <willy@infradead.org> wrote:

> That doesn't help because we call reset_page_owner() in the free page path.
>
> We could turn on tracing because we call trace_mm_page_free() in this
> path.  That requires the reporter to be able to reproduce the problem,
> and it's not clear to me whether this is a "happened once" or "every
> time I do this, it happens" problem.
>

It happened on 3 of the boots with that kernel.  4th time box either
spontaneously rebooted when X started, or watchdog restarted box shortly
after starting X server.

So I believe I should be able to reproduce it with additional patches or
extra flags enabled.

Petr

>
Comment 7 Vlastimil Babka 2019-08-15 14:43:49 UTC
Created attachment 284435 [details]
0001-mm-page_owner-record-page-owner-for-each-subpage.patch

On 8/3/19 1:29 AM, Petr Vandrovec wrote:
> On Fri, Aug 2, 2019, 3:59 PM Matthew Wilcox <willy@infradead.org
> <mailto:willy@infradead.org>> wrote:
> 
>     That doesn't help because we call reset_page_owner() in the free
>     page path.
> 
>     We could turn on tracing because we call trace_mm_page_free() in this
>     path.  That requires the reporter to be able to reproduce the problem,
>     and it's not clear to me whether this is a "happened once" or "every
>     time I do this, it happens" problem.
> 
> 
> It happened on 3 of the boots with that kernel.  4th time box either
> spontaneously rebooted when X started, or watchdog restarted box shortly
> after starting X server.
> 
> So I believe I should be able to reproduce it with additional patches or
> extra flags enabled.

Does the issue still happen with rc4? Could you apply the 3 attached
patches (work in progress), configure-enable CONFIG_DEBUG_PAGEALLOC and
CONFIG_PAGE_OWNER and boot kernel with debug_pagealloc=on page_owner=on
parameters? That should print stacktraces of allocation and first
freeing (assuming this is a double free).

Vlastimil
Comment 8 Vlastimil Babka 2019-08-15 14:43:50 UTC
Created attachment 284437 [details]
0002-mm-page_owner-keep-owner-info-when-freeing-the-page.patch
Comment 9 Vlastimil Babka 2019-08-15 14:43:50 UTC
Created attachment 284439 [details]
0003-mm-page_owner-debug_pagealloc-save-freeing-stack-tra.patch
Comment 10 Petr Vandrovec 2019-08-15 19:13:59 UTC
Vlastimil Babka wrote on 8/15/2019 7:32 AM:
>
> Does the issue still happen with rc4? Could you apply the 3 attached
> patches (work in progress), configure-enable CONFIG_DEBUG_PAGEALLOC and
> CONFIG_PAGE_OWNER and boot kernel with debug_pagealloc=on page_owner=on
> parameters? That should print stacktraces of allocation and first
> freeing (assuming this is a double free).

Unfortunately -rc4 does not find any my SATA disks due to some 
misunderstanding between AHCI driver and HPT642L adapter (there is no 
device 07:00.1, HPT is single-function device at 07:00.0):

[   18.003015] scsi host6: ahci
[   18.006605] DMAR: DRHD: handling fault status reg 2
[   18.006619] DMAR: [DMA Write] Request device [07:00.1] fault addr 
fffe0000 [fault reason 02] Present bit in context entry is clear
[   18.076616] DMAR: DRHD: handling fault status reg 102
[   18.085910] DMAR: [DMA Write] Request device [07:00.1] fault addr 
fffa0000 [fault reason 02] Present bit in context entry is clear
[   18.100989] DMAR: DRHD: handling fault status reg 202
[   18.110985] DMAR: [DMA Write] Request device [07:00.1] fault addr 
fffe0000 [fault reason 02] Present bit in context entry is clear

With iommu=off disks are visible, but USB keyboard (and other USB 
devices) does not work:

[   18.174802] ehci-pci 0000:00:1a.0: swiotlb buffer is full (sz: 8 
bytes), total 0 (slots), used 0 (slots)
[   18.174804] ehci-pci 0000:00:1a.0: overflow 0x0000000ffdc75ae8+8 of 
DMA mask ffffffff bus mask 0
[   18.174815] WARNING: CPU: 2 PID: 508 at kernel/dma/direct.c:35 
report_addr+0x2e/0x50
[   18.174816] Modules linked in:
[   18.174818] CPU: 2 PID: 508 Comm: kworker/2:1 Tainted: G 
   T 5.3.0-rc4-64-00058-gd717b092e0b2 #77
[   18.174819] Hardware name: Dell Inc. Precision T3610/09M8Y8, BIOS A16 
02/05/2018
[   18.174822] Workqueue: usb_hub_wq hub_event

I'll try to find -rc4 configuration that has enabled debugging and can boot. 


Petr
Comment 11 Vlastimil Babka 2019-08-16 12:48:00 UTC
On 8/15/19 9:13 PM, Petr Vandrovec wrote:
> Vlastimil Babka wrote on 8/15/2019 7:32 AM:
>>
>> Does the issue still happen with rc4? Could you apply the 3 attached
>> patches (work in progress), configure-enable CONFIG_DEBUG_PAGEALLOC and
>> CONFIG_PAGE_OWNER and boot kernel with debug_pagealloc=on page_owner=on
>> parameters? That should print stacktraces of allocation and first
>> freeing (assuming this is a double free).
> 
> Unfortunately -rc4 does not find any my SATA disks due to some 
> misunderstanding between AHCI driver and HPT642L adapter (there is no 
> device 07:00.1, HPT is single-function device at 07:00.0):
> 
> [   18.003015] scsi host6: ahci
> [   18.006605] DMAR: DRHD: handling fault status reg 2
> [   18.006619] DMAR: [DMA Write] Request device [07:00.1] fault addr 
> fffe0000 [fault reason 02] Present bit in context entry is clear
> [   18.076616] DMAR: DRHD: handling fault status reg 102
> [   18.085910] DMAR: [DMA Write] Request device [07:00.1] fault addr 
> fffa0000 [fault reason 02] Present bit in context entry is clear
> [   18.100989] DMAR: DRHD: handling fault status reg 202
> [   18.110985] DMAR: [DMA Write] Request device [07:00.1] fault addr 
> fffe0000 [fault reason 02] Present bit in context entry is clear

Worth reporting as well, not nice regression.

> With iommu=off disks are visible, but USB keyboard (and other USB 
> devices) does not work:

I've been told iommu=soft might help.

> [   18.174802] ehci-pci 0000:00:1a.0: swiotlb buffer is full (sz: 8 
> bytes), total 0 (slots), used 0 (slots)
> [   18.174804] ehci-pci 0000:00:1a.0: overflow 0x0000000ffdc75ae8+8 of 
> DMA mask ffffffff bus mask 0
> [   18.174815] WARNING: CPU: 2 PID: 508 at kernel/dma/direct.c:35 
> report_addr+0x2e/0x50
> [   18.174816] Modules linked in:
> [   18.174818] CPU: 2 PID: 508 Comm: kworker/2:1 Tainted: G 
>    T 5.3.0-rc4-64-00058-gd717b092e0b2 #77
> [   18.174819] Hardware name: Dell Inc. Precision T3610/09M8Y8, BIOS A16 
> 02/05/2018
> [   18.174822] Workqueue: usb_hub_wq hub_event
> 
> I'll try to find -rc4 configuration that has enabled debugging and can boot. 
> 
> 
> Petr
> 
>
Comment 12 jroedel 2019-08-16 13:03:50 UTC
On Fri, Aug 16, 2019 at 02:47:53PM +0200, Vlastimil Babka wrote:
> On 8/15/19 9:13 PM, Petr Vandrovec wrote:
> > [   18.110985] DMAR: [DMA Write] Request device [07:00.1] fault addr 
> > fffe0000 [fault reason 02] Present bit in context entry is clear
> 
> Worth reporting as well, not nice regression.

Is that a regression between 5.3-rc3 and 5.3-rc4 or is it already broken
since -rc1? The 5.3-rc5 kernel will contains some VT-d fixes that are
worth a try here too. If you can test latest linus/master branch that
would be great, otherwise -rc5 is fine too.


Regards,

	Joerg
Comment 13 Petr Vandrovec 2019-08-17 00:20:52 UTC
Vlastimil Babka wrote on 8/16/2019 5:47 AM:
> On 8/15/19 9:13 PM, Petr Vandrovec wrote:
>> With iommu=off disks are visible, but USB keyboard (and other USB
>> devices) does not work:
> 
> I've been told iommu=soft might help.

Thanks.  I've rebuilt kernel without IOMMU.

Unfortunately I was not able to reproduce original problem with latest 
kernel - neither with CMA nor without CMA.  System seems stable as a rock.

This is the change on which I've tried to reproduce it with your 
debugging patches:

commit 41de59634046b19cd53a1983594a95135c656997 (HEAD -> master, 
origin/master, origin/HEAD)
Merge: e22a97a2a85d 1ee1119d184b
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Wed Aug 14 15:29:53 2019 -0700

     Merge tag 'Wimplicit-fallthrough-5.3-rc5' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux

						Petr
Comment 14 Vlastimil Babka 2019-08-19 14:44:21 UTC
On 8/17/19 2:20 AM, Petr Vandrovec wrote:
> Vlastimil Babka wrote on 8/16/2019 5:47 AM:
>> On 8/15/19 9:13 PM, Petr Vandrovec wrote:
>>> With iommu=off disks are visible, but USB keyboard (and other USB
>>> devices) does not work:
>>
>> I've been told iommu=soft might help.
> 
> Thanks.  I've rebuilt kernel without IOMMU.
> 
> Unfortunately I was not able to reproduce original problem with latest 
> kernel - neither with CMA nor without CMA.  System seems stable as a rock.

That's rather fortunate I would say :) But if you have any unrelated
regressions in 5.3, please report them accordingly.

> This is the change on which I've tried to reproduce it with your 
> debugging patches:
> 
> commit 41de59634046b19cd53a1983594a95135c656997 (HEAD -> master, 
> origin/master, origin/HEAD)
> Merge: e22a97a2a85d 1ee1119d184b
> Author: Linus Torvalds <torvalds@linux-foundation.org>
> Date:   Wed Aug 14 15:29:53 2019 -0700
> 
>      Merge tag 'Wimplicit-fallthrough-5.3-rc5' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
> 
>                                               Petr
>