Bug 203849

Summary: 5.1.7: Oops unable to handle kernel paging request RIP: 0010:compaction_alloc+0x53b/0x890
Product: Memory Management Reporter: GYt2bW (howaboutsynergy)
Component: Page AllocatorAssignee: Andrew Morton (akpm)
Status: RESOLVED CODE_FIX    
Severity: high CC: akpm, aryabinin, howaboutsynergy, mgorman, vbabka
Priority: P1    
Hardware: x86-64   
OS: Linux   
See Also: https://bugzilla.kernel.org/show_bug.cgi?id=203735
Kernel Version: 5.1.7-g2f7d9d47575e Tree: Mainline
Regression: No
Attachments: .config

Description GYt2bW 2019-06-08 07:44:25 UTC
This seems to happen sometimes when compiling Rust - possibly caused by rustc process?

```
$ crash_kernel_read 
Not already root, re-executing myself as root by using sudo(required!)...

/usr/bin/makedumpfile is owned by makedumpfile 1.6.5-1

crash 7.2.6
Copyright (C) 2002-2019  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.
 
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

WARNING: kernel relocated [720MB]: patching 75391 gdb minimal_symbol values

      KERNEL: /usr/lib/modules/5.1.7-g2f7d9d47575e/build/vmlinux       
    DUMPFILE: /var/crash/crashdump-2019-06-08-09:08:23  [PARTIAL DUMP]
        CPUS: 6
        DATE: Sat Jun  8 09:07:50 2019
      UPTIME: 00:40:33
LOAD AVERAGE: 4.74, 5.80, 5.45
       TASKS: 728
    NODENAME: i87k
     RELEASE: 5.1.7-g2f7d9d47575e
     VERSION: #31 SMP Fri Jun 7 00:10:52 CEST 2019
     MACHINE: x86_64  (3700 Mhz)
      MEMORY: 31.9 GB
       PANIC: "Oops: 0000 [#1] SMP PTI" (check log for details)
         PID: 25124
     COMMAND: "rustc"
        TASK: ffff968413869e40  [THREAD_INFO: ffff968413869e40]
         CPU: 2
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 25124  TASK: ffff968413869e40  CPU: 2   COMMAND: "rustc"
 #0 [ffffb6eecf1a7660] machine_kexec at ffffffffae03400b
 #1 [ffffb6eecf1a76a8] __crash_kexec at ffffffffae124bd8
 #2 [ffffb6eecf1a7770] crash_kexec at ffffffffae125a08
 #3 [ffffb6eecf1a7788] oops_end at ffffffffae011866
 #4 [ffffb6eecf1a77a8] no_context at ffffffffae03bdb7
 #5 [ffffb6eecf1a7848] do_page_fault at ffffffffae03c7bb
 #6 [ffffb6eecf1a7870] page_fault at ffffffffae800dfe
    [exception RIP: compaction_alloc+1339]
    RIP: ffffffffae1b47eb  RSP: ffffb6eecf1a7928  RFLAGS: 00010286
    RAX: 0000000000000001  RBX: ffffb6eecf1a7b00  RCX: 0000000000000001
    RDX: 80000000000ffe00  RSI: 0000000000000000  RDI: 000000000000003c
    RBP: 80000000000ffe00   R8: 0000000000000000   R9: 0000000000000034
    R10: ffffe632c3ff8000  R11: ffffb6eecf1a7980  R12: 8000000000100000
    R13: 0000000000000001  R14: ffffe632c3ff8000  R15: ffff96850dfded00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
 #7 [ffffb6eecf1a79c8] migrate_pages at ffffffffae1fa9a2
 #8 [ffffb6eecf1a7a48] compact_zone at ffffffffae1b62f0
 #9 [ffffb6eecf1a7ae8] compact_zone_order at ffffffffae1b67c3
#10 [ffffb6eecf1a7ba8] try_to_compact_pages at ffffffffae1b6fa4
#11 [ffffb6eecf1a7bf8] __alloc_pages_direct_compact at ffffffffae193f72
#12 [ffffb6eecf1a7c50] __alloc_pages_slowpath at ffffffffae1945f0
#13 [ffffb6eecf1a7d40] __alloc_pages_nodemask at ffffffffae194fa8
#14 [ffffb6eecf1a7da0] do_huge_pmd_anonymous_page at ffffffffae1fd37c
#15 [ffffb6eecf1a7df0] __handle_mm_fault at ffffffffae1c0e0d
#16 [ffffb6eecf1a7ea0] handle_mm_fault at ffffffffae1c152f
#17 [ffffb6eecf1a7ec8] __do_page_fault at ffffffffae03c522
#18 [ffffb6eecf1a7f28] do_page_fault at ffffffffae03c7bb
#19 [ffffb6eecf1a7f50] page_fault at ffffffffae800dfe
    RIP: 00007cd640afabbc  RSP: 00007cd633fedc50  RFLAGS: 00010206
    RAX: 00007cd5b02fe020  RBX: 00007cd602d01020  RCX: 00007cd5b00fe010
    RDX: 00000000001fffff  RSI: 0000000000d02000  RDI: 0000000000000010
    RBP: 000000000000ffff   R8: 0000000000000067   R9: 00007cd601fff010
    R10: 0000000000000000  R11: 0000000000000246  R12: 00007cd5fd1f81b0
    R13: 00007cd602d01020  R14: 00000000fffffdd2  R15: 00007cd601f67120
    ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b
crash> sys
      KERNEL: /usr/lib/modules/5.1.7-g2f7d9d47575e/build/vmlinux
    DUMPFILE: /var/crash/crashdump-2019-06-08-09:08:23  [PARTIAL DUMP]
        CPUS: 6
        DATE: Sat Jun  8 09:07:50 2019
      UPTIME: 00:40:33
LOAD AVERAGE: 4.74, 5.80, 5.45
       TASKS: 728
    NODENAME: i87k
     RELEASE: 5.1.7-g2f7d9d47575e
     VERSION: #31 SMP Fri Jun 7 00:10:52 CEST 2019
     MACHINE: x86_64  (3700 Mhz)
      MEMORY: 31.9 GB
       PANIC: "Oops: 0000 [#1] SMP PTI" (check log for details)
crash> bt
PID: 25124  TASK: ffff968413869e40  CPU: 2   COMMAND: "rustc"
 #0 [ffffb6eecf1a7660] machine_kexec at ffffffffae03400b
 #1 [ffffb6eecf1a76a8] __crash_kexec at ffffffffae124bd8
 #2 [ffffb6eecf1a7770] crash_kexec at ffffffffae125a08
 #3 [ffffb6eecf1a7788] oops_end at ffffffffae011866
 #4 [ffffb6eecf1a77a8] no_context at ffffffffae03bdb7
 #5 [ffffb6eecf1a7848] do_page_fault at ffffffffae03c7bb
 #6 [ffffb6eecf1a7870] page_fault at ffffffffae800dfe
    [exception RIP: compaction_alloc+1339]
    RIP: ffffffffae1b47eb  RSP: ffffb6eecf1a7928  RFLAGS: 00010286
    RAX: 0000000000000001  RBX: ffffb6eecf1a7b00  RCX: 0000000000000001
    RDX: 80000000000ffe00  RSI: 0000000000000000  RDI: 000000000000003c
    RBP: 80000000000ffe00   R8: 0000000000000000   R9: 0000000000000034
    R10: ffffe632c3ff8000  R11: ffffb6eecf1a7980  R12: 8000000000100000
    R13: 0000000000000001  R14: ffffe632c3ff8000  R15: ffff96850dfded00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
 #7 [ffffb6eecf1a79c8] migrate_pages at ffffffffae1fa9a2
 #8 [ffffb6eecf1a7a48] compact_zone at ffffffffae1b62f0
 #9 [ffffb6eecf1a7ae8] compact_zone_order at ffffffffae1b67c3
#10 [ffffb6eecf1a7ba8] try_to_compact_pages at ffffffffae1b6fa4
#11 [ffffb6eecf1a7bf8] __alloc_pages_direct_compact at ffffffffae193f72
#12 [ffffb6eecf1a7c50] __alloc_pages_slowpath at ffffffffae1945f0
#13 [ffffb6eecf1a7d40] __alloc_pages_nodemask at ffffffffae194fa8
#14 [ffffb6eecf1a7da0] do_huge_pmd_anonymous_page at ffffffffae1fd37c
#15 [ffffb6eecf1a7df0] __handle_mm_fault at ffffffffae1c0e0d
#16 [ffffb6eecf1a7ea0] handle_mm_fault at ffffffffae1c152f
#17 [ffffb6eecf1a7ec8] __do_page_fault at ffffffffae03c522
#18 [ffffb6eecf1a7f28] do_page_fault at ffffffffae03c7bb
#19 [ffffb6eecf1a7f50] page_fault at ffffffffae800dfe
    RIP: 00007cd640afabbc  RSP: 00007cd633fedc50  RFLAGS: 00010206
    RAX: 00007cd5b02fe020  RBX: 00007cd602d01020  RCX: 00007cd5b00fe010
    RDX: 00000000001fffff  RSI: 0000000000d02000  RDI: 0000000000000010
    RBP: 000000000000ffff   R8: 0000000000000067   R9: 00007cd601fff010
    R10: 0000000000000000  R11: 0000000000000246  R12: 00007cd5fd1f81b0
    R13: 00007cd602d01020  R14: 00000000fffffdd2  R15: 00007cd601f67120
    ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b
crash> log
[    0.000000] microcode: microcode updated early to revision 0xb4, date = 2019-04-01
[    0.000000] Linux version 5.1.7-g2f7d9d47575e (user@i87k) (gcc version 8.3.0 (GCC)) #31 SMP Fri Jun 7 00:10:52 CEST 2019
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-linux-stable root=UUID=2b8b9ab8-7ac5-4586-aa42-d7ffb12de92a rw root_trim=yes rd.luks.allow-discards rd.luks.options=discard ipv6.disable=1 ipv6.disable_ipv6=1 ipv6.autoconf=0 loglevel=15 log_buf_len=16M ignore_loglevel printk.always_kmsg_dump=y printk.time=y printk.devkmsg=on mminit_loglevel=4 memory_corruption_check=1 fbcon=scrollback:4096k fbcon=font:ProFont6x11 net.ifnames=0 nolvm dobtrfs console=tty1 earlyprintk=vga audit=0 systemd.log_target=kmsg systemd.journald.forward_to_console=1 enforcing=0 udev.children-max=1256 rd.udev.children-max=1256 nohz=on oops=panic crashkernel=256M panic=0 psi=1 sysrq_always_enabled random.trust_cpu=off logo.nologo lpj=0 mce=bootlog reboot=force,cold noexec=on nohibernate scsi_mod.use_blk_mq=1 consoleblank=120 mitigations=auto,nosmt l1tf=full,force spec_store_bypass_disable=auto spectre_v2=auto spectre_v2_user=auto mds=full,nosmt rd.log=all noefi cpuidle.governor=teo zram.num_devices=3 zswap.enabled=0
...
...skipping...
[ 2389.133343] i2c i2c-2: NAK from device addr 0x50 msg #0
[ 2389.137459] i2c i2c-1: NAK from device addr 0x50 msg #0
[ 2389.141642] i2c i2c-3: NAK from device addr 0x50 msg #0
[ 2432.660116] gpg-agent[1125]: handler 0x74e5c195d700 for fd 10 started
[ 2432.730809] gpg-agent[1125]: handler 0x74e5c195d700 for fd 10 terminated
[ 2434.126414] BUGGY: unable to handle kernel paging request at ffffe632c3ff8030
[ 2434.126415] #PF error: [normal kernel read fault]
[ 2434.126416] PGD 82dfd5067 P4D 82dfd5067 PUD 82dfd4067 PMD 0 
[ 2434.126418] Oops: 0000 [#1] SMP PTI
[ 2434.126419] CPU: 2 PID: 25124 Comm: rustc Kdump: loaded Tainted: G     U            5.1.7-g2f7d9d47575e #31
[ 2434.126420] Hardware name: System manufacturer System Product Name/PRIME Z370-A, BIOS 1002 07/02/2018
[ 2434.126422] RIP: 0010:compaction_alloc+0x53b/0x890
[ 2434.126423] Code: 1f 41 83 c5 01 4c 39 f5 0f 82 5e 01 00 00 4c 89 34 24 eb 76 49 89 ea 49 c1 e2 06 4c 03 15 75 f3 d0 00 4d 89 d6 4d 85 f6 74 44 <41> 8b 46 30 25 80 00 00 f0 3d 00 00 00 f0 0f 84 ff 00 00 00 80 7b
[ 2434.126423] RSP: 0000:ffffb6eecf1a7928 EFLAGS: 00010286
[ 2434.126424] RAX: 0000000000000001 RBX: ffffb6eecf1a7b00 RCX: 0000000000000001
[ 2434.126425] RDX: 80000000000ffe00 RSI: 0000000000000000 RDI: 000000000000003c
[ 2434.126425] RBP: 80000000000ffe00 R08: 0000000000000000 R09: 0000000000000034
[ 2434.126426] R10: ffffe632c3ff8000 R11: ffffb6eecf1a7980 R12: 8000000000100000
[ 2434.126426] R13: 0000000000000001 R14: ffffe632c3ff8000 R15: ffff96850dfded00
[ 2434.126427] FS:  00007cd633fff700(0000) GS:ffff9684eda80000(0000) knlGS:0000000000000000
[ 2434.126428] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2434.126428] CR2: ffffe632c3ff8030 CR3: 00000007ae5a8005 CR4: 00000000003606e0
[ 2434.126429] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2434.126429] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2434.126430] Call Trace:
[ 2434.126432]  migrate_pages+0x112/0xa00
[ 2434.126433]  ? isolate_freepages_block+0x330/0x330
[ 2434.126434]  ? move_freelist_tail+0xd0/0xd0
[ 2434.126435]  compact_zone+0x6b0/0xab0
[ 2434.126436]  compact_zone_order+0xd3/0x110
[ 2434.126437]  ? psi_task_change+0xe2/0x210
[ 2434.126438]  try_to_compact_pages+0x164/0x220
[ 2434.126439]  __alloc_pages_direct_compact+0x82/0x170
[ 2434.126440]  __alloc_pages_slowpath+0x430/0xb70
[ 2434.126441]  __alloc_pages_nodemask+0x278/0x2c0
[ 2434.126442]  do_huge_pmd_anonymous_page+0x12c/0x5e0
[ 2434.126444]  __handle_mm_fault+0xbed/0x1250
[ 2434.126445]  handle_mm_fault+0xbf/0x1e0
[ 2434.126446]  __do_page_fault+0x242/0x490
[ 2434.126448]  ? page_fault+0x8/0x30
[ 2434.126449]  do_page_fault+0x1b/0x5e
[ 2434.126449]  page_fault+0x1e/0x30
[ 2434.126450] RIP: 0033:0x7cd640afabbc
[ 2434.126451] Code: 74 dc 41 f7 d6 eb 31 49 c1 e8 39 48 8d 46 f0 48 21 d0 44 88 04 31 44 88 44 01 10 48 8b 44 24 40 48 c1 e6 05 c4 c1 7e 6f 45 00 <c5> fe 7f 04 30 4c 89 cd 66 45 85 f6 74 a6 48 8b 7c 24 60 c5 f8 77
[ 2434.126452] RSP: 002b:00007cd633fedc50 EFLAGS: 00010206
[ 2434.126452] RAX: 00007cd5b02fe020 RBX: 00007cd602d01020 RCX: 00007cd5b00fe010
[ 2434.126453] RDX: 00000000001fffff RSI: 0000000000d02000 RDI: 0000000000000010
[ 2434.126453] RBP: 000000000000ffff R08: 0000000000000067 R09: 00007cd601fff010
[ 2434.126454] R10: 0000000000000000 R11: 0000000000000246 R12: 00007cd5fd1f81b0
[ 2434.126454] R13: 00007cd602d01020 R14: 00000000fffffdd2 R15: 00007cd601f67120
[ 2434.126455] Modules linked in: xt_comment msr xt_TCPMSS iptable_mangle iptable_security iptable_nat nf_nat iptable_raw nf_log_ipv4 nf_log_common xt_owner xt_LOG xt_connlimit nf_conncount xt_conntrack nf_conntrack nf_defrag_ipv4 xt_hashlimit xt_multiport xt_addrtype snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp i915 crct10dif_pclmul crc32_pclmul crc32c_intel i2c_algo_bit drm_kms_helper snd_hda_intel snd_hda_codec syscopyarea sysfillrect sysimgblt ghash_clmulni_intel snd_hwdep fb_sys_fops iTCO_wdt intel_cstate snd_hda_core drm iTCO_vendor_support intel_uncore snd_pcm intel_rapl_perf snd_timer pcspkr mei_me mq_deadline snd drm_panel_orientation_quirks e1000e soundcore i2c_i801 mei xhci_pci xhci_hcd
[ 2434.126466] CR2: ffffe632c3ff8030
```

I have the crash dump so i can run other commands if more info is wanted. Please let me know.
I'm on ArchLinux.

```
$ lscpu
Architecture:                     x86_64
CPU op-mode(s):                   32-bit, 64-bit
Byte Order:                       Little Endian
Address sizes:                    39 bits physical, 48 bits virtual
CPU(s):                           6
On-line CPU(s) list:              0-5
Thread(s) per core:               1
Core(s) per socket:               6
Socket(s):                        1
NUMA node(s):                     1
Vendor ID:                        GenuineIntel
CPU family:                       6
Model:                            158
Model name:                       Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
Stepping:                         10
CPU MHz:                          800.016
CPU max MHz:                      4700.0000
CPU min MHz:                      800.0000
BogoMIPS:                         7392.00
Virtualization:                   VT-x
L1d cache:                        192 KiB
L1i cache:                        192 KiB
L2 cache:                         1.5 MiB
L3 cache:                         12 MiB
NUMA node0 CPU(s):                0-5
Vulnerability L1tf:               Mitigation; PTE Inversion
Vulnerability Mds:                Mitigation; Clear CPU buffers; SMT disabled
Vulnerability Meltdown:           Mitigation; PTI
Vulnerability Spec store bypass:  Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:         Mitigation; __user pointer sanitization
Vulnerability Spectre v2:         Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, RSB filling
Flags:                            fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pd
                                  pe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmul
                                  qdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
                                   xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority
                                   ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec
                                   xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d
```
Comment 1 GYt2bW 2019-06-08 07:54:48 UTC
Initially hit and documented here: https://gist.github.com/howaboutsynergy/c69f4a44ad10f7cce48c1544266e43f6
and here:https://bugzilla.kernel.org/show_bug.cgi?id=203735

but I didn't have crash dump then so I thought it was caused by something else!

The patches that I had in kernel are listed(ie. their file names) here:
https://github.com/howaboutsynergy/q1q/blob/fb691dfbf4d56065bcee061d25d90ccf498485ed/OSes/archlinux/home/user/build/1packages/4used/kernel/linuxgit/PKGBUILD#L80-L148
(and located in the same dir as that PKGBUILD)

But the following PKGBUILD was used (because this was linux-stable kernel 5.1.7, based on PKGBUILD/patches files for the linuxgit mentioned above): https://github.com/howaboutsynergy/q1q/blob/fb691dfbf4d56065bcee061d25d90ccf498485ed/OSes/archlinux/home/user/build/1packages/4used/kernel/linux-stable/PKGBUILD
Comment 2 GYt2bW 2019-06-08 08:08:45 UTC
This issue was likely present since at least kernel [5.1.5-g835365932f0d](https://gist.github.com/howaboutsynergy/c69f4a44ad10f7cce48c1544266e43f6#gistcomment-2927872) (assuming the same crash happened the first time I've encountered this issue when I didn't have ability to get crash dump; the current crash, in OP, being the second time it happened since around 10May2019 when I've installed archlinux on this system)
Comment 3 GYt2bW 2019-06-08 09:23:43 UTC
Created attachment 283151 [details]
.config
Comment 4 GYt2bW 2019-06-08 09:33:45 UTC
that was .config used for kernel 5.1.7 (got via zcat /proc/config.gz)
looks like there was some leftover like
CONFIG_BUILD_SALT="4.19.15-300.fc29.x86_64"
because I "imported" it from Qubes OS Fedora 29 a while ago.
Comment 5 GYt2bW 2019-06-08 09:44:01 UTC
I wonder if this fixes it:
```
commit e577c8b64d58fe307ea4d5149d31615df2d90861
Date:   Fri May 31 22:30:59 2019 -0700
```

aka https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e577c8b64d58fe307ea4d5149d31615df2d90861

because I did not have that commit in [5.1.7](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tag/?h=v5.1.7)
Comment 6 GYt2bW 2019-06-08 09:47:26 UTC
I have swap enabled, in zram:
```
$ swapon
NAME       TYPE      SIZE USED PRIO
/dev/zram0 partition  64G   0B   -2

$ swapon -s
Filename				Type		Size	Used	Priority
/dev/zram0                             	partition	67108860	0	-2

$ zramctl /dev/zram0
NAME       ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 zstd           64G   4K   63B    4K       6 [SWAP]

```

I'm going to apply that patch on top of 5.1.7 ... since it's so simple:

```patch
diff --git a/mm/compaction.c b/mm/compaction.c
index 9febc8cc84e7..9e1b9acb116b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1399,7 +1399,7 @@ fast_isolate_freepages(struct compact_control *cc)
        page = pfn_to_page(highest);
        cc->free_pfn = highest;
      } else {
-       if (cc->direct_compaction) {
+       if (cc->direct_compaction && pfn_valid(min_pfn)) {
          page = pfn_to_page(min_pfn);
          cc->free_pfn = min_pfn;
        }
```
Comment 7 GYt2bW 2019-06-08 09:50:05 UTC
ok something is messed up on bugzilla, the above two comments are definitely not what I posted, their contents got messed up! what teh!!!!
Comment 8 GYt2bW 2019-06-08 10:20:00 UTC
made https://bugzilla.mozilla.org/show_bug.cgi?id=1557932 about that.
Comment 9 GYt2bW 2019-06-08 11:07:31 UTC
crash> sym ffffffffae1b47eb
ffffffffae1b47eb (t) compaction_alloc+1339 /home/user/build/1packages/4used/kernel/linux-stable/makepkg_pacman/linux-stable/src/linux-stable/./include/linux/page-flags.h: 735

shows it's this line:
  /*
   * PageBuddy() indicates that the page is free and in the buddy system
   * (see mm/page_alloc.c).
   */
  PAGE_TYPE_OPS(Buddy, buddy)  //this line

Anyway, I applied patch mentioned in Comment 5 and if this happens again I'll update.
Comment 10 GYt2bW 2019-06-08 20:19:17 UTC
recompiling kernel with
changed CONFIG_PAGE_POISONING_ZERO=y to =n
maybe that would help?!?? and added `page_poison=1` to /proc/cmdline

Note that I'm still able to execute `crash` commands on the crashdump due to having saved the debugging kernel image(and all other stuff even) so if anyone wanted more info related to the OP crash, just ask.
Comment 11 GYt2bW 2019-06-09 07:27:25 UTC
I'm switching to kernel [5.1.8](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tag/?h=v5.1.8)

just got released 7mins ago
Comment 12 GYt2bW 2019-06-11 13:49:07 UTC
I'm switching to [5.1.9](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tag/?h=v5.1.9) which was released 3 hours ago.

since the prev. comment, I've tried recompiling rustc multiple times, almost for fun, but the issue didn't trigger yet.... assuming it didn't already get fixed by [commit](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e577c8b64d58fe307ea4d5149d31615df2d90861) which was present since 5.1.8

But then again it did take 7 days to trigger again(for the second time) last time (since kernel 5.1.5, to 5.1.7) so this might be why: it's not easy to hit it.
Comment 13 GYt2bW 2019-06-11 13:59:47 UTC
hey now, I've just looked at log for stable kernel 5.1.y branch for `mm/compaction.c`:  
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/mm/compaction.c?h=linux-5.1.y
compared to the same log for the git kernel:  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/mm/compaction.c

and I notice that at least one commit isn't present in that stable kernel:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/compaction.c?id=dd7ef7bd14640f11763b54f55131000165f48321

so how am I to know it isn't already fixed in git kernel, but still not yet fixed in the stable kernel ?

oh well, ignorance is bliss :D
Comment 14 mgorman 2019-06-12 13:08:38 UTC
On Tue, Jun 11, 2019 at 01:59:47PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> and I notice that at least one commit isn't present in that stable kernel:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/compaction.c?id=dd7ef7bd14640f11763b54f55131000165f48321
> 
> so how am I to know it isn't already fixed in git kernel, but still not yet
> fixed in the stable kernel ?
> 
> oh well, ignorance is bliss :D
> 

Don't worry about that one. It's warning that the shift may not have
meaning because the shift value is too large. However, in this specific
case, the result is 0 which is valid behaviour for this code path. The
warning is meant to catch things like a large type being accidentally
cast to a small type and shifted by a large value. The consequences can
be that the upper bits are unexpectedly lost. In this particular code
path, we don't care. It's a cosmetic fix for the most part, no
functional impact.
Comment 15 GYt2bW 2019-06-15 16:08:47 UTC
Thanks! That commit is in [5.1.10](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tag/?h=v5.1.10), released 6 hours ago, I'm switching to it in a few mins!
Comment 16 GYt2bW 2019-06-17 19:25:03 UTC
ok I'm closing this, will reopen if it really happens again.
Meanwhile I'll keep switching to latest kernel stable (5.1.11 released 85mins ago)

I'm assuming it got fixed by this a while ago: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e577c8b64d58fe307ea4d5149d31615df2d90861