Bug 203849
Summary: | 5.1.7: Oops unable to handle kernel paging request RIP: 0010:compaction_alloc+0x53b/0x890 | ||
---|---|---|---|
Product: | Memory Management | Reporter: | GYt2bW (howaboutsynergy) |
Component: | Page Allocator | Assignee: | Andrew Morton (akpm) |
Status: | RESOLVED CODE_FIX | ||
Severity: | high | CC: | akpm, aryabinin, howaboutsynergy, mgorman, vbabka |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
See Also: | https://bugzilla.kernel.org/show_bug.cgi?id=203735 | ||
Kernel Version: | 5.1.7-g2f7d9d47575e | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | .config |
Description
GYt2bW
2019-06-08 07:44:25 UTC
Initially hit and documented here: https://gist.github.com/howaboutsynergy/c69f4a44ad10f7cce48c1544266e43f6 and here:https://bugzilla.kernel.org/show_bug.cgi?id=203735 but I didn't have crash dump then so I thought it was caused by something else! The patches that I had in kernel are listed(ie. their file names) here: https://github.com/howaboutsynergy/q1q/blob/fb691dfbf4d56065bcee061d25d90ccf498485ed/OSes/archlinux/home/user/build/1packages/4used/kernel/linuxgit/PKGBUILD#L80-L148 (and located in the same dir as that PKGBUILD) But the following PKGBUILD was used (because this was linux-stable kernel 5.1.7, based on PKGBUILD/patches files for the linuxgit mentioned above): https://github.com/howaboutsynergy/q1q/blob/fb691dfbf4d56065bcee061d25d90ccf498485ed/OSes/archlinux/home/user/build/1packages/4used/kernel/linux-stable/PKGBUILD This issue was likely present since at least kernel [5.1.5-g835365932f0d](https://gist.github.com/howaboutsynergy/c69f4a44ad10f7cce48c1544266e43f6#gistcomment-2927872) (assuming the same crash happened the first time I've encountered this issue when I didn't have ability to get crash dump; the current crash, in OP, being the second time it happened since around 10May2019 when I've installed archlinux on this system) Created attachment 283151 [details]
.config
that was .config used for kernel 5.1.7 (got via zcat /proc/config.gz) looks like there was some leftover like CONFIG_BUILD_SALT="4.19.15-300.fc29.x86_64" because I "imported" it from Qubes OS Fedora 29 a while ago. I wonder if this fixes it: ``` commit e577c8b64d58fe307ea4d5149d31615df2d90861 Date: Fri May 31 22:30:59 2019 -0700 ``` aka https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e577c8b64d58fe307ea4d5149d31615df2d90861 because I did not have that commit in [5.1.7](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tag/?h=v5.1.7) I have swap enabled, in zram: ``` $ swapon NAME TYPE SIZE USED PRIO /dev/zram0 partition 64G 0B -2 $ swapon -s Filename Type Size Used Priority /dev/zram0 partition 67108860 0 -2 $ zramctl /dev/zram0 NAME ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT /dev/zram0 zstd 64G 4K 63B 4K 6 [SWAP] ``` I'm going to apply that patch on top of 5.1.7 ... since it's so simple: ```patch diff --git a/mm/compaction.c b/mm/compaction.c index 9febc8cc84e7..9e1b9acb116b 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1399,7 +1399,7 @@ fast_isolate_freepages(struct compact_control *cc) page = pfn_to_page(highest); cc->free_pfn = highest; } else { - if (cc->direct_compaction) { + if (cc->direct_compaction && pfn_valid(min_pfn)) { page = pfn_to_page(min_pfn); cc->free_pfn = min_pfn; } ``` ok something is messed up on bugzilla, the above two comments are definitely not what I posted, their contents got messed up! what teh!!!! made https://bugzilla.mozilla.org/show_bug.cgi?id=1557932 about that. crash> sym ffffffffae1b47eb ffffffffae1b47eb (t) compaction_alloc+1339 /home/user/build/1packages/4used/kernel/linux-stable/makepkg_pacman/linux-stable/src/linux-stable/./include/linux/page-flags.h: 735 shows it's this line: /* * PageBuddy() indicates that the page is free and in the buddy system * (see mm/page_alloc.c). */ PAGE_TYPE_OPS(Buddy, buddy) //this line Anyway, I applied patch mentioned in Comment 5 and if this happens again I'll update. recompiling kernel with changed CONFIG_PAGE_POISONING_ZERO=y to =n maybe that would help?!?? and added `page_poison=1` to /proc/cmdline Note that I'm still able to execute `crash` commands on the crashdump due to having saved the debugging kernel image(and all other stuff even) so if anyone wanted more info related to the OP crash, just ask. I'm switching to kernel [5.1.8](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tag/?h=v5.1.8) just got released 7mins ago I'm switching to [5.1.9](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tag/?h=v5.1.9) which was released 3 hours ago. since the prev. comment, I've tried recompiling rustc multiple times, almost for fun, but the issue didn't trigger yet.... assuming it didn't already get fixed by [commit](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e577c8b64d58fe307ea4d5149d31615df2d90861) which was present since 5.1.8 But then again it did take 7 days to trigger again(for the second time) last time (since kernel 5.1.5, to 5.1.7) so this might be why: it's not easy to hit it. hey now, I've just looked at log for stable kernel 5.1.y branch for `mm/compaction.c`: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/mm/compaction.c?h=linux-5.1.y compared to the same log for the git kernel: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/mm/compaction.c and I notice that at least one commit isn't present in that stable kernel: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/compaction.c?id=dd7ef7bd14640f11763b54f55131000165f48321 so how am I to know it isn't already fixed in git kernel, but still not yet fixed in the stable kernel ? oh well, ignorance is bliss :D On Tue, Jun 11, 2019 at 01:59:47PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > and I notice that at least one commit isn't present in that stable kernel: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/compaction.c?id=dd7ef7bd14640f11763b54f55131000165f48321 > > so how am I to know it isn't already fixed in git kernel, but still not yet > fixed in the stable kernel ? > > oh well, ignorance is bliss :D > Don't worry about that one. It's warning that the shift may not have meaning because the shift value is too large. However, in this specific case, the result is 0 which is valid behaviour for this code path. The warning is meant to catch things like a large type being accidentally cast to a small type and shifted by a large value. The consequences can be that the upper bits are unexpectedly lost. In this particular code path, we don't care. It's a cosmetic fix for the most part, no functional impact. Thanks! That commit is in [5.1.10](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tag/?h=v5.1.10), released 6 hours ago, I'm switching to it in a few mins! ok I'm closing this, will reopen if it really happens again. Meanwhile I'll keep switching to latest kernel stable (5.1.11 released 85mins ago) I'm assuming it got fixed by this a while ago: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e577c8b64d58fe307ea4d5149d31615df2d90861 |