Bug 201377
Summary: | Kernel BUG under memory pressure: unable to handle kernel NULL pointer dereference at 00000000000000f0 | ||
---|---|---|---|
Product: | Memory Management | Reporter: | leozinho29_eu |
Component: | Other | Assignee: | Andrew Morton (akpm) |
Status: | NEW --- | ||
Severity: | normal | ||
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.19-rc7 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | dmesg and kernel config |
Description
leozinho29_eu
2018-10-11 18:13:31 UTC
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). Vlastimil, it looks like your August 21 smaps changes are failing. This one is pretty urgent, please. Leonardo (yes?): thanks for reporting. Very helpful. On Thu, 11 Oct 2018 18:13:31 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=201377 > > Bug ID: 201377 > Summary: Kernel BUG under memory pressure: unable to handle > kernel NULL pointer dereference at 00000000000000f0 > Product: Memory Management > Version: 2.5 > Kernel Version: 4.19-rc7 > Hardware: All > OS: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Other > Assignee: akpm@linux-foundation.org > Reporter: leozinho29_eu@hotmail.com > Regression: No > > Created attachment 278997 [details] > --> https://bugzilla.kernel.org/attachment.cgi?id=278997&action=edit > dmesg and kernel config > > I'm using Xubuntu 18.04 and I noticed that under memory pressure the script > from https://github.com/pixelb/ps_mem.git (HEAD > 1ed0bc5519d889d58235f2c35db01e4ede0d8231is) causing a kernel BUG and locking > a > CPU. On dmesg the following appears: > > BUG: unable to handle kernel NULL pointer dereference at 00000000000000f0 > > After this BUG the computer performance becomes greatly degraded, some > software > do not close, some fail to open, some fail to work properly. As an example, > bash fails to autocomplete. > > Steps to reproduce: > > 1) Be under memory pressure. Using dd to write a large file at /dev/shm works > for this; > 2) Run the script from https://github.com/pixelb/ps_mem.git > > Expected result: script will print information and system will keep working > normally; > > Observed result: script is killed, kernel BUG happens, CPU get stuck and > computer presents problems. > > I did not observe this with 4.17.19, I'll bisect and see if I can find which > commit is causing this. > > I'm sorry if I'm reporting to the wrong product and component. > > -- > You are receiving this mail because: > You are the assignee for the bug. (cc linux-mm, argh) On Fri, 12 Oct 2018 15:55:33 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > Vlastimil, it looks like your August 21 smaps changes are failing. > This one is pretty urgent, please. > > Leonardo (yes?): thanks for reporting. Very helpful. > > On Thu, 11 Oct 2018 18:13:31 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=201377 > > > > Bug ID: 201377 > > Summary: Kernel BUG under memory pressure: unable to handle > > kernel NULL pointer dereference at 00000000000000f0 > > Product: Memory Management > > Version: 2.5 > > Kernel Version: 4.19-rc7 > > Hardware: All > > OS: Linux > > Tree: Mainline > > Status: NEW > > Severity: normal > > Priority: P1 > > Component: Other > > Assignee: akpm@linux-foundation.org > > Reporter: leozinho29_eu@hotmail.com > > Regression: No > > > > Created attachment 278997 [details] > > --> https://bugzilla.kernel.org/attachment.cgi?id=278997&action=edit > > dmesg and kernel config > > > > I'm using Xubuntu 18.04 and I noticed that under memory pressure the script > > from https://github.com/pixelb/ps_mem.git (HEAD > > 1ed0bc5519d889d58235f2c35db01e4ede0d8231is) causing a kernel BUG and > locking a > > CPU. On dmesg the following appears: > > > > BUG: unable to handle kernel NULL pointer dereference at 00000000000000f0 > > > > After this BUG the computer performance becomes greatly degraded, some > software > > do not close, some fail to open, some fail to work properly. As an example, > > bash fails to autocomplete. > > > > Steps to reproduce: > > > > 1) Be under memory pressure. Using dd to write a large file at /dev/shm > works > > for this; > > 2) Run the script from https://github.com/pixelb/ps_mem.git > > > > Expected result: script will print information and system will keep working > > normally; > > > > Observed result: script is killed, kernel BUG happens, CPU get stuck and > > computer presents problems. > > > > I did not observe this with 4.17.19, I'll bisect and see if I can find > which > > commit is causing this. > > > > I'm sorry if I'm reporting to the wrong product and component. > > > > -- > > You are receiving this mail because: > > You are the assignee for the bug. On 10/13/18 12:56 AM, Andrew Morton wrote: > (cc linux-mm, argh) > > On Fri, 12 Oct 2018 15:55:33 -0700 Andrew Morton <akpm@linux-foundation.org> > wrote: > >> >> (switched to email. Please respond via emailed reply-to-all, not via the >> bugzilla web interface). >> >> Vlastimil, it looks like your August 21 smaps changes are failing. >> This one is pretty urgent, please. Thanks, will look in few hours. Glad that there will be rc8... >> Leonardo (yes?): thanks for reporting. Very helpful. >> >> On Thu, 11 Oct 2018 18:13:31 +0000 bugzilla-daemon@bugzilla.kernel.org >> wrote: >> >>> https://bugzilla.kernel.org/show_bug.cgi?id=201377 >>> >>> Bug ID: 201377 >>> Summary: Kernel BUG under memory pressure: unable to handle >>> kernel NULL pointer dereference at 00000000000000f0 >>> Product: Memory Management >>> Version: 2.5 >>> Kernel Version: 4.19-rc7 >>> Hardware: All >>> OS: Linux >>> Tree: Mainline >>> Status: NEW >>> Severity: normal >>> Priority: P1 >>> Component: Other >>> Assignee: akpm@linux-foundation.org >>> Reporter: leozinho29_eu@hotmail.com >>> Regression: No >>> >>> Created attachment 278997 [details] >>> --> https://bugzilla.kernel.org/attachment.cgi?id=278997&action=edit >>> dmesg and kernel config >>> >>> I'm using Xubuntu 18.04 and I noticed that under memory pressure the script >>> from https://github.com/pixelb/ps_mem.git (HEAD >>> 1ed0bc5519d889d58235f2c35db01e4ede0d8231is) causing a kernel BUG and >>> locking a >>> CPU. On dmesg the following appears: >>> >>> BUG: unable to handle kernel NULL pointer dereference at 00000000000000f0 >>> >>> After this BUG the computer performance becomes greatly degraded, some >>> software >>> do not close, some fail to open, some fail to work properly. As an example, >>> bash fails to autocomplete. >>> >>> Steps to reproduce: >>> >>> 1) Be under memory pressure. Using dd to write a large file at /dev/shm >>> works >>> for this; >>> 2) Run the script from https://github.com/pixelb/ps_mem.git >>> >>> Expected result: script will print information and system will keep working >>> normally; >>> >>> Observed result: script is killed, kernel BUG happens, CPU get stuck and >>> computer presents problems. >>> >>> I did not observe this with 4.17.19, I'll bisect and see if I can find >>> which >>> commit is causing this. >>> >>> I'm sorry if I'm reporting to the wrong product and component. >>> >>> -- >>> You are receiving this mail because: >>> You are the assignee for the bug. On 10/13/18 2:57 PM, Vlastimil Babka wrote: > On 10/13/18 12:56 AM, Andrew Morton wrote: >> (cc linux-mm, argh) >> >> On Fri, 12 Oct 2018 15:55:33 -0700 Andrew Morton <akpm@linux-foundation.org> >> wrote: >> >>> >>> (switched to email. Please respond via emailed reply-to-all, not via the >>> bugzilla web interface). >>> >>> Vlastimil, it looks like your August 21 smaps changes are failing. >>> This one is pretty urgent, please. > > Thanks, will look in few hours. Glad that there will be rc8... I think I found it, and it seems the bug was there all the time for smaps_rollup. Dunno why it was hit only now. Please test? ----8<---- From 948be25ee1bdddca8244d1a055fbf812022571e7 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka <vbabka@suse.cz> Date: Sun, 14 Oct 2018 08:59:44 +0200 Subject: [PATCH] mm: /proc/pid/smaps_rollup: fix NULL pointer deref in smaps_pte_range Leonardo reports an apparent regression in 4.19-rc7: BUG: unable to handle kernel NULL pointer dereference at 00000000000000f0 PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 3 PID: 6032 Comm: python Not tainted 4.19.0-041900rc7-lowlatency #201810071631 Hardware name: LENOVO 80UG/Toronto 4A2, BIOS 0XCN45WW 08/09/2018 RIP: 0010:smaps_pte_range+0x32d/0x540 Code: 80 00 00 00 00 74 a9 48 89 de 41 f6 40 52 40 0f 85 04 02 00 00 49 2b 30 48 c1 ee 0c 49 03 b0 98 00 00 00 49 8b 80 a0 00 00 00 <48> 8b b8 f0 00 00 00 e8 b7 ef ec ff 48 85 c0 0f 84 71 ff ff ff a8 RSP: 0018:ffffb0cbc484fb88 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 0000560ddb9e9000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000560ddb9e9 RDI: 0000000000000001 RBP: ffffb0cbc484fbc0 R08: ffff94a5a227a578 R09: ffff94a5a227a578 R10: 0000000000000000 R11: 0000560ddbbe7000 R12: ffffe903098ba728 R13: ffffb0cbc484fc78 R14: ffffb0cbc484fcf8 R15: ffff94a5a2e9cf48 FS: 00007f6dfb683740(0000) GS:ffff94a5aaf80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000f0 CR3: 000000011c118001 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __walk_page_range+0x3c2/0x6f0 walk_page_vma+0x42/0x60 smap_gather_stats+0x79/0xe0 ? gather_pte_stats+0x320/0x320 ? gather_hugetlb_stats+0x70/0x70 show_smaps_rollup+0xcd/0x1c0 seq_read+0x157/0x400 __vfs_read+0x3a/0x180 ? security_file_permission+0x93/0xc0 ? security_file_permission+0x93/0xc0 vfs_read+0x8f/0x140 ksys_read+0x55/0xc0 __x64_sys_read+0x1a/0x20 do_syscall_64+0x5a/0x110 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Decoded code matched to local compilation+disassembly points to smaps_pte_entry(): } else if (unlikely(IS_ENABLED(CONFIG_SHMEM) && mss->check_shmem_swap && pte_none(*pte))) { page = find_get_entry(vma->vm_file->f_mapping, linear_page_index(vma, addr)); Here, vma->vm_file is NULL. mss->check_shmem_swap should be false in that case, however for smaps_rollup, smap_gather_stats() can set the flag true for one vma and leave it true for subsequent vma's where it should be false. To fix, reset the check_shmem_swap flag to false. There's also related bug which sets mss->swap to shmem_swapped, which in the context of smaps_rollup overwrites any value accumulated from previous vma's. Fix that as well. Note that the report suggests a regression between 4.17.19 and 4.19-rc7, which makes the 4.19 series ending with commit 258f669e7e88 ("mm: /proc/pid/smaps_rollup: convert to single value seq_file") suspicious. But the mss was reused for rollup since 493b0e9d945f ("mm: add /proc/pid/smaps_rollup") so let's play it safe with the stable backport. Fixes: 493b0e9d945f ("mm: add /proc/pid/smaps_rollup") Link: https://bugzilla.kernel.org/show_bug.cgi?id=201377 Reported-by: Leonardo Mueller <leozinho29_eu@hotmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> --- fs/proc/task_mmu.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 5ea1d64cb0b4..a027473561c6 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -713,6 +713,8 @@ static void smap_gather_stats(struct vm_area_struct *vma, smaps_walk.private = mss; #ifdef CONFIG_SHMEM + /* In case of smaps_rollup, reset the value from previous vma */ + mss->check_shmem_swap = false; if (vma->vm_file && shmem_mapping(vma->vm_file->f_mapping)) { /* * For shared or readonly shmem mappings we know that all @@ -728,7 +730,7 @@ static void smap_gather_stats(struct vm_area_struct *vma, if (!shmem_swapped || (vma->vm_flags & VM_SHARED) || !(vma->vm_flags & VM_WRITE)) { - mss->swap = shmem_swapped; + mss->swap += shmem_swapped; } else { mss->check_shmem_swap = true; smaps_walk.pte_hole = smaps_pte_hole; On 10/14/18 8:07 PM, Leonardo Soares Müller wrote: > This patch applied on 4.19-rc7 corrected the problem to me and the > script is no longer triggering the kernel bug. Great! Can we add your Tested-by: then? > I completely skipped 4.18 because there were multiple regressions > affecting my computer. 4.19-rc6 and 4.19-rc7 have most regressions fixed > but then this issue appeared. > > The first kernel version released I found with this problem is 4.18-rc4, OK, that confirms the smaps_rollup problem is indeed older than my rewrite. Unless it's a typo and you mean 4.19-rc4 since you "skipped 4.18". > but bisecting between 4.18-rc3 and 4.18-rc4 failed: on boot there was > one message starting with [UNSUPP] and with something about "Arbitrary > File System". > I meant eighteen, this is right. While I skipped 4.18 for normal use, to
do tests when this issue appeared I tested with 4.18 too and noticed
that since 4.18-rc4 the issue exist.
Yes, you can add me to Tested-by, as this patch solved the issue to me:
no problems with kernel and the script runs normally. Thank you.
Em 14/10/2018 17:14, Vlastimil Babka escreveu:
> On 10/14/18 8:07 PM, Leonardo Soares Müller wrote:
>> This patch applied on 4.19-rc7 corrected the problem to me and the
>> script is no longer triggering the kernel bug.
>
> Great! Can we add your Tested-by: then?
>
>> I completely skipped 4.18 because there were multiple regressions
>> affecting my computer. 4.19-rc6 and 4.19-rc7 have most regressions fixed
>> but then this issue appeared.
>>
>> The first kernel version released I found with this problem is 4.18-rc4,
>
> OK, that confirms the smaps_rollup problem is indeed older than my
> rewrite. Unless it's a typo and you mean 4.19-rc4 since you "skipped 4.18".
>
>> but bisecting between 4.18-rc3 and 4.18-rc4 failed: on boot there was
>> one message starting with [UNSUPP] and with something about "Arbitrary
>> File System".
>>
This patch applied on 4.19-rc7 corrected the problem to me and the
script is no longer triggering the kernel bug.
I completely skipped 4.18 because there were multiple regressions
affecting my computer. 4.19-rc6 and 4.19-rc7 have most regressions fixed
but then this issue appeared.
The first kernel version released I found with this problem is 4.18-rc4,
but bisecting between 4.18-rc3 and 4.18-rc4 failed: on boot there was
one message starting with [UNSUPP] and with something about "Arbitrary
File System".
Em 14/10/2018 04:17, Vlastimil Babka escreveu:
> On 10/13/18 2:57 PM, Vlastimil Babka wrote:
>> On 10/13/18 12:56 AM, Andrew Morton wrote:
>>> (cc linux-mm, argh)
>>>
>>> On Fri, 12 Oct 2018 15:55:33 -0700 Andrew Morton
>>> <akpm@linux-foundation.org> wrote:
>>>
>>>>
>>>> (switched to email. Please respond via emailed reply-to-all, not via the
>>>> bugzilla web interface).
>>>>
>>>> Vlastimil, it looks like your August 21 smaps changes are failing.
>>>> This one is pretty urgent, please.
>>
>> Thanks, will look in few hours. Glad that there will be rc8...
>
> I think I found it, and it seems the bug was there all the time for
> smaps_rollup.
> Dunno why it was hit only now. Please test?
>
> ----8<----
> From 948be25ee1bdddca8244d1a055fbf812022571e7 Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Sun, 14 Oct 2018 08:59:44 +0200
> Subject: [PATCH] mm: /proc/pid/smaps_rollup: fix NULL pointer deref in
> smaps_pte_range
>
> Leonardo reports an apparent regression in 4.19-rc7:
>
> BUG: unable to handle kernel NULL pointer dereference at 00000000000000f0
> PGD 0 P4D 0
> Oops: 0000 [#1] PREEMPT SMP PTI
> CPU: 3 PID: 6032 Comm: python Not tainted 4.19.0-041900rc7-lowlatency
> #201810071631
> Hardware name: LENOVO 80UG/Toronto 4A2, BIOS 0XCN45WW 08/09/2018
> RIP: 0010:smaps_pte_range+0x32d/0x540
> Code: 80 00 00 00 00 74 a9 48 89 de 41 f6 40 52 40 0f 85 04 02 00 00 49 2b
> 30 48 c1 ee 0c 49 03 b0 98 00 00 00 49 8b 80 a0 00 00 00 <48> 8b b8 f0 00 00
> 00 e8 b7 ef ec ff 48 85 c0 0f 84 71 ff ff ff a8
> RSP: 0018:ffffb0cbc484fb88 EFLAGS: 00010202
> RAX: 0000000000000000 RBX: 0000560ddb9e9000 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000560ddb9e9 RDI: 0000000000000001
> RBP: ffffb0cbc484fbc0 R08: ffff94a5a227a578 R09: ffff94a5a227a578
> R10: 0000000000000000 R11: 0000560ddbbe7000 R12: ffffe903098ba728
> R13: ffffb0cbc484fc78 R14: ffffb0cbc484fcf8 R15: ffff94a5a2e9cf48
> FS: 00007f6dfb683740(0000) GS:ffff94a5aaf80000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000000000f0 CR3: 000000011c118001 CR4: 00000000003606e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> __walk_page_range+0x3c2/0x6f0
> walk_page_vma+0x42/0x60
> smap_gather_stats+0x79/0xe0
> ? gather_pte_stats+0x320/0x320
> ? gather_hugetlb_stats+0x70/0x70
> show_smaps_rollup+0xcd/0x1c0
> seq_read+0x157/0x400
> __vfs_read+0x3a/0x180
> ? security_file_permission+0x93/0xc0
> ? security_file_permission+0x93/0xc0
> vfs_read+0x8f/0x140
> ksys_read+0x55/0xc0
> __x64_sys_read+0x1a/0x20
> do_syscall_64+0x5a/0x110
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Decoded code matched to local compilation+disassembly points to
> smaps_pte_entry():
>
> } else if (unlikely(IS_ENABLED(CONFIG_SHMEM) && mss->check_shmem_swap
> && pte_none(*pte))) {
> page = find_get_entry(vma->vm_file->f_mapping,
> linear_page_index(vma,
> addr));
>
> Here, vma->vm_file is NULL. mss->check_shmem_swap should be false in that
> case,
> however for smaps_rollup, smap_gather_stats() can set the flag true for one
> vma
> and leave it true for subsequent vma's where it should be false.
>
> To fix, reset the check_shmem_swap flag to false. There's also related bug
> which sets mss->swap to shmem_swapped, which in the context of smaps_rollup
> overwrites any value accumulated from previous vma's. Fix that as well.
>
> Note that the report suggests a regression between 4.17.19 and 4.19-rc7,
> which makes the 4.19 series ending with commit 258f669e7e88 ("mm:
> /proc/pid/smaps_rollup: convert to single value seq_file") suspicious. But
> the
> mss was reused for rollup since 493b0e9d945f ("mm: add
> /proc/pid/smaps_rollup")
> so let's play it safe with the stable backport.
>
> Fixes: 493b0e9d945f ("mm: add /proc/pid/smaps_rollup")
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=201377
> Reported-by: Leonardo Mueller <leozinho29_eu@hotmail.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: <stable@vger.kernel.org>
> ---
> fs/proc/task_mmu.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 5ea1d64cb0b4..a027473561c6 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -713,6 +713,8 @@ static void smap_gather_stats(struct vm_area_struct *vma,
> smaps_walk.private = mss;
>
> #ifdef CONFIG_SHMEM
> + /* In case of smaps_rollup, reset the value from previous vma */
> + mss->check_shmem_swap = false;
> if (vma->vm_file && shmem_mapping(vma->vm_file->f_mapping)) {
> /*
> * For shared or readonly shmem mappings we know that all
> @@ -728,7 +730,7 @@ static void smap_gather_stats(struct vm_area_struct *vma,
>
> if (!shmem_swapped || (vma->vm_flags & VM_SHARED) ||
> !(vma->vm_flags & VM_WRITE)) {
> - mss->swap = shmem_swapped;
> + mss->swap += shmem_swapped;
> } else {
> mss->check_shmem_swap = true;
> smaps_walk.pte_hole = smaps_pte_hole;
>
|