Bug 204793 - [SME] crash: `kmem -s` reported "kmem: dma-kmalloc-512: slab: ffffe192c0001000 invalid freepointer: e5ffef4e9a040b7e" on a dumped vmcore
Summary: [SME] crash: `kmem -s` reported "kmem: dma-kmalloc-512: slab: ffffe192c000100...
Status: RESOLVED CODE_FIX
Alias: None
Product: Memory Management
Classification: Unclassified
Component: Other (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: Andrew Morton
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-09-08 08:38 UTC by lijiang
Modified: 2019-12-18 07:06 UTC (History)
0 users

See Also:
Kernel Version: v5.3-rc7
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description lijiang 2019-09-08 08:38:45 UTC
The issue was found on the particular AMD ROME machine below:
Serial Number 	diesel-sys9079-0001
Vendor 	AuthenticAMD
Model Name 	AMD EPYC 7601 32-Core Processor

The `kmem -s` reported "kmem: dma-kmalloc-512: slab: ffffe192c0001000 invalid freepointer: e5ffef4e9a040b7e" on a dumped vmcore.

How reproducible:
About 70%

Steps to Reproduce:
1. Install the latest kernel, for example:
commit 089cf7f6ecb266b6a4164919a2e69bd2f938374a (HEAD -> v5.3-rc7, tag: v5.3-rc7)
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Mon Sep 2 09:57:40 2019 -0700

    Linux 5.3-rc7

2. Enable SME by setting "mem_encrypt=on" on command line

3. Trigger a sysrq panic

4. Run crash 'kmem -s' to check the vmcore

Actual results:

#crash vmlinux vmcore
......
crash> kmem -s | grep -i invalid
kmem: dma-kmalloc-512: slab: ffffe192c0001000 invalid freepointer: e5ffef4e9a040b7e
kmem: dma-kmalloc-512: slab: ffffe192c0001000 invalid freepointer: e5ffef4e9a040b7e
Comment 1 lijiang 2019-09-08 08:39:45 UTC
As we know, kdump kernel will reuse the first 640k area because of something reasons, so the old content in the first 640k area will be copied to a backup area, which is done in purgatory(). When dumping the vmcore, kdump kernel will read the old content of the first 640k area from the backup area.

Basically, the main reason should be also clear, kernel does not correctly handle the first 640k region when SME is enabled, which causes that kernel does not properly copy these old memory content to backup area in purgatory(). So, kernel reads out the incorrect content from the backup region when dumping vmcore.

This bug is definitely related to the memory encryption, Any idea about this? Thanks.
Comment 2 lijiang 2019-12-18 07:06:20 UTC
Fixed in v5.5-rc1. Thanks.

Note You need to log in before you can comment on or make changes to this bug.