Bug 204793

Summary: [SME] crash: `kmem -s` reported "kmem: dma-kmalloc-512: slab: ffffe192c0001000 invalid freepointer: e5ffef4e9a040b7e" on a dumped vmcore
Product: Memory Management Reporter: lijiang
Component: OtherAssignee: Andrew Morton (akpm)
Status: RESOLVED CODE_FIX    
Severity: high    
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: v5.3-rc7 Subsystem:
Regression: No Bisected commit-id:

Description lijiang 2019-09-08 08:38:45 UTC
The issue was found on the particular AMD ROME machine below:
Serial Number 	diesel-sys9079-0001
Vendor 	AuthenticAMD
Model Name 	AMD EPYC 7601 32-Core Processor

The `kmem -s` reported "kmem: dma-kmalloc-512: slab: ffffe192c0001000 invalid freepointer: e5ffef4e9a040b7e" on a dumped vmcore.

How reproducible:
About 70%

Steps to Reproduce:
1. Install the latest kernel, for example:
commit 089cf7f6ecb266b6a4164919a2e69bd2f938374a (HEAD -> v5.3-rc7, tag: v5.3-rc7)
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Mon Sep 2 09:57:40 2019 -0700

    Linux 5.3-rc7

2. Enable SME by setting "mem_encrypt=on" on command line

3. Trigger a sysrq panic

4. Run crash 'kmem -s' to check the vmcore

Actual results:

#crash vmlinux vmcore
......
crash> kmem -s | grep -i invalid
kmem: dma-kmalloc-512: slab: ffffe192c0001000 invalid freepointer: e5ffef4e9a040b7e
kmem: dma-kmalloc-512: slab: ffffe192c0001000 invalid freepointer: e5ffef4e9a040b7e
Comment 1 lijiang 2019-09-08 08:39:45 UTC
As we know, kdump kernel will reuse the first 640k area because of something reasons, so the old content in the first 640k area will be copied to a backup area, which is done in purgatory(). When dumping the vmcore, kdump kernel will read the old content of the first 640k area from the backup area.

Basically, the main reason should be also clear, kernel does not correctly handle the first 640k region when SME is enabled, which causes that kernel does not properly copy these old memory content to backup area in purgatory(). So, kernel reads out the incorrect content from the backup region when dumping vmcore.

This bug is definitely related to the memory encryption, Any idea about this? Thanks.
Comment 2 lijiang 2019-12-18 07:06:20 UTC
Fixed in v5.5-rc1. Thanks.