Bug 200429 - ARM64: Kernel panic on executing simple devmem 0x0 command
Summary: ARM64: Kernel panic on executing simple devmem 0x0 command
Status: NEW
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: ARM (show other bugs)
Hardware: ARM Linux
: P1 normal
Assignee: linux-arm-kernel@lists.arm.linux.org.uk
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-07-06 11:24 UTC by Hari Kishore Vyas
Modified: 2018-07-06 11:30 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.17,4.18
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Proposed fix to avoid issue. (1.08 KB, patch)
2018-07-06 11:30 UTC, Hari Kishore Vyas
Details | Diff

Description Hari Kishore Vyas 2018-07-06 11:24:29 UTC
Kernel panics on executing simple devmem 0x0 command from command prompt. 
This issue was not happening in at least 4.14 version and now we recently moved to 4.17, I am observing it.

bcm958742k login: root (automatic login)

bcm958742k:~#  /sbin/devmem 0x0
[  192.082901] SError Interrupt on CPU4, code 0xbf000000 -- SError
[  192.082904] CPU: 4 PID: 2424 Comm: devmem Not tainted 4.17.0-02102-gdcfa25a-dirty #109
[  192.082905] Hardware name: Stingray Combo SVK (BCM958742K) (DT)
[  192.082906] pstate: 60000000 (nZCv daif -PAN -UAO)
[  192.082907] pc : 0000ffffa14600e8
[  192.082907] lr : 000000000040cdec
[  192.082908] sp : 0000fffff6b7fce0
[  192.082909] x29: 0000fffff6b7fee0 x28: 0000000000000000 
[  192.082910] x27: 0000000000000000 x26: 0000000000000000 
[  192.082912] x25: 0000fffff6b80010 x24: 0000000000000003 
[  192.082913] x23: 0000000000001000 x22: 0000fffff6b80018 
[  192.082914] x21: 0000000000000000 x20: 0000ffffa1477000 
[  192.082915] x19: 0000000000000020 x18: 00000000000004ca 
[  192.082916] x17: 0000ffffa14600cc x16: 00000000004b6ff8 
[  192.082917] x15: 0000ffffa123fde0 x14: 0000ffffa124d2c8 
[  192.082918] x13: 0000ffffa138eac8 x12: 0000000000000000 
[  192.082919] x11: 0000000000000000 x10: 0101010101010101 
[  192.082921] x9 : 0000ffffa134e300 x8 : 00000000000000de 
[  192.082922] x7 : 0000ffffa134ec00 x6 : 0000000000000000 
[  192.082923] x5 : 0000000000000000 x4 : 0000000000000003 
[  192.082924] x3 : 0000000000000001 x2 : 0000000000000000 
[  192.082925] x1 : 0000000000000008 x0 : 000000000048c4e8 
[  192.082927] Kernel panic - not syncing: Asynchronous SError Interrupt
[  192.082928] CPU: 4 PID: 2424 Comm: devmem Not tainted 4.17.0-02102-gdcfa25a-dirty #109
[  192.082929] Hardware name: Stingray Combo SVK (BCM958742K) (DT)
[  192.082930] Call trace:
[  192.082939]  dump_backtrace+0x0/0x1b8
[  192.082941]  show_stack+0x14/0x1c
[  192.082943]  dump_stack+0x90/0xb0
[  192.082945]  panic+0x140/0x2a8
[  192.082946]  __stack_chk_fail+0x0/0x18
[  192.082947]  arm64_serror_panic+0x74/0x80
[  192.082948]  do_serror+0x48/0xa0
[  192.082949]  el0_error_naked+0x10/0x18
[  192.082954] SMP: stopping secondary CPUs
[  192.082957] Kernel Offset: disabled
[  192.082959] CPU features: 0x21806008
[  192.082959] Memory Limit: none
[  192.267536] ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---

Last commit which modifies earlier bad_mode framework says to update do_serror() for correctable issues but this is very basic and needs to be handled in graceful way.

Probable fix:
hv930220@hariv-server:~/ns2_master1/kernel$ git diff arch/arm64/kernel/traps.c
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 8bbdc17..fc265dd 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -728,13 +728,19 @@ bool arm64_is_fatal_ras_serror(struct pt_regs *regs, unsign
 
 asmlinkage void do_serror(struct pt_regs *regs, unsigned int esr)
 {
-       nmi_enter();
-
-       /* non-RAS errors are not containable */
-       if (!arm64_is_ras_serror(esr) || arm64_is_fatal_ras_serror(regs, esr))
-               arm64_serror_panic(regs, esr);
-
-       nmi_exit();
+        if (user_mode(regs)) {
+               pr_crit("USER SError Interrupt on CPU%d, code 0x%08x -- %s\n",
+                       smp_processor_id(), esr, esr_get_class_string(esr));
+               die("Oops - user mode ", regs, 0);
+        } else {
+               pr_crit("KERNEL SError Interrupt on CPU%d, code 0x%08x -- %s\n",
+                       smp_processor_id(), esr, esr_get_class_string(esr));
+               nmi_enter();
+               /* non-RAS errors are not containable */
+               if (!arm64_is_ras_serror(esr) || arm64_is_fatal_ras_serror(regs, 
+                       arm64_serror_panic(regs, esr);
+               nmi_exit();
+       }
 }
 
 void __pte_error(const char *file, int line, unsigned long val)
hv930220@hariv-server:~/ns2_master1/kernel$
Comment 1 Hari Kishore Vyas 2018-07-06 11:30:56 UTC
Created attachment 277219 [details]
Proposed fix to avoid issue.

Just a simple fix to avoid issue. Will raise a formal patch soon.

Note You need to log in before you can comment on or make changes to this bug.