Bug 203125

Summary: Kernel 5.1-rc1 fails to boot on a PowerMac G4 3,6: Caused by (from SRR1=141020): Transfer error ack signal
Product: Platform Specific/Hardware Reporter: Erhard F. (erhard_f)
Component: PPC-32Assignee: platform_ppc-32
Status: RESOLVED CODE_FIX    
Severity: normal CC: christophe.leroy
Priority: P1    
Hardware: PPC-32   
OS: Linux   
Kernel Version: 5.1-rc1 Subsystem:
Regression: No Bisected commit-id:
Attachments: kernel .config
kernel panic (5.1-rc3, PowerMac G4 3,6)
bisect.log

Description Erhard F. 2019-04-01 16:32:53 UTC
Created attachment 282083 [details]
kernel .config

Kernel 5.1-rc3 fails to boot on my PowerMac G4 3,6. -rc2 fails too with the same error message. Did not try -rc1 on this machine. Kernel 5.0.5 runs fine (apart from bug #202017).

On my PowerMac G5 however 5.1 runs fine from -rc1 on. Kernel .config on both machines is pretty similar.

Sorry for the 'screenshot kernel log' but I got no other means of debugging on my machines in this early stage of the boot process.
Comment 1 Erhard F. 2019-04-01 16:33:54 UTC
Created attachment 282085 [details]
kernel panic (5.1-rc3, PowerMac G4 3,6)
Comment 2 Erhard F. 2019-04-15 12:52:34 UTC
Tried now -rc5 as I was hoping for Merge tag 'powerpc-5.1-5'. But this bug is still around.

Trying to bisect, but got quite a few bisect skip already (113 commits left)...
Comment 3 Erhard F. 2019-04-23 15:37:43 UTC
Narrowed it down a bit. 5.1-rc1 is affected in the same way, 5.0.x is still fine. Still bisecting...
Comment 4 Erhard F. 2019-05-08 22:38:09 UTC
Created attachment 282679 [details]
bisect.log

Finally, the results of the bisect...

f7354ccac844da7b1af8cc4f09da330fa3e960e4 is the first bad commit
commit f7354ccac844da7b1af8cc4f09da330fa3e960e4
Author: Christophe Leroy <christophe.leroy@c-s.fr>
Date:   Thu Jan 31 10:09:04 2019 +0000

    powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU
    
    Now that thread_info is similar to task_struct, its address is in r2
    so CURRENT_THREAD_INFO() macro is useless. This patch removes it.
    
    This patch also moves the 'tovirt(r2, r2)' down just before the
    reactivation of MMU translation, so that we keep the physical address
    of 'current' in r2 until then. It avoids a few calls to tophys().
    
    At the same time, as the 'cpu' field is not anymore in thread_info,
    TI_CPU is renamed TASK_CPU by this patch.
    
    It also allows to get rid of a couple of
    '#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE' as ACCOUNT_CPU_USER_ENTRY()
    and ACCOUNT_CPU_USER_EXIT() are empty when
    CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not defined.
    
    Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
    [mpe: Fix a missed conversion of TI_CPU idle_6xx.S]
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

:040000 040000 c7a4d4ea7c2864b693e82a075f62e26dfcb82c84 a15e8516fdadda87981702d86b38b0e07672220b M      arch
Comment 5 Christophe Leroy 2019-05-09 10:05:44 UTC
Apparently, MSR DR is not sent and DAR has value 0xc0090654, meaning that a virt address tries to get accessed while in real mode.
Comment 6 Christophe Leroy 2019-05-09 10:31:47 UTC
Try followin change:

diff --git a/arch/powerpc/mm/book3s32/hash_low.S b/arch/powerpc/mm/book3s32/hash_low.S
index e27792d0b744..8366c2abeafc 100644
--- a/arch/powerpc/mm/book3s32/hash_low.S
+++ b/arch/powerpc/mm/book3s32/hash_low.S
@@ -539,7 +539,8 @@ _GLOBAL(flush_hash_pages)
 #ifdef CONFIG_SMP
     lis    r9, (mmu_hash_lock - PAGE_OFFSET)@ha
     addi    r9, r9, (mmu_hash_lock - PAGE_OFFSET)@l
-    lwz    r8,TASK_CPU(r2)
+    tophys    (r8, r2)
+    lwz    r8, TASK_CPU(r8)
     oris    r8,r8,9
 10:    lwarx    r0,0,r9
     cmpi    0,r0,0
Comment 7 Christophe Leroy 2019-05-09 10:50:43 UTC
On 05/09/2019 10:05 AM, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=203125
> 
> Christophe Leroy (christophe.leroy@c-s.fr) changed:
> 
>             What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                   CC|                            |christophe.leroy@c-s.fr
> 
> --- Comment #5 from Christophe Leroy (christophe.leroy@c-s.fr) ---
> Apparently, MSR DR is not sent and DAR has value 0xc0090654, meaning that a
> virt address tries to get accessed while in real mode.
> 

Could you try the patch below:

diff --git a/arch/powerpc/mm/book3s32/hash_low.S 
b/arch/powerpc/mm/book3s32/hash_low.S
index e27792d0b744..8366c2abeafc 100644
--- a/arch/powerpc/mm/book3s32/hash_low.S
+++ b/arch/powerpc/mm/book3s32/hash_low.S
@@ -539,7 +539,8 @@ _GLOBAL(flush_hash_pages)
  #ifdef CONFIG_SMP
  	lis	r9, (mmu_hash_lock - PAGE_OFFSET)@ha
  	addi	r9, r9, (mmu_hash_lock - PAGE_OFFSET)@l
-	lwz	r8,TASK_CPU(r2)
+	tophys	(r8, r2)
+	lwz	r8, TASK_CPU(r8)
  	oris	r8,r8,9
  10:	lwarx	r0,0,r9
  	cmpi	0,r0,0


Christophe
Comment 8 Erhard F. 2019-05-09 11:36:35 UTC
I tried to apply the patch on top of 5.1.0 but it did not work out 'cause there is no arch/powerpc/mm/book3s32/hash_low.S. The correct file on my system seemed arch/powerpc/mm/hash_low_32.S, but changing the path in the patch did not work either.

What actually did work was manually applying your proposed change in arch/powerpc/mm/book3s32/hash_low.S! Now the G4 boots up fine with 5.1.0 as it did with 5.0.x.

Many thanks for the fix!
Comment 9 Erhard F. 2019-05-09 11:42:21 UTC
@Christophe: Oops, accidentally I trashed your last comment. Sorry! This patch didn't apply either. There is no book3s32 directory in my linux-stable tree, the hash_*.S files are directly in mm/

But does not matter, your code change works!
Comment 10 Christophe Leroy 2019-05-10 05:36:30 UTC
Thanks for testing.

Proposed patch at https://patchwork.ozlabs.org/patch/1097492/

Note that the backport to stable should be straight forward as the directory and file name changes are in following commit:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=17312f258cf6eb584f276ad592972ade7e16e318
Comment 11 Erhard F. 2019-05-22 20:18:50 UTC
Your fix landed in 5.1.4 stable now, the G4 boots fine again. Thanks!