Bug 16573

Summary: Oops in trace_hardirqs_on (powerpc)
Product: Platform Specific/Hardware Reporter: Maciej Rutecki (maciej.rutecki)
Component: PPC-32Assignee: platform_ppc-32
Severity: normal CC: alan, joerg, maciej.rutecki, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.35 Tree: Mainline
Regression: No
Attachments: test patch for ppc32

Description Maciej Rutecki 2010-08-12 20:25:33 UTC
Subject    : Oops in trace_hardirqs_on (powerpc)
Submitter  : Jörg Sommer <joerg@alea.gnuu.de>
Date       : 2010-08-06 23:31
Message-ID : 20100806233157.GA7117@alea.gnuu.de
References : http://marc.info/?l=linux-kernel&m=128114139412842&w=2

This entry is being used for tracking a regression from 2.6.34. Please don't
close it until the problem is fixed in the mainline.
Comment 1 Jörg Sommer 2010-08-15 11:21:27 UTC
I don't think, it's a regression. I've tested 2.6.32 and .33 and they have this problem, too.
Comment 2 Jörg Sommer 2010-08-16 17:48:25 UTC
I think the problem is in the second line:

void trace_hardirqs_on(void)
	if (!preempt_trace() && irq_trace())
		stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);

The call trace shows there are only two stack frames, but the CALLER_ADDR1 asks for the caller of the previous stack frame, i.e. the owner of third stack frame and this fails.

[   55.394527] Call Trace:
[   55.401878] [e3211f20] [10019c58] 0x10019c58 (unreliable)
[   55.409437] [e3211f40] [c001771c] restore+0x10/0x6c
[   55.417065] --- Exception: c00 at 0xff23c88

The bad code is this:

   0xc00aad90 <+84>:    lwz     r8,0(r31)
   0xc00aad94 <+88>:    lwz     r8,0(r8)
   0xc00aad98 <+92>:    lwz     r27,4(r8)

r31 is the pointer to the current stack frame. The first instruction loads the pointer to the previous stack frame (stored at the beginning of the current stack frame), the second loads the pointer to the last but one stack frame and the third instruction tries to load the instruction pointer of the last instruction in the last but one stack frame. And this fails. This last stack frame has no predecessor and hence, the address loaded with the second instruction is invalid. The unreferencing in the third instruction causes the segmentation fault.

The gcc internals info page says in section 5.45 __builtin_return_address(): “when the top of the stack has been reached, this function will return `0' or a random value.”

The call comes from this code in arch/powerpc/kernel/entry_32.S:880:

(gdb) disassemble restore
Dump of assembler code for function resume_kernel:
   0xc001770c <+0>:     lwz     r9,148(r1)
   0xc0017710 <+4>:     andi.   r10,r9,32768
   0xc0017714 <+8>:     beq     0xc0017720 <resume_kernel+20>
   0xc0017718 <+12>:    bl      0xc00aad3c <trace_hardirqs_on>
   0xc001771c <+16>:    lwz     r9,148(r1)
   0xc0017720 <+20>:    lwz     r0,16(r1)
   0xc0017724 <+24>:    lwz     r2,24(r1)

The question is: Why's there no previous stack frame or is the stack corrupt? Or is the assumption trace_hardirqs_on() is called on a deeper stack wrong?
Comment 3 Rafael J. Wysocki 2010-08-16 18:51:57 UTC
Dropping from the list of recent regressions as per comment #1.
Comment 4 Steven Rostedt 2010-12-23 02:40:15 UTC
Created attachment 41412 [details]
test patch for ppc32

Could someone test this patch. I have a similar fix for ppc64 which I'll post to LKML soon. I only compiled tested this for ppc32, as my kids have confiscated my only ppc32 box ;-)
Comment 5 Jörg Sommer 2011-01-29 01:03:27 UTC
This bug was resolved by commit 06ca2188eccbd7932636ac5bde2837297800480e »powerpc/ppc32/tracing: Add stack frame to calls of trace_hardirqs_on/off«