Subject : Oops in trace_hardirqs_on (powerpc)
Submitter : Jörg Sommer <email@example.com>
Date : 2010-08-06 23:31
Message-ID : 20100806233157.GA7117@alea.gnuu.de
References : http://marc.info/?l=linux-kernel&m=128114139412842&w=2
This entry is being used for tracking a regression from 2.6.34. Please don't
close it until the problem is fixed in the mainline.
I don't think, it's a regression. I've tested 2.6.32 and .33 and they have this problem, too.
I think the problem is in the second line:
if (!preempt_trace() && irq_trace())
The call trace shows there are only two stack frames, but the CALLER_ADDR1 asks for the caller of the previous stack frame, i.e. the owner of third stack frame and this fails.
[ 55.394527] Call Trace:
[ 55.401878] [e3211f20] [10019c58] 0x10019c58 (unreliable)
[ 55.409437] [e3211f40] [c001771c] restore+0x10/0x6c
[ 55.417065] --- Exception: c00 at 0xff23c88
The bad code is this:
0xc00aad90 <+84>: lwz r8,0(r31)
0xc00aad94 <+88>: lwz r8,0(r8)
0xc00aad98 <+92>: lwz r27,4(r8)
r31 is the pointer to the current stack frame. The first instruction loads the pointer to the previous stack frame (stored at the beginning of the current stack frame), the second loads the pointer to the last but one stack frame and the third instruction tries to load the instruction pointer of the last instruction in the last but one stack frame. And this fails. This last stack frame has no predecessor and hence, the address loaded with the second instruction is invalid. The unreferencing in the third instruction causes the segmentation fault.
The gcc internals info page says in section 5.45 __builtin_return_address(): “when the top of the stack has been reached, this function will return `0' or a random value.”
The call comes from this code in arch/powerpc/kernel/entry_32.S:880:
(gdb) disassemble restore
Dump of assembler code for function resume_kernel:
0xc001770c <+0>: lwz r9,148(r1)
0xc0017710 <+4>: andi. r10,r9,32768
0xc0017714 <+8>: beq 0xc0017720 <resume_kernel+20>
0xc0017718 <+12>: bl 0xc00aad3c <trace_hardirqs_on>
0xc001771c <+16>: lwz r9,148(r1)
0xc0017720 <+20>: lwz r0,16(r1)
0xc0017724 <+24>: lwz r2,24(r1)
The question is: Why's there no previous stack frame or is the stack corrupt? Or is the assumption trace_hardirqs_on() is called on a deeper stack wrong?
Dropping from the list of recent regressions as per comment #1.
Created attachment 41412 [details]
test patch for ppc32
Could someone test this patch. I have a similar fix for ppc64 which I'll post to LKML soon. I only compiled tested this for ppc32, as my kids have confiscated my only ppc32 box ;-)
This bug was resolved by commit 06ca2188eccbd7932636ac5bde2837297800480e »powerpc/ppc32/tracing: Add stack frame to calls of trace_hardirqs_on/off«