Bug 15076

Summary: System panic under load with clockevents_program_event
Product: Platform Specific/Hardware Reporter: David Heidelberg (okias) (david)
Component: i386Assignee: platform_i386
Status: CLOSED INSUFFICIENT_DATA    
Severity: high CC: florian, john.stultz, rjw, ruan.zhengwang
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.33-rc4-git4 Tree: Mainline
Regression: Yes
Bug Depends on:    
Bug Blocks: 14885    
Attachments: screen_part1.jpg
screen_part2.jpg
dmesg
kernel_config

Description David Heidelberg (okias) 2010-01-17 13:03:38 UTC
Created attachment 24603 [details]
screen_part1.jpg

This issue can trigger for example compilation of boost. Appering screenshot. 
If is needed, can test patches.
Comment 1 David Heidelberg (okias) 2010-01-17 13:04:36 UTC
Created attachment 24604 [details]
screen_part2.jpg
Comment 2 David Heidelberg (okias) 2010-01-17 13:08:38 UTC
Created attachment 24605 [details]
dmesg

dmesg, without panic
Comment 3 David Heidelberg (okias) 2010-01-17 13:09:24 UTC
Created attachment 24606 [details]
kernel_config

Kernel configuration
Comment 4 David Heidelberg (okias) 2010-01-22 10:17:25 UTC
and it's regression. Now I work on 2.6.32.3 and no problem.
Comment 5 Thomas Gleixner 2010-01-25 08:46:14 UTC
Switched to email. Please reply to all instead of using the bugzilla
interface.

> --- Comment #4 from okias <d.okias@gmail.com>  2010-01-22 10:17:25 ---
> and it's regression. Now I work on 2.6.32.3 and no problem.

That's a really weird one. The system is 50 min up and running and out
of the blue it crashes in clockevents_program_event(). This function
has been called a couple of thousand times before that point.

The only way to crash there is when *dev is pointing into nirwana. dev
comes from

int tick_program_event(ktime_t expires, int force)
{
        struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;

according to the callchain. At this point nothing fiddles with
tick_cpu_device.evtdev, so I suspect some really nasty memory
corruption going on.

okias, can you please disable highmem support and verify whether the
problem persists ?

Thanks,

	tglx
Comment 6 David Heidelberg (okias) 2010-01-25 11:33:12 UTC
Okey, I try without highmem soon.

2010/1/25, Thomas Gleixner <tglx@linutronix.de>:
> Switched to email. Please reply to all instead of using the bugzilla
> interface.
>
>> --- Comment #4 from okias <d.okias@gmail.com>  2010-01-22 10:17:25 ---
>> and it's regression. Now I work on 2.6.32.3 and no problem.
>
> That's a really weird one. The system is 50 min up and running and out
> of the blue it crashes in clockevents_program_event(). This function
> has been called a couple of thousand times before that point.
>
> The only way to crash there is when *dev is pointing into nirwana. dev
> comes from
>
> int tick_program_event(ktime_t expires, int force)
> {
>         struct clock_event_device *dev =
> __get_cpu_var(tick_cpu_device).evtdev;
>
> according to the callchain. At this point nothing fiddles with
> tick_cpu_device.evtdev, so I suspect some really nasty memory
> corruption going on.
>
> okias, can you please disable highmem support and verify whether the
> problem persists ?
>
> Thanks,
>
>       tglx
>
Comment 7 David Heidelberg (okias) 2010-01-26 14:07:47 UTC
Lastest git without HIGHMEM look good. If no problems occur, then I
try use kernel with HIGHMEM and see what change...

2010/1/25, okias <d.okias@gmail.com>:
> Okey, I try without highmem soon.
>
> 2010/1/25, Thomas Gleixner <tglx@linutronix.de>:
>> Switched to email. Please reply to all instead of using the bugzilla
>> interface.
>>
>>> --- Comment #4 from okias <d.okias@gmail.com>  2010-01-22 10:17:25 ---
>>> and it's regression. Now I work on 2.6.32.3 and no problem.
>>
>> That's a really weird one. The system is 50 min up and running and out
>> of the blue it crashes in clockevents_program_event(). This function
>> has been called a couple of thousand times before that point.
>>
>> The only way to crash there is when *dev is pointing into nirwana. dev
>> comes from
>>
>> int tick_program_event(ktime_t expires, int force)
>> {
>>         struct clock_event_device *dev =
>> __get_cpu_var(tick_cpu_device).evtdev;
>>
>> according to the callchain. At this point nothing fiddles with
>> tick_cpu_device.evtdev, so I suspect some really nasty memory
>> corruption going on.
>>
>> okias, can you please disable highmem support and verify whether the
>> problem persists ?
>>
>> Thanks,
>>
>>      tglx
>>
>
>
> --
> Jabber/XMPP: okias@isgeek.info
> SIP VoIP: sip:17474537254@proxy01.sipphone.com
>
Comment 8 David Heidelberg (okias) 2010-01-28 23:55:44 UTC
I'm afraid it freezed even without HIGHMEM. I left computer run
(compilation KDE 4.3.95) and when I come back, lcd was in dpms mode
and led was blinking. So it look like same problem. Maybe without
HIGHMEM it takes longer until problem appear.

2010/1/26, okias <d.okias@gmail.com>:
> Lastest git without HIGHMEM look good. If no problems occur, then I
> try use kernel with HIGHMEM and see what change...
>
> 2010/1/25, okias <d.okias@gmail.com>:
>> Okey, I try without highmem soon.
>>
>> 2010/1/25, Thomas Gleixner <tglx@linutronix.de>:
>>> Switched to email. Please reply to all instead of using the bugzilla
>>> interface.
>>>
>>>> --- Comment #4 from okias <d.okias@gmail.com>  2010-01-22 10:17:25 ---
>>>> and it's regression. Now I work on 2.6.32.3 and no problem.
>>>
>>> That's a really weird one. The system is 50 min up and running and out
>>> of the blue it crashes in clockevents_program_event(). This function
>>> has been called a couple of thousand times before that point.
>>>
>>> The only way to crash there is when *dev is pointing into nirwana. dev
>>> comes from
>>>
>>> int tick_program_event(ktime_t expires, int force)
>>> {
>>>         struct clock_event_device *dev =
>>> __get_cpu_var(tick_cpu_device).evtdev;
>>>
>>> according to the callchain. At this point nothing fiddles with
>>> tick_cpu_device.evtdev, so I suspect some really nasty memory
>>> corruption going on.
>>>
>>> okias, can you please disable highmem support and verify whether the
>>> problem persists ?
>>>
>>> Thanks,
>>>
>>>     tglx
>>>
>>
>>
>> --
>> Jabber/XMPP: okias@isgeek.info
>> SIP VoIP: sip:17474537254@proxy01.sipphone.com
>>
>
>
> --
> Jabber/XMPP: okias@isgeek.info
> SIP VoIP: sip:17474537254@proxy01.sipphone.com
>
Comment 9 Rafael J. Wysocki 2010-02-02 20:44:11 UTC
On Monday 01 February 2010, okias wrote:
> Still valid. But reproduction without HIGHMEM is minimal, but still annoying.
> 
> 2010/2/1, Rafael J. Wysocki <rjw@sisk.pl>:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.32.  Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=15076
> > Subject             : System panic under load with
> clockevents_program_event
> > Submitter   : okias <d.okias@gmail.com>
> > Date                : 2010-01-17 13:03 (15 days old)
Comment 10 Florian Mickler 2012-03-13 23:06:48 UTC
Is this still a problem on the current mainline kernel (3.2)?
Comment 11 David Heidelberg (okias) 2012-03-14 10:01:12 UTC
Can't test, no longer using that machine.