Bug 15076 - System panic under load with clockevents_program_event
Summary: System panic under load with clockevents_program_event
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: i386 (show other bugs)
Hardware: All Linux
: P1 high
Assignee: platform_i386
URL:
Keywords:
Depends on:
Blocks: 14885
  Show dependency tree
 
Reported: 2010-01-17 13:03 UTC by David Heidelberg (okias)
Modified: 2012-03-15 00:09 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.33-rc4-git4
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
screen_part1.jpg (562.49 KB, image/jpeg)
2010-01-17 13:03 UTC, David Heidelberg (okias)
Details
screen_part2.jpg (591.62 KB, image/jpeg)
2010-01-17 13:04 UTC, David Heidelberg (okias)
Details
dmesg (44.79 KB, text/plain)
2010-01-17 13:08 UTC, David Heidelberg (okias)
Details
kernel_config (61.66 KB, text/plain)
2010-01-17 13:09 UTC, David Heidelberg (okias)
Details

Description David Heidelberg (okias) 2010-01-17 13:03:38 UTC
Created attachment 24603 [details]
screen_part1.jpg

This issue can trigger for example compilation of boost. Appering screenshot. 
If is needed, can test patches.
Comment 1 David Heidelberg (okias) 2010-01-17 13:04:36 UTC
Created attachment 24604 [details]
screen_part2.jpg
Comment 2 David Heidelberg (okias) 2010-01-17 13:08:38 UTC
Created attachment 24605 [details]
dmesg

dmesg, without panic
Comment 3 David Heidelberg (okias) 2010-01-17 13:09:24 UTC
Created attachment 24606 [details]
kernel_config

Kernel configuration
Comment 4 David Heidelberg (okias) 2010-01-22 10:17:25 UTC
and it's regression. Now I work on 2.6.32.3 and no problem.
Comment 5 Thomas Gleixner 2010-01-25 08:46:14 UTC
Switched to email. Please reply to all instead of using the bugzilla
interface.

> --- Comment #4 from okias <d.okias@gmail.com>  2010-01-22 10:17:25 ---
> and it's regression. Now I work on 2.6.32.3 and no problem.

That's a really weird one. The system is 50 min up and running and out
of the blue it crashes in clockevents_program_event(). This function
has been called a couple of thousand times before that point.

The only way to crash there is when *dev is pointing into nirwana. dev
comes from

int tick_program_event(ktime_t expires, int force)
{
        struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;

according to the callchain. At this point nothing fiddles with
tick_cpu_device.evtdev, so I suspect some really nasty memory
corruption going on.

okias, can you please disable highmem support and verify whether the
problem persists ?

Thanks,

	tglx
Comment 6 David Heidelberg (okias) 2010-01-25 11:33:12 UTC
Okey, I try without highmem soon.

2010/1/25, Thomas Gleixner <tglx@linutronix.de>:
> Switched to email. Please reply to all instead of using the bugzilla
> interface.
>
>> --- Comment #4 from okias <d.okias@gmail.com>  2010-01-22 10:17:25 ---
>> and it's regression. Now I work on 2.6.32.3 and no problem.
>
> That's a really weird one. The system is 50 min up and running and out
> of the blue it crashes in clockevents_program_event(). This function
> has been called a couple of thousand times before that point.
>
> The only way to crash there is when *dev is pointing into nirwana. dev
> comes from
>
> int tick_program_event(ktime_t expires, int force)
> {
>         struct clock_event_device *dev =
> __get_cpu_var(tick_cpu_device).evtdev;
>
> according to the callchain. At this point nothing fiddles with
> tick_cpu_device.evtdev, so I suspect some really nasty memory
> corruption going on.
>
> okias, can you please disable highmem support and verify whether the
> problem persists ?
>
> Thanks,
>
>       tglx
>
Comment 7 David Heidelberg (okias) 2010-01-26 14:07:47 UTC
Lastest git without HIGHMEM look good. If no problems occur, then I
try use kernel with HIGHMEM and see what change...

2010/1/25, okias <d.okias@gmail.com>:
> Okey, I try without highmem soon.
>
> 2010/1/25, Thomas Gleixner <tglx@linutronix.de>:
>> Switched to email. Please reply to all instead of using the bugzilla
>> interface.
>>
>>> --- Comment #4 from okias <d.okias@gmail.com>  2010-01-22 10:17:25 ---
>>> and it's regression. Now I work on 2.6.32.3 and no problem.
>>
>> That's a really weird one. The system is 50 min up and running and out
>> of the blue it crashes in clockevents_program_event(). This function
>> has been called a couple of thousand times before that point.
>>
>> The only way to crash there is when *dev is pointing into nirwana. dev
>> comes from
>>
>> int tick_program_event(ktime_t expires, int force)
>> {
>>         struct clock_event_device *dev =
>> __get_cpu_var(tick_cpu_device).evtdev;
>>
>> according to the callchain. At this point nothing fiddles with
>> tick_cpu_device.evtdev, so I suspect some really nasty memory
>> corruption going on.
>>
>> okias, can you please disable highmem support and verify whether the
>> problem persists ?
>>
>> Thanks,
>>
>>      tglx
>>
>
>
> --
> Jabber/XMPP: okias@isgeek.info
> SIP VoIP: sip:17474537254@proxy01.sipphone.com
>
Comment 8 David Heidelberg (okias) 2010-01-28 23:55:44 UTC
I'm afraid it freezed even without HIGHMEM. I left computer run
(compilation KDE 4.3.95) and when I come back, lcd was in dpms mode
and led was blinking. So it look like same problem. Maybe without
HIGHMEM it takes longer until problem appear.

2010/1/26, okias <d.okias@gmail.com>:
> Lastest git without HIGHMEM look good. If no problems occur, then I
> try use kernel with HIGHMEM and see what change...
>
> 2010/1/25, okias <d.okias@gmail.com>:
>> Okey, I try without highmem soon.
>>
>> 2010/1/25, Thomas Gleixner <tglx@linutronix.de>:
>>> Switched to email. Please reply to all instead of using the bugzilla
>>> interface.
>>>
>>>> --- Comment #4 from okias <d.okias@gmail.com>  2010-01-22 10:17:25 ---
>>>> and it's regression. Now I work on 2.6.32.3 and no problem.
>>>
>>> That's a really weird one. The system is 50 min up and running and out
>>> of the blue it crashes in clockevents_program_event(). This function
>>> has been called a couple of thousand times before that point.
>>>
>>> The only way to crash there is when *dev is pointing into nirwana. dev
>>> comes from
>>>
>>> int tick_program_event(ktime_t expires, int force)
>>> {
>>>         struct clock_event_device *dev =
>>> __get_cpu_var(tick_cpu_device).evtdev;
>>>
>>> according to the callchain. At this point nothing fiddles with
>>> tick_cpu_device.evtdev, so I suspect some really nasty memory
>>> corruption going on.
>>>
>>> okias, can you please disable highmem support and verify whether the
>>> problem persists ?
>>>
>>> Thanks,
>>>
>>>     tglx
>>>
>>
>>
>> --
>> Jabber/XMPP: okias@isgeek.info
>> SIP VoIP: sip:17474537254@proxy01.sipphone.com
>>
>
>
> --
> Jabber/XMPP: okias@isgeek.info
> SIP VoIP: sip:17474537254@proxy01.sipphone.com
>
Comment 9 Rafael J. Wysocki 2010-02-02 20:44:11 UTC
On Monday 01 February 2010, okias wrote:
> Still valid. But reproduction without HIGHMEM is minimal, but still annoying.
> 
> 2010/2/1, Rafael J. Wysocki <rjw@sisk.pl>:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.32.  Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=15076
> > Subject             : System panic under load with
> clockevents_program_event
> > Submitter   : okias <d.okias@gmail.com>
> > Date                : 2010-01-17 13:03 (15 days old)
Comment 10 Florian Mickler 2012-03-13 23:06:48 UTC
Is this still a problem on the current mainline kernel (3.2)?
Comment 11 David Heidelberg (okias) 2012-03-14 10:01:12 UTC
Can't test, no longer using that machine.

Note You need to log in before you can comment on or make changes to this bug.