Bug 15005

Summary: Segmentation fault when shutting down
Product: ACPI Reporter: Martin Bammer (mrb74)
Component: Power-OffAssignee: acpi_power-off
Status: CLOSED PATCH_ALREADY_AVAILABLE    
Severity: blocking CC: hellvis69, lenb, n6150
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.32.3 Tree: Mainline
Regression: Yes
Attachments: Screenshot of segmentation fault
Another screenshot. Appears after a few minutes.

Description Martin Bammer 2010-01-07 20:59:01 UTC
Created attachment 24479 [details]
Screenshot of segmentation fault

Currently compiled newest kernel 2.6.32.3 for Atom based netbook (datacask Jupiter 1014a). Compiled the kernel with optimizations for Atom CPUs and also with optimizations for i586.
When shutting down the system the kernel produces a segmentation fault at the end of the shutdown process. Powering off fails. Both kernel fail with the same error.
Last working kernel was 2.6.32.2.
Comment 1 Martin Bammer 2010-01-07 21:01:08 UTC
Created attachment 24480 [details]
Another screenshot. Appears after a few minutes.
Comment 2 Andrew Morton 2010-01-12 22:07:57 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Thu, 7 Jan 2010 20:59:20 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=15005
> 
>            Summary: Segmentation fault when shutting down
>            Product: ACPI
>            Version: 2.5
>     Kernel Version: 2.6.32.3
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: blocking
>           Priority: P1
>          Component: Power-Off
>         AssignedTo: acpi_power-off@kernel-bugs.osdl.org
>         ReportedBy: mrb74@gmx.at
>         Regression: Yes
> 
> 
> Created an attachment (id=24479)
>  --> (http://bugzilla.kernel.org/attachment.cgi?id=24479)
> Screenshot of segmentation fault
> 
> Currently compiled newest kernel 2.6.32.3 for Atom based netbook (datacask
> Jupiter 1014a). Compiled the kernel with optimizations for Atom CPUs and also
> with optimizations for i586.
> When shutting down the system the kernel produces a segmentation fault at the
> end of the shutdown process. Powering off fails. Both kernel fail with the
> same
> error.
> Last working kernel was 2.6.32.2.
> 

It's a shutdown-time oops in clockevents_notify().  A 2.6.32.2 ->
2.6.32.3 regression.

2.6.32.3 included this prime suspect:

: commit fa3f5a5c1c8e6a2cbc7e21755ea7c215f8cf0577
: Author: Thomas Gleixner <tglx@linutronix.de>
: Date:   Thu Dec 10 15:35:10 2009 +0100
: 
:     clockevents: Prevent clockevent_devices list corruption on cpu hotplug
:     
:     commit bb6eddf7676e1c1f3e637aa93c5224488d99036f upstream.
: 
: which

So we may well have the same regression in 2.6.33-rcX.

Martin, can you please check whether the below revert fixes things up?

Thanks.


From: Andrew Morton <akpm@linux-foundation.org>

Revert

: commit bb6eddf7676e1c1f3e637aa93c5224488d99036f
: Author:     Thomas Gleixner <tglx@linutronix.de>
: AuthorDate: Thu Dec 10 15:35:10 2009 +0100
: Commit:     Thomas Gleixner <tglx@linutronix.de>
: CommitDate: Fri Dec 11 10:28:08 2009 +0100
: 
:     clockevents: Prevent clockevent_devices list corruption on cpu hotplug

due to the regression reported in
http://bugzilla.kernel.org/show_bug.cgi?id=15005

Cc: Xiaotian Feng <dfeng@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Reported-by: Martin Bammer <mrb74@gmx.at>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 kernel/time/clockevents.c |   18 +++---------------
 1 file changed, 3 insertions(+), 15 deletions(-)

diff -puN kernel/time/clockevents.c~revert-clockevents-prevent-clockevent_devices-list-corruption-on-cpu-hotplug kernel/time/clockevents.c
--- a/kernel/time/clockevents.c~revert-clockevents-prevent-clockevent_devices-list-corruption-on-cpu-hotplug
+++ a/kernel/time/clockevents.c
@@ -238,9 +238,8 @@ void clockevents_exchange_device(struct 
  */
 void clockevents_notify(unsigned long reason, void *arg)
 {
-	struct clock_event_device *dev, *tmp;
+	struct list_head *node, *tmp;
 	unsigned long flags;
-	int cpu;
 
 	raw_spin_lock_irqsave(&clockevents_lock, flags);
 	clockevents_do_notify(reason, arg);
@@ -251,19 +250,8 @@ void clockevents_notify(unsigned long re
 		 * Unregister the clock event devices which were
 		 * released from the users in the notify chain.
 		 */
-		list_for_each_entry_safe(dev, tmp, &clockevents_released, list)
-			list_del(&dev->list);
-		/*
-		 * Now check whether the CPU has left unused per cpu devices
-		 */
-		cpu = *((int *)arg);
-		list_for_each_entry_safe(dev, tmp, &clockevent_devices, list) {
-			if (cpumask_test_cpu(cpu, dev->cpumask) &&
-			    cpumask_weight(dev->cpumask) == 1) {
-				BUG_ON(dev->mode != CLOCK_EVT_MODE_UNUSED);
-				list_del(&dev->list);
-			}
-		}
+		list_for_each_safe(node, tmp, &clockevents_released)
+			list_del(node);
 		break;
 	default:
 		break;
_
Comment 3 Xiaotian Feng 2010-01-13 02:07:17 UTC
I'm not sure, but I can't see the start of the seg fault. Is it a kernel 
NULL pointer deref or trigger the BUG?
Mind to try patch in http://bugzilla.kernel.org/show_bug.cgi?id=15037?



On 01/13/2010 06:07 AM, Andrew Morton wrote:
>
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
>
> On Thu, 7 Jan 2010 20:59:20 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
>
>> http://bugzilla.kernel.org/show_bug.cgi?id=15005
>>
>>             Summary: Segmentation fault when shutting down
>>             Product: ACPI
>>             Version: 2.5
>>      Kernel Version: 2.6.32.3
>>            Platform: All
>>          OS/Version: Linux
>>                Tree: Mainline
>>              Status: NEW
>>            Severity: blocking
>>            Priority: P1
>>           Component: Power-Off
>>          AssignedTo: acpi_power-off@kernel-bugs.osdl.org
>>          ReportedBy: mrb74@gmx.at
>>          Regression: Yes
>>
>>
>> Created an attachment (id=24479)
>>   -->  (http://bugzilla.kernel.org/attachment.cgi?id=24479)
>> Screenshot of segmentation fault
>>
>> Currently compiled newest kernel 2.6.32.3 for Atom based netbook (datacask
>> Jupiter 1014a). Compiled the kernel with optimizations for Atom CPUs and
>> also
>> with optimizations for i586.
>> When shutting down the system the kernel produces a segmentation fault at
>> the
>> end of the shutdown process. Powering off fails. Both kernel fail with the
>> same
>> error.
>> Last working kernel was 2.6.32.2.
>>
>
> It's a shutdown-time oops in clockevents_notify().  A 2.6.32.2 ->
> 2.6.32.3 regression.
>
> 2.6.32.3 included this prime suspect:
>
> : commit fa3f5a5c1c8e6a2cbc7e21755ea7c215f8cf0577
> : Author: Thomas Gleixner<tglx@linutronix.de>
> : Date:   Thu Dec 10 15:35:10 2009 +0100
> :
> :     clockevents: Prevent clockevent_devices list corruption on cpu hotplug
> :
> :     commit bb6eddf7676e1c1f3e637aa93c5224488d99036f upstream.
> :
> : which
>
> So we may well have the same regression in 2.6.33-rcX.
>
> Martin, can you please check whether the below revert fixes things up?
>
> Thanks.
>
>
> From: Andrew Morton<akpm@linux-foundation.org>
>
> Revert
>
> : commit bb6eddf7676e1c1f3e637aa93c5224488d99036f
> : Author:     Thomas Gleixner<tglx@linutronix.de>
> : AuthorDate: Thu Dec 10 15:35:10 2009 +0100
> : Commit:     Thomas Gleixner<tglx@linutronix.de>
> : CommitDate: Fri Dec 11 10:28:08 2009 +0100
> :
> :     clockevents: Prevent clockevent_devices list corruption on cpu hotplug
>
> due to the regression reported in
> http://bugzilla.kernel.org/show_bug.cgi?id=15005
>
> Cc: Xiaotian Feng<dfeng@redhat.com>
> Cc: Thomas Gleixner<tglx@linutronix.de>
> Cc: "Rafael J. Wysocki"<rjw@sisk.pl>
> Reported-by: Martin Bammer<mrb74@gmx.at>
> Cc:<stable@kernel.org>
> Signed-off-by: Andrew Morton<akpm@linux-foundation.org>
> ---
>
>   kernel/time/clockevents.c |   18 +++---------------
>   1 file changed, 3 insertions(+), 15 deletions(-)
>
> diff -puN
> kernel/time/clockevents.c~revert-clockevents-prevent-clockevent_devices-list-corruption-on-cpu-hotplug
> kernel/time/clockevents.c
> ---
> a/kernel/time/clockevents.c~revert-clockevents-prevent-clockevent_devices-list-corruption-on-cpu-hotplug
> +++ a/kernel/time/clockevents.c
> @@ -238,9 +238,8 @@ void clockevents_exchange_device(struct
>    */
>   void clockevents_notify(unsigned long reason, void *arg)
>   {
> -     struct clock_event_device *dev, *tmp;
> +     struct list_head *node, *tmp;
>       unsigned long flags;
> -     int cpu;
>
>       raw_spin_lock_irqsave(&clockevents_lock, flags);
>       clockevents_do_notify(reason, arg);
> @@ -251,19 +250,8 @@ void clockevents_notify(unsigned long re
>                * Unregister the clock event devices which were
>                * released from the users in the notify chain.
>                */
> -             list_for_each_entry_safe(dev, tmp,&clockevents_released, list)
> -                     list_del(&dev->list);
> -             /*
> -              * Now check whether the CPU has left unused per cpu devices
> -              */
> -             cpu = *((int *)arg);
> -             list_for_each_entry_safe(dev, tmp,&clockevent_devices, list) {
> -                     if (cpumask_test_cpu(cpu, dev->cpumask)&&
> -                         cpumask_weight(dev->cpumask) == 1) {
> -                             BUG_ON(dev->mode != CLOCK_EVT_MODE_UNUSED);
> -                             list_del(&dev->list);
> -                     }
> -             }
> +             list_for_each_safe(node, tmp,&clockevents_released)
> +                     list_del(node);
>               break;
>       default:
>               break;
> _
>
>
Comment 4 Zhang Rui 2010-01-13 08:43:43 UTC
*** Bug 15028 has been marked as a duplicate of this bug. ***
Comment 5 Martin Bammer 2010-01-13 09:27:16 UTC
Yes this patch has resolved the problem. Thx!
Comment 6 Zhang Rui 2010-01-26 05:38:36 UTC
*** Bug 15028 has been marked as a duplicate of this bug. ***
Comment 7 Elvio Basello 2010-01-26 14:51:11 UTC
This issue is still affecting me on 2.6.32.5
Comment 8 Len Brown 2010-02-16 09:01:06 UTC
per comments in bug 15028,
this is fixed as of 2.6.32.7

closed.
Comment 9 john stultz 2011-05-02 20:49:06 UTC
*** Bug 15155 has been marked as a duplicate of this bug. ***