Bug 10123

Summary: No power-off / reboot with 2.6.25-rcX (up to -rc3) kernels
Product: Drivers Reporter: Guennadi Liakhovetski (g.liakhovetski)
Component: PCIAssignee: Greg Kroah-Hartman (greg)
Status: CLOSED CODE_FIX    
Severity: high CC: acpi-bugzilla, akpm, bunk, greg, rui.zhang
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.25-rc3 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 9832, 56331    
Attachments: Serial console log of a coomplete boot, "halt", several sysrq-outputs
Kernel configuration for the console log

Description Guennadi Liakhovetski 2008-02-27 08:15:07 UTC
Latest working kernel version: 2.6.24-rc2
Earliest failing kernel version: 2,6-25-rc0
Distribution: Debian etch
Hardware Environment: Compaq AP400 2 x Pentium-II @ 400MHz
Software Environment: Kernel
Problem Description: Power off / reboot blink keyboard LEDs and leave the system at 

[sdb] synchronizing SCSI cache
[sda] synchronizing SCSI cache

Notice, this machine is known for ACPI bugs, but up to 2.6.24.2 power-off / reboot worked.

Steps to reproduce: issue "halt" or "reboot"
Comment 1 Guennadi Liakhovetski 2008-02-27 08:19:38 UTC
Interestingly, sysrq-b does reboot the machine, whereas sysrq-o doen't power it off
Comment 2 Rafael J. Wysocki 2008-02-27 11:36:09 UTC
Is it possible to get a serial console log from this box?
Comment 3 Guennadi Liakhovetski 2008-02-27 12:08:35 UTC
(In reply to comment #2)
> Is it possible to get a serial console log from this box?

Rafael, log of what? For the startup I can provide the complete dmesg without a serial console. For power-off... Well, as I said, there isn't much, everything looks quite usual, just normal shutdown messages. In reboot case the last one is something like "going to reset now", then come the two "synchronizing SCSI cache" and there it just stays. This is what I see on the monitor. Or do you think there would be more on the serial console? I could produce sysrq dumps at that point, yes. Do you want them? The serial console would be a ittle bit difficult.
Comment 4 Guennadi Liakhovetski 2008-02-28 03:12:37 UTC
Created attachment 15057 [details]
Serial console log of a coomplete boot, "halt", several sysrq-outputs

The log is of a 2.6.25-rc3 kernel with the http://lkml.org/lkml/2008/2/27/150 patch applied
Comment 5 Guennadi Liakhovetski 2008-02-28 03:14:36 UTC
Created attachment 15058 [details]
Kernel configuration for the console log
Comment 6 Zhang Rui 2008-02-28 22:19:22 UTC
As this is a regression, it would be great if you can use git bisect to narrow down the problem.
Comment 7 Guennadi Liakhovetski 2008-02-29 11:30:11 UTC
git-bisect pointed out to this one:

commit fd7d1ced29e5beb88c9068801da7a362606d8273
Author: Greg Kroah-Hartman <gregkh@suse.de>
Date:   Tue May 22 22:47:54 2007 -0400

    PCI: make pci_bus a struct device


It also introduces these two errors:

kobject (c7c70500): tried to init an initialized object, something is seriously wrong.
Pid: 1, comm: swapper Not tainted 2.6.24-testpm #21
 [<c01e78d9>] kobject_init+0x89/0x90
 [<c024dc0e>] device_initialize+0x1e/0x90
 [<c024e50b>] device_register+0xb/0x20
 [<c01f14e8>] pci_bus_add_devices+0x98/0x140
 [<c03a6fd7>] ? pcibios_scan_root+0x27/0xa0
 [<c04d9aa0>] pci_legacy_init+0x50/0xf0
 [<c04bd5d2>] kernel_init+0x142/0x320
 [<c01031ba>] ? ret_from_fork+0x6/0x1c
 [<c04bd490>] ? kernel_init+0x0/0x320
 [<c04bd490>] ? kernel_init+0x0/0x320
 [<c0103ebb>] kernel_thread_helper+0x7/0x1c
 =======================
sysfs: duplicate filename '0000:01' can not be created
------------[ cut here ]------------
WARNING: at fs/sysfs/dir.c:424 sysfs_add_one+0x80/0xa0()
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.24-testpm #21
 [<c0120804>] warn_on_slowpath+0x54/0x70
 [<c0120eac>] ? __call_console_drivers+0x5c/0x70
 [<c03b0001>] ? _spin_unlock_irqrestore+0x11/0x30
 [<c0121169>] ? release_console_sem+0x1d9/0x1f0
 [<c01215ed>] ? vprintk+0x2ad/0x3b0
 [<c01e6f99>] ? ida_get_new_above+0x89/0x180
 [<c01a8610>] ? sysfs_ilookup_test+0x0/0x20
 [<c03aff5f>] ? _spin_unlock+0xf/0x30
 [<c017fbfa>] ? ifind+0x8a/0x90
 [<c012170b>] ? printk+0x1b/0x20
 [<c01a89a0>] sysfs_add_one+0x80/0xa0
 [<c01a8ed9>] create_dir+0x49/0x90
 [<c01a8f4b>] sysfs_create_dir+0x2b/0x50
 [<c01e7bce>] kobject_add_internal+0xae/0x190
 [<c01e7d2d>] ? kobject_set_name_vargs+0x2d/0x40
 [<c01e7d2d>] ? kobject_set_name_vargs+0x2d/0x40
 [<c01e7d8f>] kobject_add_varg+0x4f/0x60
 [<c01e807f>] kobject_add+0x2f/0x60
 [<c01e7a92>] ? kobject_get+0x12/0x20
 [<c024e09c>] device_add+0x8c/0x4f0
 [<c01e87ad>] ? kref_init+0xd/0x10
 [<c01e787b>] ? kobject_init+0x2b/0x90
 [<c024e512>] device_register+0x12/0x20
 [<c01f14e8>] pci_bus_add_devices+0x98/0x140
 [<c03a6fd7>] ? pcibios_scan_root+0x27/0xa0
 [<c04d9aa0>] pci_legacy_init+0x50/0xf0
 [<c04bd5d2>] kernel_init+0x142/0x320
 [<c01031ba>] ? ret_from_fork+0x6/0x1c
 [<c04bd490>] ? kernel_init+0x0/0x320
 [<c04bd490>] ? kernel_init+0x0/0x320
 [<c0103ebb>] kernel_thread_helper+0x7/0x1c
 =======================
---[ end trace ca143223eefdc828 ]---
Comment 8 Rafael J. Wysocki 2008-03-10 15:34:17 UTC
Handled-By : Gautham R Shenoy <ego@in.ibm.com>
Patch : http://lkml.org/lkml/2008/3/10/91
References : http://lkml.org/lkml/2008/3/10/340

Guennadi, can you check if this patch fixes the issue for you?
Comment 9 Guennadi Liakhovetski 2008-03-10 16:45:13 UTC
(In reply to comment #8)
> Handled-By : Gautham R Shenoy <ego@in.ibm.com>
> Patch : http://lkml.org/lkml/2008/3/10/91
> References : http://lkml.org/lkml/2008/3/10/340
> 
> Guennadi, can you check if this patch fixes the issue for you?

Sorry, firstly, neither CPU hotplug, nor -rt is used on that system. Secondly, as you can see from my comment to the bisection results, reverting the problematic commit fixes the power-off / reboot problem AND startup warnings, which occur long before root mount, INIT start, i.e., at a time, when relatively little scheduling happens. Thirdly, the comment in sched.c next to your fix and your patch description do not seem relevant either. So, if you still think this your patch might fix this regression, I'll test it, otherwise, I fail to see how it can be relevant. Please let me know if I'm missing something.

Thanks
Guennadi
Comment 10 Rafael J. Wysocki 2008-03-11 15:30:34 UTC
Okay, so we seem to have two different poweroff-related regressions, one of which  is fixed by the patch in Comment #8 and the other by reverting the commit identified in Comment #7.
Comment 11 Rafael J. Wysocki 2008-03-13 14:07:25 UTC
Handled-By : Greg KH <greg@kroah.com>
Patch : http://lkml.org/lkml/2008/3/13/267