Since I switched to 2.6.30 (around -rc4), I have been experiencing hard system hangs (BIOS driven keys not working) during suspend to RAM. These were more or less nondeterministic — I can't provide a series of steps that reproduce it.
With PM debugging turned on, I observed that the system hangs when or very soon after the following msgs are printed:
ieee80211 phy0: legacy class suspend
PM: Removing info for No Bus:hw_random
ssb ssb1:1: legacy suspend
b44 ssb1:0: legacy suspend
I can confirm that rmmodding b44 and b43 before doing echo mem >/sys/power/state fixes the issue completely — I haven't had a single problem since I deployed this workaround. I remove both of those because I observed one hang just after something like "b44: disabling PHY", but then I moved to another place and debugged without an ethernet connection, so the other hangs were (probably) caused by b43 only. If neither b44 nor b43 had been active (as in "have connection", modules alone can be loaded) during system uptime, no hangs occur — that is the only reproduction hint I have.
I have a HP Compaq nx7300 notebook. lspci follows:
00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT Express Memory Controller Hub (rev 03)
00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller (rev 03)
00:02.1 Display controller: Intel Corporation Mobile 945GM/GMS/GME, 943/940GML Express Integrated Graphics Controller (rev 03)
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 01)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01)
00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2 (rev 01)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #3 (rev 01)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #4 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 01)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01)
00:1f.2 SATA controller: Intel Corporation 82801GBM/GHM (ICH7 Family) SATA AHCI Controller (rev 01)
02:06.0 CardBus bridge: Texas Instruments PCIxx12 Cardbus Controller
02:06.1 FireWire (IEEE 1394): Texas Instruments PCIxx12 OHCI Compliant IEEE 1394 Host Controller
02:0e.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX (rev 02)
10:00.0 Network controller: Broadcom Corporation BCM4311 802.11b/g WLAN (rev 01)
I will try to attach dmesg output from one of the suspend/resume cycles that didn't go wrong, so that the order of actions is known.
Created attachment 21406 [details]
It looks like the problem is related to ssb.
Since you're able to reproduce the problem readily, is there a chance to carry out a bisection of commits between 2.6.29 and the first known bad kernel?
That is a little misunderstanding — I'm not able to reproduce it reliably, yet. I may try to find a reproducer and do a bisection, but I can't promise that very soon, sorry :/.
Sorry, I should have read you report more carefully.
At the moment I have no idea what the root cause of this problem is.
1. Can you please double check that removing the b43 and b44 modules makes the problem go away?
2. Please attach /proc/iomem and /proc/ioports from your system.
1. I'm pretty sure. I modified my suspend script to rmmod b44 and b43 and had no single suspend failure since.
[tomi@notes ~]$ cat /proc/iomem
00000000-0009fbff : System RAM
0009fc00-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000c7fff : Video ROM
000e0000-000fffff : reserved
000f0000-000fffff : System ROM
00100000-9f7cffff : System RAM
00100000-004b40dd : Kernel code
004b40de-0063ab37 : Kernel data
0069f000-0072a72b : Kernel bss
01000000-08ffffff : Crash kernel
9f7d0000-9f7e55ff : reserved
9f7e5600-9f7f7fff : ACPI Non-volatile Storage
9f7f8000-9f7fffff : reserved
a0000000-a3ffffff : PCI Bus 0000:02
a0000000-a3ffffff : PCI CardBus 0000:03
a4000000-a7ffffff : PCI CardBus 0000:03
e0000000-efffffff : 0000:00:02.0
f4000000-f40fffff : PCI Bus 0000:10
f4000000-f4003fff : 0000:10:00.0
f4000000-f4003fff : 0000:10:00.0
f4100000-f43fffff : PCI Bus 0000:02
f4100000-f4100fff : 0000:02:06.0
f4100000-f4100fff : yenta_socket
f4101000-f41017ff : 0000:02:06.1
f4101000-f41017ff : ohci1394
f4104000-f4107fff : 0000:02:06.1
f4108000-f4109fff : 0000:02:0e.0
f4108000-f4109fff : 0000:02:0e.0
f4400000-f447ffff : 0000:00:02.0
f4480000-f44bffff : 0000:00:02.0
f4500000-f457ffff : 0000:00:02.1
f4580000-f4583fff : 0000:00:1b.0
f4580000-f4583fff : ICH HD audio
f4584000-f45843ff : 0000:00:1d.7
f4584000-f45843ff : ehci_hcd
f4585000-f45853ff : 0000:00:1f.2
f4585000-f45853ff : ahci
f8000000-fbffffff : PCI MMCONFIG 0 [00-3f]
f8000000-fbffffff : pnp 00:0a
fec00000-fec00fff : IOAPIC 0
fec00000-fec00fff : reserved
fec00000-fec000ff : pnp 00:0a
fed00000-fed003ff : HPET 0
fed20000-fed9afff : reserved
fed20000-fed3ffff : pnp 00:0a
fed45000-fed8ffff : pnp 00:0a
fed90000-fed9afff : pnp 00:0a
feda0000-fedbffff : reserved
feda0000-fedbffff : pnp 00:0b
fee00000-fee00fff : Local APIC
fee00000-fee00fff : reserved
fee00000-fee00fff : pnp 00:0b
ffb00000-ffbfffff : reserved
ffb00000-ffbfffff : pnp 00:09
fff00000-ffffffff : reserved
fff00000-ffffffff : pnp 00:09
[tomi@notes ~]$ cat /proc/ioports
0000-001f : dma1
0020-0021 : pic1
0040-0043 : timer0
0050-0053 : timer1
0060-0060 : keyboard
0064-0064 : keyboard
0070-0071 : rtc0
0080-008f : dma page reg
00a0-00a1 : pic2
00c0-00df : dma2
00f0-00ff : fpu
0170-0177 : 0000:00:1f.1
0170-0177 : ata_piix
01f0-01f7 : 0000:00:1f.1
01f0-01f7 : ata_piix
0376-0376 : 0000:00:1f.1
0376-0376 : ata_piix
03c0-03df : vga+
03f6-03f6 : 0000:00:1f.1
03f6-03f6 : ata_piix
04d0-04d1 : pnp 00:0a
0500-057f : pnp 00:09
0800-080f : pnp 00:09
0cf8-0cff : PCI conf1
1000-107f : 0000:00:1f.0
1000-107f : pnp 00:0a
1000-1003 : ACPI PM1a_EVT_BLK
1004-1005 : ACPI PM1a_CNT_BLK
1008-100b : ACPI PM_TMR
1010-1015 : ACPI CPU throttle
1020-1020 : ACPI PM2_CNT_BLK
1028-102f : ACPI GPE0_BLK
1100-113f : 0000:00:1f.0
1100-113f : pnp 00:0a
1200-121f : pnp 00:0a
1370-1377 : 0000:00:1f.2
1370-1377 : ahci
13f0-13f7 : 0000:00:1f.2
13f0-13f7 : ahci
1574-1577 : 0000:00:1f.2
1574-1577 : ahci
15f4-15f7 : 0000:00:1f.2
15f4-15f7 : ahci
2000-2fff : PCI Bus 0000:02
2000-20ff : PCI CardBus 0000:03
2400-24ff : PCI CardBus 0000:03
4000-4007 : 0000:00:02.0
4020-403f : 0000:00:1d.0
4020-403f : uhci_hcd
4040-405f : 0000:00:1d.1
4040-405f : uhci_hcd
4060-407f : 0000:00:1d.2
4060-407f : uhci_hcd
4080-409f : 0000:00:1d.3
4080-409f : uhci_hcd
40a0-40af : 0000:00:1f.1
40a0-40af : ata_piix
40d0-40df : 0000:00:1f.2
40d0-40df : ahci
Do you mean that the suspend/resume can work well if the b43/b44 module is unloaded? Right?
Will you please enable "CONFIG_PM_DEBUG" in kernel configuration and do the following test to confirm whether the system can be resumed from the suspended state?
a. kill the process using /proc/acpi/event
b. echo device > /sys/power/pm_test
c. echo mem > /sys/power/state; dmesg >dmesg_after_device;
d. echo core/cpu/platform >/sys/power/pm_test and do the test in step C.
(After echo mem > /sys/power/state, it will enter the sleeping state. And it is unnecessary to press the power button. It will wait for five seconds and resume from the suspend state).
BTW: please assure that B43/b44 driver is loaded while doing the above test.
Is this still relevant given what was said in this thread: http://thread.gmane.org/gmane.linux.power-management.general/15119/focus=5694 ?
Thanks for the info.
From the info mentioned in comment #7 it seems that this issue can be fixed by the patch.
So IMO this bug can be marked as resolved.
From the info in comment #7 it seems that this issue is related with the b43/b44 driver. And it is fixed by the patch mentioned in comment #7.
So this bug will be marked as resolved.
Handled-By : Johannes Berg <email@example.com>
Patch : http://patchwork.kernel.org/patch/37837/
Fixed by commit 89c3a8aca28e6d57f2ae945d97858a372d624b81 .
Will this be included in the next .30 stable?