Bug 13337

Summary: [post 2.6.29 regression] hang during suspend of b44/b43 modules
Product: Power Management Reporter: Tomas Janousek (tomi)
Component: Hibernation/SuspendAssignee: ykzhao (yakui.zhao)
Severity: normal CC: alan, ozan, rjw, schwab, yakui.zhao
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.30 Tree: Mainline
Regression: Yes
Bug Depends on:    
Bug Blocks: 7216, 13070    
Attachments: dmesg

Description Tomas Janousek 2009-05-18 10:59:54 UTC
Since I switched to 2.6.30 (around -rc4), I have been experiencing hard system hangs (BIOS driven keys not working) during suspend to RAM. These were more or less nondeterministic — I can't provide a series of steps that reproduce it.

With PM debugging turned on, I observed that the system hangs when or very soon after the following msgs are printed:

ieee80211 phy0: legacy class suspend
PM: Removing info for No Bus:hw_random
ssb ssb1:1: legacy suspend
b44 ssb1:0: legacy suspend

I can confirm that rmmodding b44 and b43 before doing echo mem >/sys/power/state fixes the issue completely — I haven't had a single problem since I deployed this workaround. I remove both of those because I observed one hang just after something like "b44: disabling PHY", but then I moved to another place and debugged without an ethernet connection, so the other hangs were (probably) caused by b43 only. If neither b44 nor b43 had been active (as in "have connection", modules alone can be loaded) during system uptime, no hangs occur — that is the only reproduction hint I have.

I have a HP Compaq nx7300 notebook. lspci follows:
00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT Express Memory Controller Hub (rev 03)
00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller (rev 03)
00:02.1 Display controller: Intel Corporation Mobile 945GM/GMS/GME, 943/940GML Express Integrated Graphics Controller (rev 03)
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 01)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01)
00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2 (rev 01)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #3 (rev 01)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #4 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 01)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01)
00:1f.2 SATA controller: Intel Corporation 82801GBM/GHM (ICH7 Family) SATA AHCI Controller (rev 01)
02:06.0 CardBus bridge: Texas Instruments PCIxx12 Cardbus Controller
02:06.1 FireWire (IEEE 1394): Texas Instruments PCIxx12 OHCI Compliant IEEE 1394 Host Controller
02:0e.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX (rev 02)
10:00.0 Network controller: Broadcom Corporation BCM4311 802.11b/g WLAN (rev 01)

I will try to attach dmesg output from one of the suspend/resume cycles that didn't go wrong, so that the order of actions is known.
Comment 1 Tomas Janousek 2009-05-18 11:16:39 UTC
Created attachment 21406 [details]
Comment 2 Rafael J. Wysocki 2009-05-18 16:47:13 UTC
It looks like the problem is related to ssb.

Since you're able to reproduce the problem readily, is there a chance to carry out a bisection of commits between 2.6.29 and the first known bad kernel?
Comment 3 Tomas Janousek 2009-05-18 19:53:33 UTC
That is a little misunderstanding — I'm not able to reproduce it reliably, yet. I may try to find a reproducer and do a bisection, but I can't promise that very soon, sorry :/.
Comment 4 Rafael J. Wysocki 2009-05-19 22:12:32 UTC
Sorry, I should have read you report more carefully.

At the moment I have no idea what the root cause of this problem is.

1. Can you please double check that removing the b43 and b44 modules makes the problem go away?

2. Please attach /proc/iomem and /proc/ioports from your system.
Comment 5 Tomas Janousek 2009-05-19 23:24:26 UTC
1. I'm pretty sure. I modified my suspend script to rmmod b44 and b43 and had no single suspend failure since.
2. Follows:

[tomi@notes ~]$ cat /proc/iomem 
00000000-0009fbff : System RAM
0009fc00-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000c7fff : Video ROM
000e0000-000fffff : reserved
  000f0000-000fffff : System ROM
00100000-9f7cffff : System RAM
  00100000-004b40dd : Kernel code
  004b40de-0063ab37 : Kernel data
  0069f000-0072a72b : Kernel bss
  01000000-08ffffff : Crash kernel
9f7d0000-9f7e55ff : reserved
9f7e5600-9f7f7fff : ACPI Non-volatile Storage
9f7f8000-9f7fffff : reserved
a0000000-a3ffffff : PCI Bus 0000:02
  a0000000-a3ffffff : PCI CardBus 0000:03
a4000000-a7ffffff : PCI CardBus 0000:03
e0000000-efffffff : 0000:00:02.0
f4000000-f40fffff : PCI Bus 0000:10
  f4000000-f4003fff : 0000:10:00.0
    f4000000-f4003fff : 0000:10:00.0
f4100000-f43fffff : PCI Bus 0000:02
  f4100000-f4100fff : 0000:02:06.0
    f4100000-f4100fff : yenta_socket
  f4101000-f41017ff : 0000:02:06.1
    f4101000-f41017ff : ohci1394
  f4104000-f4107fff : 0000:02:06.1
  f4108000-f4109fff : 0000:02:0e.0
    f4108000-f4109fff : 0000:02:0e.0
f4400000-f447ffff : 0000:00:02.0
f4480000-f44bffff : 0000:00:02.0
f4500000-f457ffff : 0000:00:02.1
f4580000-f4583fff : 0000:00:1b.0
  f4580000-f4583fff : ICH HD audio
f4584000-f45843ff : 0000:00:1d.7
  f4584000-f45843ff : ehci_hcd
f4585000-f45853ff : 0000:00:1f.2
  f4585000-f45853ff : ahci
f8000000-fbffffff : PCI MMCONFIG 0 [00-3f]
  f8000000-fbffffff : pnp 00:0a
fec00000-fec00fff : IOAPIC 0
  fec00000-fec00fff : reserved
    fec00000-fec000ff : pnp 00:0a
fed00000-fed003ff : HPET 0
fed20000-fed9afff : reserved
  fed20000-fed3ffff : pnp 00:0a
  fed45000-fed8ffff : pnp 00:0a
  fed90000-fed9afff : pnp 00:0a
feda0000-fedbffff : reserved
  feda0000-fedbffff : pnp 00:0b
fee00000-fee00fff : Local APIC
  fee00000-fee00fff : reserved
    fee00000-fee00fff : pnp 00:0b
ffb00000-ffbfffff : reserved
  ffb00000-ffbfffff : pnp 00:09
fff00000-ffffffff : reserved
  fff00000-ffffffff : pnp 00:09

[tomi@notes ~]$ cat /proc/ioports 
0000-001f : dma1
0020-0021 : pic1
0040-0043 : timer0
0050-0053 : timer1
0060-0060 : keyboard
0064-0064 : keyboard
0070-0071 : rtc0
0080-008f : dma page reg
00a0-00a1 : pic2
00c0-00df : dma2
00f0-00ff : fpu
0170-0177 : 0000:00:1f.1
  0170-0177 : ata_piix
01f0-01f7 : 0000:00:1f.1
  01f0-01f7 : ata_piix
0376-0376 : 0000:00:1f.1
  0376-0376 : ata_piix
03c0-03df : vga+
03f6-03f6 : 0000:00:1f.1
  03f6-03f6 : ata_piix
04d0-04d1 : pnp 00:0a
0500-057f : pnp 00:09
0800-080f : pnp 00:09
0cf8-0cff : PCI conf1
1000-107f : 0000:00:1f.0
  1000-107f : pnp 00:0a
    1000-1003 : ACPI PM1a_EVT_BLK
    1004-1005 : ACPI PM1a_CNT_BLK
    1008-100b : ACPI PM_TMR
    1010-1015 : ACPI CPU throttle
    1020-1020 : ACPI PM2_CNT_BLK
    1028-102f : ACPI GPE0_BLK
1100-113f : 0000:00:1f.0
  1100-113f : pnp 00:0a
1200-121f : pnp 00:0a
1370-1377 : 0000:00:1f.2
  1370-1377 : ahci
13f0-13f7 : 0000:00:1f.2
  13f0-13f7 : ahci
1574-1577 : 0000:00:1f.2
  1574-1577 : ahci
15f4-15f7 : 0000:00:1f.2
  15f4-15f7 : ahci
2000-2fff : PCI Bus 0000:02
  2000-20ff : PCI CardBus 0000:03
  2400-24ff : PCI CardBus 0000:03
4000-4007 : 0000:00:02.0
4020-403f : 0000:00:1d.0
  4020-403f : uhci_hcd
4040-405f : 0000:00:1d.1
  4040-405f : uhci_hcd
4060-407f : 0000:00:1d.2
  4060-407f : uhci_hcd
4080-409f : 0000:00:1d.3
  4080-409f : uhci_hcd
40a0-40af : 0000:00:1f.1
  40a0-40af : ata_piix
40d0-40df : 0000:00:1f.2
  40d0-40df : ahci
Comment 6 ykzhao 2009-07-21 09:06:01 UTC
Hi, Tomas
    Do you mean that the suspend/resume can work well if the b43/b44 module is unloaded? Right?
    Will you please enable "CONFIG_PM_DEBUG" in kernel configuration and do the following test to confirm whether the system can be resumed from the suspended state?
    a. kill the process using /proc/acpi/event
    b. echo device > /sys/power/pm_test
    c. echo mem > /sys/power/state; dmesg >dmesg_after_device;
    d. echo core/cpu/platform >/sys/power/pm_test and do the test in step C.
(After echo mem > /sys/power/state, it will enter the sleeping state. And it is unnecessary to press the power button. It will wait for five seconds and resume from the suspend state).

BTW: please assure that B43/b44 driver is loaded while doing the above test.

Comment 7 Tomas Janousek 2009-07-21 11:02:30 UTC
Is this still relevant given what was said in this thread: http://thread.gmane.org/gmane.linux.power-management.general/15119/focus=5694 ?
Comment 8 ykzhao 2009-07-22 01:22:32 UTC
Hi, Tomas
    Thanks for the info.
    From the info mentioned in comment #7 it seems that this issue can be fixed by the patch.

    So IMO this bug can be marked as resolved.
Comment 9 ykzhao 2009-07-22 01:24:47 UTC
From the info in comment #7 it seems that this issue is related with the b43/b44 driver. And it is fixed by the patch mentioned in comment #7.
    So this bug will be marked as resolved.
Comment 10 Rafael J. Wysocki 2009-07-28 21:13:34 UTC
Handled-By : Johannes Berg <johannes@sipsolutions.net>
Patch : http://patchwork.kernel.org/patch/37837/
Comment 11 Rafael J. Wysocki 2009-08-09 23:47:30 UTC
Fixed by commit 89c3a8aca28e6d57f2ae945d97858a372d624b81 .
Comment 12 Ozan Caglayan 2009-08-27 06:59:09 UTC
Will this be included in the next .30 stable?