Bug 13092
JTFR: regression still exists in 2.6.30-rc2 Is this a eeepc or netbook of some sort? Sounds like pci hotplug issue. can you post lspci? no, it's a fairly old laptop, Dell Latitude LS, built early 2001. Suspend-to-disk has worked flawlessly for many years, though. lspci -v 00:00.0 Host bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (rev 03) Flags: bus master, medium devsel, latency 64 Memory at f8000000 (32-bit, prefetchable) [size=64M] Capabilities: [a0] AGP version 1.0 Kernel driver in use: agpgart-intel Kernel modules: intel-agp 00:01.0 PCI bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX AGP bridge (rev 03) (prog-if 00 [Normal decode]) Flags: bus master, 66MHz, medium devsel, latency 128 Bus: primary=00, secondary=01, subordinate=01, sec-latency=64 Memory behind bridge: fe400000-febfffff Prefetchable memory behind bridge: f6000000-f7bfffff Kernel modules: shpchp 00:07.0 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 02) Flags: bus master, medium devsel, latency 0 00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01) (prog-if 80 [Master]) Flags: bus master, medium devsel, latency 64 [virtual] Memory at 000001f0 (32-bit, non-prefetchable) [size=8] [virtual] Memory at 000003f0 (type 3, non-prefetchable) [size=1] [virtual] Memory at 00000170 (32-bit, non-prefetchable) [size=8] [virtual] Memory at 00000370 (type 3, non-prefetchable) [size=1] I/O ports at fcd0 [size=16] Kernel driver in use: PIIX_IDE Kernel modules: ata_piix, piix 00:07.2 USB Controller: Intel Corporation 82371AB/EB/MB PIIX4 USB (rev 01) (prog-if 00 [UHCI]) Flags: bus master, medium devsel, latency 64, IRQ 10 I/O ports at fce0 [size=32] Kernel driver in use: uhci_hcd Kernel modules: uhci-hcd 00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03) Flags: medium devsel, IRQ 9 Kernel driver in use: piix4_smbus Kernel modules: i2c-piix4 00:0a.0 CardBus bridge: Texas Instruments PCI1211 Subsystem: Dell Device 1225 Flags: bus master, medium devsel, latency 168, IRQ 10 Memory at 28020000 (32-bit, non-prefetchable) [size=4K] Bus: primary=00, secondary=02, subordinate=05, sec-latency=176 Memory window 0: 20000000-23fff000 (prefetchable) Memory window 1: 24000000-27fff000 I/O window 0: 00001000-000010ff I/O window 1: 00001400-000014ff 16-bit legacy interface ports at 0001 Kernel driver in use: yenta_cardbus Kernel modules: yenta_socket 00:0d.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 78) Subsystem: Dell Device 00a5 Flags: bus master, medium devsel, latency 80, IRQ 10 I/O ports at fc00 [size=128] Memory at fedfec00 (32-bit, non-prefetchable) [size=128] [virtual] Expansion ROM at 28000000 [disabled] [size=128K] Capabilities: [dc] Power Management version 2 Kernel driver in use: 3c59x Kernel modules: 3c59x 00:10.0 Communication controller: Agere Systems WinModem 56k (rev 01) Subsystem: Actiontec Electronics Inc Device 2500 Flags: medium devsel, IRQ 3 Memory at fedfe800 (32-bit, non-prefetchable) [size=256] I/O ports at fcc8 [size=8] I/O ports at f800 [size=256] Capabilities: [f8] Power Management version 2 01:00.0 VGA compatible controller: Neomagic Corporation NM2200 [MagicGraph 256AV] (rev 20) (prog-if 00 [VGA controller]) Subsystem: Dell Device 0005 Flags: bus master, fast Back2Back, medium devsel, latency 128, IRQ 10 Memory at f6000000 (32-bit, prefetchable) [size=16M] Memory at fe400000 (32-bit, non-prefetchable) [size=4M] Memory at feb00000 (32-bit, non-prefetchable) [size=1M] Expansion ROM at <unassigned> [disabled] Capabilities: [dc] Power Management version 1 Kernel modules: neofb 01:00.1 Multimedia audio controller: Neomagic Corporation NM2200 [MagicMedia 256AV Audio] (rev 20) Subsystem: Dell Device 0080 Flags: fast Back2Back, medium devsel, IRQ 10 Memory at f7800000 (32-bit, prefetchable) [size=4M] Memory at fea00000 (32-bit, non-prefetchable) [size=1M] Capabilities: [dc] Power Management version 1 Kernel driver in use: NeoMagic 256 Kernel modules: snd-nm256 02:00.0 Ethernet controller: Atheros Communications Inc. AR2413 802.11bg NIC (rev 01) Subsystem: Atheros Communications Inc. TP-Link TL-WN510G Wireless CardBus Adapter Flags: bus master, medium devsel, latency 168, IRQ 10 Memory at 24000000 (32-bit, non-prefetchable) [size=64K] Capabilities: [44] Power Management version 2 Kernel driver in use: ath5k Kernel modules: ath5k I hate to suggest it, but if you can try bisecting it, that would help a lot. I suspect it is something in the platform code as opposed to the driver itself. Suspend/resume works fine here. The only major changes on the driver were: 2.6.28: 8bdd5b9c6bd53add260756b6673a0545fbdbba21 bc1b32d6bdd2d6f3fbee9a7c01c9b099f11c579c (none that I know of in 2.6.29) 2.6.30: 665af4fc8979734d8f73c9a6732be07e545ce4cc bb2becac91f13e862d4601a8c5364bc758c35b8e ok, after some initial problems I'm happily bisecting, only I cannot test more than 2-3 versions per day. currently, known-good is still 2.6.28 ie 4a6908a3a050aacc9c3a2f36b276b46c0629ad91 latest known-bad is 672cf3cefe5f686637dec72b9f3d21fe1cdc8c94 -- that is, unless I'm wrong, none of the major changes you named are still in the picture? I told git-bisect to only look at kernel/power/ drivers/net/wireless/adm8211.* drivers/net/wireless/mac80211_hwsim.c and drivers/net/wireless/ath5k/ -- do you think that's sensible, anything I should add? Why adm8211 and hwsim? And you're certainly missing net/wireless/ and net/mac80211/. Also, drivers/platform/x86 (especially if using hp-wmi) and drivers/pci/hotplug are good candidates. thanks, I'm including those. known-good is 6dd014808f91ad99d4d794cf7c7c69610c10f904 known-bad is e808e586b77a10949e209f8a00cb8bf27e51df12 there are just two ath5k commits in between, namely 4fb7404e0eaf574c00d01d2b1ce2615229b350cd and 71ef99c8b79ab07e1c79794085481464f9870d62 but I probably won't be able to test those, as builts in the neighbourhood refuse to suspend (freeze in "snapshotting system", requiring cold reboot), namely: 33f1d7ecc6cffff3c618a02295de969ebbacd95d f1dd2b23badfe8a28910a78be24452c627c4b6f2 do you happen to know anything about this problem, or which range i'd need to exclude? I'm now hand-picking versions by staring at 'git bisect visualize' I'm done bisecting. The wont-suspend thing is a separate issue, fixed in 2ea5521022ac8f4f528dcbae02668e02a3501a5a and fixable in all earlier affected revisions by applying that patch. So the one that breaks resume of my WLAN card is 355a72d75b3b4f4877db4c9070c798238028ecb5, a patch to drivers/pci/pci-driver.c: 355a72d75b3b4f4877db4c9070c798238028ecb5 is first bad commit commit 355a72d75b3b4f4877db4c9070c798238028ecb5 Author: Rafael J. Wysocki <rjw@sisk.pl> Date: Mon Dec 8 00:34:57 2008 +0100 PCI: Rework default handling of suspend and resume Rework the handling of suspend and resume of PCI devices which have no drivers or the drivers of which do not provide any suspend-resume callbacks in such a way that their standard PCI configuration registers will be saved and restored with interrupts disabled. This should prevent such devices, including PCI bridges, from being resumed too late to be able to function correctly during the resume of the other PCI devices that may depend on them. Also, to remove one possible source of future confusion, drop the default handling of suspend and resume for PCI devices with drivers providing the 'pm' object introduced by the new suspend-resume framework (there are no such PCI drivers at the moment). This patch addresses the regression from 2.6.26 tracked as http://bugzilla.kernel.org/show_bug.cgi?id=12121 . Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> please tell me what to do next, reassign the bug as appropriate etc. I'm also pinging the author of the patch. no comments or ideas in a month? this resume regression is still an issue for 2.6.30-rc7 I don't really understand what the offending commit is doing, or how it affects ath5k. As it's a commit to drivers/pci/ I'd reassign the bug to that component, but I'm not allowed since only the assignee can do that... As was to be expected, in 2.6.30 the wireless won't resume successfully either. Someone have an idea what else I could do to provide you with clues as to the roots of this problem? Does it make sense to printk-test what paths through the code touched by 355a72d75b3b4f4877db4c9070c798238028ecb5 are taken, before and after applying that commit? Or will the problem likely be elsewhere, as the commit seems to be just part of a larger reworking of suspend/resume since 2.6.28?? I think it's a question for the Raphael and the PM experts. 2.6.30 did have a rewrite for suspend/resume for all wireless devices, but it clearly works for a lot of people (including me) so there's likely some difference in your platform. That said, there were some questions recently about whether or not ath5k should free pci resources at suspend time; however that hasn't changed in some time. Doesn't work for me, too. Running the latest Ubuntu kernel from kernel-ppa resuming is not possible: Aug 25 01:13:02 ups kernel: [ 764.760853] ath5k phy0: failed to wakeup the MAC Chip Aug 25 01:13:02 ups kernel: [ 764.760863] ath5k phy0: can't reset hardware (-5) Aug 25 01:13:02 ups NetworkManager: <info> (wlan0): deactivating device (reason: 2). Lspci -v: 03:00.0 Ethernet controller: Atheros Communications Inc. Atheros AR5001X+ Wireless Network Adapter (rev 01) Subsystem: Netgear Device 4b00 Flags: bus master, medium devsel, latency 168, IRQ 16 Memory at 34000000 (32-bit, non-prefetchable) [size=64K] Capabilities: <access denied> Kernel driver in use: ath5k Kernel modules: ath5k Maybe the same as http://bugzilla.kernel.org/show_bug.cgi?id=13948 I tried last week's 2.6.31-rc8 with and without the suspend-to-ram patch suggested in comment #14 / bug 13948, but there's no difference, there's still no successful resume of ath5k after suspend-to-*disk* I'd really love to get a comment from the Power Management gurus. This bug is almost six months old now, and I could have done a lot of debugging if given a few directions and the feeling that someone in the know actually cares (I've ben too busy to just start at random) Any ideas Rafael? Not really. The Alek's commit c82f63e411f1b58427c103bd95af2863b1c96dd1 (PCI: check saved state before restore) could have helped here in theory, but I guess I'll need to have a look at the driver. Please ping me about it in a day or two. Created attachment 23133 [details]
ath5k: Rework suspend and resume
Florian, please check if this patch changes anything.
Rafael, I wasn't very successful. I applied your patch to the remote master branch (ie 43c1266ce4d), but the resulting kernel is unable to properly associate with my ap in the first place ("ath timeout", iwconfig shows my ap's MAC address but I only get a link-local IP) - please tell me if you need more details, or what branch / commit I should be using to get your current state of work. Anyway, I tried suspend/resume, but I'm still getting the same error messages ("..failed to wake up the mac chip...") It applies to the current Linus' tree. I'll prepare a patch against 2.6.31 for you. Created attachment 23143 [details]
ath5k: Rework suspend and resume (against 2.6.31)
This is a version of the previous patch that applies to 2.6.31. I couldn't compile it, so please let me know if it doesn't compile and I'll fix it.
BTW, if the patch from comment #18 doesn't make any difference, it is highly unlikely that resume doesn't work because of commit 355a72d75b3b4f4877db4c9070c798238028ecb5. That could have been the case when the commit was done, but not right now. BTW2, please attach the output of dmesg from 2.6.31 with the patch from commit #21 applied (assuming it compiles) including at least one suspend-resume cycle. Also please attach the contents of /proc/iomem and /proc/ioports. I _suspect_ the problem is not with the ath5k, which is actually OK with or without my patch, but with a bridge it is connected to. One more question, is your ath5k compiled statically into the kernel? the patch from comment #21 doesn't compile: ... CC [M] drivers/net/wireless/ath/ath5k/base.o drivers/net/wireless/ath/ath5k/base.c: In function 'ath5k_pci_resume': drivers/net/wireless/ath/ath5k/base.c:696: error: 'err' undeclared (first use in this function) drivers/net/wireless/ath/ath5k/base.c:696: error: (Each undeclared identifier is reported only once drivers/net/wireless/ath/ath5k/base.c:696: error: for each function it appears in.) drivers/net/wireless/ath/ath5k/base.c:694: warning: label 'err_no_irq' defined but not used make[6]: *** [drivers/net/wireless/ath/ath5k/base.o] Fehler 1 ... I compile ath5k as a module, always. I will attach dmesg/iomem/ioports tonight. Created attachment 23150 [details]
ath5k: Rework suspend and resume (against 2.6.31)
New version, verified to compile.
Please test this one.
Created attachment 23151 [details]
PCMCIA: Rework suspend and resume (against 2.6.31)
Please also apply this patch, in addition to the previous one, and see if that helps.
Regardless of whether it helps or not, please attach the output of dmesg from 2.6.31 including at least one suspend-resume cycle with this patch applied.
Created attachment 23156 [details]
dmesg output after suspend/resume
with both patches applied to 2.6.31, still no change
Created attachment 23157 [details]
/proc/iomem after unsuccessful resume
Created attachment 23158 [details]
/proc/ioports after unsuccessful resume
Created attachment 23159 [details]
dmesg output, suspend/resume with "pcmcia: EjectCards yes"
I thought I'd mention that the wireless is upped after resume by an "UpInterfaces" directive in /etc/hibernate/common.conf
Looking at that file, I notice there's a pcmcia section with an "EjectCards" directive. With that enabled, the wireless properly resumes - only I'm not getting an IP address for the interface until dhcpcd is restarted, but that's a different matter. Alas, that's a workaround for my issue, even though I don't think it's a solution for this bug as physically reinjecting the card always worked, right?
But I thought it might be interesting info, so attached the dmesg output.
OK, thanks. So, it looks like the driver is unloaded before the hibernation and reloaded after the restore (actually, after we've returned to the user space), so the patch from comment #26 doesn't really matter (although I wonder what happens if the drivers is _not_ unloaded before hibernation). This means that indeed the device is handled by the default PCI suspend/resume which apparently doesn't work correctly with it, so I'll need to find the reason why. Hopefully, the information you've already provided will be sufficient for this purpose. That said, I wonder why there's the difference between the first and the second probing of the device: (before hibernation): [ 12.957047] ath5k 0000:02:00.0: enabling device (0000 -> 0002) [ 12.957187] ath5k 0000:02:00.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, low) -> IRQ 10 [ 12.957361] ath5k 0000:02:00.0: enabling bus mastering [ 12.957543] ath5k 0000:02:00.0: registered as 'phy0' (after restore): [ 98.042630] ath5k 0000:02:00.0: enabling device (0000 -> 0002) [ 98.042763] ath5k 0000:02:00.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, low) -> IRQ 10 [ 98.042926] ath5k 0000:02:00.0: enabling bus mastering [ 98.043111] ath5k 0000:02:00.0: registered as 'phy1' Why is it phy1 and what's happened to phy0? Bob? For clarity, comment #32 refers to the dmesg output from comment #28, (In reply to comment #31) > Created an attachment (id=23159) [details] > dmesg output, suspend/resume with "pcmcia: EjectCards yes" > > I thought I'd mention that the wireless is upped after resume by an > "UpInterfaces" directive in /etc/hibernate/common.conf > > Looking at that file, I notice there's a pcmcia section with an "EjectCards" > directive. With that enabled, the wireless properly resumes - only I'm not > getting an IP address for the interface until dhcpcd is restarted, but that's > a > different matter. Alas, that's a workaround for my issue, even though I don't > think it's a solution for this bug as physically reinjecting the card always > worked, right? > > But I thought it might be interesting info, so attached the dmesg output. It is interesting in that it kind of confirms the previous observation that the default PCI suspend/resume handling doesn't really work for your device. With the card ejected there's nothing to handle. :-) Created attachment 23161 [details] PCMCIA: Rework suspend and resume (against 2.6.31) - variant 2 Please try this patch instead of the patch from comment #27 and report back. (In reply to comment #32) > Why is it phy1 and what's happened to phy0? Bob? So phy%d is the common wireless device from the upper layer cfg80211 (see net/wireless/core.c, and 'iw phy'). Theoretically, a the device driver can register more than one of these per physical device if there are multiple radios or virtual radios, but for ath5k there's only one per physical device. cfg80211 allocates them and maintains the counter for these, so the following will create a phy1 (I believe it frees them without decrementing the usage count): $ rmmod ath5k $ modprobe ath5k On the other hand, this will reload ath5k without creating phy1, since cfg80211 is also reloaded: $ modprobe -r ath5k $ modprobe ath5k Thanks for the info. So this is perfectly normal. I'd still like to know if the problem is reproducible with the patch from comment #35. Created attachment 23173 [details]
dmesg after successful suspend/resume with patches from comments 26 AND 35
HURRAY, it works! see dmesg attached
NB I'll be AFK from now until Monday
(In reply to comment #38) > Created an attachment (id=23173) [details] > dmesg after successful suspend/resume with patches from comments 26 AND 35 > > HURRAY, it works! see dmesg attached Great, thanks for testing. So, we need to run full yenta resume before starting to resume dependent devices. I guess the other PCMCIA drivers need similar fixes. > NB I'll be AFK from now until Monday No problem. Handled-By : Rafael J. Wysocki <rjw@sisk.pl> Patch : http://bugzilla.kernel.org/attachment.cgi?id=23161 Please don't change the "Blocks" filed, I need it. Created attachment 23180 [details]
PM / PCMCIA: Drop second argument of pcmcia_socket_dev_suspend()
Clean-up patch needed for the final fix patch.
Created attachment 23181 [details]
PM / yenta: Fix cardbus suspend/resume regression
Cleaned-up yenta socket suspend/resume rework patch.
Florian, please verify that this patch on top of the one from the previous comment still fixes the problem for you (both patches against 2.6.31).
Ignore-Patch : http://bugzilla.kernel.org/attachment.cgi?id=23161 Patch : http://bugzilla.kernel.org/attachment.cgi?id=23180 Patch : http://bugzilla.kernel.org/attachment.cgi?id=23181 I can confirm that 2.6.31 patched with patches from comments 42 and 43 still fix the problem for me. Also, I can confirm that the patches you posted to LKML, applied on top of tonights master branch, also fix my problem (and master does no longer exhibit last weeks association problems, which is also good) Rafael, thanks a lot for your work! Florian, can you please check if the patch from http://bugzilla.kernel.org/attachment.cgi?id=23341 on top of 2.6.32-rc5 reintroduces the problem for you? Unfortunately, the fix for your system broke resume on some other systems (see bug #14334). Please also check if resume works on your system with the patch from http://bugzilla.kernel.org/attachment.cgi?id=23523 *** Bug 14105 has been marked as a duplicate of this bug. *** Rafael, I'm sorry but neither of them works for me, ie both patches (#47 and #48) reintroduce the problem! Thanks for testing. The systems affected by bug #14334 seem to have a power resources problem that shows up when yenta socket is powered up "too early". I've got the same problem (also acer aspire) with 2.6.33: https://bugzilla.kernel.org/show_bug.cgi?id=ath5k-wakeup (i've created another bug, because at the beggining it seemed to be destroying EEPROM...) |
Created attachment 21001 [details] annotated kern.log my wireless is an Atheros AR2413 on a pccard by tp-link (TL-WN510G). It suspends/resumes flawlessly on the 2.6.28.9 kernel, yet on 2.6.29 while the wireless is ok after the initial boot, resume fails with ath5k phy0: failed to wakeup the MAC Chip ath5k 0000:02:00.0: PCI INT A disabled ath5k: probe of 0000:02:00.0 failed with error -5 doing 'modprobe -r ath5k; modprobe ath5k' just repeats these messages; yet when I manually remove and reinject the pccard, everything's fine again. I'm attaching an annotated kern.log of a 2.6.29 boot/suspend/resume/module reinsertion/pccard reinsertion, followed by 2.6.28 boot/suspend/successful resume. Please tell me where to go from here, what other info might be useful etc. I'm happy to try patches or specific git checkouts, but am a bit at a loss what way to start. Florian