Bug 12656

Summary: iwl3945 broken after hibernation: Wait for START_ALIVE timeout after 2000ms.
Product: Networking Reporter: Alex Riesen (raa.lkml)
Component: WirelessAssignee: networking_wireless (networking_wireless)
Status: CLOSED CODE_FIX    
Severity: normal CC: drago01, helmut.schaa, joel, reinette.chatre, rjw, zdenek.kabelac
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: v2.6.29-rc3-742-ge83102c Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 7216, 12398    
Attachments: dmesg after resume
lspci -vvvx _before_ hybernation
the kernel .config
dmesg after reverting the commit found by bisect (to compare with the bug case)

Description Alex Riesen 2009-02-08 01:52:46 UTC
Latest working kernel version: v2.6.28, but may be even some previous -rcs worked (not sure)
Earliest failing kernel version: v2.6.29-rc3-742-ge83102c
Distribution: Ubuntu 8.10
Hardware Environment: Dell XPS M1330 (64bit)
Software Environment: iwlwifi-3945-2.ucode (firmware version 15.28.2.8)
Problem Description:
After resuming from hybernation the network card is disabled
(the activity LED is off) and these messages can appear in kernel log:
iwl3945 0000:0c:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
iwl3945 0000:0c:00.0: irq 30 for MSI/MSI-X
iwl3945: Wait for START_ALIVE timeout after 2000ms.
iwl3945 0000:0c:00.0: PCI INT A disabled
iwl3945 0000:0c:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
iwl3945 0000:0c:00.0: irq 30 for MSI/MSI-X
iwl3945: Wait for START_ALIVE timeout after 2000ms.
iwl3945 0000:0c:00.0: PCI INT A disabled

Switching the card on and off with killswitch (the laptop has a hardware
killswitch) does not help.

Steps to reproduce:
 hybernate and resume
Comment 1 Alex Riesen 2009-02-08 01:56:36 UTC
Created attachment 20161 [details]
dmesg after resume
Comment 2 Alex Riesen 2009-02-08 01:57:20 UTC
Created attachment 20162 [details]
lspci -vvvx _before_ hybernation
Comment 3 Alex Riesen 2009-02-08 01:57:56 UTC
Created attachment 20163 [details]
the kernel .config
Comment 4 Anonymous Emailer 2009-02-08 06:42:53 UTC
Reply-To: fork0@users.sourceforge.net

Bisection points at:

commit c4e061ace75513aee227090486cc46dec7810c00
Author: Reinette Chatre <reinette.chatre@intel.com>
Date:   Tue Feb 3 10:20:03 2009 -0800

    iwlwifi: save PCI state before suspend, restore after resume

    This is the right thing to do and fixes the following warning:

diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c b/drivers/net/wireless/iwlwifi/iwl3945-base.c
index 95d0198..5b44d32 100644
--- a/drivers/net/wireless/iwlwifi/iwl3945-base.c
+++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c
@@ -8143,6 +8143,7 @@ static int iwl3945_pci_suspend(struct pci_dev *pdev, pm_message_t state)
 		priv->is_open = 1;
 	}
 
+	pci_save_state(pdev);
 	pci_set_power_state(pdev, PCI_D3hot);
 
 	return 0;
@@ -8153,6 +8154,7 @@ static int iwl3945_pci_resume(struct pci_dev *pdev)
 	struct iwl3945_priv *priv = pci_get_drvdata(pdev);
 
 	pci_set_power_state(pdev, PCI_D0);
+	pci_restore_state(pdev);
 
 	if (priv->is_open)
 		iwl3945_mac_start(priv->hw);

The change makes sense to me, but it obviously makes problems here.
Reverting the commit allows the wireless network to restore its
functionality after resume from hibernation.

bugme-daemon@bugzilla.kernel.org, Sun, Feb 08, 2009 11:48:30 +0100:
> http://bugzilla.kernel.org/show_bug.cgi?id=12656
Comment 5 Alex Riesen 2009-02-08 06:45:51 UTC
Created attachment 20166 [details]
dmesg after reverting the commit found by bisect (to compare with the bug case)
Comment 6 Rafael J. Wysocki 2009-02-08 10:48:43 UTC
On Sunday 08 February 2009, Alex Riesen wrote:
> Bisection points at:
> 
> commit c4e061ace75513aee227090486cc46dec7810c00
> Author: Reinette Chatre <reinette.chatre@intel.com>
> Date:   Tue Feb 3 10:20:03 2009 -0800
> 
>     iwlwifi: save PCI state before suspend, restore after resume
> 
>     This is the right thing to do and fixes the following warning:
> 
> diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c
> b/drivers/net/wireless/iwlwifi/iwl3945-base.c
> index 95d0198..5b44d32 100644
> --- a/drivers/net/wireless/iwlwifi/iwl3945-base.c
> +++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c
> @@ -8143,6 +8143,7 @@ static int iwl3945_pci_suspend(struct pci_dev *pdev,
> pm_message_t state)
>               priv->is_open = 1;
>       }
>  
> +     pci_save_state(pdev);
>       pci_set_power_state(pdev, PCI_D3hot);
>  
>       return 0;
> @@ -8153,6 +8154,7 @@ static int iwl3945_pci_resume(struct pci_dev *pdev)
>       struct iwl3945_priv *priv = pci_get_drvdata(pdev);
>  
>       pci_set_power_state(pdev, PCI_D0);
> +     pci_restore_state(pdev);
>  
>       if (priv->is_open)
>               iwl3945_mac_start(priv->hw);
> 
> The change makes sense to me, but it obviously makes problems here.
> Reverting the commit allows the wireless network to restore its
> functionality after resume from hibernation.

What happens if you only remove the pci_restore_state(pdev) from
iwl3945_pci_resume?

Rafael
Comment 7 Anonymous Emailer 2009-02-08 15:37:32 UTC
Reply-To: fork0@users.sourceforge.net

Rafael J. Wysocki, Sun, Feb 08, 2009 19:48:02 +0100:
> On Sunday 08 February 2009, Alex Riesen wrote:
> > @@ -8153,6 +8154,7 @@ static int iwl3945_pci_resume(struct pci_dev *pdev)
> >     struct iwl3945_priv *priv = pci_get_drvdata(pdev);
> >  
> >     pci_set_power_state(pdev, PCI_D0);
> > +   pci_restore_state(pdev);
> >  
> >     if (priv->is_open)
> >             iwl3945_mac_start(priv->hw);
> > 
> 
> What happens if you only remove the pci_restore_state(pdev) from
> iwl3945_pci_resume?
> 

Bad. Same as before reverting the Reinette Chatre's commit.
Comment 8 Reinette Chatre 2009-02-08 22:29:29 UTC
Alex,

On Sun, 2009-02-08 at 15:37 -0800, Alex Riesen wrote:
> Rafael J. Wysocki, Sun, Feb 08, 2009 19:48:02 +0100:
> > On Sunday 08 February 2009, Alex Riesen wrote:
> > > @@ -8153,6 +8154,7 @@ static int iwl3945_pci_resume(struct pci_dev *pdev)
> > >   struct iwl3945_priv *priv = pci_get_drvdata(pdev);
> > >  
> > >   pci_set_power_state(pdev, PCI_D0);
> > > + pci_restore_state(pdev);
> > >  
> > >   if (priv->is_open)
> > >           iwl3945_mac_start(priv->hw);
> > > 
> > 
> > What happens if you only remove the pci_restore_state(pdev) from
> > iwl3945_pci_resume?
> > 
> 
> Bad. Same as before reverting the Reinette Chatre's commit.

Sorry about this ...

Could you please try the patch below on top of the bad commit?

diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c b/drivers/net/wireless/iwlwifi/iwl3945-base.c
index 5b44d32..1f44344 100644
--- a/drivers/net/wireless/iwlwifi/iwl3945-base.c
+++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c
@@ -8144,6 +8144,7 @@ static int iwl3945_pci_suspend(struct pci_dev *pdev, pm_message_t state)
 	}
 
 	pci_save_state(pdev);
+	pci_disable_device(pdev);
 	pci_set_power_state(pdev, PCI_D3hot);
 
 	return 0;
@@ -8152,8 +8153,12 @@ static int iwl3945_pci_suspend(struct pci_dev *pdev, pm_message_t state)
 static int iwl3945_pci_resume(struct pci_dev *pdev)
 {
 	struct iwl3945_priv *priv = pci_get_drvdata(pdev);
+	int ret;
 
 	pci_set_power_state(pdev, PCI_D0);
+	ret = pci_enable_device(pdev);
+	if (ret)
+		return ret;
 	pci_restore_state(pdev);
 
 	if (priv->is_open)
Comment 9 Anonymous Emailer 2009-02-08 23:23:56 UTC
Reply-To: fork0@users.sourceforge.net

reinette chatre, Mon, Feb 09, 2009 07:32:38 +0100:
> > Rafael J. Wysocki, Sun, Feb 08, 2009 19:48:02 +0100:
> > > 
> > > What happens if you only remove the pci_restore_state(pdev) from
> > > iwl3945_pci_resume?
> > > 
> > 
> > Bad. Same as before reverting the Reinette Chatre's commit.
> 
> Sorry about this ...
> 
> Could you please try the patch below on top of the bad commit?
> 

Does not help. It is still broken.
Comment 10 Reinette Chatre 2009-02-09 00:16:39 UTC
Alex,

On Sun, 2009-02-08 at 15:37 -0800, Alex Riesen wrote:
> Rafael J. Wysocki, Sun, Feb 08, 2009 19:48:02 +0100:
> > On Sunday 08 February 2009, Alex Riesen wrote:
> > > @@ -8153,6 +8154,7 @@ static int iwl3945_pci_resume(struct pci_dev *pdev)
> > >   struct iwl3945_priv *priv = pci_get_drvdata(pdev);
> > >  
> > >   pci_set_power_state(pdev, PCI_D0);
> > > + pci_restore_state(pdev);
> > >  
> > >   if (priv->is_open)
> > >           iwl3945_mac_start(priv->hw);
> > > 
> > 
> > What happens if you only remove the pci_restore_state(pdev) from
> > iwl3945_pci_resume?
> > 
> 
> Bad. Same as before reverting the Reinette Chatre's commit.
> 

Sorry again ... please ignore previous patch.

pci_legacy_suspend assumes that the driver's suspend function should
call pci_save_state. In the driver this function is called as part of
the suspend process, but not always directly from the suspend function -
instead when the mac80211 stack signals the interface goes down this
call will be skipped as part of the suspend process ... because it was
already done when interface was brought down. So pci state will be
saved ... but not always in the suspend function.

With the bad patch what we did was save the pci state again ... but at
this time the device was already disabled so when device was resumed we
resumed a disabled device.

The patch below should solve this problem. Yes - it does look like a
hack, but I do not know how we can indicate the pci state is already
saved at this time. If we save the state at this point it is of a
disabled device and you have reported what that is like on resume. This
should work also in the case when the interface was not open when the
device is suspended because in this case the device is disabled also.

Could you please try this out?

diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c b/drivers/net/wireless/iwlwifi/iwl3945-base.c
index 95d0198..29bb46c 100644
--- a/drivers/net/wireless/iwlwifi/iwl3945-base.c
+++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c
@@ -8143,6 +8143,14 @@ static int iwl3945_pci_suspend(struct pci_dev *pdev, pm_message_t state)
 		priv->is_open = 1;
 	}
 
+	/* it is assumed state will be saved in this function
+	 * - but the interface could be stopped by mac80211 stack before
+	 *   suspend is called. If so, the pci state will be saved and
+	 *   device will be disabled when this function is entered. If we
+	 *   save the state now we will save a disabled device. Just
+	 *   indicate that the current saved state is valid. */
+	pdev->state_saved = true;
+
 	pci_set_power_state(pdev, PCI_D3hot);
 
 	return 0;
Comment 11 Alex Riesen 2009-02-09 04:56:38 UTC
(In reply to comment #10)
> 
> +       /* it is assumed state will be saved in this function
> +        * - but the interface could be stopped by mac80211 stack before
> +        *   suspend is called. If so, the pci state will be saved and
> +        *   device will be disabled when this function is entered. If we
> +        *   save the state now we will save a disabled device. Just
> +        *   indicate that the current saved state is valid. */
> +       pdev->state_saved = true;
> +
>         pci_set_power_state(pdev, PCI_D3hot);
> 
>         return 0;
> 

Are you sure it is to be applied on the tree with c4e061ace7551 reverted?
Because hibernation obviously works this way (that was the thing the earlier
bisection has found, and yes, I retested). Or should I test for something
else besides the hibernation? What does this change if the problematic
commit is reverted?
Comment 12 Reinette Chatre 2009-02-09 07:34:16 UTC
Alex,

Yes - this is to be applied with that patch reverted. As you state this will solve hibernation for you - at the same time it also addresses the WARN you currently get when hibernation does work.

Would it be possible to test hibernation with the interface down also?

Thank you very much
Comment 13 Alex Riesen 2009-02-09 08:44:17 UTC
(In reply to comment #12)
> Yes - this is to be applied with that patch reverted. As you state this will
> solve hibernation for you - at the same time it also addresses the WARN you
> currently get when hibernation does work.

Oh, now I understand your what you meant by saying "it does look like a
hack". For reference, this is the WARN (I think):

WARNING: at drivers/pci/pci-driver.c:374 pci_legacy_suspend+0xa3/0xe6()
Hardware name: XPS M1330                       
PCI PM: Device state not saved by iwl3945_pci_suspend+0x0/0x51
Modules linked in: binfmt_misc cpufreq_powersave cpufreq_conservative cpufreq_userspace iptable_filter dm_crypt dm_mod sbp2 fuse serio_raw uvcvideo videodev v4l1_compat v4l2_compat_ioctl32 ohci1394 ieee1394 video
Pid: 3860, comm: pm-hibernate Not tainted 2.6.29-rc3-t #211
Call Trace:
 [<ffffffff8023c3c0>] warn_slowpath+0xd8/0x114
 [<ffffffff8025e38a>] ? get_lock_stats+0x2d/0x5c
 [<ffffffff803c66e1>] ? acpi_bus_data_handler+0x0/0xb
 [<ffffffff802604c5>] ? mark_held_locks+0x67/0x82
 [<ffffffff805f825d>] ? _spin_unlock_irqrestore+0x44/0x6d
 [<ffffffff80260686>] ? trace_hardirqs_on_caller+0x114/0x138
 [<ffffffff802606b7>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff805f826a>] ? _spin_unlock_irqrestore+0x51/0x6d
 [<ffffffff80254a32>] ? up+0x39/0x3e
 [<ffffffff803c48ce>] ? acpi_ut_release_mutex+0x62/0x66
 [<ffffffff803ba63b>] ? acpi_get_data+0x63/0x75
 [<ffffffff80442648>] ? iwl3945_pci_suspend+0x0/0x51
 [<ffffffff803c5de8>] ? acpi_bus_get_device+0x2a/0x3e
 [<ffffffff803c5eb6>] ? acpi_bus_power_manageable+0x16/0x2e
 [<ffffffff80398b07>] ? acpi_pci_power_manageable+0x1c/0x1e
 [<ffffffff80381b15>] ? pci_set_power_state+0xd1/0x106
 [<ffffffff80384267>] pci_legacy_suspend+0xa3/0xe6
 [<ffffffff805f825d>] ? _spin_unlock_irqrestore+0x44/0x6d
 [<ffffffff803843d6>] pci_pm_freeze+0x48/0xab
 [<ffffffff8041f9f1>] pm_op+0x80/0xea
 [<ffffffff804201f2>] device_suspend+0x416/0x552
 [<ffffffff8026ee0f>] hibernation_snapshot+0x49/0x197
 [<ffffffff8026f00d>] hibernate+0xb0/0x1f3
 [<ffffffff8026da44>] state_store+0x5e/0xdc
 [<ffffffff80371e67>] kobj_attr_store+0x17/0x19
 [<ffffffff802fc348>] sysfs_write_file+0xe9/0x11e
 [<ffffffff802b2e3c>] vfs_write+0xb3/0x129
 [<ffffffff802b2f80>] sys_write+0x4c/0x75
 [<ffffffff8020b8cb>] system_call_fastpath+0x16/0x1b

> Would it be possible to test hibernation with the interface down also?

No problem. Tested: it stays down :) No warnings.
Comment 14 Zdenek Kabelac 2009-02-09 08:59:52 UTC
Ok - I think I'm seeing now the same problem with my T61 - I'm not sure if it's related with this bug -  http://www.intellinuxwireless.org/bugzilla/show_bug.cgi?id=1834

but now - when I run suspend and resume - I end up with unworkable wifi connection - and I've checked with the patch from Comment #11 or without - I think this is somewhat new behavior as now it happens relatively always when I want to test - on the other hand - there is simple workaround -  modprobe -r iwl3945 ; modprobe iwl3945

This makes my wifi working again - see the log:

iwl3945: Intel(R) PRO/Wireless 3945ABG/BG Network Connection driver for Linux, 1.2.26ks
iwl3945: Copyright(c) 2003-2008 Intel Corporation
iwl3945 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
iwl3945 0000:03:00.0: setting latency timer to 64
iwl3945: Tunable channels: 13 802.11bg, 23 802.11a channels
iwl3945: Detected Intel Wireless WiFi Link 3945ABG
iwl3945 0000:03:00.0: PCI INT A disabled
....

iwl3945: Wait for START_ALIVE timeout after 2000ms.
iwl3945 0000:03:00.0: PCI INT A disabled
iwl3945 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
iwl3945 0000:03:00.0: irq 32 for MSI/MSI-X
iwl3945: Wait for START_ALIVE timeout after 2000ms.
iwl3945 0000:03:00.0: PCI INT A disabled
PM: Removing info for No Bus:rfkill1
PM: Removing info for No Bus:wlan0
PM: Removing info for No Bus:wmaster0
PM: Removing info for No Bus:phy0
....

lib80211_crypt: unregistered algorithm 'NULL'
lib80211: common routines for IEEE802.11 drivers
lib80211_crypt: registered algorithm 'NULL'
PM: Removing info for platform:regulatory.0
mac80211: Unknown symbol regulatory_hint_11d
mac80211: Unknown symbol cfg80211_wext_giwname
mac80211: Unknown symbol cfg80211_wext_giwmode
mac80211: Unknown symbol wiphy_register
mac80211: Unknown symbol wiphy_new
mac80211: Unknown symbol wiphy_unregister
mac80211: Unknown symbol ieee80211_radiotap_iterator_init
mac80211: Unknown symbol __ieee80211_get_channel
mac80211: Unknown symbol cfg80211_wext_siwmode
mac80211: Unknown symbol ieee80211_radiotap_iterator_next
mac80211: Unknown symbol ieee80211_channel_to_frequency
mac80211: Unknown symbol ieee80211_frequency_to_channel
mac80211: Unknown symbol wiphy_free
iwl3945: Unknown symbol ieee80211_free_hw
iwl3945: Unknown symbol ieee80211_alloc_hw
iwl3945: Unknown symbol ieee80211_register_hw
iwl3945: Unknown symbol ieee80211_rate_control_unregister
iwl3945: Unknown symbol ieee80211_wake_queue
iwl3945: Unknown symbol ieee80211_find_sta
iwl3945: Unknown symbol ieee80211_tx_status_irqsafe
iwl3945: Unknown symbol ieee80211_wake_queues
iwl3945: Unknown symbol ieee80211_rate_control_register
iwl3945: Unknown symbol ieee80211_stop_queue
iwl3945: Unknown symbol ieee80211_stop_queues
iwl3945: Unknown symbol ieee80211_scan_completed
iwl3945: Unknown symbol ieee80211_channel_to_frequency
iwl3945: Unknown symbol ieee80211_unregister_hw
iwl3945: Unknown symbol ieee80211_hdrlen
iwl3945: Unknown symbol ieee80211_beacon_get
iwl3945: Unknown symbol ieee80211_rx_irqsafe
PM: Adding info for platform:regulatory.0
cfg80211: Calling CRDA to update world regulatory domain
iwl3945: Intel(R) PRO/Wireless 3945ABG/BG Network Connection driver for Linux, 1.2.26ks
iwl3945: Copyright(c) 2003-2008 Intel Corporation
iwl3945 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
iwl3945 0000:03:00.0: setting latency timer to 64
iwl3945: Tunable channels: 13 802.11bg, 23 802.11a channels
iwl3945: Detected Intel Wireless WiFi Link 3945ABG
PM: Adding info for No Bus:phy0
wmaster0 (iwl3945): not using net_device_ops yet
PM: Adding info for No Bus:wmaster0
phy0: Selected rate control algorithm 'iwl-3945-rs'
wlan0 (iwl3945): not using net_device_ops yet
PM: Adding info for No Bus:wlan0
PM: Adding info for No Bus:rfkill2
iwl3945 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
iwl3945 0000:03:00.0: irq 32 for MSI/MSI-X
iwl3945 0000:03:00.0: firmware: requesting iwlwifi-3945-2.ucode
PM: Adding info for No Bus:0000:03:00.0
PM: Removing info for No Bus:0000:03:00.0
iwl3945 loaded firmware version 15.28.2.8
ADDRCONF(NETDEV_UP): wlan0: link is not ready
Comment 15 Reinette Chatre 2009-02-10 11:46:56 UTC
(In reply to comment #14)
> Ok - I think I'm seeing now the same problem with my T61 - I'm not sure if
> it's
> related with this bug - 
> http://www.intellinuxwireless.org/bugzilla/show_bug.cgi?id=1834
> 
> but now - when I run suspend and resume - I end up with unworkable wifi
> connection - and I've checked with the patch from Comment #11 or without - I

this looks like the same problem. did you revert c4e061ace75513aee227090486cc46dec7810c00 before applying the patch in comment #11?

Also, your log contains many "unknown symbol" errors. Can you please fix that?
Comment 16 Zdenek Kabelac 2009-02-10 12:27:20 UTC
So after my short tests:

With reverted commit c4e061ace75513aee227090486cc46dec7810c00 and patch from comment #11, there is really some improvement.

Wifi seems to survive 4 suspend/resume cycles in row.

I'm currently running this HEAD 4c098bcd55fad34dcf224bf8343db6a9ac58fc68
(kernel 2.6.29-rc4)

And about those missing symbols - these are just results of modprobe -r iwl3945
try yourself... so I cannot fix them.
Comment 17 Reinette Chatre 2009-02-10 13:14:30 UTC
(In reply to comment #16)
> So after my short tests:
> 
> With reverted commit c4e061ace75513aee227090486cc46dec7810c00 and patch from
> comment #11, there is really some improvement.
> 
> Wifi seems to survive 4 suspend/resume cycles in row.

Sounds like it fixed it for you as confirmed by comment #4. You probably will notice some WARN messages in your log when suspending - the patch in comment #10 addresses that.