Bug 15551

Summary: WARNING: at net/mac80211/work.c:811 ieee80211_work_work+0x7f/0xde8 [mac80211]()
Product: Networking Reporter: Alex Zhavnerchik (alex.vizor)
Component: WirelessAssignee: Rafael J. Wysocki (rjw)
Status: CLOSED CODE_FIX    
Severity: normal CC: linville, reinette.chatre, rjw, yi.zhu
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.34-rc1 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 7216, 15310    
Attachments: dmesg
fix the problem from the iwlwifi side
fix the problem from the iwlwifi side (rebased against Linus' tip)
Kernel log (2.6.34-rc3)
dmesg 2.6.34-rc4

Description Alex Zhavnerchik 2010-03-16 22:03:47 UTC
Created attachment 25561 [details]
dmesg

Hi. I caught this after several resumes from s2ram, from s2disk it works ok.

I attached my dmesg. Please inform me if I can do the report more useful.
Comment 1 John W. Linville 2010-03-17 15:39:37 UTC
[22456.252166] s2ram: page allocation failure. order:4, mode:0x4010
[22456.252172] Pid: 23558, comm: s2ram Not tainted 2.6.34-rc1 #1
[22456.252175] Call Trace:
[22456.252187]  [<ffffffff810b8dc9>] __alloc_pages_nodemask+0x565/0x60c
[22456.252195]  [<ffffffff810de879>] alloc_pages_current+0x90/0x99
[22456.252201]  [<ffffffff810b7a9e>] __get_free_pages+0x9/0x46
[22456.252221]  [<ffffffffa03b4bdd>] iwl_tx_queue_init+0xf2/0x2a1 [iwlcore]
[22456.252227]  [<ffffffffa03b4f12>] iwl_txq_ctx_reset+0x186/0x20c [iwlcore]
[22456.252233]  [<ffffffffa03ae41c>] iwl_hw_nic_init+0x124/0x139 [iwlcore]

That looks like the initial part of the failure.  Just curious, are you using SLAB or SLUB?
Comment 2 Alex Zhavnerchik 2010-03-17 15:43:29 UTC
I have CONFIG_SLUB=y in kernel config
Comment 3 Alex Zhavnerchik 2010-03-17 15:44:01 UTC
I have CONFIG_SLUB=y in kernel config, should I try SLAB?
Comment 4 John W. Linville 2010-03-17 15:46:04 UTC
If you don't mind, it might be helpful to know if that avoids the problem.  The Intel drivers have had several "run-ins" w/ SLUB... :-(
Comment 5 Alex Zhavnerchik 2010-03-17 15:48:20 UTC
I'll try it with 2.6.34-rc2 when it out
Comment 6 Andrew Morton 2010-03-22 19:57:35 UTC
On Tue, 16 Mar 2010 22:03:53 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=15551

This'll be a post-2.6.33 regression.

>            Summary: wifi doesn't work after resume from s2ram on ThinkPad
>                     T61
>            Product: Networking
>            Version: 2.5
>     Kernel Version: 2.6.34-rc1
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Wireless
>         AssignedTo: networking_wireless@kernel-bugs.osdl.org
>         ReportedBy: alex.vizor@gmail.com
>         Regression: Yes
> 
> 
> Created an attachment (id=25561)
>  --> (http://bugzilla.kernel.org/attachment.cgi?id=25561)
> dmesg
> 
> Hi. I caught this after several resumes from s2ram, from s2disk it works ok.
> 
> I attached my dmesg. Please inform me if I can do the report more useful.
> 

This:

: [22456.252166] s2ram: page allocation failure. order:4, mode:0x4010
: [22456.252172] Pid: 23558, comm: s2ram Not tainted 2.6.34-rc1 #1
: [22456.252175] Call Trace:
: [22456.252187]  [<ffffffff810b8dc9>] __alloc_pages_nodemask+0x565/0x60c
: [22456.252195]  [<ffffffff810de879>] alloc_pages_current+0x90/0x99
: [22456.252201]  [<ffffffff810b7a9e>] __get_free_pages+0x9/0x46
: [22456.252221]  [<ffffffffa03b4bdd>] iwl_tx_queue_init+0xf2/0x2a1 [iwlcore]
: [22456.252227]  [<ffffffffa03b4f12>] iwl_txq_ctx_reset+0x186/0x20c [iwlcore]
: [22456.252233]  [<ffffffffa03ae41c>] iwl_hw_nic_init+0x124/0x139 [iwlcore]

Rafael, this is fallout from 452aa6999e6703ffbddd7f6ea124d3968915f3e3
("mm/pm: force GFP_NOIO during suspend/hibernation and resume"). 
Wireless was previously doing this stupidly-large allocation with
GFP_KERNEL.  But I think the above change is what flipped it to
__GFP_WAIT, which caused the page allocator to go into lame-and-sucky
mode, so the allocation failed.

(btw, that was a bug - 452aa6999e6703ffbddd7f6ea124d3968915f3e3 should
have cleared the __GFP_WAIT bit too).
Comment 7 Rafael J. Wysocki 2010-03-22 21:02:38 UTC
On Monday 22 March 2010, Andrew Morton wrote:
> On Tue, 16 Mar 2010 22:03:53 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=15551
> 
> This'll be a post-2.6.33 regression.
> 
> >            Summary: wifi doesn't work after resume from s2ram on ThinkPad
> >                     T61
> >            Product: Networking
> >            Version: 2.5
> >     Kernel Version: 2.6.34-rc1
> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Wireless
> >         AssignedTo: networking_wireless@kernel-bugs.osdl.org
> >         ReportedBy: alex.vizor@gmail.com
> >         Regression: Yes
> > 
> > 
> > Created an attachment (id=25561)
> >  --> (http://bugzilla.kernel.org/attachment.cgi?id=25561)
> > dmesg
> > 
> > Hi. I caught this after several resumes from s2ram, from s2disk it works
> ok.
> > 
> > I attached my dmesg. Please inform me if I can do the report more useful.
> > 
> 
> This:
> 
> : [22456.252166] s2ram: page allocation failure. order:4, mode:0x4010
> : [22456.252172] Pid: 23558, comm: s2ram Not tainted 2.6.34-rc1 #1
> : [22456.252175] Call Trace:
> : [22456.252187]  [<ffffffff810b8dc9>] __alloc_pages_nodemask+0x565/0x60c
> : [22456.252195]  [<ffffffff810de879>] alloc_pages_current+0x90/0x99
> : [22456.252201]  [<ffffffff810b7a9e>] __get_free_pages+0x9/0x46
> : [22456.252221]  [<ffffffffa03b4bdd>] iwl_tx_queue_init+0xf2/0x2a1 [iwlcore]
> : [22456.252227]  [<ffffffffa03b4f12>] iwl_txq_ctx_reset+0x186/0x20c
> [iwlcore]
> : [22456.252233]  [<ffffffffa03ae41c>] iwl_hw_nic_init+0x124/0x139 [iwlcore]
> 
> Rafael, this is fallout from 452aa6999e6703ffbddd7f6ea124d3968915f3e3
> ("mm/pm: force GFP_NOIO during suspend/hibernation and resume"). 
> Wireless was previously doing this stupidly-large allocation with
> GFP_KERNEL.  But I think the above change is what flipped it to
> __GFP_WAIT, which caused the page allocator to go into lame-and-sucky
> mode, so the allocation failed.

Well, technically the problem is in the driver.  It shouldn't do anything like
this during suspend/resume.  I'll try to figure out how to fix this at minimal cost.

> (btw, that was a bug - 452aa6999e6703ffbddd7f6ea124d3968915f3e3 should
> have cleared the __GFP_WAIT bit too).

Hmm, should it?  That would make all allocations GFP_ATOMIC and is this what
we really want?

Rafael
Comment 8 Rafael J. Wysocki 2010-03-22 22:02:28 UTC
On Monday 22 March 2010, Andrew Morton wrote:
> On Tue, 16 Mar 2010 22:03:53 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=15551
> 
> This'll be a post-2.6.33 regression.
> 
> >            Summary: wifi doesn't work after resume from s2ram on ThinkPad
> >                     T61
> >            Product: Networking
> >            Version: 2.5
> >     Kernel Version: 2.6.34-rc1
> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Wireless
> >         AssignedTo: networking_wireless@kernel-bugs.osdl.org
> >         ReportedBy: alex.vizor@gmail.com
> >         Regression: Yes
> > 
> > 
> > Created an attachment (id=25561)
> >  --> (http://bugzilla.kernel.org/attachment.cgi?id=25561)
> > dmesg
> > 
> > Hi. I caught this after several resumes from s2ram, from s2disk it works
> ok.
> > 
> > I attached my dmesg. Please inform me if I can do the report more useful.
> > 
> 
> This:
> 
> : [22456.252166] s2ram: page allocation failure. order:4, mode:0x4010
> : [22456.252172] Pid: 23558, comm: s2ram Not tainted 2.6.34-rc1 #1
> : [22456.252175] Call Trace:
> : [22456.252187]  [<ffffffff810b8dc9>] __alloc_pages_nodemask+0x565/0x60c
> : [22456.252195]  [<ffffffff810de879>] alloc_pages_current+0x90/0x99
> : [22456.252201]  [<ffffffff810b7a9e>] __get_free_pages+0x9/0x46
> : [22456.252221]  [<ffffffffa03b4bdd>] iwl_tx_queue_init+0xf2/0x2a1 [iwlcore]
> : [22456.252227]  [<ffffffffa03b4f12>] iwl_txq_ctx_reset+0x186/0x20c
> [iwlcore]
> : [22456.252233]  [<ffffffffa03ae41c>] iwl_hw_nic_init+0x124/0x139 [iwlcore]
> 
> Rafael, this is fallout from 452aa6999e6703ffbddd7f6ea124d3968915f3e3
> ("mm/pm: force GFP_NOIO during suspend/hibernation and resume"). 
> Wireless was previously doing this stupidly-large allocation with
> GFP_KERNEL.  But I think the above change is what flipped it to
> __GFP_WAIT, which caused the page allocator to go into lame-and-sucky
> mode, so the allocation failed.

Well, technically the problem is in the driver.  It shouldn't do anything like
this during suspend/resume.  I'll try to figure out how to fix this at minimal cost.

> (btw, that was a bug - 452aa6999e6703ffbddd7f6ea124d3968915f3e3 should
> have cleared the __GFP_WAIT bit too).

Hmm, should it?  That would make all allocations GFP_ATOMIC and is this what
we really want?

Rafael
Comment 9 Zhu Yi 2010-03-23 07:42:26 UTC
Created attachment 25651 [details]
fix the problem from the iwlwifi side
Comment 10 Rafael J. Wysocki 2010-03-23 21:20:49 UTC
Thanks!

Alex, can you test if the patch in comment #9 helps, please?
Comment 11 Alex Zhavnerchik 2010-03-23 22:02:53 UTC
Mmmm. It failed when I tried to apply this patch
Comment 12 Alex Zhavnerchik 2010-03-23 22:19:12 UTC
Forgot to mention I tried it on 2.6.34-rc2
Comment 13 Rafael J. Wysocki 2010-03-23 22:21:12 UTC
Do you mean the patch didn't apply?
Comment 14 Alex Zhavnerchik 2010-03-23 22:23:34 UTC
I tried to apply it like: patch -p1<../iwl.patch

On Wed, Mar 24, 2010 at 12:21 AM, <bugzilla-daemon@bugzilla.kernel.org>wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=15551
>
>
>
>
>
> --- Comment #13 from Rafael J. Wysocki <rjw@sisk.pl>  2010-03-23 22:21:12
> ---
> Do you mean the patch didn't apply?
>
> --
> Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You reported the bug.
>
Comment 15 Zhu Yi 2010-03-24 01:51:54 UTC
Created attachment 25668 [details]
fix the problem from the iwlwifi side (rebased against Linus' tip)

The previous patch was against iwlwifi git tip. This one was against Linus' tip.
Comment 16 Alex Zhavnerchik 2010-03-24 22:25:11 UTC
Looks like it works with this patch. I did s2ram several times and wifi works good after resume
Comment 17 Alex Zhavnerchik 2010-04-08 17:49:58 UTC
Created attachment 25916 [details]
Kernel log (2.6.34-rc3)

It is happening on 2.6.34-rc3 without patch proposed in this bug report
Comment 18 Reinette Chatre 2010-04-16 22:51:29 UTC
The patch proposed in comment #15 and verified to work in comment #16 can now be found in linux-2.6. Can this bug be closed?
Comment 19 Alex Zhavnerchik 2010-04-17 09:20:06 UTC
Yeah now it works well with 2.6.34-rc4
Comment 20 Alex Zhavnerchik 2010-04-19 20:03:01 UTC
Caught it on 2.6.34-rc4

Apr 19 22:53:23 loki kernel: [70023.276109] ------------[ cut here ]------------
Apr 19 22:53:23 loki kernel: [70023.276157] WARNING: at net/mac80211/work.c:812 ieee80211_work_work+0x7f/0xdee [mac80211]()
Apr 19 22:53:23 loki kernel: [70023.276164] Hardware name: 64608SG
Apr 19 22:53:23 loki kernel: [70023.276169] work scheduled while going to suspend
Apr 19 22:53:23 loki kernel: [70023.276174] Modules linked in: usbhid hid cryptd aes_x86_64 aes_generic vboxnetadp vboxnetflt autofs4 acpi_cpufreq cpufr
eq_conservative cpufreq_userspace cpufreq_powersave cpufreq_stats uinput microcode fuse vboxdrv firewire_sbp2 loop snd_hda_codec_analog snd_hda_intel sn
d_hda_codec arc4 snd_hwdep ecb snd_pcm_oss snd_mixer_oss snd_pcm thinkpad_acpi snd_seq_midi snd_rawmidi iwlagn snd_seq_midi_event iwlcore pcmcia snd_seq
 mac80211 snd_timer yenta_socket joydev snd_seq_device rsrc_nonstatic tpm_tis cfg80211 snd psmouse video intel_agp i2c_i801 tpm soundcore pcmcia_core rf
kill snd_page_alloc battery tpm_bios wmi serio_raw ac output nvram processor evdev pcspkr ext4 mbcache jbd2 crc16 sd_mod crc_t10dif ide_cd_mod cdrom ata
_generic ata_piix ahci sdhci_pci uhci_hcd sdhci libata firewire_ohci piix ehci_hcd firewire_core crc_itu_t mmc_core led_class scsi_mod usbcore e1000e nl
s_base ide_core thermal thermal_sys [last unloaded: scsi_wait_scan]
Apr 19 22:53:23 loki kernel: [70023.276397] Pid: 25371, comm: ifconfig Tainted: G        W  2.6.34-rc4 #2
Apr 19 22:53:23 loki kernel: [70023.276403] Call Trace:
Apr 19 22:53:23 loki kernel: [70023.276423]  [<ffffffff81045f2d>] warn_slowpath_common+0x77/0x8f
Apr 19 22:53:23 loki kernel: [70023.276434]  [<ffffffff81045f92>] warn_slowpath_fmt+0x3c/0x3e
Apr 19 22:53:23 loki kernel: [70023.276464]  [<ffffffffa0335166>] ieee80211_work_work+0x7f/0xdee [mac80211]
Apr 19 22:53:23 loki kernel: [70023.276475]  [<ffffffff810710fc>] ? put_lock_stats+0x1c/0x27
Apr 19 22:53:23 loki kernel: [70023.276484]  [<ffffffff810711c5>] ? lock_release_holdtime+0xbe/0xc3
Apr 19 22:53:23 loki kernel: [70023.276512]  [<ffffffffa0335f46>] ieee80211_work_purge+0x71/0xbe [mac80211]
Apr 19 22:53:23 loki kernel: [70023.276540]  [<ffffffffa0336a53>] ieee80211_stop+0x4f/0x535 [mac80211]
Apr 19 22:53:23 loki kernel: [70023.276552]  [<ffffffff8104bb0f>] ? local_bh_enable_ip+0x9/0xb
Apr 19 22:53:23 loki kernel: [70023.276563]  [<ffffffff81381967>] ? _raw_spin_unlock_bh+0x3e/0x42
Apr 19 22:53:23 loki kernel: [70023.276577]  [<ffffffff812cf09d>] __dev_close+0x73/0x86
Apr 19 22:53:23 loki kernel: [70023.276586]  [<ffffffff812cce3b>] __dev_change_flags+0xb9/0x13c
Apr 19 22:53:23 loki kernel: [70023.276595]  [<ffffffff812cef3d>] dev_change_flags+0x1c/0x52
Apr 19 22:53:23 loki kernel: [70023.276606]  [<ffffffff81318df1>] devinet_ioctl+0x2b1/0x564
Apr 19 22:53:23 loki kernel: [70023.276616]  [<ffffffff812d0ca6>] ? dev_ioctl+0x5f9/0x65c
Apr 19 22:53:23 loki kernel: [70023.276627]  [<ffffffff8131a8d6>] inet_ioctl+0x8a/0xa2
Apr 19 22:53:23 loki kernel: [70023.276637]  [<ffffffff812bc569>] sock_do_ioctl+0x24/0x43
Apr 19 22:53:23 loki kernel: [70023.276646]  [<ffffffff812bc9a8>] sock_ioctl+0x20e/0x21d
Apr 19 22:53:23 loki kernel: [70023.276658]  [<ffffffff8106605a>] ? sched_clock_cpu+0xc3/0xce
Apr 19 22:53:23 loki kernel: [70023.276670]  [<ffffffff8110502a>] vfs_ioctl+0x2d/0xa1
Apr 19 22:53:23 loki kernel: [70023.276680]  [<ffffffff811055a0>] do_vfs_ioctl+0x48b/0x4d1
Apr 19 22:53:23 loki kernel: [70023.276690]  [<ffffffff81064877>] ? up_read+0x1e/0x37
Apr 19 22:53:23 loki kernel: [70023.276699]  [<ffffffff81105637>] sys_ioctl+0x51/0x74
Apr 19 22:53:23 loki kernel: [70023.276709]  [<ffffffff8106605a>] ? sched_clock_cpu+0xc3/0xce
Apr 19 22:53:23 loki kernel: [70023.276721]  [<ffffffff81008b82>] system_call_fastpath+0x16/0x1b
Apr 19 22:53:23 loki kernel: [70023.276729] ---[ end trace 4151dc69780539f3 ]---
Comment 21 Reinette Chatre 2010-04-19 20:41:22 UTC
(In reply to comment #20)
> Caught it on 2.6.34-rc4
> 
> Apr 19 22:53:23 loki kernel: [70023.276109] ------------[ cut here
> ]------------
> Apr 19 22:53:23 loki kernel: [70023.276157] WARNING: at
> net/mac80211/work.c:812
> ieee80211_work_work+0x7f/0xdee [mac80211]()
> Apr 19 22:53:23 loki kernel: [70023.276164] Hardware name: 64608SG
> Apr 19 22:53:23 loki kernel: [70023.276169] work scheduled while going to
> suspend

It is not clear if this is the same issue since the errors do not contain the memory allocation failure that triggered the problem in your initial report. Even so, the patch proposed in this bug report and tested by you already is not yet in an rc release. It can be found in linux-2.6 so it should be in rc5. You thus ran without the patch addressing this issue.
Comment 22 Alex Zhavnerchik 2010-04-19 20:51:12 UTC
Created attachment 26054 [details]
dmesg 2.6.34-rc4

Here is dmesg after resuming and reloading iwlagn module
Comment 23 Reinette Chatre 2010-04-19 20:56:48 UTC
(In reply to comment #22)
> Created an attachment (id=26054) [details]
> dmesg 2.6.34-rc4
> 
> Here is dmesg after resuming and reloading iwlagn module

Yes - this appears to be the same problem since trouble starts with an allocation failure in iwl_tx_queue_init() ... so please apply the fix on top of 2.6.34-rc4 or run with the latest from linux-2.6 ... or wait for 2.6.34-rc5.
Comment 24 Rafael J. Wysocki 2010-05-04 21:22:22 UTC
Fixed by commit de0f60ea94e132c858caa64a44b2012e1e8580b0 .