Bug 15551
Summary: | WARNING: at net/mac80211/work.c:811 ieee80211_work_work+0x7f/0xde8 [mac80211]() | ||
---|---|---|---|
Product: | Networking | Reporter: | Alex Zhavnerchik (alex.vizor) |
Component: | Wireless | Assignee: | Rafael J. Wysocki (rjw) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | linville, reinette.chatre, rjw, yi.zhu |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.34-rc1 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 7216, 15310 | ||
Attachments: |
dmesg
fix the problem from the iwlwifi side fix the problem from the iwlwifi side (rebased against Linus' tip) Kernel log (2.6.34-rc3) dmesg 2.6.34-rc4 |
[22456.252166] s2ram: page allocation failure. order:4, mode:0x4010 [22456.252172] Pid: 23558, comm: s2ram Not tainted 2.6.34-rc1 #1 [22456.252175] Call Trace: [22456.252187] [<ffffffff810b8dc9>] __alloc_pages_nodemask+0x565/0x60c [22456.252195] [<ffffffff810de879>] alloc_pages_current+0x90/0x99 [22456.252201] [<ffffffff810b7a9e>] __get_free_pages+0x9/0x46 [22456.252221] [<ffffffffa03b4bdd>] iwl_tx_queue_init+0xf2/0x2a1 [iwlcore] [22456.252227] [<ffffffffa03b4f12>] iwl_txq_ctx_reset+0x186/0x20c [iwlcore] [22456.252233] [<ffffffffa03ae41c>] iwl_hw_nic_init+0x124/0x139 [iwlcore] That looks like the initial part of the failure. Just curious, are you using SLAB or SLUB? I have CONFIG_SLUB=y in kernel config I have CONFIG_SLUB=y in kernel config, should I try SLAB? If you don't mind, it might be helpful to know if that avoids the problem. The Intel drivers have had several "run-ins" w/ SLUB... :-( I'll try it with 2.6.34-rc2 when it out On Tue, 16 Mar 2010 22:03:53 GMT bugzilla-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=15551 This'll be a post-2.6.33 regression. > Summary: wifi doesn't work after resume from s2ram on ThinkPad > T61 > Product: Networking > Version: 2.5 > Kernel Version: 2.6.34-rc1 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Wireless > AssignedTo: networking_wireless@kernel-bugs.osdl.org > ReportedBy: alex.vizor@gmail.com > Regression: Yes > > > Created an attachment (id=25561) > --> (http://bugzilla.kernel.org/attachment.cgi?id=25561) > dmesg > > Hi. I caught this after several resumes from s2ram, from s2disk it works ok. > > I attached my dmesg. Please inform me if I can do the report more useful. > This: : [22456.252166] s2ram: page allocation failure. order:4, mode:0x4010 : [22456.252172] Pid: 23558, comm: s2ram Not tainted 2.6.34-rc1 #1 : [22456.252175] Call Trace: : [22456.252187] [<ffffffff810b8dc9>] __alloc_pages_nodemask+0x565/0x60c : [22456.252195] [<ffffffff810de879>] alloc_pages_current+0x90/0x99 : [22456.252201] [<ffffffff810b7a9e>] __get_free_pages+0x9/0x46 : [22456.252221] [<ffffffffa03b4bdd>] iwl_tx_queue_init+0xf2/0x2a1 [iwlcore] : [22456.252227] [<ffffffffa03b4f12>] iwl_txq_ctx_reset+0x186/0x20c [iwlcore] : [22456.252233] [<ffffffffa03ae41c>] iwl_hw_nic_init+0x124/0x139 [iwlcore] Rafael, this is fallout from 452aa6999e6703ffbddd7f6ea124d3968915f3e3 ("mm/pm: force GFP_NOIO during suspend/hibernation and resume"). Wireless was previously doing this stupidly-large allocation with GFP_KERNEL. But I think the above change is what flipped it to __GFP_WAIT, which caused the page allocator to go into lame-and-sucky mode, so the allocation failed. (btw, that was a bug - 452aa6999e6703ffbddd7f6ea124d3968915f3e3 should have cleared the __GFP_WAIT bit too). On Monday 22 March 2010, Andrew Morton wrote: > On Tue, 16 Mar 2010 22:03:53 GMT > bugzilla-daemon@bugzilla.kernel.org wrote: > > > http://bugzilla.kernel.org/show_bug.cgi?id=15551 > > This'll be a post-2.6.33 regression. > > > Summary: wifi doesn't work after resume from s2ram on ThinkPad > > T61 > > Product: Networking > > Version: 2.5 > > Kernel Version: 2.6.34-rc1 > > Platform: All > > OS/Version: Linux > > Tree: Mainline > > Status: NEW > > Severity: normal > > Priority: P1 > > Component: Wireless > > AssignedTo: networking_wireless@kernel-bugs.osdl.org > > ReportedBy: alex.vizor@gmail.com > > Regression: Yes > > > > > > Created an attachment (id=25561) > > --> (http://bugzilla.kernel.org/attachment.cgi?id=25561) > > dmesg > > > > Hi. I caught this after several resumes from s2ram, from s2disk it works > ok. > > > > I attached my dmesg. Please inform me if I can do the report more useful. > > > > This: > > : [22456.252166] s2ram: page allocation failure. order:4, mode:0x4010 > : [22456.252172] Pid: 23558, comm: s2ram Not tainted 2.6.34-rc1 #1 > : [22456.252175] Call Trace: > : [22456.252187] [<ffffffff810b8dc9>] __alloc_pages_nodemask+0x565/0x60c > : [22456.252195] [<ffffffff810de879>] alloc_pages_current+0x90/0x99 > : [22456.252201] [<ffffffff810b7a9e>] __get_free_pages+0x9/0x46 > : [22456.252221] [<ffffffffa03b4bdd>] iwl_tx_queue_init+0xf2/0x2a1 [iwlcore] > : [22456.252227] [<ffffffffa03b4f12>] iwl_txq_ctx_reset+0x186/0x20c > [iwlcore] > : [22456.252233] [<ffffffffa03ae41c>] iwl_hw_nic_init+0x124/0x139 [iwlcore] > > Rafael, this is fallout from 452aa6999e6703ffbddd7f6ea124d3968915f3e3 > ("mm/pm: force GFP_NOIO during suspend/hibernation and resume"). > Wireless was previously doing this stupidly-large allocation with > GFP_KERNEL. But I think the above change is what flipped it to > __GFP_WAIT, which caused the page allocator to go into lame-and-sucky > mode, so the allocation failed. Well, technically the problem is in the driver. It shouldn't do anything like this during suspend/resume. I'll try to figure out how to fix this at minimal cost. > (btw, that was a bug - 452aa6999e6703ffbddd7f6ea124d3968915f3e3 should > have cleared the __GFP_WAIT bit too). Hmm, should it? That would make all allocations GFP_ATOMIC and is this what we really want? Rafael On Monday 22 March 2010, Andrew Morton wrote: > On Tue, 16 Mar 2010 22:03:53 GMT > bugzilla-daemon@bugzilla.kernel.org wrote: > > > http://bugzilla.kernel.org/show_bug.cgi?id=15551 > > This'll be a post-2.6.33 regression. > > > Summary: wifi doesn't work after resume from s2ram on ThinkPad > > T61 > > Product: Networking > > Version: 2.5 > > Kernel Version: 2.6.34-rc1 > > Platform: All > > OS/Version: Linux > > Tree: Mainline > > Status: NEW > > Severity: normal > > Priority: P1 > > Component: Wireless > > AssignedTo: networking_wireless@kernel-bugs.osdl.org > > ReportedBy: alex.vizor@gmail.com > > Regression: Yes > > > > > > Created an attachment (id=25561) > > --> (http://bugzilla.kernel.org/attachment.cgi?id=25561) > > dmesg > > > > Hi. I caught this after several resumes from s2ram, from s2disk it works > ok. > > > > I attached my dmesg. Please inform me if I can do the report more useful. > > > > This: > > : [22456.252166] s2ram: page allocation failure. order:4, mode:0x4010 > : [22456.252172] Pid: 23558, comm: s2ram Not tainted 2.6.34-rc1 #1 > : [22456.252175] Call Trace: > : [22456.252187] [<ffffffff810b8dc9>] __alloc_pages_nodemask+0x565/0x60c > : [22456.252195] [<ffffffff810de879>] alloc_pages_current+0x90/0x99 > : [22456.252201] [<ffffffff810b7a9e>] __get_free_pages+0x9/0x46 > : [22456.252221] [<ffffffffa03b4bdd>] iwl_tx_queue_init+0xf2/0x2a1 [iwlcore] > : [22456.252227] [<ffffffffa03b4f12>] iwl_txq_ctx_reset+0x186/0x20c > [iwlcore] > : [22456.252233] [<ffffffffa03ae41c>] iwl_hw_nic_init+0x124/0x139 [iwlcore] > > Rafael, this is fallout from 452aa6999e6703ffbddd7f6ea124d3968915f3e3 > ("mm/pm: force GFP_NOIO during suspend/hibernation and resume"). > Wireless was previously doing this stupidly-large allocation with > GFP_KERNEL. But I think the above change is what flipped it to > __GFP_WAIT, which caused the page allocator to go into lame-and-sucky > mode, so the allocation failed. Well, technically the problem is in the driver. It shouldn't do anything like this during suspend/resume. I'll try to figure out how to fix this at minimal cost. > (btw, that was a bug - 452aa6999e6703ffbddd7f6ea124d3968915f3e3 should > have cleared the __GFP_WAIT bit too). Hmm, should it? That would make all allocations GFP_ATOMIC and is this what we really want? Rafael Created attachment 25651 [details]
fix the problem from the iwlwifi side
Thanks! Alex, can you test if the patch in comment #9 helps, please? Mmmm. It failed when I tried to apply this patch Forgot to mention I tried it on 2.6.34-rc2 Do you mean the patch didn't apply? I tried to apply it like: patch -p1<../iwl.patch On Wed, Mar 24, 2010 at 12:21 AM, <bugzilla-daemon@bugzilla.kernel.org>wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=15551 > > > > > > --- Comment #13 from Rafael J. Wysocki <rjw@sisk.pl> 2010-03-23 22:21:12 > --- > Do you mean the patch didn't apply? > > -- > Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You reported the bug. > Created attachment 25668 [details]
fix the problem from the iwlwifi side (rebased against Linus' tip)
The previous patch was against iwlwifi git tip. This one was against Linus' tip.
Looks like it works with this patch. I did s2ram several times and wifi works good after resume Created attachment 25916 [details]
Kernel log (2.6.34-rc3)
It is happening on 2.6.34-rc3 without patch proposed in this bug report
The patch proposed in comment #15 and verified to work in comment #16 can now be found in linux-2.6. Can this bug be closed? Yeah now it works well with 2.6.34-rc4 Caught it on 2.6.34-rc4 Apr 19 22:53:23 loki kernel: [70023.276109] ------------[ cut here ]------------ Apr 19 22:53:23 loki kernel: [70023.276157] WARNING: at net/mac80211/work.c:812 ieee80211_work_work+0x7f/0xdee [mac80211]() Apr 19 22:53:23 loki kernel: [70023.276164] Hardware name: 64608SG Apr 19 22:53:23 loki kernel: [70023.276169] work scheduled while going to suspend Apr 19 22:53:23 loki kernel: [70023.276174] Modules linked in: usbhid hid cryptd aes_x86_64 aes_generic vboxnetadp vboxnetflt autofs4 acpi_cpufreq cpufr eq_conservative cpufreq_userspace cpufreq_powersave cpufreq_stats uinput microcode fuse vboxdrv firewire_sbp2 loop snd_hda_codec_analog snd_hda_intel sn d_hda_codec arc4 snd_hwdep ecb snd_pcm_oss snd_mixer_oss snd_pcm thinkpad_acpi snd_seq_midi snd_rawmidi iwlagn snd_seq_midi_event iwlcore pcmcia snd_seq mac80211 snd_timer yenta_socket joydev snd_seq_device rsrc_nonstatic tpm_tis cfg80211 snd psmouse video intel_agp i2c_i801 tpm soundcore pcmcia_core rf kill snd_page_alloc battery tpm_bios wmi serio_raw ac output nvram processor evdev pcspkr ext4 mbcache jbd2 crc16 sd_mod crc_t10dif ide_cd_mod cdrom ata _generic ata_piix ahci sdhci_pci uhci_hcd sdhci libata firewire_ohci piix ehci_hcd firewire_core crc_itu_t mmc_core led_class scsi_mod usbcore e1000e nl s_base ide_core thermal thermal_sys [last unloaded: scsi_wait_scan] Apr 19 22:53:23 loki kernel: [70023.276397] Pid: 25371, comm: ifconfig Tainted: G W 2.6.34-rc4 #2 Apr 19 22:53:23 loki kernel: [70023.276403] Call Trace: Apr 19 22:53:23 loki kernel: [70023.276423] [<ffffffff81045f2d>] warn_slowpath_common+0x77/0x8f Apr 19 22:53:23 loki kernel: [70023.276434] [<ffffffff81045f92>] warn_slowpath_fmt+0x3c/0x3e Apr 19 22:53:23 loki kernel: [70023.276464] [<ffffffffa0335166>] ieee80211_work_work+0x7f/0xdee [mac80211] Apr 19 22:53:23 loki kernel: [70023.276475] [<ffffffff810710fc>] ? put_lock_stats+0x1c/0x27 Apr 19 22:53:23 loki kernel: [70023.276484] [<ffffffff810711c5>] ? lock_release_holdtime+0xbe/0xc3 Apr 19 22:53:23 loki kernel: [70023.276512] [<ffffffffa0335f46>] ieee80211_work_purge+0x71/0xbe [mac80211] Apr 19 22:53:23 loki kernel: [70023.276540] [<ffffffffa0336a53>] ieee80211_stop+0x4f/0x535 [mac80211] Apr 19 22:53:23 loki kernel: [70023.276552] [<ffffffff8104bb0f>] ? local_bh_enable_ip+0x9/0xb Apr 19 22:53:23 loki kernel: [70023.276563] [<ffffffff81381967>] ? _raw_spin_unlock_bh+0x3e/0x42 Apr 19 22:53:23 loki kernel: [70023.276577] [<ffffffff812cf09d>] __dev_close+0x73/0x86 Apr 19 22:53:23 loki kernel: [70023.276586] [<ffffffff812cce3b>] __dev_change_flags+0xb9/0x13c Apr 19 22:53:23 loki kernel: [70023.276595] [<ffffffff812cef3d>] dev_change_flags+0x1c/0x52 Apr 19 22:53:23 loki kernel: [70023.276606] [<ffffffff81318df1>] devinet_ioctl+0x2b1/0x564 Apr 19 22:53:23 loki kernel: [70023.276616] [<ffffffff812d0ca6>] ? dev_ioctl+0x5f9/0x65c Apr 19 22:53:23 loki kernel: [70023.276627] [<ffffffff8131a8d6>] inet_ioctl+0x8a/0xa2 Apr 19 22:53:23 loki kernel: [70023.276637] [<ffffffff812bc569>] sock_do_ioctl+0x24/0x43 Apr 19 22:53:23 loki kernel: [70023.276646] [<ffffffff812bc9a8>] sock_ioctl+0x20e/0x21d Apr 19 22:53:23 loki kernel: [70023.276658] [<ffffffff8106605a>] ? sched_clock_cpu+0xc3/0xce Apr 19 22:53:23 loki kernel: [70023.276670] [<ffffffff8110502a>] vfs_ioctl+0x2d/0xa1 Apr 19 22:53:23 loki kernel: [70023.276680] [<ffffffff811055a0>] do_vfs_ioctl+0x48b/0x4d1 Apr 19 22:53:23 loki kernel: [70023.276690] [<ffffffff81064877>] ? up_read+0x1e/0x37 Apr 19 22:53:23 loki kernel: [70023.276699] [<ffffffff81105637>] sys_ioctl+0x51/0x74 Apr 19 22:53:23 loki kernel: [70023.276709] [<ffffffff8106605a>] ? sched_clock_cpu+0xc3/0xce Apr 19 22:53:23 loki kernel: [70023.276721] [<ffffffff81008b82>] system_call_fastpath+0x16/0x1b Apr 19 22:53:23 loki kernel: [70023.276729] ---[ end trace 4151dc69780539f3 ]--- (In reply to comment #20) > Caught it on 2.6.34-rc4 > > Apr 19 22:53:23 loki kernel: [70023.276109] ------------[ cut here > ]------------ > Apr 19 22:53:23 loki kernel: [70023.276157] WARNING: at > net/mac80211/work.c:812 > ieee80211_work_work+0x7f/0xdee [mac80211]() > Apr 19 22:53:23 loki kernel: [70023.276164] Hardware name: 64608SG > Apr 19 22:53:23 loki kernel: [70023.276169] work scheduled while going to > suspend It is not clear if this is the same issue since the errors do not contain the memory allocation failure that triggered the problem in your initial report. Even so, the patch proposed in this bug report and tested by you already is not yet in an rc release. It can be found in linux-2.6 so it should be in rc5. You thus ran without the patch addressing this issue. Created attachment 26054 [details]
dmesg 2.6.34-rc4
Here is dmesg after resuming and reloading iwlagn module
(In reply to comment #22) > Created an attachment (id=26054) [details] > dmesg 2.6.34-rc4 > > Here is dmesg after resuming and reloading iwlagn module Yes - this appears to be the same problem since trouble starts with an allocation failure in iwl_tx_queue_init() ... so please apply the fix on top of 2.6.34-rc4 or run with the latest from linux-2.6 ... or wait for 2.6.34-rc5. Fixed by commit de0f60ea94e132c858caa64a44b2012e1e8580b0 . |
Created attachment 25561 [details] dmesg Hi. I caught this after several resumes from s2ram, from s2disk it works ok. I attached my dmesg. Please inform me if I can do the report more useful.