Bug 93041
Summary: | iwl3945: Stuck queue causes crash | ||
---|---|---|---|
Product: | Networking | Reporter: | Edward Hart (edward.dan.hart) |
Component: | Wireless | Assignee: | networking_wireless (networking_wireless) |
Status: | RESOLVED DOCUMENTED | ||
Severity: | normal | CC: | brianh, dikiy_evrej, edward.dan.hart, mark_k, nathan.collins, scruffythinking, stf_xl, szg00000, tomi.kyostila |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.16.0-031600-generic | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Relevant portion of dmesg
Output of lspci -v. Dmesg output with a backtraces. |
Description
Edward Hart
2015-02-10 16:49:25 UTC
Created attachment 166381 [details]
Relevant portion of dmesg
Firstly, I do not sure what causes this bug (sorry). It appears to occur randomly every two weeks or so.
However, the same things happen every time:
1. a queue becomes stuck
2. a hardware restart is requested
3. iwl3945 fails to set up bootstrap microcode
4. several call traces are output to dmesg
5. wireless connection is lost and the laptop has to be rebooted to reconnect.
Created attachment 166391 [details]
Output of lspci -v.
(In reply to Edward Hart from comment #1) > Firstly, I do not sure what causes this bug (sorry). To clarify: I do not know what causes the stuck queue that is part of this bug. This bug had stopped occurring until I upgraded to Ubuntu 15.04 and now it's happening with the same regularity as before. Does anyone have any idea how can I get more info/debug this? Some other people (myself included) are having this problem too: https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/1408963 (In reply to Mark from comment #5) > Some other people (myself included) are having this problem too: > https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/1408963 Thanks for the link. As reported on that launchpad page, the bug has not gone away since the update to Ubuntu 16.10. It may have gotten worse -- people who hadn't experienced the bug in a version or two, and at least one who never had a problem, are now dealing with it. (As am I.) Same problem here (with a kernel 2.6* there were no problem, till I upgraded to Mint 18 :D ) . dmesg attached. I tried do downgrade firmware to the old one: iwlwifi-3945-1.ucode It seems to be a workaround. I doesn't see the bug for a >6 hours now. Created attachment 244271 [details]
Dmesg output with a backtraces.
Not sure if this will help, but I had this issue for about 18 months on Ubuntu Unity and Ubuntu Gnome, multiple wifi drops a day requiring a reboot. Learned that Mate manages the network stack differently, so I installed Ubuntu Mate 16.10 over Ubuntu Gnome 16.10, and have had only 1 crash in over a week -- and I don't think that was related to the same issue (a number of things crashed at once). So that may help someone narrow down the issue -- whatever is happening in Unity and Gnome 3 to crash iwl3945, it's not happening on Mate. Downgrade the firmware is really a workaround, since 5 days any there are any problem with wifi. The salut was a bit too early :) Yesterday there were some stucks. But anyway, its a much, much fewer then it was with an actual firmware iwl-3945-2.ucode. (In reply to J from comment #10) > Not sure if this will help, but I had this issue for about 18 months on > Ubuntu Unity and Ubuntu Gnome, multiple wifi drops a day requiring a reboot. > Learned that Mate manages the network stack differently, so I installed > Ubuntu Mate 16.10 over Ubuntu Gnome 16.10, and have had only 1 crash in over > a week -- and I don't think that was related to the same issue (a number of > things crashed at once). > > So that may help someone narrow down the issue -- whatever is happening in > Unity and Gnome 3 to crash iwl3945, it's not happening on Mate. Scratch that -- iwl3945 still crashes n MATE requiring a restart, just not as often. In my case, it occurs about every three days on MATE, as opposed to every three hours on Unity or Gnome 3. Hearing that going back to iwl-3945-1.ucode fixes the problem for some. downgrade to 3.13.* makes the things much better... I'm getting this same problem on an HP/Compaq 6710b running Linux Mint 18.1. I've tried updating to the latest kernel build I could find, 4.9.11 (amd64) from kernel.ubuntu.org but it still happens. > iwl3945 0000:03:00.0: BSM uCode verification failed at addr 0x00003800+0 (of
> 900), is 0xa5a5a5a2, s/b 0xf802020
means that we can not communicate with the hardware via PCI bus. It's hard to tell what cause that: it can be hardware problem, iwl3945 firmware problem , kernel PCI sublayer problem, or iwl3945/mac80211/cfg80211 drivers problem.
(In reply to Ilya from comment #14) > downgrade to 3.13.* makes the things much better... Those are changes in iwl3945 drivers since 3.13: git log v3.13..HEAD --no-merges --oneline -- drivers/net/wireless/iwlegacy drivers/net/wireless/intel/iwlegacy/ ae3cf47 iwlegacy: make il3945_mac_ops __ro_after_init 6bdf1e0 Makefile: drop -D__CHECK_ENDIAN__ from cflags 1dc8079 iwlegacy: constify local structures 4c73195 iwlegacy: use IS_ENABLED() instead of checking for built-in or module 7947d3e mac80211: Add support for beacon report radio measurement 2cce76c iwlegacy: avoid warning about missing braces 57fbcce cfg80211: remove enum ieee80211_band 84d17a2 iwl4965: Fix more memory leaks in __il4965_up() c2fd344 iwl4965: Fix a memory leak in error handling code of __il4965_up fe9b479 iwl4965: Fix a null pointer dereference in il_tx_queue_free and il_cmd_queue_free fb9693f iwlegacy: Return directly if allocation fails in il_eeprom_init() 50ea05e mac80211: pass block ack session timeout to to driver 9ec855c iwlegacy: 4965-mac: constify il_sensitivity_ranges structure 31ced24 iwlegacy: mark il_adjust_beacon_interval as noinline 3f0267f iwlegacy: cleanup end of il_send_add_sta() 7ac9a36 iwlegacy: move under intel directory 621a5f7 debugfs: Pass bool pointer to debugfs_create_bool() e3abc8f mac80211: allow to transmit A-MSDU within A-MPDU 8358491 iwlegacy: convert hex_dump_to_buffer() to %*ph 30686bf mac80211: convert HW flags to unsigned long bitmap df14046 mac80211: remove support for IFF_PROMISC 93803b3 wireless: Use eth_<foo>_addr instead of memset ff5e568 iwlegacy: 4965-rs: Remove bogus colon after newline from debug message 1a94ace iwl3945: Use setup_timer af68b87 iwl4965: Use setup_timer 0f791eb4 mac80211: allow channel switch with multiple channel contexts 595a23f iwl4965: fix %d confusingly prefixed with 0x in format string 0d8614b mac80211: replace SMPS hw flags with wiphy feature bits 9baa3c3 PCI: Remove DEFINE_PCI_DEVICE_TABLE macro use 9f0b4cb iwlegacy: use correct structure type name in sizeof c56ef67 mac80211: support more than one band in scan request 45eeeaf iwlegacy: Convert /n to \n 77be2c5 mac80211: add vif to flush call 997bc71 iwl4965: disable 8K A-MSDU by default dbdac2b iwlegacy: properly enable power saving 8e67427 iwlegacy: merge reclaim check 59f0118 iwl3945: fix wakeup interrupt cc01f9b mac80211: remove module handling from rate control ops 631ad70 mac80211: make rate control ops const c8bf40a wireless: delete non-required instances of include <linux/init.h> ccbac29 iwlegacy: use ether_addr_equal_64bits c8aa5ab drivers: net: Mark functions as static in debug.c 0e06b09 drivers: net: Mark functions as static in 4965-debug.c 03a71e0 drivers: net: Mark functions as static in 3945-debug.c 5f5deff iwl3945: do not print RFKILL message a2f73b6 cfg80211: move regulatory flags to their own variable 8fe02e1 cfg80211: consolidate passive-scan and no-ibss flags most of them are cosmetic patches or justification to mac80211/cfg80211 interface changes, I do not see anything suspicious there. If this is regression from 3.13 most likely it was caused by some other kernel changes is in PCI sublayer. If bug is somehow reproducible it could be eventually bisected by git-bisect, but this can be hard, especially if bug is not easy to reproduce. Actually this commit is suspicious: dbdac2b iwlegacy: properly enable power saving However power save should be not used, check iw dev wlan0 get power_save and if is enabled, disable by: iw dev wlan0 set power_save off and check if it helps with the problem. Yes, my laptop shows power save is on after a fresh boot. I'll turn it off and leave it running for a while to see if that changes fixes it. So far so good. Nearly 8 hours uptime with no failure. Looks like power saving is the likely culprit. Before it was lucky to last 1 hour. (In reply to Brian Havard from comment #20) > So far so good. Nearly 8 hours uptime with no failure. Looks like power > saving is the likely culprit. Before it was lucky to last 1 hour. Any news? Are there problems after this workaround? If no, how can we fix it with a code? This is firmware problem, which will not be fixed. Power save was allowed in "dbdac2b iwlegacy: properly enable power saving" commit to allow users to save power (for some people PS do not crash the firmware). What can eventually be done is add warning when PS is enabled, as seems same users enable PS and do not relate it with the problems. (In reply to Stanislaw Gruszka from comment #22) > What can eventually be done is add warning when PS is enabled, as seems same > users enable PS and do not relate it with the problems. Or distributions enable it without any user intervention, instead of stay with sane default. After forcing power save off (using an if-up.d script) I've had no more problems with the wifi. Patch adding warning was posted: http://marc.info/?l=linux-wireless&m=149483519701866&w=2 |