Bug 15843 (ath5k-wakeup)

Summary: ath5k can't properly resume/reset AR5004X (minipci) after sleep
Product: Drivers Reporter: Tomas Mudrunka (harviecz)
Component: network-wirelessAssignee: drivers_network-wireless (drivers_network-wireless)
Status: CLOSED INSUFFICIENT_DATA    
Severity: blocking CC: bugzilla.kernel.org, giulio.genovese, harviecz, linville, me, mickflemm
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.33 - ArchLinux Subsystem:
Regression: No Bisected commit-id:
Attachments: grep ath /var/log/messages.log
Use pci save/restore
mugshot of crashed machine

Description Tomas Mudrunka 2010-04-24 14:32:17 UTC
Hello,  i've bought NeWeb Wistron CM9 (Atheros AR5001X+) few days ago and it was working since that day, but today i booted up my pc and i saw that pidgin was connected when i was leaving to get a breakfast, when i returned, i found my laptop completely freezed up, so i've rebooted it and after that i was not able to connect to the internet and in everything.log i saw somethig like this:

...yesterday...
Apr 24 14:09:10 harvie-ntb kernel: ath5k 0000:00:0b.0: restoring config space at offset 0xf (was 0x1c0a0100, writing 0x1c0a0104)
Apr 24 14:09:10 harvie-ntb kernel: ath5k 0000:00:0b.0: restoring config space at offset 0x4 (was 0x0, writing 0xe2000000)
Apr 24 14:09:10 harvie-ntb kernel: ath5k 0000:00:0b.0: restoring config space at offset 0x3 (was 0x0, writing 0xa810)
Apr 24 14:09:10 harvie-ntb kernel: ath5k 0000:00:0b.0: restoring config space at offset 0x1 (was 0x2900000, writing 0x2900016)
...breakfast+reboot...
Apr 24 14:36:59 harvie-ntb kernel: ath5k 0000:00:0b.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
Apr 24 14:36:59 harvie-ntb kernel: ath5k 0000:00:0b.0: registered as 'phy0'
Apr 24 14:36:59 harvie-ntb kernel: ath5k phy0: Invalid EEPROM checksum: 0xdb11 eep_max: 0x0340 (default size)
Apr 24 14:36:59 harvie-ntb kernel: ath5k phy0: unable to init EEPROM
Apr 24 14:36:59 harvie-ntb kernel: ath5k 0000:00:0b.0: PCI INT A disabled
Apr 24 14:36:59 harvie-ntb kernel: ath5k: probe of 0000:00:0b.0 failed with error -5
...now...

i've tried to do another reboot, but after it i am getting such messages:

Apr 24 14:36:59 harvie-ntb kernel: ath5k 0000:00:0b.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
Apr 24 14:36:59 harvie-ntb kernel: ath5k 0000:00:0b.0: registered as 'phy0'
Apr 24 14:36:59 harvie-ntb kernel: ath5k phy0: Invalid EEPROM checksum: 0xdb11 eep_max: 0x0340 (default size)
Apr 24 14:36:59 harvie-ntb kernel: ath5k phy0: unable to init EEPROM
Apr 24 14:36:59 harvie-ntb kernel: ath5k 0000:00:0b.0: PCI INT A disabled
Apr 24 14:36:59 harvie-ntb kernel: ath5k: probe of 0000:00:0b.0 failed with error -5
Apr 24 14:45:00 harvie-ntb kernel: ath5k 0000:00:0b.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
Apr 24 14:45:00 harvie-ntb kernel: ath5k 0000:00:0b.0: setting latency timer to 64
Apr 24 14:45:00 harvie-ntb kernel: ath5k 0000:00:0b.0: registered as 'phy1'
Apr 24 14:45:00 harvie-ntb kernel: Modules linked in: ath5k(+) nls_cp437 vfat fat ipv6 ext4 jbd2 crc16 cpufreq_ondemand cpufreq_stats cpufreq_conservative cpufreq_userspace cpufreq_powersave powernow_k8 freq_table tun snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_intel8x0m snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_timer snd soundcore snd_page_alloc pcmcia sis_agp mac80211 yenta_socket rsrc_nonstatic i2c_sis96x amd64_agp shpchp ath battery thermal ac processor pcmcia_core i2c_core agpgart pci_hotplug usb_storage cfg80211 rfkill button joydev k8temp led_class sis900 mii sr_mod cdrom psmouse evdev loop serio_raw sg pcspkr autofs4 fuse rtc_cmos rtc_core rtc_lib usbhid hid ext3 jbd mbcache ohci_hcd ehci_hcd usbcore sisfb ata_generic sd_mod pata_sis pata_acpi libata scsi_mod [last unloaded: ath5k]
Apr 24 14:45:00 harvie-ntb kernel: [<fa232d30>] ? ath5k_intr+0x0/0x2a0 [ath5k]
Apr 24 14:45:00 harvie-ntb kernel: [<fa23389a>] ath5k_pci_probe+0x2a1/0x11ff [ath5k]
Apr 24 14:45:00 harvie-ntb kernel: [<f808901b>] init_ath5k_pci+0x1b/0x33 [ath5k]
Apr 24 14:45:00 harvie-ntb kernel: [<f8089000>] ? init_ath5k_pci+0x0/0x33 [ath5k]

in lspci now i see:
00:0b.0 Ethernet controller: CastleNet Technology Inc. Device 0013 (rev 01)

i am not sure, but i guess that before it was identyfiing itself as NeWeb Wistron CM9

so i have another piece of hardware bricked by Linux kernel (and second minipci wifi card).
Comment 1 John W. Linville 2010-04-25 13:42:35 UTC
"ath5k phy0: Invalid EEPROM checksum: 0xdb11 eep_max: 0x0340 (default size)" -- surely this is the issue.

Bob & Nick -- any suggestions to help Thomas correct this?  Is it reasonably possible that ath5k corrupted the EEPROM?
Comment 2 Bob Copeland 2010-04-25 18:39:44 UTC
(In reply to comment #0)
> Hello,  i've bought NeWeb Wistron CM9 (Atheros AR5001X+) few days ago and it
> was working since that day, but today i booted up my pc and i saw that pidgin
> was connected when i was leaving to get a breakfast, when i returned, i found
> my laptop completely freezed up, so i've rebooted it and after that i was not

This is quite unfortunate...

> in lspci now i see:
> 00:0b.0 Ethernet controller: CastleNet Technology Inc. Device 0013 (rev 01)
> 
> i am not sure, but i guess that before it was identyfiing itself as NeWeb
> Wistron CM9

Can you show lspci -vnn?

> so i have another piece of hardware bricked by Linux kernel (and second
> minipci
> wifi card).

What was the first bricked hardware?  In the same system?

Hmm, this seems to be a common problem on these cards, with madwifi as well although this is the first I heard of it:

http://www.mobilnews.cz/blog/?p=36

(In reply to comment #1)
> Bob & Nick -- any suggestions to help Thomas correct this?  Is it reasonably
> possible that ath5k corrupted the EEPROM?

Well, I guess we can reload an EEPROM image with ath_info, it's a bit dangerous though.

Whether the driver can write the eeprom depends on the version of hardware - (5001X is a 5212+?)  on 5210 (which works for virtually no one anyway) any write to the eeprom space basically works; for 5211+ you have to set the proper bit in an address register first, and we never do that in the driver.

The pages are mapped writable in mmio space, so some errant kernel bug can conceivably set the bit and then scribble in it.  ath_info manages to change the eeprom by mmap on /dev/mem as root.
Comment 3 Tomas Mudrunka 2010-04-25 19:25:23 UTC
> lspci -vnn
---CUT----

00:0b.0 Ethernet controller [0200]: CastleNet Technology Inc. Device [1688:0013] (rev 01)
	Subsystem: Wistron NeWeb Corp. Device [185f:1012]
	Flags: bus master, fast Back2Back, medium devsel, latency 0
	Memory at e2010000 (32-bit, non-prefetchable) [size=64K]
	Memory at 88020000 (64-bit, non-prefetchable) [size=64K]
	Memory at <unassigned> (64-bit, non-prefetchable)
	Memory at <invalid-64bit-slot> (64-bit, non-prefetchable)
	Capabilities: [44] MSI: Enable- Count=16/2 Maskable+ 64bit+

---CUT----

NOTE: in fact it's NOT a "CastleNet Technology Inc. Device" there was different id when EEPROM was OK and instead of "Memory at e2010000" there was "Memory at e2000000" or something else...

> Hmm, this seems to be a common problem on these cards, with madwifi as well
> although this is the first I heard of it:
> http://www.mobilnews.cz/blog/?p=36

i know that page, but i am not able to build such obsolete drivers even with oldest kernel which i have available :-( and it's quite unusable to do this each few hours just to get my internet connection back (and sometimes kernel freezes up totaly during ath5k problems and therefore i can't get no log about whats going on).

> Well, I guess we can reload an EEPROM image with ath_info, it's a bit
> dangerous though.

It would be cool if we'll have way to restore the card...
BTW EEPROM contains some information about MAC adress and fine performance tuning which is different for each card, so this should be saved.

isn't there just some way to lock EEPROM memory of this card so it will be read-only and therefore nothing will be able to brick it?

I found another strange thing... when i lefted my laptop powered off (not suspended) for few hours the eeprom was back but after few minutes of operation it was gone again and lspci -vvv was again showing "CastleNet Technology Inc. Device", so i am bit confused about what is happening. and logs are bit different each time. i will try to leave my comp powered off again when i will have some time to do so. BTW i have same problem with 2.6.27 kernel + ath5k driver.


> What was the first bricked hardware?  In the same system?

I was bricked my broadcom 4318 (b43 driver), but it was caused by my stupidness by accidentally calling two rmmods and two modprobes of b43 all almost at the same time in my script...

right now i am using few years old PCMCIA D-Link DWL-G650 (wifi b/g) with ath5k driver and everything just works as supposed (even with madwifi).

Wistron CM9 is bit newer (and supports wifi a/b/g), it should work with madwifi, but there are some issues. It's very popular and it's refered to be a "best linux minipci wifi card", it's widely used in embeded linux (OpenWRT) routers and mikrotik routerboards. Chipset used is ar5001X+ which contains few different Atheros chips...
Comment 4 Tomas Mudrunka 2010-04-25 22:36:53 UTC
I left computer off for ~30 minutes and miracle happend again:

...
00:0b.0 Ethernet controller [0200]: Atheros Communications Inc. Atheros AR5001X+ Wireless Network Adapter [168c:0013] (rev 01)
	Subsystem: Wistron NeWeb Corp. CM9 Wireless a/b/g MiniPCI Adapter [185f:1012]
	Flags: bus master, medium devsel, latency 168, IRQ 17
	Memory at e2000000 (32-bit, non-prefetchable) [size=64K]
	Capabilities: [44] Power Management version 2
	Kernel driver in use: ath5k
	Kernel modules: ath_pci, ath5k
...


OT: BTW when i use rfkill button to disable wifi and then i enable it again, i need to be messing around for a while to get network manager use the card again (reloading module was not enough...)

1 ;( root@harvie-ntb ~ # rmmod ath5k
0 ;) root@harvie-ntb ~ # modprobe ath5k
0 ;) root@harvie-ntb ~ # ifconfig wlan0 up
SIOCSIFFLAGS: Unknown error 132
255 ;( root@harvie-ntb ~ # iwconfig wlan0 txpower auto
0 ;) root@harvie-ntb ~ # ifconfig wlan0 up

but it's unrelated.


seems that EEPROM can be fixed by letting computer turned off for a while (1 minute is not enough) but it's not much cool to have crashing kernel with ath5k and need to turn off computer for a minutes when i wan't to do something. (first time i experienced this issue was during programming lecture when i needed to write code which means using google of course). Anyway i am POSTing this update using affected Wistron CM9 card.
Comment 5 Tomas Mudrunka 2010-04-25 22:45:59 UTC
This is what i see now when card and driver are properly loaded:

ath5k 0000:00:0b.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
ath5k 0000:00:0b.0: registered as 'phy1'
ath: EEPROM regdomain: 0x0
ath: EEPROM indicates default country code should be used
ath: doing EEPROM country->regdmn map search
ath: country maps to regdmn code: 0x3a
ath: Country alpha2 being used: US
ath: Regpair used: 0x3a
ath5k phy1: Atheros AR5213A chip found (MAC: 0x59, PHY: 0x43)
ath5k phy1: RF5112B multiband radio found (0x36)



Here are few errors which i have seen in past days (in many various combinations - they are not chronologicaly sorted!):

kernel: ath5k phy3: failed to warm reset the MAC Chip
kernel: ath5k phy3: failed to resume the MAC Chip
kernel: ath5k phy3: failed to wakeup the MAC Chip
kernel: ath5k phy3: can't reset hardware (-5)
kernel: ath5k phy0: POST Failed !!!
kernel: ath5k: probe of 0000:00:0b.0 failed with error -5
kernel: ath5k: probe of 0000:00:0b.0 failed with error -11
kernel: ath5k phy3: noise floor calibration failed (2427MHz)
kernel: ath5k phy3: noise floor calibration failed (2412MHz)
kernel: ath5k phy3: ath5k_chan_set: unable to reset channel (2412 Mhz)
kernel: ath5k phy3: ath5k_chan_set: unable to reset channel (2417 Mhz)
kernel: ath5k phy3: ath5k_chan_set: unable to reset channel (2422 Mhz)

hope this will help
Comment 6 Tomas Mudrunka 2010-04-26 10:20:07 UTC
Another fail just now!

note that card looks ok, but after next reboot i guess it will look like corrupted card...
lspci -vvvnn:

00:0b.0 Ethernet controller [0200]: Atheros Communications Inc. Atheros AR5001X+ Wireless Network Adapter [168c:0013] (rev 01)
	Subsystem: Wistron NeWeb Corp. CM9 Wireless a/b/g MiniPCI Adapter [185f:1012]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B+ DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR+ INTx-
	Latency: 96 (2500ns min, 7000ns max), Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 17
	Region 0: Memory at e2000000 (32-bit, non-prefetchable) [size=64K]
	Capabilities: [44] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=2 PME-
	Kernel modules: ath_pci, ath5k


dmesg | grep ath:

ath5k 0000:00:0b.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
ath5k 0000:00:0b.0: registered as 'phy0'
ath: EEPROM regdomain: 0x0
ath: EEPROM indicates default country code should be used
ath: doing EEPROM country->regdmn map search
ath: country maps to regdmn code: 0x3a
ath: Country alpha2 being used: US
ath: Regpair used: 0x3a
ath5k phy0: Atheros AR5213A chip found (MAC: 0x59, PHY: 0x43)
ath5k phy0: RF5112B multiband radio found (0x36)
ath5k 0000:00:0b.0: PCI INT A disabled
ath5k 0000:00:0b.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
ath5k 0000:00:0b.0: registered as 'phy0'
ath: EEPROM regdomain: 0x0
ath: EEPROM indicates default country code should be used
ath: doing EEPROM country->regdmn map search
ath: country maps to regdmn code: 0x3a
ath: Country alpha2 being used: US
ath: Regpair used: 0x3a
ath5k phy0: Atheros AR5213A chip found (MAC: 0x59, PHY: 0x43)
ath5k phy0: RF5112B multiband radio found (0x36)
ath5k 0000:00:0b.0: PCI INT A disabled
ath5k 0000:00:0b.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
ath5k 0000:00:0b.0: registered as 'phy1'
ath: EEPROM regdomain: 0x0
ath: EEPROM indicates default country code should be used
ath: doing EEPROM country->regdmn map search
ath: country maps to regdmn code: 0x3a
ath: Country alpha2 being used: US
ath: Regpair used: 0x3a

ath5k phy1: Atheros AR5213A chip found (MAC: 0x59, PHY: 0x43)
ath5k phy1: RF5112B multiband radio found (0x36)
WARNING: at drivers/net/wireless/ath/ath5k/base.c:1166 ath5k_tasklet_rx+0x52c/0x560 [ath5k]()
Modules linked in: ath5k led_class mac80211 ath cfg80211 rfkill ipv6 ext4 jbd2 crc16 cpufreq_ondemand cpufreq_stats cpufreq_conservative cpufreq_userspace cpufreq_powersave snd_seq_dummy arc4 ecb powernow_k8 freq_table tun snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device joydev snd_intel8x0m snd_intel8x0 snd_ac97_codec snd_pcm snd_timer thermal ac button snd pcmcia soundcore sis_agp battery snd_page_alloc processor ac97_bus usbhid yenta_socket amd64_agp i2c_sis96x sr_mod rsrc_nonstatic shpchp hid psmouse agpgart cdrom i2c_core sis900 mii sg pcmcia_core pci_hotplug k8temp loop serio_raw pcspkr evdev autofs4 tpm_tis tpm tpm_bios fuse rtc_cmos rtc_core rtc_lib ext3 jbd mbcache ohci_hcd ehci_hcd usbcore sisfb ata_generic sd_mod pata_sis pata_acpi libata scsi_mod [last unloaded: led_class]
 [<c104314d>] warn_slowpath_common+0x6d/0xa0
 [<fa66ecfc>] ? ath5k_tasklet_rx+0x52c/0x560 [ath5k]
 [<fa66ecfc>] ? ath5k_tasklet_rx+0x52c/0x560 [ath5k]
 [<c10431c6>] warn_slowpath_fmt+0x26/0x30
 [<fa66ecfc>] ath5k_tasklet_rx+0x52c/0x560 [ath5k]
ath5k phy1: failed to warm reset the MAC Chip
ath5k phy1: can't reset hardware (-5)
ath5k 0000:00:0b.0: restoring config space at offset 0xf (was 0x1c0a0100, writing 0x1c0a0104)
ath5k 0000:00:0b.0: restoring config space at offset 0x4 (was 0x0, writing 0xe2000000)
ath5k 0000:00:0b.0: restoring config space at offset 0x3 (was 0x0, writing 0xa810)
ath5k 0000:00:0b.0: restoring config space at offset 0x1 (was 0x2900000, writing 0x82900016)
ath5k 0000:00:0b.0: restoring config space at offset 0xf (was 0x1c0a0100, writing 0x1c0a0104)
ath5k 0000:00:0b.0: restoring config space at offset 0x4 (was 0x0, writing 0xe2000000)
ath5k 0000:00:0b.0: restoring config space at offset 0x3 (was 0x0, writing 0xa810)
ath5k 0000:00:0b.0: restoring config space at offset 0x1 (was 0x2900000, writing 0x2900016)

WARNING: at drivers/net/wireless/ath/ath5k/base.c:1166 ath5k_tasklet_rx+0x52c/0x560 [ath5k]()
Modules linked in: ath5k led_class mac80211 ath cfg80211 rfkill ipv6 ext4 jbd2 crc16 cpufreq_ondemand cpufreq_stats cpufreq_conservative cpufreq_userspace cpufreq_powersave snd_seq_dummy arc4 ecb powernow_k8 freq_table tun snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device joydev snd_intel8x0m snd_intel8x0 snd_ac97_codec snd_pcm snd_timer thermal ac button snd pcmcia soundcore sis_agp battery snd_page_alloc processor ac97_bus usbhid yenta_socket amd64_agp i2c_sis96x sr_mod rsrc_nonstatic shpchp hid psmouse agpgart cdrom i2c_core sis900 mii sg pcmcia_core pci_hotplug k8temp loop serio_raw pcspkr evdev autofs4 tpm_tis tpm tpm_bios fuse rtc_cmos rtc_core rtc_lib ext3 jbd mbcache ohci_hcd ehci_hcd usbcore sisfb ata_generic sd_mod pata_sis pata_acpi libata scsi_mod [last unloaded: led_class]
 [<c104314d>] warn_slowpath_common+0x6d/0xa0
 [<fa66ecfc>] ? ath5k_tasklet_rx+0x52c/0x560 [ath5k]
 [<fa66ecfc>] ? ath5k_tasklet_rx+0x52c/0x560 [ath5k]
 [<c10431c6>] warn_slowpath_fmt+0x26/0x30
 [<fa66ecfc>] ath5k_tasklet_rx+0x52c/0x560 [ath5k]
 [<fa667b78>] ? ath5k_hw_calibration_poll+0x18/0x70 [ath5k]
ath5k phy1: failed to warm reset the MAC Chip
ath5k phy1: can't reset hardware (-5)
ath5k 0000:00:0b.0: PCI INT A disabled
ath5k 0000:00:0b.0: enabling device (0604 -> 0606)
ath5k 0000:00:0b.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
ath5k 0000:00:0b.0: setting latency timer to 64
ath5k 0000:00:0b.0: registered as 'phy2'
ath5k phy2: failed to wakeup the MAC Chip
ath5k 0000:00:0b.0: PCI INT A disabled
ath5k: probe of 0000:00:0b.0 failed with error -5



0 ;) root@harvie-ntb ~ # rmmod ath5k
Killed (SIGKILL)
137 ;( root@harvie-ntb ~ # rmmod ath5k
ERROR: Removing 'ath5k': Device or resource busy
Comment 7 Tomas Mudrunka 2010-04-26 10:54:16 UTC
i left pc turned of for 15 minutes and it didn't helped:

ath5k phy0: can't reset hardware (-5)
ath5k phy0: failed to wakeup the MAC Chip
ath5k phy0: can't reset hardware (-5)
ath5k phy0: failed to wakeup the MAC Chip
ath5k phy0: can't reset hardware (-5)
ath5k phy0: failed to wakeup the MAC Chip
ath5k phy0: can't reset hardware (-5)
ath5k phy0: failed to wakeup the MAC Chip
ath5k phy0: can't reset hardware (-5)
ath5k phy0: failed to wakeup the MAC Chip
ath5k phy0: can't reset hardware (-5)



BTW i mentioned that my problem happens when i wake pc from suspended state (to ram using pm-utils and basic kernel sleep driver in my case). maybe there's some problem with waking up... but in some cases i was using wifi few minutes after wakeup and then i got error message (and lost link).

which seems to cause messages like this:
ath5k phy0: can't reset hardware (-5)
ath5k phy0: failed to resume the MAC Chip
ath5k phy0: failed to wakeup the MAC Chip
ath5k phy0: failed to warm reset the MAC Chip
Comment 8 Tomas Mudrunka 2010-04-26 23:49:17 UTC
Kernel crashed again... card suddenly started reconnecting several times (with help of networkmanager) and after few

ath5k phy1: failed to warm reset the MAC Chip
ath5k phy1: can't reset hardware (-5)

kernel finaly crashed.

i think i'll rather try madwifi for few days (even when card have very loosy sensitivity with madwifi) because i can't do kernel debuging while learning for my exams... i hope madwifi will do the work for a while :(


i've noted few errors i didn't seen before:

ath5k_hw_get_isr: ISR: 0x00000000 IMR: 0x800824b5
WARNING: at drivers/net/wireless/ath/ath5k/base.c:1166 ath5k_tasklet_rx+0x52c/0x560 [ath5k]()
[<f812bb78>] ? ath5k_hw_calibration_poll+0x18/0x70 [ath5k]
ath5k phy43: unsupported jumbo


(i will attach whole log | grep ath)
Comment 9 Tomas Mudrunka 2010-04-26 23:51:55 UTC
Created attachment 26156 [details]
grep ath /var/log/messages.log

added some more logs... hope it will help ;o)
Comment 11 Tomas Mudrunka 2010-04-27 14:47:22 UTC
i've tried:
echo 1 > /sys/module/ath5k/drivers/pci\:ath5k/0000\:00\:0b.0/reset

and i was getting endless loop of this error in dmesg:
ath5k 0000:00:0b.0: restoring config space at offset 0x1 (was 0x2900400, writing 0x2900016)

when i unloaded ath5k, echo returned. when i tried to load it back, kernel stucked.
Comment 12 Tomas Mudrunka 2010-04-28 23:25:05 UTC
Anyway... i've been using Wistron CM9 happily with ath5k (btw madwifi have the same issue) for some time and everything works just fine when i am not trying to suspend my pc to ram (which puts card to powersaving mode). Because driver is not able to reset card properly after wakeup for some reason... I believe that this issue was not so obvious because many of people are using Wistron CM9 on Linux routers where is no need to hibernate the system or enter some kind of powersaving mode...

maybe the problem is in ath driver which is AFAIK used by both - ath5k and madwifi.

BTW unloading module before sleep and reloading it later after wakeup do not help. Is there some way to protect card from being powersaved (just for workaround until driver will be fixed)?
Comment 13 Tomas Mudrunka 2010-05-03 16:20:42 UTC
I've just found that NeWeb Wistron CM9 card have AR5004X chipset (NOT AR5001X+), so there is also some problem in detection with lspci:
http://www.atheros.com/pt/bulletins/AR5004XBulletin.pdf (Wistron CM9)
http://www.atheros.com/pt/bulletins/AR5001X+Bulletin.pdf (seems to be older chipset used in popular Orinoco cards)

But my Wistron CM9 shows this:
00:0b.0 Ethernet controller: Atheros Communications Inc. Atheros AR5001X+ Wireless Network Adapter (rev 01)

but according to dmesg it seems that particular chips from this chipsets are detected properly...


BTW now i've had small similar problems (unable to reset chip, disconnects and kernel crash) with wifi after reboot (i am not using sleep since i am using this card), but after imediate reboot everything started to work again...
Comment 14 Bob Copeland 2010-05-03 22:03:24 UTC
madwifi does not use the ath driver, just ath5k, ath9k, and etc.  It does have a lot of the same code as madwifi.

lspci information comes directly from PCI configuration space -- the driver just reads it and reports it as a string, there's no detection there, so not to worry about seeing AR5001X.  Those numbers are largely meaningless anyway.

The reason I asked for lspci -vnn before was to see if there were bit flips in the pci config memory, and there are:

When the device misreports as CastleNet xxx, pci id is: 1688:0013.  When wireless is working, you get: 168b:0013.  Note that the bottom two bits in the real value of the vendor ID got lost due to suspend to ram.  That, and lots more than normal values in the configuration space got lost.  So something bad is happening when power drops due to suspend; some values in the pci memory disappear.

I'm not sure what to debug next in this setup, but it's not the first case I've heard of vendor ID changing, there were a couple of other reports in recent kernels... ath5k didn't change a lot in that cycle so it could be platform related.  Does it happen in 2.6.{31,32} by chance?
Comment 15 Bob Copeland 2010-05-03 22:18:22 UTC
BTW here's a couple of patches to try reverting.

commit 0d0cd72fa1e6bfd419c99478ec70b4877ed0ef86
Author: Bob Copeland <me@bobcopeland.com>
Date:   Sat Jul 4 12:59:54 2009 -0400

    ath5k: do not release irq across suspend/resume
    
    Paraphrasing Rafael J. Wysocki: "drivers should not release PCI IRQs
    in suspend."  Doing so causes a warning during suspend/resume on some
    platforms.
    
    Cc: Rafael J. Wysocki <rjw@sisk.pl>
    Reported-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
    Signed-off-by: Bob Copeland <me@bobcopeland.com>
    Signed-off-by: John W. Linville <linville@tuxdriver.com>

commit baee1f3caa5a771880144358dd07d32e09ba4dcf
Author: Rafael J. Wysocki <rjw@sisk.pl>
Date:   Mon Oct 5 00:52:09 2009 +0200

    Wireless / ath5k: Simplify suspend and resume callbacks
    
    Simplify the suspend and resume callbacks of ath5k by converting the
    driver to struct dev_pm_ops and allowing the PCI PM core to do the
    PCI-specific suspend/resume handling.
    
    Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
    Signed-off-by: John W. Linville <linville@tuxdriver.com>
Comment 16 Tomas Mudrunka 2010-05-03 23:35:21 UTC
Bob: Are you sure this really matters? i've tried to unload module before suspend and reload it after wakeup and it didn't helped... (i think that module should properly resume card when it's loaded) And BTW both patches were merged after 2.6.30, but i had problems even with 2.6.27-LTS...

btw now i will try madwifi-hal for some time to see if something will go better...

by now i found ath5k usable in 99% of cases when i do not use sleep (which is very uncomfortable for me)
Comment 17 Tomas Mudrunka 2010-05-04 07:42:29 UTC
I've tried MadWifi-HAL version and same problem there... maybe there's something wrong with suspend in kernel or with something not directly related to ath5k...


PM: resume of devices complete after 2407.394 msecs
PM: Finishing wakeup.
Restarting tasks ... done.
ath_pci 0000:00:0b.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
MadWifi: ath_attach: Switching rfkill capability off.
wifi0: Atheros AR5213A chip found (MAC 5.9, PHY 2112 4.3, Radio 3.6)
ath_pci: wifi0: Atheros 5212: mem=0xe2000000, irq=17
ath0: no IPv6 routers present
wifi0: ath_rxorn_tasklet: Receive FIFO overrun; resetting.
wifi0: ath_reset: Unable to reset hardware: 'Hardware didn't respond as expected' (HAL status 3)
wifi0: FAILED verification of AR5K_PHY_AGCSIZE_DESIRED default value [found=0x2 (2) expected=0xde (-34)].
       AR5K_PHY_AGCSIZE:0x9850:0x302d3130:..11.... ..1.11.1 ..11...1 ..11....:unknown
wifi0: FAILED verification of AR5K_PHY_AGCCOARSE_LO default value [found=0xae (-82) expected=0xcc (-52)].
     AR5K_PHY_AGCCOARSE:0x985c:0x72695700:.111..1. .11.1..1 .1.1.111 ........:unknown
wifi0: FAILED verification of AR5K_PHY_AGCCOARSE_HI default value [found=0x52 (-46) expected=0x6e (-18)].
     AR5K_PHY_AGCCOARSE:0x985c:0x72695700:.111..1. .11.1..1 .1.1.111 ........:unknown
wifi0: FAILED verification of AR5K_PHY_SIG_FIRPWR default value [found=0xc (12) expected=0xba (-70)].
           AR5K_PHY_SIG:0x9858:0x30303030:..11.... ..11.... ..11.... ..11....:unknown
wifi0: FAILED verification of AR5K_PHY_WEAK_OFDM_LOW_M1 default value [found=0x49 (73) expected=0x32 (50)].
              (unknown):0x986c:0x65726566:.11..1.1 .111..1. .11..1.1 .11..11.:unknown
wifi0: FAILED verification of AR5K_PHY_WEAK_OFDM_LOW_M2 default value [found=0x2b (43) expected=0x28 (40)].
              (unknown):0x986c:0x65726566:.11..1.1 .111..1. .11..1.1 .11..11.:unknown
wifi0: FAILED verification of AR5K_PHY_WEAK_OFDM_LOW_M2_COUNT default value [found=0x25 (37) expected=0x30 (48)].
              (unknown):0x986c:0x65726566:.11..1.1 .111..1. .11..1.1 .11..11.:unknown
wifi0: FAILED verification of AR5K_PHY_WEAK_OFDM_LOW_SELFCOR default value [found=0x0 (0) expected=0x1 (1)].
              (unknown):0x986c:0x65726566:.11..1.1 .111..1. .11..1.1 .11..11.:unknown
wifi0: FAILED verification of AR5K_PHY_WEAK_OFDM_HIGH_M1 default value [found=0x29 (41) expected=0x4d (77)].
              (unknown):0x9868:0x6552204e:.11..1.1 .1.1..1. ..1..... .1..111.:unknown
wifi0: FAILED verification of AR5K_PHY_WEAK_OFDM_HIGH_M2 default value [found=0x65 (101) expected=0x40 (64)].
              (unknown):0x9868:0x6552204e:.11..1.1 .1.1..1. ..1..... .1..111.:unknown
wifi0: FAILED verification of AR5K_PHY_WEAK_OFDM_HIGH_M2_COUNT default value [found=0xe (14) expected=0x10 (16)].
              (unknown):0x9868:0x6552204e:.11..1.1 .1.1..1. ..1..... .1..111.:unknown
wifi0: FAILED verification of AR5K_PHY_WEAK_CCK_THRESH default value [found=0x2 (2) expected=0x8 (8)].
              (unknown):0xa208:0x04001202:.....1.. ........ ...1..1. ......1.:unknown
wifi0: FAILED verification of AR5K_PHY_SIG_FIRSTEP default value [found=0x3 (3) expected=0x0 (0)].
           AR5K_PHY_SIG:0x9858:0x30303030:..11.... ..11.... ..11.... ..11....:unknown
wifi0: FAILED verification of AR5K_PHY_SPUR_THRESH default value [found=0x3 (3) expected=0x2 (2)].
          AR5K_PHY_SPUR:0x9924:0x00000106:........ ........ .......1 .....11.:unknown
wifi0: ath_fatal_tasklet: Hardware error; resetting.
wifi0: ath_chan_set: Unable to reset channel 1 (2412 MHz) flags 0xc0 'Hardware didn't respond as expected' (HAL status 3)
Comment 18 Tomas Mudrunka 2010-05-09 14:35:37 UTC
any ideas what can this mean?:

------------[ cut here ]------------
WARNING: at drivers/net/wireless/ath/ath5k/base.c:1166 ath5k_tasklet_rx+0x565/0x590 [ath5k]()
Hardware name: Aspire 3000     
invalid hw_rix: 0
Modules linked in: ipv6 arc4 ecb cpufreq_ondemand cpufreq_stats cpufreq_conservative cpufreq_powersave powernow_k8 freq_table vboxnetadp vboxnetflt vboxdrv tun snd_seq_dummy ath5k snd_seq_oss snd_seq_midi_event mac80211 ath snd_seq snd_seq_device cfg80211 rfkill led_class snd_intel8x0 snd_intel8x0m joydev snd_ac97_codec sis_agp pcmcia snd_pcm snd_timer snd soundcore battery snd_page_alloc amd64_agp ac i2c_sis96x ac97_bus sr_mod agpgart i2c_core cdrom yenta_socket rsrc_nonstatic processor thermal button pcmcia_core psmouse usbhid hid pcspkr sg loop shpchp pci_hotplug sis900 mii k8temp serio_raw evdev autofs4 tpm_tis fuse tpm tpm_bios rtc_cmos rtc_core rtc_lib ext4 mbcache jbd2 crc16 ohci_hcd ehci_hcd usbcore sisfb ata_generic sd_mod pata_sis pata_acpi libata scsi_mod
Pid: 3235, comm: sensors-applet Tainted: G        W  2.6.33-ARCH #1
Call Trace:
 [<c1043b4d>] warn_slowpath_common+0x6d/0xa0
 [<fa45a3d5>] ? ath5k_tasklet_rx+0x565/0x590 [ath5k]
 [<fa45a3d5>] ? ath5k_tasklet_rx+0x565/0x590 [ath5k]
 [<c1043bc6>] warn_slowpath_fmt+0x26/0x30
 [<fa45a3d5>] ath5k_tasklet_rx+0x565/0x590 [ath5k]
 [<c1049a38>] tasklet_action+0x58/0xc0
 [<c104a4bd>] __do_softirq+0x8d/0x1d0
 [<c101dec6>] ? irq_complete_move+0x16/0x20
 [<c101e7af>] ? ack_apic_level+0x5f/0x1f0
 [<c104a63d>] do_softirq+0x3d/0x50
 [<c104a9fd>] irq_exit+0x6d/0x70
 [<c1005b00>] do_IRQ+0x50/0xc0
 [<c10f4d9d>] ? sys_write+0x3d/0x70
 [<c1003cb0>] common_interrupt+0x30/0x38
---[ end trace 1b3facddd5b36d8e ]---



ath5k_hw_get_isr: ISR: 0x00000000 IMR: 0x800824b5
ath5k phy0: failed to warm reset the MAC Chip
ath5k phy0: can't reset hardware (-5)
ath5k_hw_get_isr: ISR: 0x00000000 IMR: 0x800824b5



------------[ cut here ]------------
WARNING: at drivers/net/wireless/ath/ath5k/base.c:1166 ath5k_tasklet_rx+0x565/0x590 [ath5k]()
Hardware name: Aspire 3000     
invalid hw_rix: 0
Modules linked in: ipv6 arc4 ecb cpufreq_ondemand cpufreq_stats cpufreq_conservative cpufreq_powersave powernow_k8 freq_table vboxnetadp vboxnetflt vboxdrv tun snd_seq_dummy ath5k snd_seq_oss snd_seq_midi_event mac80211 ath snd_seq snd_seq_device cfg80211 rfkill led_class snd_intel8x0 snd_intel8x0m joydev snd_ac97_codec sis_agp pcmcia snd_pcm snd_timer snd soundcore battery snd_page_alloc amd64_agp ac i2c_sis96x ac97_bus sr_mod agpgart i2c_core cdrom yenta_socket rsrc_nonstatic processor thermal button pcmcia_core psmouse usbhid hid pcspkr sg loop shpchp pci_hotplug sis900 mii k8temp serio_raw evdev autofs4 tpm_tis fuse tpm tpm_bios rtc_cmos rtc_core rtc_lib ext4 mbcache jbd2 crc16 ohci_hcd ehci_hcd usbcore sisfb ata_generic sd_mod pata_sis pata_acpi libata scsi_mod
Pid: 3098, comm: perl Tainted: G        W  2.6.33-ARCH #1
Call Trace:
 [<c1043b4d>] warn_slowpath_common+0x6d/0xa0
 [<fa45a3d5>] ? ath5k_tasklet_rx+0x565/0x590 [ath5k]
 [<fa45a3d5>] ? ath5k_tasklet_rx+0x565/0x590 [ath5k]
 [<c1043bc6>] warn_slowpath_fmt+0x26/0x30
 [<fa45a3d5>] ath5k_tasklet_rx+0x565/0x590 [ath5k]
 [<c1049a38>] tasklet_action+0x58/0xc0
 [<c104a4bd>] __do_softirq+0x8d/0x1d0
 [<c101dec6>] ? irq_complete_move+0x16/0x20
 [<c101e7af>] ? ack_apic_level+0x5f/0x1f0
 [<c104a510>] ? __do_softirq+0xe0/0x1d0
 [<c104a63d>] do_softirq+0x3d/0x50
 [<c104a9fd>] irq_exit+0x6d/0x70
 [<c1005b00>] do_IRQ+0x50/0xc0
 [<c1003cb0>] common_interrupt+0x30/0x38
---[ end trace 1b3facddd5b36d8f ]---
Comment 19 John W. Linville 2010-05-13 15:17:27 UTC
FWIW, I think those warnings are a separate issue from the resume-related failures...
Comment 20 Bob Copeland 2010-05-14 03:16:30 UTC
We need to just drop that warning, it's a nuisance.  (Yes, I added it, I suck.)

It really seems like there's something wrong with the hardware (either platform or card), or the order in which we are saving/restoring PCI data.  You bought this device new?
Comment 21 Tomas Mudrunka 2010-05-14 10:14:29 UTC
Bob: anyway... i guess that removing warning will not fix the suspend/resume issue :-)
Comment 22 Bob Copeland 2010-05-14 12:05:40 UTC
Yeah the warning just happens due to deferred rx processing after channel changes, and is harmless.  We can try pci_save/restore_state in the suspend/resume handlers to ensure config space is saved before the platform handlers run - that's my best guess on what to try next.  I'll spin a patch this wknd.
Comment 23 John W. Linville 2010-06-09 17:27:33 UTC
Hey, Bob -- any word on that patch? :-)
Comment 24 Tomas Mudrunka 2010-06-10 17:26:40 UTC
And what if the problem is in SiS chipset (minipci bus) or it's driver?
Comment 25 Bob Copeland 2010-06-10 17:38:17 UTC
Created attachment 26714 [details]
Use pci save/restore

Heh, oops slipped my mind.  Ok try this one.

If the problem is not in the wireless driver, then I guess it needs to be reassigned to whomever handles the platform or bus driver, I don't know who that would be.
Comment 26 Tomas Mudrunka 2010-06-10 20:17:30 UTC
I've googled for pci save restore and i've found few interesting things:
http://software.itags.org/linux-unix/141535/
especialy those lines:

1. Current PCI save/restore routines only cover first 64 bytes
2. No PCI bridge driver currently.
3. Some special devices can't or are difficult to save/restore config
space with current model. Such as PCI link device, it's a sysdev, but
its resume code can't be invoked with irq disabled.
4. ACPI possibly changes special devices' config space, such as host
bridge or LPC bridge. The special devices generally are vender specific,
and possibly will not have a driver forever.

i can say this:
1.) wifi cards probably use more than 64 bytes of config space (lot of options)
2.) i have some pci2pci bridge in my laptop:
[22:15:05] 0 ;) harvie@harvie-ntb Shared $ lspci | grep -i pci
00:01.0 PCI bridge: Silicon Integrated Systems [SiS] SG86C202
...


seems that there can be some workaround (or even fix) for this... can i at least save/restore whole config space manualy somehow?
Comment 27 Tomas Mudrunka 2010-06-10 20:20:02 UTC
Oh i didn't noticed the attachment :-)
Comment 28 Tomas Mudrunka 2010-06-10 20:32:09 UTC
I've found this already in LTS kernel 2.6.32:
( src/linux-2.6.32/drivers/net/wireless/ath/ath5k/base.c )
pci_save_state(pdev); pci_restore_state(pdev);

look:

 673 ath5k_pci_suspend(struct pci_dev *pdev, pm_message_t state)
 674 {
 675   struct ieee80211_hw *hw = pci_get_drvdata(pdev);
 676   struct ath5k_softc *sc = hw->priv;
 677 
 678   ath5k_led_off(sc);
 679 
 680   pci_save_state(pdev);
 681   pci_disable_device(pdev);
 682   pci_set_power_state(pdev, PCI_D3hot);
 683 
 684   return 0;
 685 }
 686 
 687 static int
 688 ath5k_pci_resume(struct pci_dev *pdev)
 689 {
 690   struct ieee80211_hw *hw = pci_get_drvdata(pdev);
 691   struct ath5k_softc *sc = hw->priv;
 692   int err;
 693 
 694   pci_restore_state(pdev);
 695 
 696   err = pci_enable_device(pdev);
 697   if (err)
 698     return err;
 699 
 700   /*
 701    * Suspend/Resume resets the PCI configuration space, so we have to
 702    * re-disable the RETRY_TIMEOUT register (0x41) to keep
 703    * PCI Tx retries from interfering with C3 CPU state
 704    */
 705   pci_write_config_byte(pdev, 0x41, 0);
 706 
 707   ath5k_led_enable(sc);
 708   return 0;
 709 }

i'll try it... or pdev is something different than to_pci_dev(dev) ?
Comment 29 Bob Copeland 2010-06-11 03:09:50 UTC
Yeah we don't need to save the whole mmio space though -- mac80211 saves the wifi state and we reprogram the card after resume.  We need to preserve enough to recognize the card though.

pdev is not different from to_pci_dev().  The code from LTS is from an earlier kernel (one of the patches I suggested reverting removed it).
Comment 30 Tomas Mudrunka 2010-06-11 10:22:43 UTC
Bob: yeh, anyway the LTS kernel makes the troubles too:

...
ath5k phy0: noise floor calibration failed (2417MHz)
__ratelimit: 12 callbacks suppressed
ath5k phy0: noise floor calibration timeout (2417MHz)
ath5k phy0: noise floor calibration failed (2417MHz)
ath5k phy0: noise floor calibration failed (2417MHz)
ath5k phy0: noise floor calibration failed (2417MHz)
ath5k phy0: noise floor calibration failed (2417MHz)
ath5k phy0: noise floor calibration failed (2417MHz)
ath5k phy0: noise floor calibration timeout (2417MHz)
ath5k phy0: noise floor calibration failed (2417MHz)
ath5k phy0: noise floor calibration failed (2417MHz)
ath5k phy0: noise floor calibration timeout (2417MHz)
__ratelimit: 42 callbacks suppressed
ath5k phy0: noise floor calibration failed (5745MHz)
ath5k phy0: noise floor calibration failed (5745MHz)
ath5k phy0: noise floor calibration failed (5745MHz)
ath5k phy0: noise floor calibration failed (5745MHz)
ath5k phy0: noise floor calibration failed (5745MHz)
ath5k phy0: noise floor calibration failed (5745MHz)
ath5k phy0: noise floor calibration failed (5745MHz)
ath5k phy0: noise floor calibration failed (5745MHz)
ath5k phy0: noise floor calibration failed (5745MHz)
ath5k phy0: noise floor calibration failed (5745MHz)
__ratelimit: 53 callbacks suppressed
ath5k phy0: failed to resume the MAC Chip
ath5k phy0: can't reset hardware (-5)
No probe response from AP 00:04:e2:fc:b8:aa after 500ms, disconnecting.
ath5k phy0: failed to wakeup the MAC Chip
ath5k phy0: can't reset hardware (-5)
ath5k phy0: failed to wakeup the MAC Chip
ath5k phy0: can't reset hardware (-5)
ath5k phy0: failed to wakeup the MAC Chip
ath5k phy0: can't reset hardware (-5)
ath5k phy0: failed to wakeup the MAC Chip
ath5k phy0: can't reset hardware (-5)
ath5k phy0: failed to wakeup the MAC Chip
ath5k phy0: can't reset hardware (-5)
wlan0: direct probe to AP 00:04:e2:fc:b8:aa (try 1)
wlan0: direct probe to AP 00:04:e2:fc:b8:aa (try 2)
wlan0: direct probe to AP 00:04:e2:fc:b8:aa (try 3)
wlan0: direct probe to AP 00:04:e2:fc:b8:aa timed out
__ratelimit: 58 callbacks suppressed
ath5k phy0: failed to wakeup the MAC Chip
ath5k phy0: can't reset hardware (-5)
ath5k phy0: failed to wakeup the MAC Chip
ath5k phy0: can't reset hardware (-5)
ath5k phy0: failed to wakeup the MAC Chip
ath5k phy0: can't reset hardware (-5)
ath5k phy0: failed to wakeup the MAC Chip
ath5k phy0: can't reset hardware (-5)
ath5k phy0: failed to wakeup the MAC Chip
ath5k phy0: can't reset hardware (-5)
__ratelimit: 56 callbacks suppressed
ath5k phy0: failed to wakeup the MAC Chip
ath5k phy0: can't reset hardware (-5)
ath5k phy0: failed to wakeup the MAC Chip
ath5k phy0: can't reset hardware (-5)
ath5k phy0: failed to wakeup the MAC Chip
ath5k phy0: can't reset hardware (-5)
ath5k phy0: failed to wakeup the MAC Chip
ath5k phy0: can't reset hardware (-5)
ath5k phy0: failed to wakeup the MAC Chip
ath5k phy0: can't reset hardware (-5)
...
Comment 31 Bob Copeland 2010-06-11 10:56:09 UTC
Well, one problem at a time.  LTS kernel is not upstream and old, so please try the patch against a recent vanilla kernel and let us know if it fixes the restore problem.
Comment 32 Tomas Mudrunka 2010-06-16 20:33:29 UTC
Created attachment 26816 [details]
mugshot of crashed machine

Bob: problems even with 2.6.34, i have new screenshot of crashed box saying HARDWARE ERROR, so it looks to be even more complicated and only thing we can do is probably some kind of workaround...
Comment 33 John W. Linville 2010-08-13 16:02:23 UTC
*** Bug 14561 has been marked as a duplicate of this bug. ***
Comment 34 John W. Linville 2010-08-13 17:37:17 UTC
This report is very confusing...

Did you investigate the hardware problem?  Did you run 'mcelog --ascii' as the
backtrace suggested?  Did you report (or otherwise resolve) that bug?

Did you try Bob's pach from comment 25 against a recent kernel (i.e. 2.6.35 or
later)?  Can you reproduce the problem with that?
Comment 35 Tomas Mudrunka 2010-08-14 19:50:14 UTC
John: i am looking at the bobs patch nad base.c and i've found that there is such piece of code:


  /*
   * Suspend/Resume resets the PCI configuration space, so we have to
   * re-disable the RETRY_TIMEOUT register (0x41) to keep
   * PCI Tx retries from interfering with C3 CPU state
   */
  pci_write_config_byte(pdev, 0x41, 0);

Maybe it's the actual problem, i've applied the patch, but i will also try to remove this line... Maybe some writing to some magic 0x41 pointer is not a good idea with all chipsets...
Comment 36 Tomas Mudrunka 2010-08-14 20:33:32 UTC
John: 
1.)
> Did you report (or otherwise resolve) that bug?
I think that it's what i am doing right now. it seems to me that its VERY much related to ath5k.




2.)
the mcelog output doesn't seem much satysfying to me:


[22:30:02] 0 ;) root@harvie-ntb mcelog # cat /home/harvie/Desktop/mcelog.txt 
HARDWARE ERROR
CPU 0: Machine Check Exception:			4 Bank 4: b200000000070f0f
TSC 789a2e7426
PROCESSOR 2:20fc2 TIME 1276714843 SOCKET 0 APIC 0


[22:30:13] 0 ;) root@harvie-ntb mcelog # cat /home/harvie/Desktop/mcelog.txt | mcelog --ascii
CPU 0: Machine Check Exception:			4 Bank 4: b200000000070f0f
TSC 789a2e7426
CPU 0 BANK 4 TSC 789a2e7426 
TIME 1276714843 Wed Jun 16 21:00:43 2010
STATUS b200000000070f0f MCGSTATUS 4
PROCESSOR 2:20fc2 TIME 1276714843 SOCKET 0 APIC 0
Comment 37 Tomas Mudrunka 2010-08-14 20:43:35 UTC
Bit more accurate output:

# cat /home/harvie/Desktop/mcelog.txt | mcelog --cpu k8 --dmi --ascii

WARNING: with --dmi mcelog --ascii must run on the same machine with the
     same BIOS/memory configuration as where the machine check occurred.
CPU 0: Machine Check Exception:			4 Bank 4: b200000000070f0f
TSC 789a2e7426
CPU 0 4 northbridge TSC 789a2e7426 
TIME 1276714843 Wed Jun 16 21:00:43 2010
  Northbridge Watchdog error
       bit57 = processor context corrupt
       bit61 = error uncorrected
  bus error 'generic participation, request timed out
             generic error mem transaction
             generic access, level generic'
STATUS b200000000070f0f MCGSTATUS 4
PROCESSOR 2:20fc2 TIME 1276714843 SOCKET 0 APIC 0
Comment 38 Bob Copeland 2010-08-15 16:53:25 UTC
FWIW we tried removing the write to configuration space and it broke some stuff.  At the time Jouni from Atheros confirmed it needs to be there.  

I would still like a confirmation on whether the patch does anything.  It is not in any upstream kernel yet.
Comment 39 Tomas Mudrunka 2010-08-16 01:18:02 UTC
Bob: oh yes, i can definetely say that removing that line breaks something :-)

Right now i am building kernel with those changes:
- your patch
- commented out lines for disabling/enabling LED during suspend (hope this will not break something)
- added CONFIG_PCIEASPM=y
- added pci_write_config_byte(pdev, 0x41, 0); to few more PCI events (attach,detach) in hope that i will be able to fix something by reloading module

I'll let you know ASAP. You must understand that it takes some time before i am able to reproduce the bug, because i need to leave computer turned off for sufficient time, then i have to use it for a while and then i have to leave it suspended in RAM for sufficient time (few hours - whole night), then i have to turn it on and use it while watching logs...
Comment 40 Tomas Mudrunka 2010-08-16 01:22:28 UTC
BTW behaviour of system without pci_write_config_byte(pdev, 0x41, 0); in resume function is very similar to behaviour of system with that line after resume issuing this bug. So we have two important questions:

1.) How can absence of that line (in PCI resume handler) affect the system which haven't been suspended yet?
2.) Adding this line seems to fix problem, but only partialy. Aren't we forgoten about something similar elsewhere? Or it's completely different issue?
Comment 41 Tomas Mudrunka 2010-08-16 01:42:51 UTC
And why the RETRY_TIMEOUT register (0x41) does not survive the suspend/resume? Why it does not survive save/restore of PCI state? How should other registers survive? Maybe we should store all known atheros registers manualy, because they are lost too...
Comment 42 Tomas Mudrunka 2010-08-16 01:44:10 UTC
Are we uploading some firmware to the device? Maybe it should be reuploaded after resume...
Comment 43 Tomas Mudrunka 2010-08-16 05:18:08 UTC
Bob, John: It seems that patch doesn't helped much...

i saw this few moments before errors starts to appear in log:
ath5k_hw_get_isr: ISR: 0x00000000 IMR: 0x800814b5
what does this mean? is it OK?



Here is whole log:


ath5k_hw_get_isr: ISR: 0x00000000 IMR: 0x800814b5
------------[ cut here ]------------
WARNING: at drivers/net/wireless/ath/ath5k/base.c:1179 ath5k_tasklet_rx+0x5ab/0x670 [ath5k]()
Hardware name: Aspire 3000     
invalid hw_rix: 0
Modules linked in: ath5k arc4 ecb mac80211 ath cfg80211 rfkill led_class ipv6 cpufreq_userspace cpufreq_ondemand cpufreq_stats cpufreq_conservative cpufreq_powersave powernow_k8 freq_table mperf snd_seq_dummy snd_seq_oss tun snd_seq_midi_event snd_seq snd_seq_device joydev snd_intel8x0m i2c_sis96x i2c_core usbhid hid sr_mod pcmcia cdrom snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_timer snd soundcore snd_page_alloc ac battery thermal button processor amd64_agp pcspkr agpgart sg loop psmouse k8temp sis900 mii shpchp yenta_socket pci_hotplug pcmcia_rsrc pcmcia_core evdev serio_raw autofs4 fuse rtc_cmos rtc_core rtc_lib ext4 mbcache jbd2 crc16 ohci_hcd ehci_hcd usbcore sisfb ata_generic sd_mod pata_sis pata_acpi libata scsi_mod [last unloaded: ath5k]
Pid: 9031, comm: epiphany Not tainted 2.6.35-ARCH #3
Call Trace:
 [<c104373d>] warn_slowpath_common+0x6d/0xa0
 [<fa36d33b>] ? ath5k_tasklet_rx+0x5ab/0x670 [ath5k]
 [<fa36d33b>] ? ath5k_tasklet_rx+0x5ab/0x670 [ath5k]
 [<c10437ee>] warn_slowpath_fmt+0x2e/0x30
 [<fa36d33b>] ath5k_tasklet_rx+0x5ab/0x670 [ath5k]
 [<c1049778>] tasklet_action+0x58/0xc0
 [<c1049ca0>] __do_softirq+0x90/0x1e0
 [<c101fc36>] ? irq_complete_move+0x16/0x20
 [<c102027f>] ? ack_apic_level+0x5f/0x1f0
 [<c1049cf1>] ? __do_softirq+0xe1/0x1e0
 [<c1067f6f>] ? ktime_get_ts+0xff/0x130
 [<c1049e2d>] do_softirq+0x3d/0x50
 [<c104a1ed>] irq_exit+0x6d/0x70
 [<c1005a50>] do_IRQ+0x50/0xc0
 [<c1048951>] ? sys_gettimeofday+0x31/0x70
 [<c1003d30>] common_interrupt+0x30/0x38
---[ end trace 228776f33fb36589 ]---
ath5k phy1: too many interrupts, giving up for now
ath5k phy1: too many interrupts, giving up for now
ath5k phy1: too many interrupts, giving up for now
ath5k phy1: too many interrupts, giving up for now
ath5k phy1: too many interrupts, giving up for now
ath5k phy1: too many interrupts, giving up for now
ath5k phy1: too many interrupts, giving up for now
ath5k_hw_get_isr: ISR: 0x00000000 IMR: 0x800814b5
ath5k phy1: too many interrupts, giving up for now
ath5k phy1: too many interrupts, giving up for now
net_ratelimit: 683 callbacks suppressed
ath5k phy1: too many interrupts, giving up for now
ath5k phy1: too many interrupts, giving up for now
ath5k phy1: too many interrupts, giving up for now
ath5k phy1: too many interrupts, giving up for now
ath5k phy1: too many interrupts, giving up for now
ath5k phy1: too many interrupts, giving up for now
ath5k phy1: too many interrupts, giving up for now
ath5k phy1: too many interrupts, giving up for now
ath5k phy1: too many interrupts, giving up for now
ath5k phy1: too many interrupts, giving up for now
No probe response from AP 00:04:e2:fc:b8:aa after 500ms, disconnecting.
cfg80211: Calling CRDA to update world regulatory domain
cfg80211: World regulatory domain updated:
    (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
    (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
    (2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
    (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
    (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
    (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
net_ratelimit: 258 callbacks suppressed
ath5k phy1: failed to wakeup the MAC Chip
ath5k phy1: can't reset hardware (-5)
ath5k phy1: failed to wakeup the MAC Chip
ath5k phy1: can't reset hardware (-5)
ath5k phy1: failed to wakeup the MAC Chip
ath5k phy1: can't reset hardware (-5)
ath5k phy1: failed to wakeup the MAC Chip
ath5k phy1: can't reset hardware (-5)
ath5k phy1: failed to wakeup the MAC Chip
ath5k phy1: can't reset hardware (-5)
net_ratelimit: 52 callbacks suppressed
ath5k phy1: failed to wakeup the MAC Chip
ath5k phy1: can't reset hardware (-5)
ath5k phy1: failed to wakeup the MAC Chip
ath5k phy1: can't reset hardware (-5)
ath5k phy1: failed to wakeup the MAC Chip
ath5k phy1: can't reset hardware (-5)
ath5k phy1: failed to wakeup the MAC Chip
ath5k phy1: can't reset hardware (-5)
ath5k phy1: failed to wakeup the MAC Chip
ath5k phy1: can't reset hardware (-5)
ath5k 0000:00:0b.0: PCI INT A disabled
(unloaded module to prevent crash of whole system...)
Comment 44 Tomas Mudrunka 2010-08-16 05:33:38 UTC
And another question (i hope it can lead somewhere):
After each fail i see the CastleNet NIC instead of Atheros:
00:0b.0 Ethernet controller: CastleNet Technology Inc. Device 0013 (rev 01)

Why it's always the same ID? If the memory was overwritten by some random data it would be different each time it fails. This looks like it's almost intentionaly overwritten somewhere in code... Or maybe there is some other reason to act as CastleNet NIC... what do you think?
Comment 45 John W. Linville 2010-08-16 15:22:03 UTC
So the PCI ID changes 168c:0013 -> 1688:0013...that is a single bit difference.

You've got hardware failing to reset, PCI IDs changing (by a single bit), and you are getting MCE errors...is there some reason not to suspect that you have a dying laptop?
Comment 46 Bob Copeland 2010-08-16 17:34:01 UTC
(In reply to comment #41)
> And why the RETRY_TIMEOUT register (0x41) does not survive the
> suspend/resume?
> Why it does not survive save/restore of PCI state? How should other registers
> survive? Maybe we should store all known atheros registers manualy, because
> they are lost too...

Only the first 0x40 registers are saved by the core; 0x41+ are device specific.  We do reprogram the card after resume so no need to worry about the rest.
Comment 47 Tomas Mudrunka 2010-08-18 21:48:21 UTC
> is there some reason not to suspect that you have a dying laptop?
Beside that i can't afford new one right now? :-D and hacker should never throw away any piece of hardware which is working partialy at least :-) I've been trying memtest and some CPU tests without error (well problem can be somewher else). Anyway i still think that my HW is OK.
- there are also other users with similar issues with this driver and hardware.
- there are no othere problems and it just WORKS when i am not using suspend
- do we have some positive feedback like "hello i am lumberjack and i feel ok. i have my lappy suspended for all night and my AR5004X wifi is working flawlessly all day with your driver for weeks without reboot." ??


> We do reprogram the card after resume so no need to worry about the rest.
Can we try to disable this? or it will not work without it? I really think that manualy saving and restoring all bytes can help (at least as workaround). It seems to me that the card is reprogrammed incorrectly after wakeup or some bytes are not touched at all.

i can try to write it myself, but i even don't know how to allocate memory in kernelspace... i can probably use some static array to save the configspace which will work for single card only :-) but i am not kernel hacker and i will probably screw it all up. btw how can i get the total number of config bytes? (i don't want to overflow somewhere and kill the card definetely).
Comment 48 John W. Linville 2010-08-19 15:12:17 UTC
Can you point at "other uses with similar issues with this driver and hardware"?

As for happy lumberjacks or whatever, it can be difficult to collect "everything is ok" reports -- people generally don't bother with that.
Comment 49 Tomas Mudrunka 2010-08-19 17:49:31 UTC
> Can you point at "other uses with similar issues with this driver and
hardware"?
*** Bug 14561 has been marked as a duplicate of this bug. ***
And i've seen even few more similar wakeup problems here on bugzilla...

Well nobody complains if he's happy. This card seems to be popular, but it's more commonly used in routers/APs (which are actually NEVER suspended) than in laptops...
Comment 50 Bob Copeland 2010-08-19 19:08:12 UTC
Well, I can at least say that suspend works fine here on my hardware (well, not in 2.6.36-rcX yet, but that's not ath5k's fault).

The part that reprograms the card after suspend is in the core - take a look at net/mac80211/pm.c and ieee80211_reconfig in util.c -- it really does the same thing as when you do ifconfig up (which includes a reset that reloads all the card's initvals).  We just remember enough in the driver to power up the card again.

Anyway some of these issues seem unrelated to suspend (seems like there are multiple problems).  Can you try enabling slab/slub debug on your kernel and see if anything triggers?  Also, post output of /proc/interrupts?

The IMR/ISR printk seems to indicate interrupts firing when they are supposed to be disabled.  Could just be sharing or something.
Comment 51 Tomas Mudrunka 2010-08-20 00:14:24 UTC
ok.

1.) i'll try if something will break after several ifconfig down/ups.

3.) i'll post /proc/interrupts from both working and failing state of the card

2.) can we write to the space, where vendor/product IDs are stored? if it's not physicaly impossible, maybe we are destroying it accidentaly.

4.) is there some way that i can use to avoid IMR/ISR "just sharing or something" issue?
Comment 52 John W. Linville 2010-10-04 19:35:46 UTC
(Note that the list in comment 51 is number out of order...)

Thomas, can you provide the information from 1 & 3?

Bob, can you anser 2 & 4?
Comment 53 Tomas Mudrunka 2010-10-04 20:10:10 UTC
2.6.35.7 (maybe 2.6.35.6 too) seems to be even worse... i think now it fails even without suspending (while older kernels are still working - at least before suspending). I don't have too much time to investigate it now, but i really think that something went worse now...

Johny: sorry, i will look at it ASAP.
Comment 54 John W. Linville 2010-11-23 18:26:24 UTC
Thomas & Bob, ping?
Comment 55 Bob Copeland 2010-11-24 02:40:42 UTC
2) Personally haven't tried it, but it should be difficult to accidentally write into the configuration space.  Take a look at <a href="http://www.sfr-fresh.com/linux/misc/madwifi-0.9.4.tar.gz:a/madwifi-0.9.4/tools/ath_info.c">ath_info.c</a>) -- it has to set a specific bit in a command register, as well as loading data and offset registers in order to write the pci id.  And we don't (intentionally, at least) ever use the write bit within ath5k.

4) I'm afraid I don't know the apic that well; sometimes this is achieved by moving things around to different slots or changing bios settings.
Comment 56 Nick Kossifidis 2010-11-24 14:59:56 UTC
I really think there is something wrong with your PCI controller, not only you get MCE, changed pci ids but also interrupts from nowhere...

ath5k_hw_get_isr: ISR: 0x00000000 IMR: 0x800824b5
ath5k phy0: failed to warm reset the MAC Chip
ath5k phy0: can't reset hardware (-5)
ath5k_hw_get_isr: ISR: 0x00000000 IMR: 0x800824b5

Notice that IMR is active (has bits set), that means when ISR & IMR is true we have interrupt generation but here ISR is zero so card is not generating any interrupts, that's why ath5k_hw_get_isr prints this message, because this shouldn't happen ! Your PCI controller messes things up, sends interrupts that trigger our interrupt handler and get_isr is called to find out that hw never sent an interrupt :P

BTW when you read status with lspci you don't use ath5k (if you did you wouldn't be able to read them without a driver being loaded), it reads PCI registers directly and pci ids etc are mapped on fixed eeprom locations, it's hw that does the EEPROM read in this case through EEPROM controler. Also as you saw ath5k didn't corrupt your EEPROM because if it did pci ids etc wouldn't magically return to their initial values after power-off, the only corrupted EEPROMs I've seen on CM6/9 is due to extreme power failure on embedded wi-fi routers or lightning strikes (and that's because they are very popular for such setups), and it's the only case I've ever seen ! Actually I have a CM6 with corrupted EEPROM and it can still work (i was thinking of a "default" set of settings in case of corrupted EEPROMs but Atheros ppl said it's too complex), at least it doesn't give any failed resets, POST tests etc.

A buggy PCI controller can explain everything, if it messes up data transfers (as it did with pci ids) then we read crap when we try to reset/wakeup the card (btw you can use ath_info to check register values when your card works fine and when "it doesn't" to check this out) and also messes up when we try to do POST test...

kernel: ath5k phy0: POST Failed !!!

and BTW this can't be ath5k's fault because ath5k has just loaded (we run POST test on attach) !

You mentioned that:

instead of "Memory at e2010000" there was "Memory at
e2000000" or something else...

ath5k can't change it's pci base address !

Finaly here is a report of another wireless driver that fails on resume on the same SiS chipset...
https://bugs.launchpad.net/ubuntu/+source/acpi/+bug/67792


So please give me a reason why this is an ath5k bug...
Comment 57 Tomas Mudrunka 2010-11-25 23:07:23 UTC
Well so you say it's definetely hardware bug OK? Or it's software problem with PCI controller driver?

I am not saying that it's impossible to have hardware error in 6 years old Acer laptop :-D

I just think that ath5k should not freeze whole kernel even in case of buggy hardware. Why PCI ids do not change if i don't load the ath5k? Why kernel does not crash when i unload ath5k?

Let me explain: if module knows there is strange hardware error and it's going to crash if module will not be unloaded, why ath5k does not disable/suspend or completely unload itself to prevent crash which can cause loosing data and similar problems?
Comment 58 Nick Kossifidis 2010-11-26 01:50:29 UTC
PCI ids don't change if ath5k is not loaded because card isn't working, you can't have suspend failure if there is nothing to suspend or corruption when card is inactive.

Ath5k is responsible for ar5k cards, it's not responsible for the whole system, it can't know eg. if your host PCI controller is failing because it doesn't know anything about it, it just tries to talk to the card and assumes that the PCI subsystem is doing its job. It can detect failures on card operation but a reset fixes most of them, we haven't seen so far a card failure so hard that we need to unload the module or else system hangs.

From what I've seen so far there is a problem with the PCI subsystem in your case. It might be host's pci controller/driver or card's pci controller. I can't guarantee about hw failure on your card's pci controller, although I've never seen such failure -or report- and I have a CM9 right here in front of me that works with ath5k just fine (actually I've been using that card for development for 4 years and was also working on my outdoor wireless router for a while, I've done everything on this card :P, unfortunately suspend/resume doesn't work on my old laptop with or without the card so I can't test it). However I'm sure we handle pci stuff on our part correctly on sw and you can see that for yourself since for your Dlink DWL-G650 you run exactly the same code as with CM9, only phy changes from RF5112 to RF2112 which is actually the same phy without 5GHz support, there is no difference on mac/pci code (pcmcia cards are also pci cards to us, bridge control is handled by another driver, eg yenta-socket) !

Also what do you mean when you say "ath5k crashes" ? Except the MCE you gave us (that indicates a bus error btw and explicitly says "This is not a software problemm !") I didn't see any logs that indicate ath5k crashes. When you load/unload a driver for a PCI card the PCI subsystem also gets triggered so how can we be sure without logs of the crash which driver fails ? The only logs that you gave us indicate a problem on interrupt processing with interrupts that are not triggered by ar5k card (clearly indicates a problem with PCI) and slowpath warning on rx_tasklet (also related to interrupt processing) and they are just warnings not errors. We still don't know what's going on when your system crashes, my guess is you get an interrupt storm and your system hangs, check this out...

ath5k_hw_get_isr: ISR: 0x00000000 IMR: 0x800814b5
ath5k phy1: too many interrupts, giving up for now
ath5k phy1: too many interrupts, giving up for now
net_ratelimit: 683 callbacks suppressed

This is not a bug on ath5k, either you have hw failure on your card's/host PCI controller or your host PCI controller is not properly configured for suspend/resume (and that's a common thing, suspend/resume is not so "standard" procedure, it has to do with BIOS support, ACPI stuff etc + not all vendors follow the standards). In both cases there is nothing we can do.

Just for the record try this...

a) Disable NetworkManager
b) Unload ath5k
c) Suspend
d) Resume
e) Reload ath5k
f) Enable NetworkManager

Also please load ath5k with DEBUG_RESET (0x00000001) and see if you get any fatal interrupts (that would be very interesting)...
Comment 59 Nick Kossifidis 2011-01-16 20:11:37 UTC
Any news on this one ?
Comment 60 Tomas Mudrunka 2011-02-10 18:29:20 UTC
with new kernel it does not work at all... (no crash, but no wifi also)

$ uname -a
Linux insomnia 2.6.37-ARCH #1 SMP PREEMPT Sat Jan 29 19:40:04 UTC 2011 i686 Mobile AMD Sempron(tm) Processor 3000+ AuthenticAMD GNU/Linux

$ dmesg
cfg80211: Calling CRDA to update world regulatory domain
ath5k 0000:00:0b.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
ath5k 0000:00:0b.0: registered as 'phy0'
cfg80211: World regulatory domain updated:
    (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
    (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
    (2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
    (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
    (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
    (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
cfg80211: Calling CRDA for country: CZ
cfg80211: Regulatory domain changed to country: CZ
    (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
    (2400000 KHz - 2483500 KHz @ 40000 KHz), (N/A, 2000 mBm)
    (5150000 KHz - 5250000 KHz @ 40000 KHz), (N/A, 2301 mBm)
    (5250000 KHz - 5350000 KHz @ 40000 KHz), (N/A, 2000 mBm)
    (5470000 KHz - 5725000 KHz @ 40000 KHz), (N/A, 2698 mBm)
ath5k phy0: Invalid EEPROM checksum: 0x082a eep_max: 0x0340 (default size)
ath5k phy0: unable to init EEPROM
ath5k 0000:00:0b.0: PCI INT A disabled
ath5k: probe of 0000:00:0b.0 failed with error -5


$ lspci -vvv
00:0b.0 Ethernet controller: Atheros Communications Inc. Atheros AR5001X+ Wireless Network Adapter (rev 01)
	Subsystem: Wistron NeWeb Corp. CM9 Wireless a/b/g MiniPCI Adapter
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR+ INTx-
	Latency: 168 (2500ns min, 7000ns max), Cache Line Size: 80 bytes
	Interrupt: pin A routed to IRQ 17
	Region 0: Memory at e2000000 (32-bit, non-prefetchable) [size=64K]
	Region 1: Memory at <unassigned> (64-bit, non-prefetchable)
	Region 3: Memory at <unassigned> (64-bit, non-prefetchable)
	Region 5: Memory at <invalid-64bit-slot> (64-bit, non-prefetchable)
	Capabilities: [44] Slot ID: 0 slots, First-, chassis 00
	Kernel modules: ath5k

and there's no device in ifconfig of course...
Comment 61 Nick Kossifidis 2011-02-10 18:42:46 UTC
Have you tried what i mentioned on the previous post ?
Comment 62 John W. Linville 2011-03-17 19:24:33 UTC
Closing due to lack of response...
Comment 63 Tomas Mudrunka 2011-05-02 23:09:19 UTC
With each kernel version everything was getting even worse, card stoped to work completely (and i thought i've bricked my miniPCI controler) and after few more kernel versions it ended up by completely freezing my system until key was pressed or mouse was moved. this leaded me to another idea. NOHZ! (tickless kernel is default in ArchLinux)

i've added nohz=off to kernel options line in GRUB and card magically started to work. It also survived first suspension to RAM, so we'll se if the tickless mode was only problem.