Bug 14874 - (ath5k-regression) Ath5k regression with commit 8bf3d79bc401ca417ccf9fc076d3295d1a71dbf5
(ath5k-regression)
Ath5k regression with commit 8bf3d79bc401ca417ccf9fc076d3295d1a71dbf5
Status: CLOSED CODE_FIX
Product: Networking
Classification: Unclassified
Component: Wireless
All Linux
: P1 high
Assigned To: Luis R. Rodriguez
:
Depends on:
Blocks: 14885
  Show dependency treegraph
 
Reported: 2009-12-25 00:49 UTC by Joshua Covington
Modified: 2010-01-24 21:56 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.32.2
Tree: Mainline
Regression: Yes


Attachments
dmesg-2.6.32.2-14.fc13.x86_64 (44.81 KB, text/plain)
2009-12-25 00:49 UTC, Joshua Covington
Details
Check for custom EEPROM sizes (3.93 KB, patch)
2009-12-29 00:11 UTC, Luis R. Rodriguez
Details | Diff
full dmesg from patched compat-wireless-2009-12-11 (39.52 KB, text/plain)
2009-12-29 22:28 UTC, Joshua Covington
Details
dmesg-2.6.30.10-105.fc11 (39.29 KB, text/plain)
2009-12-29 22:36 UTC, Joshua Covington
Details
test with wireless-testing-2009-12-28 (39.36 KB, text/plain)
2010-01-01 13:48 UTC, Joshua Covington
Details
'lspci -vv' of kernel-2.6.32.2 (18.64 KB, application/octet-stream)
2010-01-03 00:35 UTC, Stephen Beahm
Details
dmesg of kernel-2.6.32.2 (33.71 KB, application/octet-stream)
2010-01-03 00:36 UTC, Stephen Beahm
Details
'lspci -vv' of kernel 2.6.32.2 with 'eeprom-2.patch' applied (18.73 KB, application/octet-stream)
2010-01-03 00:37 UTC, Stephen Beahm
Details
dmesg of kernel-2.6.32.2 with 'eeprom-2.patch' applied (35.72 KB, application/octet-stream)
2010-01-03 00:38 UTC, Stephen Beahm
Details

Description Joshua Covington 2009-12-25 00:49:39 UTC
Created attachment 24302 [details]
dmesg-2.6.32.2-14.fc13.x86_64

After upgrading to the latestes stable 2.6.32.2 my atheros 2413 is useless. dmesg show the following:
ath5k 0000:08:04.0: PCI INT A -> GSI 21 (level, low) -> IRQ 21
ath5k 0000:08:04.0: registered as 'phy0'
ath5k phy0: Invalid EEPROM checksum 0xdcd7
ath5k phy0: unable to init EEPROM
HDA Intel 0000:00:14.2: PCI INT A -> GSI 16 (level, low) -> IRQ 16
ath5k 0000:08:04.0: PCI INT A disabled
ath5k: probe of 0000:08:04.0 failed with error -5
I've attached the full dmesg. The problem didn't exist with 2.6.32.1 and I found that commit 8bf3d79bc401ca417ccf9fc076d3295d1a71dbf5 is breaking my wireless.

My device is:
08:04.0 Ethernet controller: Atheros Communications Inc. AR2413 802.11bg NIC (rev 01)
	Subsystem: AMBIT Microsystem Corp. Device 0418
	Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 21
	Region 0: Memory at c0200000 (32-bit, non-prefetchable) [size=64K]
	Capabilities: [44] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=2 PME-
	Kernel modules: ath5k

So I'm one of those with a bogus EEPROM but what should I do? Call the Acer or what?
Comment 1 John W. Linville 2009-12-28 17:28:47 UTC
It seems like contacting the device manufacturer is your best (or only) option.  Maybe Luis has another suggestion?
Comment 2 Joshua Covington 2009-12-28 18:29:00 UTC
This same device has been working without any glitches since kernel-2.6.20.x. And it still works when I revert this commit.

This is a notebook and as I said it works perfectly. What should I say to Acer: 

"Can you give me another mini wlan pci-e card because this one has a wrong eeprom-checksum and the person responsible for this code in the linux kernel decided that this, otherwise perfectly working card, should throw an exseption which renders it useless!!!"

Maybe the code should continue with the initialization instead of breaking it up. A simple message in the logs should indicate that something MIGHT go wrong but this should be a reason render the card useless just because of the checkusm.

This card does work with linux!
Comment 3 John W. Linville 2009-12-28 20:05:27 UTC
Then I suppose you will need to keep reverting that commit in your local kernel builds, or find a way to get properly checksummed data into your EEPROM.

The thing is, if the checksum fails we don't really have any good idea as to what part of the data in the EEPROM is wrong.  So if we honor the data in that possibly/likely corrupt EEPROM then we could be enabling users to violate local laws regarding the use of wireless devices.  Now, you may be comfortable deciding to do that for yourself.  But I'm not comfortable deciding to do that for others.

So, I'm sorry that this is causing you an inconvenience.  But I really don't see another responsible option.

In hopes of avoiding another RESOLVED->REOPENED cycle, I'll leave this open for a while in case Luis has a suggestion for how you can repair your card's EEPROM.  But, that does not signify any intent on my part to revert the patch in question.
Comment 4 Joshua Covington 2009-12-28 22:18:11 UTC
Thank you.

About the EEPROM:
I'm not aware of any "user-friendly" way for flashing the eeprom. This is a two year old notebook and don't think there is any support for such modells. I'm sure this situation exists for quite a lot pci-e cards.

>> ...possibly/likely corrupt EEPROM then we could be enabling users to violate
>> local laws regarding the use of wireless devices.

I know what you mean. My notebook was bought in the EU and is certified for the whole EU. This means it does comply with the standards in place. Therefore there should be a way to detect the country and enable the allowed frequencies.

On the other side the same pci-e card is also sold in the US. This means it is capable of operating on frequencies that can be forbidden in some countries.

Just because it does not "comply" with certain country frequencies, doesn't mean my card shouldn't work "out of the box" under linux.

If this is such a big problem, then how is it solved in windows? It works perfectly with win7. Why I don't have such problems with it?

And I saw Luis has an email ending with @atheros.com, which makes me believe, he's affiliated with the company and should be aware of such "problems". He should work to find a software solution about it, not just to deactive the card and say "the avarage John should recompile the kernel by himself and revert this". This is not an option.

Please, find a better solution.
Comment 5 Luis R. Rodriguez 2009-12-28 22:46:18 UTC
The EEPROM checksum is always enforced even on the Windows and MAC OS X driver. It indicates we cannot rely on it for anything so unfortunately you are likely to run into bugs which are just corner cases for busted EEPROMs and frankly there are better things to do in ath5k than try to work around busted EEPROMs. A possible solution to this is to rely on a very restrictive EEPROM but even then you wouldn't know about a lot of details of the card. You can have single band 2.4 GHz card, you can have single band 5 GHz cards, you can have dual bands cards, then there is the actual revision ID of the card, if that cannot be relied on then there is no way of even knowing what type of MAC is present and because of that itself everything in the hardware could would be trial and error.

The reason for the patch is to avoid bogus bug reports due to busted EEPROMs. If your laptop is new then yes, you can contact your manufacturer and tell them your laptop's wireless card EEPROM checksum does not check out and if you're in warranty I don't see why you wouldn't get a replacement.

If your win7 installation is working fine then its likely reflective of another bug and that is the way the checksum is implemented. The algorithm is always the same but I do see the size can change, let me check something and get back to you.
Comment 6 Luis R. Rodriguez 2009-12-29 00:11:04 UTC
Created attachment 24335 [details]
Check for custom EEPROM sizes

Please try this patch, it checks for custom EEPROM sizes. Its the only difference I see between what is in the latest hardware code for our legacy devices and what is on ath5k.
Comment 7 Joshua Covington 2009-12-29 22:28:34 UTC
Created attachment 24366 [details]
full dmesg from patched compat-wireless-2009-12-11


I tested this with the latest available compat-wireless-2009-12-11 patched with your patch on kernel-2.6.30.10-105.fc11.x86_64. Here are the results (see also the attachted full dmesg):

cfg80211: Calling CRDA to update world regulatory domain
cfg80211: World regulatory domain updated:
    (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
    (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
    (2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
    (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
    (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
    (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
  alloc irq_desc for 21 on cpu 0 node 0
  alloc kstat_irqs on cpu 0 node 0
ath5k 0000:08:04.0: PCI INT A -> GSI 21 (level, low) -> IRQ 21
ath5k 0000:08:04.0: registered as 'phy0'
ath: EEPROM regdomain: 0x63
ath: EEPROM indicates we should expect a direct regpair map
ath: Country alpha2 being used: 00
ath: Regpair used: 0x63
ath5k phy0: can't register ieee80211 hw
ath5k 0000:08:04.0: PCI INT A disabled
ath5k: probe of 0000:08:04.0 failed with error -17
cfg80211: Calling CRDA for country: DE
cfg80211: Regulatory domain changed to country: DE
    (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
    (2400000 KHz - 2483500 KHz @ 40000 KHz), (N/A, 2000 mBm)
    (5150000 KHz - 5350000 KHz @ 40000 KHz), (N/A, 2000 mBm)
    (5470000 KHz - 5725000 KHz @ 40000 KHz), (N/A, 2698 mBm)
Comment 8 Joshua Covington 2009-12-29 22:36:28 UTC
Created attachment 24367 [details]
dmesg-2.6.30.10-105.fc11

Here is the dmesg from the "default" kernel-2.6.30.10-105.fc11. As you can see the card works fine here as it does with kernel-2.6.32.1. Something was introduced in 2.6.32.2 that broke it.

Is the code in compat-wireless-2009-12-11 older than the code in the current 2.6.32.2?
Comment 9 Joshua Covington 2010-01-01 13:48:35 UTC
Created attachment 24397 [details]
test with wireless-testing-2009-12-28

I pulled the git branch of Linville's wireless-testing-next-2009-12-28 and applied the test patch (full dmesg is attached). However I still get the same messages as with compat-wireless-2209-12-11.

Does this "ath5k phy0: can't register ieee80211 hw" have something to do with the eeprom or it's a separate problem?
Comment 10 Luis R. Rodriguez 2010-01-01 23:04:27 UTC
The patch I provided was to be applied on top of 2.6.32.2 please don't try wireless-testing unless you feel comfortable in doing so. So please the patch against 2.6.32.2.

The "ath5k phy0: can't register ieee80211 hw" messages is completely unrelated to the EEPROM failure. I want to know if the patch fixes your EEPROM checksum on 2.6.32.2. Please provide the full output log.
Comment 11 Stephen Beahm 2010-01-03 00:34:18 UTC
I too have experienced a regression with commit 8bf3d79bc401ca417ccf9fc076d3295d1a71dbf5.

My hardware is an Acer Aspire 5102WLMI laptop with built in atheros hardware. The laptop is 3 years old.

ath5k had been previously working for a very long time. I first experienced this regression with Fedora 12 kernel-2.6.31.9-174.fc12.x86_64.

I can confirm that 'eeprom-2.patch' fixes my problem. So, my EEPROM does indeed have the correct checksum, with a custom EEPROM size.

As requested in comment #10, I have attached logs for stable kernel-2.6.32.2 before and after applying 'eeprom-2.patch'.

Luis, thank you for addressing this issue. If any more testing is required, I am willing to do so.
Comment 12 Stephen Beahm 2010-01-03 00:35:35 UTC
Created attachment 24412 [details]
'lspci -vv' of kernel-2.6.32.2
Comment 13 Stephen Beahm 2010-01-03 00:36:15 UTC
Created attachment 24413 [details]
dmesg of kernel-2.6.32.2
Comment 14 Stephen Beahm 2010-01-03 00:37:26 UTC
Created attachment 24414 [details]
'lspci -vv' of kernel 2.6.32.2 with 'eeprom-2.patch' applied
Comment 15 Stephen Beahm 2010-01-03 00:38:05 UTC
Created attachment 24415 [details]
dmesg of kernel-2.6.32.2 with 'eeprom-2.patch' applied
Comment 16 Luis R. Rodriguez 2010-01-04 02:08:03 UTC
Stephen, thanks for reporting how the patch fixes your issue. Joshua, how about you?
Comment 17 Joshua Covington 2010-01-04 10:13:18 UTC
I tried the 2.6.32.2 kernel with a fedora-livecd. I'm still using f11 wich is on 2.6.30.10 kernel. Therefore I used the latest available wireless-testing git in my tests.

In order not to recompile the whole kernel I extracted the wireless -staff from 2.6.32.2 and applied the patch to it. After recompiling the wireless staff i still need to recreate and integrate those in the livecd.

That's why I still haven't tested this on 2.6.32.2. But I think that the patch should fix this for me, too.
Comment 18 Luis R. Rodriguez 2010-01-04 14:44:04 UTC
This patch should not yet been in wireless-testing as I wanted a tester but thanks for trying. We do now have one tester though so I will submit it for stable inclusion as well. Thanks.
Comment 19 Luis R. Rodriguez 2010-01-04 14:45:00 UTC
Patch will be submitted.

Tested by: Stephen Beahm <stephenbeahm@comcast.net>
Comment 20 Rafael J. Wysocki 2010-01-04 20:29:33 UTC
Handled-By : Luis R. Rodriguez <mcgrof@gmail.com>
Patch : http://bugzilla.kernel.org/attachment.cgi?id=24335
Comment 21 Rafael J. Wysocki 2010-01-24 21:56:50 UTC
Fixed by commit 359207c687cc8f4f9845c8dadd0d6dabad44e584.

Note You need to log in before you can comment on or make changes to this bug.