Bug 11802

Summary: prism 2.5 broke in 2.6.27
Product: Drivers Reporter: Yasen Balev (fraxinus.excelsior)
Component: network-wirelessAssignee: drivers_network-wireless (drivers_network-wireless)
Status: CLOSED CODE_FIX    
Severity: normal CC: akpm, johannes
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.27 Subsystem:
Regression: Yes Bisected commit-id:

Description Yasen Balev 2008-10-21 13:26:14 UTC
Latest working kernel version: 2.6.26.5
Earliest failing kernel version: 2.6.27
Distribution: Debian Testing, custom compiled generic kernel from kernel.org
Hardware Environment:

lspci :
02:01.0 Network controller: Intersil Corporation Prism 2.5 Wavelan chipset (rev 01)
lspci -n:
02:01.0 0280: 1260:3873 (rev 01)

(minipci board scavenged from Fujitsu p2120 notebook over noname "minipci-to-pci")
Intel 865 motherboard, P4 @2.8 CPU

Software Environment:
Problem Description:

The wireless board stopped to work as an AP in 2.6.27. Tried 2.6.27.2 - no luck either. What I get in dmesg:

[   14.220924] hostap_pci 0000:02:01.0: PCI INT A -> GSI 21 (level, low) -> IRQ 21
[   14.221311] hostap_pci: Registered netdevice wifi0
[   14.222071] wifi0: Original COR value: 0x0
[   14.421983] prism2_hw_init: initialized in 195 ms
[   14.423725] wifi0: NIC: id=0x8013 v1.0.0
[   14.424117] wifi0: PRI: id=0x15 v1.1.1
[   14.428554] wifi0: STA: id=0x1f v1.8.2
[   14.433237] wifi0: Intersil Prism2.5 PCI: mem=0xec000000, irq=21
[   14.433587] wifi0: registered netdevice wlan0
.........unrelated drivers outut..........
...<here we do ifconfig wlan0 192.168.220.254>
[   47.351189] wifi0: invalid skb->cb magic (0x000000ae, expected 0xf08a36a2)
[   47.602147] wifi0: invalid skb->cb magic (0x000000ae, expected 0xf08a36a2)
[   47.854121] wifi0: invalid skb->cb magic (0x000000ae, expected 0xf08a36a2)
[   47.864092] wifi0: invalid skb->cb magic (0x0000001e, expected 0xf08a36a2)
[   48.054753] wifi0: invalid skb->cb magic (0x000000a2, expected 0xf08a36a2)
[   48.209136] wifi0: invalid skb->cb magic (0x000000a2, expected 0xf08a36a2)
[   48.460146] wifi0: invalid skb->cb magic (0x000000a2, expected 0xf08a36a2)
[   48.720131] wifi0: invalid skb->cb magic (0x000000a2, expected 0xf08a36a2)
[   48.911234] wifi0: invalid skb->cb magic (0x000000df, expected 0xf08a36a2)
[   49.158184] wifi0: invalid skb->cb magic (0x00000092, expected 0xf08a36a2)
...and so on

Stations see the network, but cannot associate.

Steps to reproduce:

1. have this in /etc/network/interfaces :
auto wlan0
iface wlan0 inet static
        address 192.168.220.254
        netmask 255.255.255.0
        wireless-mode master
        wireless-essid ea0
        wireless-channel 4

2. boot 2.6.27.* ==> not working AP
2a. boot 2.6.26.* ==> working AP
Comment 1 Johannes Berg 2008-10-24 07:30:15 UTC
Try this patch.

--- everything.orig/drivers/net/wireless/hostap/hostap_wlan.h	2008-10-24 16:27:46.000000000 +0200
+++ everything/drivers/net/wireless/hostap/hostap_wlan.h	2008-10-24 16:29:17.000000000 +0200
@@ -918,9 +918,12 @@ struct hostap_interface {
 
 /*
  * TX meta data - stored in skb->cb buffer, so this must not be increased over
- * the 40-byte limit
+ * the 48-byte limit.
+ * THE PADDING THIS STARTS WITH IS A HORRIBLE HACK THAT SHOULD NOT LIVE
+ * TO SEE THE DAY.
  */
 struct hostap_skb_tx_data {
+	unsigned int __padding_for_default_qdiscs;
 	u32 magic; /* HOSTAP_SKB_TX_DATA_MAGIC */
 	u8 rate; /* transmit rate */
 #define HOSTAP_TX_FLAGS_WDS BIT(0)
Comment 2 Yasen Balev 2008-10-24 17:01:59 UTC
(In reply to comment #1)

The patch works, thank you.
Shoud we consider the "HORRIBLE HACK" worse than the driver in 2.6.26?

> Try this patch.
> 
Comment 3 Andrew Morton 2008-10-26 22:49:36 UTC
Johannes, could you please ensure that this patch (or whatever patch
we end up using) gets cc'ed to stable@kernel.org for the backport?

Thanks.
Comment 4 Johannes Berg 2008-10-31 03:25:29 UTC
Sorry, forgot to add myself as CC to the patch so I missed your note.

Let me explain the patch a bit: the networking layer mandates that the skb->cb is always owned by whoever has the skb queued. This commonly is first the IP layer, then the qdisc, and finally the driver. This driver, on the other hand, puts information into the skb->cb and then hands the skb back to the qdisc layer. It has always done that, and it has always been wrong of it to assume that the cb would be untouched when the skb got back to the driver's second network interface.

Now, it still has worked most of the time because the default qdisc never touched skb->cb until 2.6.27, where the qdisc layer started using four bytes in skb->cb unconditionally. These four bytes are reserved by the patch above, and then the information the hostap driver needs is left intact across the qdisc layer.

Actually really fixing this would probably mean a complete rewrite of the hostap driver, or, preferably, porting the STA-mode bits to the orinoco driver and the AP-mode bits to mac80211 to get rid of hostap completely, it already overlaps for some hardware but supports WPA for more hardware than orinoco.

Or we can just put in this hack and wait until all the hardware has been replaced. I wouldn't recommend building an AP with this old hardware anyway because getting an ath5k or b43 card is cheap and supports many more features, for example encryption offloaded to the hardware.

I don't really want to make a decision about this patch just because I analysed why and how it breaks, but somebody will have to I guess.
Comment 5 Yasen Balev 2008-11-01 01:44:13 UTC
Johannes, thank you for your explanation.

Sorry to say that I am even less qualified to decide what to do with the bug. The fix works for me, here and now.

p.s.
Yes, I know that Atheros or Broadcom board would be better. But, since no one's life depends on this very AP and 11Mbps is pretty much enough, I desided to use a board already scrapped.
Comment 6 Alexandros C. Couloumbis 2008-11-07 23:38:58 UTC
the above patch produces the following on an openwrt 2.6.27.4 x86 system:

wifi2: 00:02:6f:35:8f:21 assoc_cb - STA associated
------------[ cut here ]------------
WARNING: at kernel/softirq.c:136 ()
Modules linked in: hostap_pci hostap ath_pci ath_rate_minstrel
ath_hal(P) wlan_scan_sta wlan ieee80211 ieee80211_crypt
Call Trace:[<8000e990>] 0x8000e990
[<8000e990>] 0x8000e990
[<80024cb8>] 0x80024cb8
[<c010b3f0>] 0xc010b3f0
[<c0116b64>] 0xc0116b64
[<c0111ee0>] 0xc0111ee0
[<c010f36c>] 0xc010f36c
[<c011a874>] 0xc011a874
[<c00b8548>] 0xc00b8548
[<c0096538>] 0xc0096538
[<c008fdbc>] 0xc008fdbc
[<8016b6e4>] 0x8016b6e4
[<8002ad38>] 0x8002ad38
[<801648fc>] 0x801648fc
[<80183710>] 0x80183710
[<8017ee44>] 0x8017ee44
[<c00bbcb8>] 0xc00bbcb8
[<c00b175c>] 0xc00b175c
[<c00b67e4>] 0xc00b67e4
[<80155870>] 0x80155870
[<80004ec0>] 0x80004ec0
[<c00b7d2c>] 0xc00b7d2c
[<80155870>] 0x80155870
[<c011cc8c>] 0xc011cc8c
[<8015e47c>] 0x8015e47c
[<8002f04c>] 0x8002f04c
[<8002a418>] 0x8002a418
[<8002a4fc>] 0x8002a4fc
[<8002aa84>] 0x8002aa84
[<8002a924>] 0x8002a924
[<80001444>] 0x80001444
[<80001444>] 0x80001444
[<80001660>] 0x80001660
[<8000f528>] 0x8000f528
[<8000b660>] 0x8000b660
[<80001680>] 0x80001680

---[ end trace 276a316f0ed04cd6 ]---
wifi2: STA 00:02:6f:35:8f:21 did not ACK activity poll frame
wifi2: sending disassociation info to STA
00:02:6f:35:8f:21(last=4294928330, jiffies=36286)
wifi2: sending deauthentication info to STA
00:02:6f:35:8f:21(last=4294928330, jiffies=36536)
Comment 7 Johannes Berg 2008-11-08 00:04:02 UTC
You're going to have to provide symbols for that trace, and it's surely not this patch causing it but an unrelated problem.