Bug 42976

Summary: RTL8192SE DMA: Out of SW-IOMMU space for 178 bytes at device 0000:07:00.0
Product: Drivers Reporter: Da Xue (da)
Component: network-wirelessAssignee: drivers_network-wireless (drivers_network-wireless)
Status: RESOLVED CODE_FIX    
Severity: normal CC: florian, joseph.salisbury, Larry.Finger, linville
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.2 3.3 Subsystem:
Regression: No Bisected commit-id:
Attachments: Test patch for dma buffer leak

Description Da Xue 2012-03-22 17:18:57 UTC
I have RTL8192se in Master mode as an access point with WPA2-PSK CCMP.  If the module is loaded for rtl8192se, I will get SW-IOMMU errors after an hour or two. All of the network interface cards would stop working until I rmmod rtl8192se. Kernel log is filled with SW-IOMMU errors.  RTL8192SE is dev 07:00.  

Mar 20 09:30:04 server kernel: [ 2892.930146] DMA: Out of SW-IOMMU space for 54 bytes at device 0000:05:00.0
Mar 20 09:30:04 server kernel: [ 2892.930153] r8169 0000:05:00.0: eth1: Failed to map TX DMA!
Mar 20 09:30:05 server kernel: [ 2893.970118] DMA: Out of SW-IOMMU space for 178 bytes at device 0000:07:00.0
Mar 20 09:30:05 server kernel: [ 2894.019589] DMA: Out of SW-IOMMU space for 91 bytes at device 0000:05:00.0
Mar 20 09:30:05 server kernel: [ 2894.019597] r8169 0000:05:00.0: eth1: Failed to map TX DMA!
Mar 20 09:30:05 server kernel: [ 2894.072448] DMA: Out of SW-IOMMU space for 178 bytes at device 0000:07:00.0
Mar 20 09:30:05 server kernel: [ 2894.161438] DMA: Out of SW-IOMMU space for 172 bytes at device 0000:07:00.0
Mar 20 09:30:05 server kernel: [ 2894.174824] DMA: Out of SW-IOMMU space for 178 bytes at device 0000:07:00.0
Mar 20 09:30:05 server kernel: [ 2894.277052] DMA: Out of SW-IOMMU space for 178 bytes at device 0000:07:00.0
Mar 20 09:30:05 server kernel: [ 2894.319623] DMA: Out of SW-IOMMU space for 91 bytes at device 0000:05:00.0



If I use iommu=off mem=4g, the other network interfaces don't stop but the wireless interface stops working after an hour or so.

Mar 21 22:07:57 server kernel: [ 7.633145] rtl8192se 0000:07:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
Mar 21 22:07:57 server kernel: [ 7.633154] rtl8192se 0000:07:00.0: setting latency timer to 64
Mar 21 22:07:57 server kernel: [ 7.645813] rtl8192se: rtl8192ce: FW Power Save off (module option)
Mar 21 22:07:57 server kernel: [ 7.646783] rtl8192se: Driver for Realtek RTL8192SE/RTL8191SE
Mar 21 22:07:57 server kernel: [ 7.646785] Loading firmware rtlwifi/rtl8192sefw.bin
Mar 21 22:07:57 server kernel: [ 7.786693] ieee80211 phy0: Selected rate control algorithm 'rtl_rc'
Mar 21 22:08:27 server kernel: [ 48.747692] rtlwifi:rtl_cam_get_free_entry():<0-0> -----hwsec_cam_bitmap: 0x0 entry_idx=4
Mar 21 22:16:36 server kernel: [ 538.200932] rtlwifi:rtl_cam_get_free_entry():<0-0> -----hwsec_cam_bitmap: 0x10 entry_idx=5
Mar 21 22:17:17 server kernel: [ 578.874549] rtlwifi: &&&&&&&&&del entry 5
Mar 21 22:17:17 server kernel: [ 579.017165] rtlwifi:rtl_cam_get_free_entry():<0-0> -----hwsec_cam_bitmap: 0x10 entry_idx=5
Mar 21 22:38:05 server kernel: [ 1826.151983] rtlwifi: &&&&&&&&&del entry 5
Mar 21 22:46:09 server kernel: [ 2310.517744] rtlwifi:rtl_cam_get_free_entry():<0-0> -----hwsec_cam_bitmap: 0x10 entry_idx=5
Mar 21 23:18:05 server kernel: [ 4225.670987] rtlwifi: &&&&&&&&&del entry 5
Mar 21 23:18:05 server kernel: [ 4225.687049] rtlwifi: &&&&&&&&&del entry 4

I've tested the 3.2 kernel and 3.3 kernel.  Additional info can be found https://bugs.launchpad.net/ubuntu/+source/linux/+bug/961618
Comment 1 Larry Finger 2012-03-23 00:54:35 UTC
Created attachment 72688 [details]
Test patch for dma buffer leak

It turns out that the driver uses pci_map_single() for the beacon buffers, but never unmaps them. Please test this patch.
Comment 2 Da Xue 2012-03-23 09:52:21 UTC
Seems to have done the trick.  I have tested for over 6 hours without failure.  Thanks for your help and time.
Comment 3 Larry Finger 2012-03-23 15:47:26 UTC
Thanks for testing. Is it OK for me to use your name and E-mail address in the "Reported-and-Tested-By" line of the patch?
Comment 4 Da Xue 2012-03-23 16:25:36 UTC
Seems like I was premature in my testing or it may be a completely different problem.

Mar 23 04:00:31 server kernel: [    7.751926] rtl8192se 0000:07:00.0: setting latency timer to 64
Mar 23 04:00:31 server kernel: [    7.765404] rtl8192se: rtl8192ce: FW Power Save off (module option)
Mar 23 04:00:31 server kernel: [    7.767471] rtl8192se: Driver for Realtek RTL8192SE/RTL8191SE
Mar 23 04:00:31 server kernel: [    7.767473] Loading firmware rtlwifi/rtl8192sefw.bin
Mar 23 04:00:31 server kernel: [    8.293884] ieee80211 phy0: Selected rate control algorithm 'rtl_rc'
Mar 23 04:01:34 server kernel: [   81.821201] rtlwifi:rtl_cam_get_free_entry():<0-0> -----hwsec_cam_bitmap: 0x0 entry_idx=4
Mar 23 04:01:47 server kernel: [   95.212456] rtlwifi: &&&&&&&&&del entry 4
Mar 23 04:01:47 server kernel: [   95.396828] rtl8192se 0000:07:00.0: PCI INT A disabled
Mar 23 04:02:14 server kernel: [  122.618103] rtl8192se 0000:07:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
Mar 23 04:02:14 server kernel: [  122.618117] rtl8192se 0000:07:00.0: setting latency timer to 64
Mar 23 04:02:14 server kernel: [  122.631570] rtl8192se: rtl8192ce: FW Power Save off (module option)
Mar 23 04:02:14 server kernel: [  122.631793] rtl8192se: Driver for Realtek RTL8192SE/RTL8191SE
Mar 23 04:02:14 server kernel: [  122.631794] Loading firmware rtlwifi/rtl8192sefw.bin
Mar 23 04:02:14 server kernel: [  122.634123] ieee80211 phy1: Selected rate control algorithm 'rtl_rc'
Mar 23 04:02:31 server kernel: [  139.493163] rtlwifi:rtl_cam_get_free_entry():<0-0> -----hwsec_cam_bitmap: 0x0 entry_idx=4
Mar 23 07:36:54 server kernel: [12999.650598] rtlwifi:rtl_cam_get_free_entry():<0-0> -----hwsec_cam_bitmap: 0x10 entry_idx=5
Mar 23 10:28:56 server kernel: [23319.482507] rtlwifi: &&&&&&&&&del entry 5
Mar 23 10:32:32 server kernel: [23535.368946] rtlwifi: &&&&&&&&&del entry 4
Mar 23 12:20:42 server kernel: [30023.680194] rtl8192se 0000:07:00.0: PCI INT A disabled
Comment 5 Da Xue 2012-03-23 16:27:27 UTC
The last line was caused by rmmod rtl8192se.
Comment 6 Larry Finger 2012-03-24 05:52:56 UTC
This problem with the cam is definitely new. At the moment, I do not understand this section of the code and will likely have to get help from Realtek.

My first reading of the code led me to believe that this was a WEP problem, but I see that you are using WPA2. Interesting - obviouly I missed something.
Comment 7 Da Xue 2012-03-24 16:59:47 UTC
Below is my hostapd configuration if it helps.

auth_algs=1
beacon_int=100
bridge=br0
channel=11
country_code=US
ctrl_interface_group=0
ctrl_interface=/var/run/hostapd
driver=nl80211
dtim_period=2
dump_file=/tmp/hostapd.dump
fragm_threshold=2346
ht_capab=[HT40-][SHORT-GI-20][SHORT-GI-40][MAX-AMSDU-7935][DSSS_CCK-40]
hw_mode=g
ieee80211d=1
ieee80211n=1
ignore_broadcast_ssid=0
interface=wlan0
logger_stdout=-1
logger_stdout_level=2
logger_syslog=-1
logger_syslog_level=2
macaddr_acl=0
max_num_sta=255
rsn_pairwise=CCMP
rts_threshold=2347
ssid=xxx
wmm_ac_be_acm=0
wmm_ac_be_aifs=3
wmm_ac_be_cwmax=10
wmm_ac_be_cwmin=4
wmm_ac_be_txop_limit=0
wmm_ac_bk_acm=0
wmm_ac_bk_aifs=7
wmm_ac_bk_cwmax=10
wmm_ac_bk_cwmin=4
wmm_ac_bk_txop_limit=0
wmm_ac_vi_acm=0
wmm_ac_vi_aifs=2
wmm_ac_vi_cwmax=4
wmm_ac_vi_cwmin=3
wmm_ac_vi_txop_limit=94
wmm_ac_vo_acm=0
wmm_ac_vo_aifs=2
wmm_ac_vo_cwmax=3
wmm_ac_vo_cwmin=2
wmm_ac_vo_txop_limit=47
wmm_enabled=1
wpa=2
wpa_key_mgmt=WPA-PSK
wpa_passphrase=xxx
Comment 8 Larry Finger 2012-03-25 02:18:49 UTC
I have a script that writes the hostapd.conf file on the fly and sets up NAT between the AP and the network connection. Both that approach and your configuration file (modified for my setup) result in beacons, but I cannot connect. As a result, it seems that I will not be able to test here.

I sent a request for help to Realtek, but I do not expect a response until next week. In the meantime, if you have plenty of disk space on the AP machine, could you load rtl8192se with the "debug=3" option? That should provide more info.

Thanks.
Comment 9 Da Xue 2012-03-29 00:10:26 UTC
Larry, it occurred that one time.  I haven't been able to replicate it again in the last few days.  I've transferred over 2TB of data and 50H+ of testing.  Yes you can use my name and email in the tested by.  I will let you know if long term testing determines any additional issues.

Thanks for your help
Best
Comment 10 Joseph Salisbury 2012-04-11 14:19:40 UTC
Hi Larry,

Do you plan on submitting the patch for this bug upstream?

Thanks,

Joe
Comment 11 Larry Finger 2012-04-11 15:03:23 UTC
Yes and already done. It is in the wireless-testing tree as commit a75e2ad772b6c26efd702f04be1f9a6414d24f22 with a March 26 date. I added the annotation that it be sent to stable, but I think John Linville will be sending it upstream for kernel 3.5 as it missed the 3.4 merge.
Comment 12 Joseph Salisbury 2012-04-11 15:06:51 UTC
Great! Thanks for the info, Larry.
Comment 13 John W. Linville 2012-04-11 19:05:02 UTC
Well, actually I missed the "stable" tag on that one and another from you.  I will direct them appropriately.
Comment 14 Florian Mickler 2012-04-16 21:19:11 UTC
A patch referencing this bug report has been merged in Linux v3.4-rc3:

commit 673f7786e205c87b5d978c62827b9a66d097bebb
Author: Larry Finger <Larry.Finger@lwfinger.net>
Date:   Mon Mar 26 10:48:20 2012 -0500

    rtlwifi: Add missing DMA buffer unmapping for PCI drivers