Bug 86861
Summary: | RTL8185L PCI ifup freeze system | ||
---|---|---|---|
Product: | Drivers | Reporter: | Eric Shattow (lucent) |
Component: | network-wireless | Assignee: | drivers_network-wireless (drivers_network-wireless) |
Status: | REOPENED --- | ||
Severity: | normal | CC: | andrea.merello, lucent |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.16 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
fix rtl8225.c to use aux functions for anaparam write
add delays in anaparam write functions add dealyts when locking/unlocking CONFIGX regs |
Description
Eric Shattow
2014-10-25 07:44:17 UTC
Temporary workaround is to #include <linux/delay.h> and then stuff 'udelay(2);' into all of the io functions in drivers/net/wireless/rtl818x/rtl8180/rtl8180.h Driver loads, works correctly with some light load testing. How to determine where the timing is falling over? For what it's worth, this is a bug in GCC 4.9.1 and not the linux kernel. Recompiled with -O0 and works great. Maybe it is not a bug in GCC. Also happens with gcc 4.8.3 as compiler. How to debug? Hello, Thank you for reporting this. It doesn't really semms a GCC bug. As you have find out, this problem is somewhat tining-sensitive. Adding explicit delays or disabling GCC optimizations has probably the same effect: delaying things.. When timings in the order of uS matters, then a lot of things may play a role. Recently the rtl818x driver got a lot of modifications. Have you ever tried with a older kernel (say 3.13)? If it worked then it could be a bug introduced with recent patches (or they just made arose an hidden timing problem). If it didn't then I can exclude those patches. Indeed I have another report about a system crash like yours that happens with older kernel. I couldn't ever end up with any soulution.. Maybe your discover about IO delays have finally enlighed things a bit :) Actually I think the vendor driver implements 20us delays on all MMAP write accessors. I thought this was just conservative for whose write accesses that might require delay (for example RF/BB access), but I didn't tought the HW could crash the whole system.. (I'm thinking about the HW hitting some weird internal condition that causes the HW itself to mess the PCI bus). AFAIK the kernel IO accessors guareantee access ordering wrt themselfes and I took some care to put memory barriers where IO accesses should be ordered wrt to DMA memory. It is still possible I missed something here, or that some PCI posting issue is tricking things, but indeed the problem could be really a true HW requirement not to see IO accesses too fast after writing some register. BTW I'll give a colser look to the code double-checking barriers and related things.. Maybe it could also worth to try adding explicit delays only to write accesses where it is reasonable to think some clocking resynchronization is involved inside the ASIC. This likely means where BB and possibly Analog frontend and RF interface registers are accessed. If this won't work then probably we have not enough information about the HW in order to infer what it is actually needed to delay, and we probably cannot do something rellay better than really place delays in all write accessors as in the reference driver. Maybe the register writes in the ISR handler might be the only ones that I could still try to save.. I will come out with some experimental patches as soon as I have time.. Andrea Compiled mainline linux kernel 3.13 and tested. Confirm loading rtl8180.ko is okay but it then does crash on "ifconfig wlan up". Now back to 3.16-3 (3.16.15 debian). cd $LINUXSRC && make EXTRA_CFLAGS="-g -I$(pwd)/drivers/net/wireless/rtl818x" -C /lib/modules/$(uname -r)/build M=$(pwd) drivers/net/wireless/rtl818x/rtl8180/rtl818x_pci.ko Good: rtl8180.h after every iowriteNN() is ndelay(20) Good: rtl8180.h unmodified, all source calling rtl818x_iowriteNN() insert ndelay(20) after every call Bad: rtl8180.h after every iowriteNN() is ndelay(20), rtl8225.c:rtl8225_rf_stop replace all iowriteNN calls to avoid ndelay. Deadlocks immediately on "ifconfig wlan0 down", consistently. Good: rtl8180.h after every iowriteNN() is ndelay(20), rtl8225.c:rtl8225_rf_stop replace all rtl818x_iowriteNN calls to avoid ndelay, add ndelay(20) after each iowriteNN call. I went through the source, one subroutine at a time and A/B compare "insmod; ifconfig up; ifconfig down; rmmod" with ndelay after every iowrite call and without those calls per function. This amounts to guesswork because I don't understand anything about memory boundries i.e. I suspect problems with at least rtl8225.c:rtl8225_rf_init and rtl8225.c:rtl8225_rf_stop. Thank you a lot for your work! Your test "Bad: rtl8180.h after every iowriteNN() is ndelay(20), rtl8225.c:rtl8225_rf_stop replace all iowriteNN calls ..." is probably the most interesting. If you have let untouched the iowriteNN in rtl8225_write (they still had delays because of modified iowriteNN in rtl8180.h), that is called in rtl8225_rf_stop, then I'd focus on ANAPARAM registers write procedure. As you can see in the code, the procedure for writing ANAPARAM registers involves the actual ANAPARAM register iowrite operation itself, as well as two nested "unlock/re-lock" operations: Unlocking ANAPARAM registers (in order to be able to write them) is achieved by writing CONFIG3 register; unlocking CONFIG3 register is achieved by writing EEPROM_CMD register. Perahps after the unlock operation, before writing the unlocked register, and/or before re-locking, the delay is needed. (Obviously this is just a speculation..) For writing ANAPARAM registers there are utility functions rtl8180_set_anaparam and rtl8180_set_anaparam2. Changing rtl8225_rf_init and rtl8225_rf_stop in order to use those utility functions is indeed shomething that I should had done earlier.. I'm preparing a patch to do this. I will attach here soon. Once applied that patch, it is easy to put delays in rtl8180_set_anaparam and rtl8180_set_anaparam2 and see whether this fix the problem. I will prepare a patch also for this soon. If this is not enough to fix the problem, then maybe the issue happens when unlocking/writing/locking CONFIGX registers (I'm still speculating..). There are probably other places where this is done other than whose related to ANAPARAM registers write. I can eventually try also putting delays every time CONFIGX registers are written. I will eventually prepare a patch for this also. Created attachment 157041 [details]
fix rtl8225.c to use aux functions for anaparam write
This is against 3.16.3.
With this patch applied it is easier to test delays in ANAPARAM writing code.
(I'll probably also rebase and push to mainline as it also reduce duplicated code and make code cleaner..)
Created attachment 157051 [details]
add delays in anaparam write functions
This can me applied on top of my previous patch I just posted here. It is an _experimental_ patch to see if the problem has to do with the card messing up the PCI bus whenever ANAPARAM registers are written without delays.
This adds delays in the following places:
- after unlocking anaparam regs, before writing them
- after writing anaparam regs, before locking them again.
If this turns out to help with the problem, then it's probably worth to try adding also dummy register reads to force PCI posting.
Yes, I confirm "[patch] add delays in anaparam write functions" seems to help. Process for this: 1. apply "[patch] fix rtl8225.c to use aux functions for anaparam write" 2. change rtl8180.h copy and modify rtl818x_iowriteNN routines i.e. +static inline void rtl818x_iowrite32_nd(struct rtl8180_priv *priv, + __le32 __iomem *addr, u32 val) +{ + iowrite32(val, addr); +} + static inline void rtl818x_iowrite8(struct rtl8180_priv *priv, u8 __iomem *addr, u8 val) { iowrite8(val, addr); + ndelay(20); } 3. verify driver will load and operate normally with added delay this way (yes, it does operate normally) 4. modify anaparam* routines that will be touched by "[patch] add delays in anaparam write functions" to use non-delay *_nd routines 5. verify driver now fails at "ifconfig wlan0 up" (yes, hard lock-up of kernel) 6. apply udelay(20) as directed by "[patch] add delays in anaparam write functions" 7. verify driver will now load and operate normally (Yes, with limited testing it does appear to work again). Hello, Thank you for this tests. Your report seems to confirm that delay after writing anaparam is required, but, because you still kept some other delays in IO accessors, we still don't know if waiting only after writing anaparams is enough to avoid the crash. Can you please try to apply the patches keeping IO accessors in rtl8180.h without any delays to see what happens ? This way the only added delays will be those introduced by the patch "add delays in anaparam write functions". If this is not enough (system still crashes) then I will prepare another patch to add delays also after unlocking CONFIGX registers. Thank you Andrea (In reply to Andrea Merello from comment #10) > Can you please try to apply the patches keeping IO accessors in rtl8180.h > without any delays to see what happens ? This way the only added delays will > be those introduced by the patch "add delays in anaparam write functions". I forgot to say, that the delays are needed or else it does crash. To be sure, I tried again now adding a new patch to my steps in comment #9 reverting the changes to rtl8180.h and use of *_nd in dev.c; the module inserts as usual but there is a hardlock crash when "ifconfig wlan0 up". I guess the changes in anaparam routines are helpful but there are more troubles still. Ah, OK. I'm going to attach another patch, to be applied in top of those two, that adds also delays surrounding the CONFIGX registers unlocks. Can you please try this? Created attachment 158421 [details]
add dealyts when locking/unlocking CONFIGX regs
Applied, "ifconfig wlan0 up" success now. Starting network-manager to bring up connection to home wifi network, and system freeze happens. Last output at printk 10 logging level: [ 1914.016566] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready [ 1935.862172] wlan0: authenticate with xx:xx:xx:xx:xx:xx [ 1935.924098] wlan0: send auth to xx:xx:xx:xx:xx:xx (try 1/3) What is next to try? Hello, Sorry for the huge delay, these are quite busy months for me :( I'm starting to suspect that delays are potentially needed everywere :( Anyway before give up I first of all need to check code again and see if I'm missing something. I will do as soon as I have some spare time. Your last log shown that the crash happens also when the card has been fully initialized.. I think that when the association procedure is ongoing, the driver shouldn't do particular things that may specifically need delays. During this procedure the driver should mainly deal with interrupt status register and DMA kick register. Maybe it could have some callback invoked to configure contention window, IFS* registers, and things like those. But these are all MAC-related stuff.. This seems to refute my initial supposition about interal ASIC busses and reclocking.. That's might be more likely a "simplier" ASIC PCI interface problem, loosely related to other ASIC internal things... Maybe it's some weird incompatibilty with the chipset or something like that.. Well, I will check the code again and eventually (hopefully) I will come back with other patches, but I'm starting to think that we might have to introduce delays in all write accessors like the reference driver does.. Andrea |