Bug 16276
Summary: | RTL-8110SC/8169SC driver crash | ||
---|---|---|---|
Product: | Networking | Reporter: | Jaroslav Fojtik (fojtik) |
Component: | IPV4 | Assignee: | Francois Romieu (romieu) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | akpm, alan |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.34 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | Hardware requires more delay in mdio_{read / write} |
Description
Jaroslav Fojtik
2010-06-23 13:56:17 UTC
Looks like a driver issue. Created attachment 26964 [details]
Hardware requires more delay in mdio_{read / write}
This in included in post 2.6.34.
Jaroslav, can you give it a try ?
--
Ueimor
No, I have tested bare 2.6.34 only. I will do it. I have been used this computer for one year and half until this message occurred. So I do not expect that fixing it I could notify sooner. Anyway, this patch seems to me to be quite bad: Original code: static int mdio_read(void __iomem *ioaddr, int reg_addr) { int i, value = -1; RTL_W32(PHYAR, 0x0 | (reg_addr & 0x1f) << 16); for (i = 20; i > 0; i--) { /* * Check if the RTL8169 has completed retrieving data from * the specified MII register. */ if (RTL_R32(PHYAR) & 0x80000000) { value = RTL_R32(PHYAR) & 0xffff; break; } udelay(25); ^^^ why to spinloop here 25us when 20 is sufficient enough } return value; } Do rather this. This would wait from 20us to 40us after event occurs. Original code waits from 25us to 45us after event occurs. static int mdio_read(void __iomem *ioaddr, int reg_addr) { int i, value = -1; RTL_W32(PHYAR, 0x0 | (reg_addr & 0x1f) << 16); for (i = 25; i > 0; i--) { /* * Check if the RTL8169 has completed retrieving data from * the specified MII register. */ if (RTL_R32(PHYAR) & 0x80000000) { value = RTL_R32(PHYAR) & 0xffff; udelay(20); break; } udelay(20); } return value; } Jaroslav Fojtik 2010-06-28 23:11:59 : [...] > I have been used this computer for one year and half until this message > occurred. So I do not expect that fixing it I could notify sooner. Fine. There is no hurry. [...] > why to spinloop here 25us when 20 is sufficient enough It is not. See http://marc.info/?l=linux-netdev&m=127605163010938&w=2 -- Ueimor >It is not. See http://marc.info/?l=linux-netdev&m=127605163010938&w=2
You did not understand me.
When I pool device with period 25us, the operation finish could occur anytime in the middle of 25us delay.
When I add 20us additional delay, the delay from operation end is random value from 20us to 45us.
Is it clear now?
So I propose faster pooling with period 20us - see my code.
I could test even faster pooling, but I am unsure whether saving several us could help.
Optimizing mdio code is not my #1 priority, especially when I broke the code with falsely innocuous changes in the first place (looong ago). Realtek suggests that the completion needs ~100us in itself. An extra 20us guard is needed too. How much will be saved with the suggested change when compared with current -git code ? How many mdio ops ? Really _that_ often ? Not much imho. Of course you can discuss or submit it on netdev if you feel it is worth it. -- Ueimor The proposed change would: 1, add one additional op, (100/25=4; 100/20=5) 2, save 5us machine time 3, does not prolong total timeout wait time. I have no idea how often it is used. I only hope that this in not required for every packet, because it could kill overall performance a lot. Ok, I better understand your concern now. This code is only used for link management related operations. It should mostly be used when the interface status changes or when something goes wrong. The 8168 can use it heavily (several hundred mdio ops) but the old 8169 is an order of magnitude below. Actually it's a bit like saving 5% of the time spent filling the yearly fiscal paper. -- Ueimor *** Bug 16303 has been marked as a duplicate of this bug. *** I will return back to main topic and will not poison this bug report with speedup hints. After adding additional 20us delay I did not observe any defect with three RTL-8110SC/8169SC devices. I also guess that diode defect after cable unplug was the same problem, because it vanishes. |