Bug 42899
Description
Thomas Pilarski
2012-03-10 12:54:30 UTC
I guess it syill happens even when VMware stuff has not been loaded after boot, right ? Please send a 'cat /sys/bus/pci/devices/0000:06:00.0/resource' and a complete dmesg. Thanks. -- Ueimor Created attachment 72684 [details]
The dmegs output from boot to the error
The IOMMU is enabled in the bios and is working on this machine.
Created attachment 72685 [details]
cat /sys/bus/pci/devices/0000:06:00.0/resource
The bug appears only while sending a data. I was not able to force it while receiving. I have tested the release version 3.3 without any proprietary drivers. I was unable to force the bug in debian backport kernel 3.2, but throughput was much lower. This is similar to https://bugzilla.kernel.org/show_bug.cgi?id=42921 - same motherboard - same Realtek chipset (8168evl) - same IOMMU error Actually the resource file is not needed as the error originates from the PCIe device. If I read things correctly, the page was present and the network device did not have read permission on it. Afaiks it can only happen if the device has read a Tx descriptor ring entry where the memory buffer address was not correctly set. Be it a race or a wrong barrier, it would be consistent with the "high Tx traffic" condition. Bug #42921 appears as a regression since 3.0.x from a user's viewpoint. Explicit 8168evl support was missing in 3.0.x : it handles the 8168evl as a 8168e. Greping for the XID line in a 3.0.x dmesg should confirm it. The explicit 8168evl support code could be wrong but I'll rather assume for now that it only changes timings. Same narrative for the backported debian kernel 3.2 (it works but the throughput is lower -> different timing). Thomas, can you try https://bugzilla.kernel.org/attachment.cgi?id=72682 with a 3.3 kernel ? It is mostly the post 3.3 r8169 driver. Its threading is easier to understand - read "maintainer did not trust his years-old-modified- by-many code" - and it should nail a pair of races. If it still does not work, please apply the following patch on top of it (some writes may be seen with an inconsistent ordering device-wise, I am not sure). Thanks. -- Ueimor Created attachment 72689 [details]
Add memory / device barrier
Created attachment 72690 [details]
The dmegs output from boot to the error with the new driver
Created attachment 72691 [details]
The dmegs output from boot to the error with the new driver + barrier patch
(In reply to comment #8) [...] > The dmegs output from boot to the error with the new driver + barrier patch Can you temporarily disable the IOMMU in the bios and test if the same driver performs differently wrt Tx failure ? -- Ueimor I have tested it without IOMMU and cannot see any differences in the performance. I have transmitted many files and checked it for corruption, but all files are ok. But this drivers is the slowest of all drivers, with and without iommu. (In reply to comment #10) > I have tested it without IOMMU and cannot see any differences in the > performance. I have transmitted many files and checked it for corruption, but > all files are ok. But this drivers is the slowest of all drivers, with and > without iommu. How are you testing and defining "slower than" ? It is inefficient wrt CPU usage when receiving - one extra memory copy - but it should not be an issue when filling the pipe. You may check the setttings (ethtool -k eth0) and set sg as well tx checksumming (ethtool -K eth0 tx on sg on) to lower the CPU load in the Tx path. (it does not explain what is going on between the device and the iommu though) -- Ueimor Perhaps a different latency of the packages changes the behaviour between the samba server and the client. I was copying a big file to the server and watching the throughput. It's not a reliable test. With the original driver I get throughputs of up to 70MiB/s, which drops partly to ~40MiB/s. With the last driver I got very variability at the speeds, but it was between 25 and 60MiB/s. My server is a small energy efficient one, not the best for testing. Is there a test tool, with which I can test different tx speeds? Just transmit lots of data to another computer, who just answer ok. Then I can make some tests with all three modules again with hopefully meaningful results. I have made a test with netcat at a constant transmit speed of 112MiB/s and there is no problem. The problem only happens with the cifs and cachefs module. I have no problems while mounting the samba share with gvfs module. I have made some more tests. The tx speed of the r8169 device was at the maximum with netcat. I opened a second netcat rx stream, limited by the sender with pv. I have increased the speed from 100K, 200K, 400K, 1M to 2M. After 19,4MB at a speed of 2M stream, the IOMMU error occurs. I have tried the other direction. Receiving the stream at full speed and transmitting the stream at different speeds. tx: for i in 128K 256K 512K 1024K 1536K 2048K 4096K; do dd if=/dev/zero bs=$i count=30 2>/dev/null | pv -L $i | nc host 7777; done The IOMMU error occurs sometimes at the lowest speed, even after 70KiB of data transfer. It occurs always at at a speed of 256K. I have tested it four times. This is my last post. I have tested the other driver from the comment #5 and there is no difference to the module delivered with the 3.3 kernel with this test method. With the driver from comment #5 and + patch from comment #6 the bug occurs at higher tx speeds of 512 or 1024KiB/s. The rx speed is always 112MiB/s in all three cases. Sometimes the transmission just stops and I have CTRL+C it and start it again. It happens always at slow speeds of 4KiB/s, 8KiB/s and 16KiB/s. I can make some other tests. Created attachment 72802 [details]
Tx descriptor unmap barrier
Thomas, can you apply this patch on top of the previous one,
enable the iommu and see if it makes a difference, at least
for the iommu message ?
Thanks.
--
Ueimor
With the driver from comment #5 and + patch from comment #6 + patch from comment #17 the bug occurs at tx speeds of 256KiB/s, like before Perhaps the bug exists not in the transmitting part. The new tests show, that the bug only occurs, while rx and tx is active. And and need only a small amount of transmitting data, while receiving a lot of data. r8169 -> tx -> 256KiB/s <- rx -< 111Mib/s If switch the stream speeds, I need a much higher rx speed to provoke the bug. r8169 -> tx -> 111MiB/s <- rx -< 2Mib/s I have found the culpit patch. It's commit e542a2269f232d61270ceddd42b73a4348dee2bb r8169: adjust the RxConfig settings. Set the init value before reset in probe function. And then just modify the relative bits and keep the init settings. For 8110S, 8110SB, and 8110SC series, the initial value of RxConfig needs to be set after the tx/rx is enabled. I have reverted the patch and tested it with kernel 3.3.0 and there are no iommu errors anymore. I append the patch, but it's just the reversed patch and I am not a kernel developer. Created attachment 72993 [details]
Reversed patch from 3.0.0-rc5
Created attachment 72996 [details]
Poison Rx and Tx ring registers
(In reply to comment #19) > I have found the culpit patch. > It's commit e542a2269f232d61270ceddd42b73a4348dee2bb Thanks. I need to narrow where / when the wrong DMA happens. Can you apply #21 on top of unpatched 3.3.0 and send the new dmesg ? If the IOMMU emits a message, please define RX128_INT_EN as 0 and try the same test. > I have reverted the patch and tested it with kernel 3.3.0 and there are no > iommu errors anymore. I append the patch, but it's just the reversed patch > and > I am not a kernel developer. It's a very good work. -- Ueimor Created attachment 72998 [details] Dmesg with linux-3.3.0 + patch from comment #21 Created attachment 72999 [details] Dmesg with linux-3.3.0 + patch from comment #21 + RX128_INT_EN = 0 Created attachment 73000 [details]
Poison Rx and Tx ring registers
Poison high priority Tx ring as well.
The driver does not use it. It could be a good candidate for the unexpected DMA access.
(In reply to comment #24) > Created an attachment (id=72999) [details] > Dmesg with linux-3.3.0 + patch from comment #21 + RX128_INT_EN = 0 [ 306.775761] AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x001d address=0x0000000000003000 flags=0x0050] Same address. :o( I have updated the poisoning patch. Can you give it a try and send the new dmesg ? -- Ueimor Created attachment 73001 [details] Dmesg with linux-3.3.0 + patch from comment #25 Created attachment 73003 [details] Dmesg with linux-3.3.0 + patch from comment #25 - second run Maybe the iommu errors are twice, because I have started the test with the sending nc thread first. Here the dmesg output in the other order, which was used in the tests before. I am using two nc instances without any limit. One sending and the other receiving. I need both to provoke the error. (In reply to comment #28) > Created an attachment (id=73003) [details] > Dmesg with linux-3.3.0 + patch from comment #25 - second run > > Maybe the iommu errors are twice, because I have started the test with the > sending nc thread first. Here the dmesg output in the other order, which was > used in the tests before. > > I am using two nc instances without any limit. One sending and the other > receiving. I need both to provoke the error. Ok, I hope I finally understand your message. You can send and receive a lot of data : you happily pass init time so there is not much point in searching uninitialized things. Forget poisoning, it should not help here. You can send a lot when you do not receive much. This is TCP so the packet count is not necessarily that different between the Tx stream and the Rx stream. You can receive a lot when you do not send much. ACKs fly in the opposite direction this time. I do not expect much difference in the pps for Rx and Tx either. I am not sure how #15 and #18 should be read: (128k + 256k)/ 1500 bytes (MTU) will need four complete Tx descriptor rings and around one Rx descriptor ring. Assuming a concurrent Rx stream, ACKs do not need much data nor FIFO room nor DMA but their relative pps or Rx descriptor ring entries weight is high. It could leave the driver with an apparently smaller Rx ring. Can you define NUM_RX_DESC as 32 (or even 16) and see if the reverted e542a2269f232d61270ceddd42b73a4348dee2bb kernel is still stable ? If it does not break, replace #define RX_FIFO_THRESH (7 << RXCFG_FIFO_SHIFT) with: #define RX_FIFO_THRESH (6 << RXCFG_FIFO_SHIFT) -- Ueimor I have tested the 3.3.0 module + revert patch from comment #15 + changed #define RX_FIFO_THRESH to (6..) and (5..). I can provoke the error with the following setting only. #define RX_FIFO_THRESH (5 << RXCFG_FIFO_SHIFT) I have tested the concurrent rx/tx netcat testcase with udp packets instead of tcp packets. The iommu error did not occur, while I was using the original 3.3.0 kernel module. I have tested the 3.3.0 module + revert patch from comment #15 + changed NUM_RX_DESC to 16 and could not provoke the iommu error. (In reply to comment #30) > I have tested the 3.3.0 module + revert patch from comment #15 + changed > #define RX_FIFO_THRESH to (6..) and (5..). (comment #15 or #5 ? I am lost ) > I can provoke the error with the following setting only. > #define RX_FIFO_THRESH (5 << RXCFG_FIFO_SHIFT) I would then expect plain 3.3.0 driver to work if you add a case for RTL_GIGA_MAC_VER_34 similar to RTL_GIGA_MAC_VER_24 in rtl_init_rxcfg. Can you check it ? May I assume that the error still exhibits the same pattern : AMD-Vi error at 0x...3000 then a netdev watchdog warning a few seconds later ? > I have tested the concurrent rx/tx netcat testcase with udp packets instead > of > tcp packets. The iommu error did not occur, while I was using the original > 3.3.0 kernel module. (bingo) It starts to look like an usual Rx ring exhaustion / chipset FIFO overflow induced bug. Once you have checked the rtl_init_rxcfg thing above, can you return to a failing kernel, build with CONFIG_FTRACE and CONFIG_DYNAMIC_FTRACE, then enable trace for rtl_task, rtl_slow_event_work, rtl_reset_work and rtl_phy_work ? It should work as : # insmod /somewhere/r8169.ko # mount -t debugfs debugfs /sys/kernel/debug # cd /sys/kernel/debug/tracing # echo 0 > tracing_on # echo function > current_tracer # echo rtl_task > set_ftrace_filter # echo rtl_slow_event_work >> set_ftrace_filter (note: ">>" here) ... # cat enabled_functions (check) # echo 1 > tracing_on # cat trace <stress r8169> # echo 0 > tracing_on (see Documentation/trace/ftrace.txt as well) -- Ueimor Created attachment 73108 [details]
Add RTL_GIGA_MAC_VER_34 in rtl_init_rxcfg
Sorry I used the revert patch from comment #20 and not #15 in the last tests. With the kernel from 3.3.0 + patch from comment #32, I cannot provoke the iommu error any more. For the tracing stuff, I have used the kernel module from comment #5 without any patches. I will attach the dmesg output and the trace log. Created attachment 73109 [details]
Dmesg output during tracing
Created attachment 73110 [details]
Tracing output
All four function were enabled.
(In reply to comment #34) > Sorry I used the revert patch from comment #20 and not #15 in the last tests. > > With the kernel from 3.3.0 + patch from comment #32, I cannot provoke the > iommu > error any more. It was with small RX_RING (/me wonders how low you could go) and lots of TCP stuff through nc, right ? > For the tracing stuff, I have used the kernel module from comment #5 without > any patches. I will attach the dmesg output and the trace log. (IOMMU error address changed a bit) Ok, no slow work like RxOverflow before the watchdog triggers. Let's check there is no unduly ignored event status bit. Please apply incoming patch on top of #5 and enable the trace stuff. I have added a big netif_tx lock. I hope it won't break the bug. -- Ueimor Created attachment 73111 [details]
More debug info on top of #5
I have tested the last patch from #38, but there is still no trace near the iommu error. The dmesg and trace.log is attached in the following comments. > > With the kernel from 3.3.0 + patch from comment #32, I cannot provoke > > the iommu error any more. > > It was with small RX_RING (/me wonders how low you could go) and lots of > TCP stuff through nc, right ? I have tested it with NUM_RX_DESC 16 and RX_FIFO_THRESH (5 << ..) without iommu error. And it's the patch from #33 and #32. It's not really much tcp stuff. While transferring 111MiB/s there is only ~400KiB/s of additional traffic in each direction. Perhaps there is better test with many tcp connections? Created attachment 73125 [details]
Dmesg output during second tracing tracing / patch #38
Created attachment 73126 [details]
Second Tracing output / patch #38
(In reply to comment #39) > I have tested the last patch from #38, but there is still no trace near the > iommu error. The dmesg and trace.log is attached in the following comments. Actually there is : see the net_ratelimit messages. Please remove the "if (net_ratelimit())" test at line 5435. [...] > I have tested it with NUM_RX_DESC 16 and RX_FIFO_THRESH (5 << ..) without > iommu error. And it's the patch from #33 and #32. > > It's not really much tcp stuff. While transferring 111MiB/s there is only > ~400KiB/s of additional traffic in each direction. Perhaps there is better > test > with many tcp connections? Sure, I would not be that surprized if it can be broken as well. -- Ueimor I have removed the if (net_ratelimit()) cause, but leaved the following netif_err(tp, drv, tp->dev,... call. It starts at 1711.686374. Created attachment 73136 [details]
Dmesg output during third tracing / ratelimit removed
Created attachment 73137 [details]
Third Tracing output / ratelimit removed
There was no iommu error during this test. (In reply to comment #43) > I have removed the if (net_ratelimit()) cause, but leaved the following > netif_err(tp, drv, tp->dev,... call. > > It starts at 1711.686374. Can you replace "netif_err(tp, intr, tp->dev, " with plain "printk(KERN_INFO" Thanks. -- Ueimor (In reply to comment #46) > There was no iommu error during this test. I have yet to find a reason why the iommu sometimes chokes on the first Rx descriptor entry. From the driver's viewpoint, it does not change from "open()" to "close()". Btw can you build your kernel with CONFIG_SLAB instead of CONFIG_SLUB ? Thanks. -- Ueimor Created attachment 73144 [details]
Dmesg output during fourth tracing / kprint + SLAB
The next try. Does it help. I cannot see any differences.
Created attachment 73145 [details]
Fourth tracing output / kprint + SLAB
(In reply to comment #49) > Created an attachment (id=73144) [details] > Dmesg output during fourth tracing / kprint + SLAB > > The next try. Does it help. I cannot see any differences. Neither do I see any difference. Can you put a big 'return 1;' statement right at the start of lib/ratelimit.c::___ratelimit and set your CONFIG_LOG_BUF_SHIFT at 21 ? An 'ethtool -S p13p1' after bug will be welcome too. PS: before using the r8169 module, can you rmmod/modprobe it an extra time ? It may change the Rx descriptor address the iommu does not like. It can probably be ignored for now. -- Ueimor Created attachment 73156 [details]
Dmesg output during tracing / kprint + SLAB - ratelimit
I have even set the loglevel to 7.
Created attachment 73157 [details]
Dmesg output during tracing / kprint + SLAB - ratelimit
ethtool output after iommu error
NIC statistics:
tx_packets: 652164
rx_packets: 693122
tx_errors: 0
rx_errors: 0
rx_missed: 1
align_errors: 214
tx_single_collisions: 0
tx_multi_collisions: 0
unicast: 693028
broadcast: 91
multicast: 3
tx_aborted: 0
tx_underrun: 0
Same issue here. Same motherboard, with IOMMU enabled. It seems I have the extra annoyance of not being able to get the card to work again after the problem occurred - at least until reboot. I reproduce the problem consistently when using a bittorrent client (btlaunchmanycurses, bittornado version). Last occurrence was right after a sudden increase in Rx, though nothing impressive: ~1.1MB/s (limited by ADSL, but eth is in 1000baseT/Full link mode, dixit ethtool). I don't know about other occurrences I had. It would make the bug much less annoying if there is a way to recover without rebooting. Created attachment 73575 [details] Build r8169 module > It would make the bug much less annoying if there is a way to recover without > rebooting. You can this driver from comment #5 https://bugzilla.kernel.org/attachment.cgi?id=72682 And the patch from comment #32/33 https://bugzilla.kernel.org/attachment.cgi?id=73108 I have attached a quick and dirty script for building the module. It works fine with fedora. I exchanged a bit on the ML with Francois Romieu, and forgot to report back. He gave me the patch from comment #32 you linked for testing, and it worked fine on 3.3 and 3.4. I'm running patched 3.4(.0) since, and could not reproduce the crash so far. I could reproduce it on unpatched kernel within a few (tens of) second of max-speed full duplex transfer with another Gb card, and neither this test nor daily usage cause any problem on patched kernel. A patch referencing this bug report has been merged in Linux v3.5-rc5: commit eb2dc35d99028b698cdedba4f5522bc43e576bd2 Author: françois romieu <romieu@fr.zoreil.com> Date: Wed Jun 20 12:09:18 2012 +0000 r8169: RxConfig hack for the 8168evl. |