Kernel Bug Tracker – Bug 42899
r8169 accesses invalid addresses on high throughput >50MiB/s
Last modified: 2012-07-29 17:41:25 UTC
06:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 06)
Subsystem: Giga-byte Technology GA-EP45-DS5 Motherboard
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 41
Region 0: I/O ports at 9e00 [size=256]
Region 2: Memory at fd7ff000 (64-bit, prefetchable) [size=4K]
Region 4: Memory at fd7f8000 (64-bit, prefetchable) [size=16K]
Capabilities: <access denied>
Kernel driver in use: r8169
[ 394.401670] AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x001d address=0x00000000009cd000 flags=0x0050]
[ 432.784034] ------------[ cut here ]------------
[ 432.784051] WARNING: at net/sched/sch_generic.c:256 dev_watchdog+0xf3/0x18b()
[ 432.784058] Hardware name: GA-990FXA-D3
[ 432.784064] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
[ 432.784070] Modules linked in: des_generic ecb md4 hmac nls_utf8 cifs fscache r8169 vmnet(O) vsock(O) vmci(O) vmmon(O) ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables parport_pc ppdev lp parport bridge stp rfcomm bnep bluetooth rfkill powernow_k8 mperf cpufreq_powersave cpufreq_stats cpufreq_userspace cpufreq_conservative binfmt_misc uinput ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse loop firewire_sbp2 firewire_core crc_itu_t xts gf128mul snd_hda_codec_hdmi snd_hda_codec_realtek mxm_wmi hid_logitech_dj evdev joydev snd_hda_intel snd_hda_codec radeon snd_hwdep snd_pcm snd_seq tpm_tis snd_timer snd_seq_device edac_core tpm sp5100_tco edac_mce_amd tpm_bios k10temp ttm serio_raw snd pcspkr i2c_piix4 drm_kms_helper drm i2c_algo_bit i2c_core wmi processor soundcore snd_page_alloc thermal_sys button ext4 mbcache jbd2 crc16 sha256_generic cryptd aes_x86_64 aes_generic cbc dm_crypt sg sr_mod cdrom dm_mod hid_cherry usbhid hid power_supply sd_mod crc_t10dif usb_storage uas ohci_hcd ahci libahci mii libata scsi_mod ehci_hcd xhci_hcd usbcore usb_common [last unloaded: r8169]
[ 432.784270] Pid: 0, comm: swapper/4 Tainted: G O 3.3.0-rc6 #2
[ 432.784277] Call Trace:
[ 432.784282] <IRQ> [<ffffffff8103b94a>] ? warn_slowpath_common+0x78/0x8c
[ 432.784301] [<ffffffff8103b9fc>] ? warn_slowpath_fmt+0x45/0x4a
[ 432.784311] [<ffffffff812bdd34>] ? netif_tx_lock+0x43/0x78
[ 432.784322] [<ffffffff812bde5c>] ? dev_watchdog+0xf3/0x18b
[ 432.784332] [<ffffffff81048105>] ? run_timer_softirq+0x1c4/0x292
[ 432.784341] [<ffffffff812bdd69>] ? netif_tx_lock+0x78/0x78
[ 432.784351] [<ffffffff810412f9>] ? __do_softirq+0xc4/0x1a0
[ 432.784361] [<ffffffff8109a19d>] ? handle_irq_event_percpu+0x166/0x184
[ 432.784370] [<ffffffff81013c8e>] ? read_tsc+0x5/0x16
[ 432.784380] [<ffffffff8136c79c>] ? call_softirq+0x1c/0x30
[ 432.784389] [<ffffffff8100fba3>] ? do_softirq+0x3f/0x79
[ 432.784398] [<ffffffff810410ce>] ? irq_exit+0x44/0xb1
[ 432.784406] [<ffffffff8100f480>] ? do_IRQ+0x94/0xaa
[ 432.784415] [<ffffffff81365e6e>] ? common_interrupt+0x6e/0x6e
[ 432.784420] <EOI> [<ffffffff8102d65c>] ? native_safe_halt+0x2/0x3
[ 432.784436] [<ffffffff810153ad>] ? default_idle+0x4b/0x84
[ 432.784445] [<ffffffff810154bd>] ? amd_e400_idle+0xd7/0xfe
[ 432.784453] [<ffffffff8100de94>] ? cpu_idle+0xb9/0xf7
[ 432.784462] [<ffffffff81365b39>] ? _raw_spin_unlock_irqrestore+0xc/0x12
[ 432.784471] [<ffffffff8135d630>] ? start_secondary+0x1fc/0x202
[ 432.784479] ---[ end trace df11ab4c28e1a01c ]---
[ 432.800911] r8169 0000:06:00.0: eth0: link up
I guess it syill happens even when VMware stuff has not been loaded after boot, right ?
Please send a 'cat /sys/bus/pci/devices/0000:06:00.0/resource' and a complete dmesg.
Created attachment 72684 [details]
The dmegs output from boot to the error
The IOMMU is enabled in the bios and is working on this machine.
Created attachment 72685 [details]
The bug appears only while sending a data. I was not able to force it while receiving. I have tested the release version 3.3 without any proprietary drivers.
I was unable to force the bug in debian backport kernel 3.2, but throughput was much lower.
This is similar to https://bugzilla.kernel.org/show_bug.cgi?id=42921
- same motherboard
- same Realtek chipset (8168evl)
- same IOMMU error
Actually the resource file is not needed as the error originates from the
PCIe device. If I read things correctly, the page was present and the
network device did not have read permission on it.
Afaiks it can only happen if the device has read a Tx descriptor ring entry
where the memory buffer address was not correctly set. Be it a race or a
wrong barrier, it would be consistent with the "high Tx traffic" condition.
Bug #42921 appears as a regression since 3.0.x from a user's viewpoint.
Explicit 8168evl support was missing in 3.0.x : it handles the 8168evl as
a 8168e. Greping for the XID line in a 3.0.x dmesg should confirm it.
The explicit 8168evl support code could be wrong but I'll rather assume
for now that it only changes timings. Same narrative for the backported
debian kernel 3.2 (it works but the throughput is lower -> different timing).
Thomas, can you try https://bugzilla.kernel.org/attachment.cgi?id=72682
with a 3.3 kernel ? It is mostly the post 3.3 r8169 driver. Its threading is
easier to understand - read "maintainer did not trust his years-old-modified-
by-many code" - and it should nail a pair of races. If it still does not work,
please apply the following patch on top of it (some writes may be seen
with an inconsistent ordering device-wise, I am not sure).
Created attachment 72689 [details]
Add memory / device barrier
Created attachment 72690 [details]
The dmegs output from boot to the error with the new driver
Created attachment 72691 [details]
The dmegs output from boot to the error with the new driver + barrier patch
(In reply to comment #8)
> The dmegs output from boot to the error with the new driver + barrier patch
Can you temporarily disable the IOMMU in the bios and test if the same driver
performs differently wrt Tx failure ?
I have tested it without IOMMU and cannot see any differences in the performance. I have transmitted many files and checked it for corruption, but all files are ok. But this drivers is the slowest of all drivers, with and without iommu.
(In reply to comment #10)
> I have tested it without IOMMU and cannot see any differences in the
> performance. I have transmitted many files and checked it for corruption, but
> all files are ok. But this drivers is the slowest of all drivers, with and
> without iommu.
How are you testing and defining "slower than" ?
It is inefficient wrt CPU usage when receiving - one extra memory copy - but
it should not be an issue when filling the pipe. You may check the setttings
(ethtool -k eth0) and set sg as well tx checksumming (ethtool -K eth0 tx on sg on)
to lower the CPU load in the Tx path.
(it does not explain what is going on between the device and the iommu though)
Perhaps a different latency of the packages changes the behaviour between the samba server and the client. I was copying a big file to the server and watching the throughput. It's not a reliable test.
With the original driver I get throughputs of up to 70MiB/s, which drops partly to ~40MiB/s. With the last driver I got very variability at the speeds, but it was between 25 and 60MiB/s. My server is a small energy efficient one, not the best for testing.
Is there a test tool, with which I can test different tx speeds? Just transmit lots of data to another computer, who just answer ok. Then I can make some tests with all three modules again with hopefully meaningful results.
I have made a test with netcat at a constant transmit speed of 112MiB/s and there is no problem. The problem only happens with the cifs and cachefs module. I have no problems while mounting the samba share with gvfs module.
I have made some more tests. The tx speed of the r8169 device was at the maximum with netcat. I opened a second netcat rx stream, limited by the sender with pv. I have increased the speed from 100K, 200K, 400K, 1M to 2M. After 19,4MB at a speed of 2M stream, the IOMMU error occurs.
I have tried the other direction. Receiving the stream at full speed and transmitting the stream at different speeds.
for i in 128K 256K 512K 1024K 1536K 2048K 4096K; do dd if=/dev/zero bs=$i count=30 2>/dev/null | pv -L $i | nc host 7777; done
The IOMMU error occurs sometimes at the lowest speed, even after 70KiB of data transfer. It occurs always at at a speed of 256K. I have tested it four times.
This is my last post.
I have tested the other driver from the comment #5 and there is no difference to the module delivered with the 3.3 kernel with this test method.
With the driver from comment #5 and + patch from comment #6 the bug occurs at higher tx speeds of 512 or 1024KiB/s.
The rx speed is always 112MiB/s in all three cases.
Sometimes the transmission just stops and I have CTRL+C it and start it again. It happens always at slow speeds of 4KiB/s, 8KiB/s and 16KiB/s.
I can make some other tests.
Created attachment 72802 [details]
Tx descriptor unmap barrier
Thomas, can you apply this patch on top of the previous one,
enable the iommu and see if it makes a difference, at least
for the iommu message ?
With the driver from comment #5 and + patch from comment #6 + patch from comment #17 the bug occurs at tx speeds of 256KiB/s, like before
Perhaps the bug exists not in the transmitting part. The new tests show, that the bug only occurs, while rx and tx is active. And and need only a small amount of transmitting data, while receiving a lot of data.
r8169 -> tx -> 256KiB/s
<- rx -< 111Mib/s
If switch the stream speeds, I need a much higher rx speed to provoke the bug.
r8169 -> tx -> 111MiB/s
<- rx -< 2Mib/s
I have found the culpit patch.
It's commit e542a2269f232d61270ceddd42b73a4348dee2bb
r8169: adjust the RxConfig settings.
Set the init value before reset in probe function. And then just
modify the relative bits and keep the init settings.
For 8110S, 8110SB, and 8110SC series, the initial value of RxConfig
needs to be set after the tx/rx is enabled.
I have reverted the patch and tested it with kernel 3.3.0 and there are no iommu errors anymore. I append the patch, but it's just the reversed patch and I am not a kernel developer.
Created attachment 72993 [details]
Reversed patch from 3.0.0-rc5
Created attachment 72996 [details]
Poison Rx and Tx ring registers
(In reply to comment #19)
> I have found the culpit patch.
> It's commit e542a2269f232d61270ceddd42b73a4348dee2bb
I need to narrow where / when the wrong DMA happens.
Can you apply #21 on top of unpatched 3.3.0 and send the new dmesg ?
If the IOMMU emits a message, please define RX128_INT_EN as 0 and try
the same test.
> I have reverted the patch and tested it with kernel 3.3.0 and there are no
> iommu errors anymore. I append the patch, but it's just the reversed patch and
> I am not a kernel developer.
It's a very good work.
Created attachment 72998 [details]
Dmesg with linux-3.3.0 + patch from comment #21
Created attachment 72999 [details]
Dmesg with linux-3.3.0 + patch from comment #21 + RX128_INT_EN = 0
Created attachment 73000 [details]
Poison Rx and Tx ring registers
Poison high priority Tx ring as well.
The driver does not use it. It could be a good candidate for the unexpected DMA access.
(In reply to comment #24)
> Created an attachment (id=72999) [details]
> Dmesg with linux-3.3.0 + patch from comment #21 + RX128_INT_EN = 0
[ 306.775761] AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x001d address=0x0000000000003000 flags=0x0050]
Same address. :o(
I have updated the poisoning patch. Can you give it a try and send the new dmesg ?
Created attachment 73001 [details]
Dmesg with linux-3.3.0 + patch from comment #25
Created attachment 73003 [details]
Dmesg with linux-3.3.0 + patch from comment #25 - second run
Maybe the iommu errors are twice, because I have started the test with the sending nc thread first. Here the dmesg output in the other order, which was used in the tests before.
I am using two nc instances without any limit. One sending and the other receiving. I need both to provoke the error.
(In reply to comment #28)
> Created an attachment (id=73003) [details]
> Dmesg with linux-3.3.0 + patch from comment #25 - second run
> Maybe the iommu errors are twice, because I have started the test with the
> sending nc thread first. Here the dmesg output in the other order, which was
> used in the tests before.
> I am using two nc instances without any limit. One sending and the other
> receiving. I need both to provoke the error.
Ok, I hope I finally understand your message.
You can send and receive a lot of data : you happily pass init time so there is not much point
in searching uninitialized things. Forget poisoning, it should not help here.
You can send a lot when you do not receive much. This is TCP so the packet count is not necessarily
that different between the Tx stream and the Rx stream. You can receive a lot when you do not send
much. ACKs fly in the opposite direction this time. I do not expect much difference in the pps for
Rx and Tx either.
I am not sure how #15 and #18 should be read: (128k + 256k)/ 1500 bytes (MTU) will need four
complete Tx descriptor rings and around one Rx descriptor ring. Assuming a concurrent Rx stream,
ACKs do not need much data nor FIFO room nor DMA but their relative pps or Rx descriptor ring
entries weight is high. It could leave the driver with an apparently smaller Rx ring. Can you define
NUM_RX_DESC as 32 (or even 16) and see if the reverted e542a2269f232d61270ceddd42b73a4348dee2bb
kernel is still stable ?
If it does not break, replace
#define RX_FIFO_THRESH (7 << RXCFG_FIFO_SHIFT)
#define RX_FIFO_THRESH (6 << RXCFG_FIFO_SHIFT)
I have tested the 3.3.0 module + revert patch from comment #15 + changed #define RX_FIFO_THRESH to (6..) and (5..).
I can provoke the error with the following setting only.
#define RX_FIFO_THRESH (5 << RXCFG_FIFO_SHIFT)
I have tested the concurrent rx/tx netcat testcase with udp packets instead of tcp packets. The iommu error did not occur, while I was using the original 3.3.0 kernel module.
I have tested the 3.3.0 module + revert patch from comment #15 + changed
NUM_RX_DESC to 16 and could not provoke the iommu error.
(In reply to comment #30)
> I have tested the 3.3.0 module + revert patch from comment #15 + changed
> #define RX_FIFO_THRESH to (6..) and (5..).
(comment #15 or #5 ? I am lost )
> I can provoke the error with the following setting only.
> #define RX_FIFO_THRESH (5 << RXCFG_FIFO_SHIFT)
I would then expect plain 3.3.0 driver to work if you add a case for
similar to RTL_GIGA_MAC_VER_24 in rtl_init_rxcfg. Can you check it ?
May I assume that the error still exhibits the same pattern : AMD-Vi error at
0x...3000 then a netdev watchdog warning a few seconds later ?
> I have tested the concurrent rx/tx netcat testcase with udp packets instead of
> tcp packets. The iommu error did not occur, while I was using the original
> 3.3.0 kernel module.
It starts to look like an usual Rx ring exhaustion / chipset FIFO overflow
induced bug. Once you have checked the rtl_init_rxcfg thing above, can you return to
a failing kernel, build with CONFIG_FTRACE and CONFIG_DYNAMIC_FTRACE, then
enable trace for rtl_task, rtl_slow_event_work, rtl_reset_work and rtl_phy_work ?
It should work as :
# insmod /somewhere/r8169.ko
# mount -t debugfs debugfs /sys/kernel/debug
# cd /sys/kernel/debug/tracing
# echo 0 > tracing_on
# echo function > current_tracer
# echo rtl_task > set_ftrace_filter
# echo rtl_slow_event_work >> set_ftrace_filter (note: ">>" here)
# cat enabled_functions (check)
# echo 1 > tracing_on
# cat trace
# echo 0 > tracing_on
(see Documentation/trace/ftrace.txt as well)
Created attachment 73108 [details]
Add RTL_GIGA_MAC_VER_34 in rtl_init_rxcfg
Sorry I used the revert patch from comment #20 and not #15 in the last tests.
With the kernel from 3.3.0 + patch from comment #32, I cannot provoke the iommu error any more.
For the tracing stuff, I have used the kernel module from comment #5 without any patches. I will attach the dmesg output and the trace log.
Created attachment 73109 [details]
Dmesg output during tracing
Created attachment 73110 [details]
All four function were enabled.
(In reply to comment #34)
> Sorry I used the revert patch from comment #20 and not #15 in the last tests.
> With the kernel from 3.3.0 + patch from comment #32, I cannot provoke the iommu
> error any more.
It was with small RX_RING (/me wonders how low you could go) and lots of TCP stuff
through nc, right ?
> For the tracing stuff, I have used the kernel module from comment #5 without
> any patches. I will attach the dmesg output and the trace log.
(IOMMU error address changed a bit)
Ok, no slow work like RxOverflow before the watchdog triggers. Let's check there is
no unduly ignored event status bit. Please apply incoming patch on top of #5 and
enable the trace stuff. I have added a big netif_tx lock. I hope it won't break the bug.
Created attachment 73111 [details]
More debug info on top of #5
I have tested the last patch from #38, but there is still no trace near the iommu error. The dmesg and trace.log is attached in the following comments.
> > With the kernel from 3.3.0 + patch from comment #32, I cannot provoke
> > the iommu error any more.
> It was with small RX_RING (/me wonders how low you could go) and lots of
> TCP stuff through nc, right ?
I have tested it with NUM_RX_DESC 16 and RX_FIFO_THRESH (5 << ..) without iommu error. And it's the patch from #33 and #32.
It's not really much tcp stuff. While transferring 111MiB/s there is only ~400KiB/s of additional traffic in each direction. Perhaps there is better test with many tcp connections?
Created attachment 73125 [details]
Dmesg output during second tracing tracing / patch #38
Created attachment 73126 [details]
Second Tracing output / patch #38
(In reply to comment #39)
> I have tested the last patch from #38, but there is still no trace near the
> iommu error. The dmesg and trace.log is attached in the following comments.
Actually there is : see the net_ratelimit messages. Please remove the
"if (net_ratelimit())" test at line 5435.
> I have tested it with NUM_RX_DESC 16 and RX_FIFO_THRESH (5 << ..) without
> iommu error. And it's the patch from #33 and #32.
> It's not really much tcp stuff. While transferring 111MiB/s there is only
> ~400KiB/s of additional traffic in each direction. Perhaps there is better test
> with many tcp connections?
Sure, I would not be that surprized if it can be broken as well.
I have removed the if (net_ratelimit()) cause, but leaved the following netif_err(tp, drv, tp->dev,... call.
It starts at 1711.686374.
Created attachment 73136 [details]
Dmesg output during third tracing / ratelimit removed
Created attachment 73137 [details]
Third Tracing output / ratelimit removed
There was no iommu error during this test.
(In reply to comment #43)
> I have removed the if (net_ratelimit()) cause, but leaved the following
> netif_err(tp, drv, tp->dev,... call.
> It starts at 1711.686374.
Can you replace "netif_err(tp, intr, tp->dev, " with plain "printk(KERN_INFO"
(In reply to comment #46)
> There was no iommu error during this test.
I have yet to find a reason why the iommu sometimes chokes on the first Rx descriptor entry.
From the driver's viewpoint, it does not change from "open()" to "close()".
Btw can you build your kernel with CONFIG_SLAB instead of CONFIG_SLUB ?
Created attachment 73144 [details]
Dmesg output during fourth tracing / kprint + SLAB
The next try. Does it help. I cannot see any differences.
Created attachment 73145 [details]
Fourth tracing output / kprint + SLAB
(In reply to comment #49)
> Created an attachment (id=73144) [details]
> Dmesg output during fourth tracing / kprint + SLAB
> The next try. Does it help. I cannot see any differences.
Neither do I see any difference. Can you put a big 'return 1;' statement
right at the start of lib/ratelimit.c::___ratelimit and set your
CONFIG_LOG_BUF_SHIFT at 21 ?
An 'ethtool -S p13p1' after bug will be welcome too.
PS: before using the r8169 module, can you rmmod/modprobe it an
extra time ? It may change the Rx descriptor address the iommu does
not like. It can probably be ignored for now.
Created attachment 73156 [details]
Dmesg output during tracing / kprint + SLAB - ratelimit
I have even set the loglevel to 7.
Created attachment 73157 [details]
Dmesg output during tracing / kprint + SLAB - ratelimit
ethtool output after iommu error
Same issue here. Same motherboard, with IOMMU enabled.
It seems I have the extra annoyance of not being able to get the card to work again after the problem occurred - at least until reboot.
I reproduce the problem consistently when using a bittorrent client (btlaunchmanycurses, bittornado version). Last occurrence was right after a sudden increase in Rx, though nothing impressive: ~1.1MB/s (limited by ADSL, but eth is in 1000baseT/Full link mode, dixit ethtool). I don't know about other occurrences I had.
It would make the bug much less annoying if there is a way to recover without rebooting.
Created attachment 73575 [details]
Build r8169 module
> It would make the bug much less annoying if there is a way to recover without
You can this driver from comment #5
And the patch from comment #32/33
I have attached a quick and dirty script for building the module. It works fine with fedora.
I exchanged a bit on the ML with Francois Romieu, and forgot to report back.
He gave me the patch from comment #32 you linked for testing, and it worked fine on 3.3 and 3.4. I'm running patched 3.4(.0) since, and could not reproduce the crash so far.
I could reproduce it on unpatched kernel within a few (tens of) second of max-speed full duplex transfer with another Gb card, and neither this test nor daily usage cause any problem on patched kernel.
A patch referencing this bug report has been merged in Linux v3.5-rc5:
Author: franÃ§ois romieu <firstname.lastname@example.org>
Date: Wed Jun 20 12:09:18 2012 +0000
r8169: RxConfig hack for the 8168evl.