Subject : 2.6.28: BUG in r8169 Submitter : "Andrey Vul" <andrey.vul@gmail.com> Date : 2008-12-31 18:37 References : http://marc.info/?l=linux-kernel&m=123074869611409&w=4 Notify-Also : Francois Romieu <romieu@fr.zoreil.com> This entry is being used for tracking a regression from 2.6.27. Please don't close it until the problem is fixed in the mainline.
probably dupe of #10109
I just had this pop up again today. Running 2.6.28.1 on Centos 5.2 x86_64. If IRQBALANCE is enabled, it happens immediately. Uptime is 17 days on brand new hardware. The box is under heavy network load. ------------[ cut here ]------------ WARNING: at net/sched/sch_generic.c:226 dev_watchdog+0x206/0x220() NETDEV WATCHDOG: eth0 (r8169): transmit timed out Modules linked in: dm_mirror dm_region_hash dm_log dm_multipath dm_mod serio_raw pcspkr Pid: 0, comm: swapper Not tainted 2.6.28.1 #6 Call Trace: <IRQ> [<ffffffff8023633c>] warn_slowpath+0x10c/0x150 [<ffffffff8024ab39>] autoremove_wake_function+0x9/0x30 [<ffffffff8022c65a>] __wake_up_common+0x5a/0x90 [<ffffffff8022c367>] source_load+0x37/0x70 [<ffffffff803cfb5a>] __next_cpu+0x1a/0x30 [<ffffffff8022daec>] find_busiest_group+0x18c/0x820 [<ffffffff803d52fe>] strlcpy+0x4e/0x80 [<ffffffff804b78b6>] dev_watchdog+0x206/0x220 [<ffffffff802122d9>] read_tsc+0x9/0x20 [<ffffffff80250368>] getnstimeofday+0x48/0xe0 [<ffffffff8024d5e8>] run_hrtimer_pending+0x18/0x120 [<ffffffff804b76b0>] dev_watchdog+0x0/0x220 [<ffffffff8023fbff>] run_timer_softirq+0x15f/0x1c0 [<ffffffff8023b52c>] __do_softirq+0x9c/0x170 [<ffffffff8020c87c>] call_softirq+0x1c/0x30 [<ffffffff8020e135>] do_softirq+0x35/0x70 [<ffffffff8021c205>] smp_apic_timer_interrupt+0x85/0xd0 [<ffffffff8020c2cb>] apic_timer_interrupt+0x6b/0x70 <EOI> [<ffffffff802130c1>] mwait_idle+0x41/0x50 [<ffffffff8020a2da>] cpu_idle+0x3a/0x70 ---[ end trace 33d76deea67d0fe1 ]--- r8169: eth0: link up r8169: eth0: link up
Please try something later than 2.6.28.1, preferably 2.6.29-rc5 or the latest 2.6.28.y .
I can confirm a similar bug on 2.6.28.7 and 2.6.28.8 ------------[ cut here ]------------ WARNING: at net/sched/sch_generic.c:226 dev_watchdog+0x247/0x260() NETDEV WATCHDOG: eth0 (r8169): transmit timed out Modules linked in: nfsd exportfs r8169 mii thermal processor fuse Pid: 0, comm: swapper Not tainted 2.6.28.8-090316-2248 #1 Call Trace: <IRQ> [<ffffffff8023e657>] warn_slowpath+0xb7/0xf0 [<ffffffff8050153b>] ? sock_def_readable+0x3b/0x70 [<ffffffff80502349>] ? sock_queue_rcv_skb+0xe9/0x130 [<ffffffff803cd2a9>] ? __next_cpu+0x19/0x30 [<ffffffff8023589c>] ? find_busiest_group+0x1dc/0x990 [<ffffffff80213319>] ? read_tsc+0x9/0x20 [<ffffffff8025b489>] ? getnstimeofday+0x59/0xe0 [<ffffffff80258389>] ? ktime_get_ts+0x59/0x60 [<ffffffff803d3209>] ? strlcpy+0x49/0x60 [<ffffffff8021e895>] ? lapic_next_event+0x15/0x20 [<ffffffff8051e627>] dev_watchdog+0x247/0x260 [<ffffffff8025a313>] ? sched_clock_cpu+0x143/0x190 [<ffffffff8051e3e0>] ? dev_watchdog+0x0/0x260 [<ffffffff80248d4f>] run_timer_softirq+0x13f/0x210 [<ffffffff80258389>] ? ktime_get_ts+0x59/0x60 [<ffffffff8025e20f>] ? clockevents_program_event+0x4f/0x90 [<ffffffff80243e14>] __do_softirq+0x94/0x160 [<ffffffff8020cddc>] call_softirq+0x1c/0x30 [<ffffffff8020e815>] do_softirq+0x45/0x80 [<ffffffff80243b9d>] irq_exit+0x8d/0xa0 [<ffffffff8021f138>] smp_apic_timer_interrupt+0x88/0xc0 [<ffffffff8020c82b>] apic_timer_interrupt+0x6b/0x70 <EOI> [<ffffffff802141aa>] ? mwait_idle+0x4a/0x50 [<ffffffff8020a502>] ? enter_idle+0x22/0x30 [<ffffffff8020a5ae>] ? cpu_idle+0x5e/0xb0 [<ffffffff805a9faf>] ? start_secondary+0x152/0x1a3 ---[ end trace 7b3bb601968af4c5 ]--- r8169: eth0: link up
I'm having the same issue with ubuntu 9.04 alpha [ 3538.000050] WARNING: at /build/buildd/linux-2.6.28/net/sched/sch_generic.c:226 dev_watchdog+0x219/0x230() [ 3538.000056] NETDEV WATCHDOG: eth2 (r8169): transmit timed out [ 3538.000060] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat binfmt_misc i915 drm bridge stp bnep video output input_polldev smsc47m1 smsc47m192 hwmon_vid i2c_i801 sbp2 ieee1394 lp snd_hda_intel snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy ppdev snd_seq_oss psmouse snd_seq_midi snd_rawmidi serio_raw iTCO_wdt iTCO_vendor_support snd_seq_midi_event snd_seq pcspkr snd_timer snd_seq_device snd soundcore intel_agp snd_page_alloc parport_pc parport agpgart usbhid usb_storage r8169 mii fbcon tileblit font bitblit softcursor [ 3538.000146] Pid: 0, comm: swapper Not tainted 2.6.28-11-generic #36-Ubuntu [ 3538.000151] Call Trace: [ 3538.000162] [<c0139a60>] warn_slowpath+0x60/0x80 [ 3538.000171] [<c02c0030>] ? as_deactivate_request+0x30/0x40 [ 3538.000178] [<c013128d>] ? find_busiest_group+0x15d/0x7f0 [ 3538.000185] [<c012c6dc>] ? enqueue_entity+0x13c/0x360 [ 3538.000193] [<c0156873>] ? getnstimeofday+0x53/0x110 [ 3538.000201] [<c0119a00>] ? lapic_get_maxlvt+0x0/0x30 [ 3538.000208] [<c047ced7>] ? icmp_send+0x167/0x560 [ 3538.000215] [<c0156873>] ? getnstimeofday+0x53/0x110 [ 3538.000221] [<c02cadbd>] ? strlcpy+0x1d/0x60 [ 3538.000229] [<c0431382>] ? netdev_drivername+0x32/0x40 [ 3538.000235] [<c0445ee9>] dev_watchdog+0x219/0x230 [ 3538.000243] [<c043b7ab>] ? neigh_table_init_no_netlink+0x14b/0x1d0 [ 3538.000249] [<c043af80>] ? neigh_periodic_timer+0x0/0x190 [ 3538.000257] [<c01444d7>] ? mod_timer+0x37/0x80 [ 3538.000263] [<c043b0a6>] ? neigh_periodic_timer+0x126/0x190 [ 3538.000269] [<c0143aa0>] run_timer_softirq+0x130/0x200 [ 3538.000275] [<c0445cd0>] ? dev_watchdog+0x0/0x230 [ 3538.000281] [<c0445cd0>] ? dev_watchdog+0x0/0x230 [ 3538.000289] [<c013f147>] __do_softirq+0x97/0x170 [ 3538.000296] [<c0152c36>] ? hrtimer_interrupt+0x186/0x1b0 [ 3538.000302] [<c0152a89>] ? ktime_get+0x19/0x40 [ 3538.000308] [<c013f27d>] do_softirq+0x5d/0x60 [ 3538.000314] [<c013f3f5>] irq_exit+0x55/0x90 [ 3538.000321] [<c011a0ab>] smp_apic_timer_interrupt+0x5b/0x90 [ 3538.000328] [<c0105318>] apic_timer_interrupt+0x28/0x30 [ 3538.000335] [<c010b002>] ? mwait_idle+0x42/0x50 [ 3538.000340] [<c010285d>] cpu_idle+0x6d/0xd0 [ 3538.000348] [<c04f11ee>] rest_init+0x4e/0x60 [ 3538.000353] ---[ end trace 05614c40c8f508dd ]-
And confirmed now again on Ubuntu 8.10 Intrepid Ibex which is weird since it used to work fine before, maybe I need to update my system in this case
dmesg output from Ubuntu Intrepid Ibex 8.10 2.6.27-11-generic [ 2521.964027] ------------[ cut here ]------------ [ 2521.964044] WARNING: at /build/buildd/linux-2.6.27/net/sched/sch_generic.c:219 dev_watchdog+0x21a/0x230() [ 2521.964051] NETDEV WATCHDOG: eth2 (r8169): transmit timed out [ 2521.964056] Modules linked in: nls_cp437 cifs i915 drm binfmt_misc af_packet bridge stp bnep sco rfcomm l2cap bluetooth ppdev cp$ [ 2521.964227] Pid: 0, comm: swapper Not tainted 2.6.27-11-generic #1 [ 2521.964235] [<c0131e15>] warn_slowpath+0x65/0x90 [ 2521.964246] [<c0240030>] ? get_request+0xc0/0x360 [ 2521.964256] [<c012990d>] ? find_busiest_group+0x15d/0x7c0 [ 2521.964267] [<c037edde>] ? account_scheduler_latency+0xe/0x220 [ 2521.964277] [<c037edde>] ? account_scheduler_latency+0xe/0x220 [ 2521.964286] [<c0118e38>] ? read_hpet+0x8/0x20 [ 2521.964297] [<c014e6eb>] ? getnstimeofday+0x4b/0x100 [ 2521.964307] [<c0136a26>] ? set_normalized_timespec+0x16/0x90 [ 2521.964316] [<c0154437>] ? timer_stats_update_stats+0x17/0x250 [ 2521.964325] [<c0254a19>] ? strlen+0x9/0x20 [ 2521.964333] [<c0252a9d>] ? strlcpy+0x1d/0x60 [ 2521.964341] [<c02f16a7>] ? netdev_drivername+0x37/0x40 [ 2521.964350] [<c03068aa>] dev_watchdog+0x21a/0x230 [ 2521.964358] [<c01136c0>] ? lapic_next_event+0x20/0x30 [ 2521.964368] [<c0151dbf>] ? clockevents_program_event+0x9f/0x150 [ 2521.964377] [<c013c038>] run_timer_softirq+0x138/0x210 [ 2521.964386] [<c0306690>] ? dev_watchdog+0x0/0x230 [ 2521.964394] [<c0306690>] ? dev_watchdog+0x0/0x230 [ 2521.964429] [<c0137732>] __do_softirq+0x92/0x120 [ 2521.964436] [<c013781d>] do_softirq+0x5d/0x60 [ 2521.964443] [<c0137995>] irq_exit+0x55/0x90 [ 2521.964450] [<c0113f8d>] smp_apic_timer_interrupt+0x5d/0x90 [ 2521.964459] [<c01050f8>] apic_timer_interrupt+0x28/0x30 [ 2521.964468] [<c010acca>] ? mwait_idle+0x4a/0x50 [ 2521.964476] [<c010288d>] cpu_idle+0x7d/0x140 [ 2521.964483] [<c037b471>] start_secondary+0x9d/0xcc [ 2521.964492] ======================= [ 2521.964497] ---[ end trace 436b5311b7770f56 ]--- [ 2521.983387] r8169: eth2: link up
Same bug on debian kernels 2.6.29-4, 2.6.30-8 and 2.6.32-2. see http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=528362 http://www.kerneloops.org/submitresult.php?number=1044137 probably dup of #12500 http://bugzilla.kernel.org/show_bug.cgi?id=12500 and #14709 (even with a non-gigabit ethernet card) http://bugzilla.kernel.org/show_bug.cgi?id=14709 and #10134 (because sometimes system hang and it is not possible to get any log) http://bugzilla.kernel.org/show_bug.cgi?id=10134
Problem is usually solved by using boot parameters noacpi pci=nomsi or even reloading modules mii and r8169. But sooner or later it happens again.
I had this problem with 2.6.33 from FC13. It is partially solved since 2.6.34rc4, no kernel error appears. But eth becomes frozen for 10-20second because RX Overflow occurs. Using "noacpi, pci=nomsi" boot parameters helps. Tested with 8169B rev2, mac_version 0x14
Problem seems to be fixed in 2.6.34-rc7. Backporting module changes to fedora 13 kernel would not help. Kernel error disapears, but something strange with msi support cause RX overflows occurs too fast. network works for 5 seconds then freeze for 10 seconds.. There is no problems with 2.6.34-rc7, may be earlier, but module fixes come in 2.6.34-rc4
Thank you for following up on this! Could you please test the latest 2.6.27.y and 2.6.32.y stable kernels, to see if that issue is also fixed there? Regards, Flo
Alright, if it is not fixed in 2.6.32.y and 2.6.27.y please reopen this bugreport. There are 5 commits from 2.6.34 regarding r8169 in 2.6.32.y. Regards, Flo
Actually this may not be fixed... :/ I picked up a netgear GA311 today and it doesn't work properly. It is just spewing management frames out onto the network uncontrollably. (perhaps the hardware I picked up is broken). Anyway here is some kernel output. [ 693.610782] r8169 0000:06:00.0: eth1: link down [ 697.165046] r8169 0000:06:00.0: eth1: link up [ 841.824026] ------------[ cut here ]------------ [ 841.824038] WARNING: at net/sched/sch_generic.c:258 dev_watchdog+0x151/0x1ff() [ 841.824041] Hardware name: [ 841.824044] NETDEV WATCHDOG: eth1 (r8169): transmit queue 0 timed out [ 841.824048] Modules linked in: ipv6 joydev hid_logitech ff_memless usbhid hid loop snd_hda_codec_realtek tpm_tis serio_raw tpm tpm_bios psmouse evdev i2c_i801 i2c_core pcspkr snd_hda_intel parport_pc parport processor rng_core snd_hda_codec button snd_pcm iTCO_wdt snd_timer shpchp intel_agp pci_hotplug intel_gtt snd soundcore snd_page_alloc ext3 jbd mbcache sha256_generic aes_x86_64 aes_generic cbc dm_crypt dm_mirror dm_region_hash dm_log dm_snapshot dm_mod raid1 md_mod sg sr_mod sd_mod cdrom crc_t10dif ide_pci_generic piix ide_core r8169 ata_piix ata_generic e100 mii libata ehci_hcd scsi_mod uhci_hcd thermal fan thermal_sys [ 841.824127] Pid: 0, comm: swapper Tainted: G M 2.6.37 #1 [ 841.824130] Call Trace: [ 841.824134] <IRQ> [<ffffffff810441eb>] warn_slowpath_common+0x80/0x98 [ 841.824147] [<ffffffff81044297>] warn_slowpath_fmt+0x41/0x43 [ 841.824152] [<ffffffff8127dec4>] dev_watchdog+0x151/0x1ff [ 841.824158] [<ffffffff8105aeca>] ? __queue_work+0x24a/0x259 [ 841.824164] [<ffffffff81050a86>] run_timer_softirq+0x210/0x2de [ 841.824169] [<ffffffff8127dd73>] ? dev_watchdog+0x0/0x1ff [ 841.824176] [<ffffffff81066cf0>] ? ktime_get+0x60/0xb9 [ 841.824181] [<ffffffff8104a0c8>] __do_softirq+0xd3/0x19b [ 841.824186] [<ffffffff8106ad3d>] ? tick_program_event+0x21/0x23 [ 841.824192] [<ffffffff8100395c>] call_softirq+0x1c/0x28 [ 841.824196] [<ffffffff810055ad>] do_softirq+0x41/0x7e [ 841.824200] [<ffffffff81049f53>] irq_exit+0x36/0x78 [ 841.824207] [<ffffffff8101b745>] smp_apic_timer_interrupt+0x88/0x96 [ 841.824213] [<ffffffff81003413>] apic_timer_interrupt+0x13/0x20 [ 841.824216] <EOI> [<ffffffff8100a460>] ? mwait_idle+0xbc/0xca [ 841.824226] [<ffffffff81062f20>] ? atomic_notifier_call_chain+0x13/0x15 [ 841.824231] [<ffffffff81001e46>] cpu_idle+0xb4/0x125 [ 841.824239] [<ffffffff812dc679>] rest_init+0x6d/0x6f [ 841.824245] [<ffffffff816c1d60>] start_kernel+0x3c8/0x3d3 [ 841.824250] [<ffffffff816c12b1>] x86_64_start_reservations+0xb8/0xbc [ 841.824255] [<ffffffff816c13bb>] x86_64_start_kernel+0x106/0x115 [ 841.824259] ---[ end trace 370ec34b2c707a8e ]--- [ 841.840080] r8169 0000:06:00.0: eth1: link up lspci 06:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
Can you bisect it? As far as I understood this bug got introduced between 2.6.27 and 2.6.28?
I would welcome several things : - above all the XID info and ethtool -d output. lspci is not specific enough. You can find the XID line in dmesg. Attaching a complete dmesg is fine. - a status report with a current kernel if the bug does not take ages to happen - Doug's problem may be a bit different. I do not have the resources to go out and figure the different sauces in each vendor's kernel. "current kernel" means either Linus's or David Miller's net-next git branches. For instance see: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6.git David B.: is it an identified regression for you or did you only experience NETDEV WATCHDOG messages _and_ loss of connectivity without any previously known working kernel version ? Last time I tried, my GA311 sent mac control frames too but it did not crash under pktgen (less than a week ago, current davem-next). -- Ueimor