Bug 205073
Summary: | igb driver hang | ||
---|---|---|---|
Product: | Drivers | Reporter: | sander44 (ionut_n2001) |
Component: | Network | Assignee: | Stephen Hemminger (stephen) |
Status: | NEW --- | ||
Severity: | high | CC: | acelan, Alexandra.Kossovsky, carsten_hammer, jeffrey.t.kirsher, perlover, vladi |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 5.4.0-rc1 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | journalctl file |
Description
sander44
2019-10-02 17:05:11 UTC
looks like I have similar problem: doing a btrfs send receive triggers this in my dmesg: [14021.085239] ------------[ cut here ]------------ [0/628] [14021.085256] NETDEV WATCHDOG: van0 (igb): transmit queue 0 timed out [14021.085278] WARNING: CPU: 4 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x1fb/0x200 [14021.085281] Modules linked in: wireguard(O) ip6_udp_tunnel udp_tunnel tcp_diag udp_diag raw_diag inet_diag netlink_diag nfnetlink_queue xt_nat macvlan veth ip6table_filter ip6_tables nf_conntrack_netlink nfnetlink bridge stp llc 8021q iptable_raw nf_log_ipv4 nf_log_common xt_LOG xt_limit ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_NFQUEUE xt_mark ipv6 xt_conntrack iptable_filter xt_MASQUERADE xt_addrtype iptable_nat nf_nat iptable_mangle ip_tables x_tables sch_fq_codel nf_c onntrack_sip nf_conntrack nf_defrag_ipv4 nf_defrag_ipv6 amdgpu kvm_amd tun ccp snd_hda_codec_hdmi mfd_core gpu_sched kvm ttm snd_hda_intel efi_pstore snd_intel_dspcfg irqbypass drm_kms_helper sha1_generic snd_hda_codec tcp_cubic syscopyar ea snd_hda_core sysfillrect sysimgblt efivars fb_sys_fops k10temp snd_pcm snd_timer drm snd backlight soundcore cp210x evdev usbserial bfq acpi_cpufreq button algif_rng algif_aead algif_hash algif_skcipher af_alg overlay virtiofs fuse ext 4 mbcache jbd2 dm_thin_pool dm_persistent_data [14021.085336] dm_snapshot dm_bufio dm_bio_prison usb_storage sr_mod cdrom virtio_crypto crypto_engine virtio_mmio virtio_pci virtio_input virtio_balloon virtio_rng virtio_console virtio_blk virtio_ring virtio [14021.085350] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G O 5.5.7 #65 [14021.085355] Hardware name: System manufacturer System Product Name/PRIME X370-PRO, BIOS 5220 09/12/2019 [14021.085358] RIP: 0010:dev_watchdog+0x1fb/0x200 [14021.085362] Code: 49 63 56 e0 eb 93 4c 89 ef c6 05 1f f6 7a 00 01 e8 8a cc fc ff 44 89 e1 48 89 c2 4c 89 ee 48 c7 c7 40 6b 18 8c e8 15 41 9f ff <0f> 0b eb bf 90 48 83 ec 40 48 89 5c 24 10 48 89 6c 24 18 48 89 cb [14021.085363] RSP: 0018:ffff9f5700204e98 EFLAGS: 00010282 [14021.085365] RAX: 0000000000000000 RBX: ffff9bcdca4c88c0 RCX: 0000000000000933 [14021.085366] RDX: 0000000000000001 RSI: 0000000000000092 RDI: ffffffff8c5c456c [14021.085367] RBP: ffff9bcdcb66041c R08: 0000000000000933 R09: 0000000000000001 [14021.085368] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 [14021.085369] R13: ffff9bcdcb660000 R14: ffff9bcdcb660440 R15: 0000000000000008 [14021.085370] FS: 0000000000000000(0000) GS:ffff9bcdd0f00000(0000) knlGS:0000000000000000 [14021.085372] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [14021.085373] CR2: 00007f68c8000010 CR3: 00000003bf346000 CR4: 00000000003406e0 [14021.085374] Call Trace: [14021.085376] <IRQ> [14021.085379] ? pfifo_fast_enqueue+0x180/0x180 [14021.085383] call_timer_fn.isra.0+0x11/0x70 [14021.085386] run_timer_softirq+0x342/0x390 [14021.085389] ? tick_sched_timer+0x40/0x90 [14021.085391] ? tick_sched_do_timer+0x70/0x70 [14021.085394] ? sched_clock+0x5/0x10 [14021.085397] __do_softirq+0xd7/0x22c [14021.085401] irq_exit+0xe5/0xf0 [14021.085403] smp_apic_timer_interrupt+0x66/0xa0 [14021.085406] apic_timer_interrupt+0xf/0x20 [14021.085407] </IRQ> [14021.085410] RIP: 0010:cpuidle_enter_state+0x138/0x260 [14021.085412] Code: e8 ed 0f aa ff 31 ff 48 89 c5 e8 73 28 aa ff 45 84 f6 74 12 9c 58 f6 c4 02 0f 85 0b 01 00 00 31 ff e8 3c 9a ae ff fb 45 85 ed <0f> 88 cf 00 00 00 49 63 d5 bf 68 00 00 00 48 89 e8 48 89 d1 48 2b [14021.085413] RSP: 0018:ffff9f5700137e78 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13 [14021.085415] RAX: ffff9bcdd0f217c0 RBX: ffffffff8c241da0 RCX: 000000000000001f [14021.085416] RDX: 0000000000000000 RSI: 0000000023975c86 RDI: 0000000000000000 [14021.085417] RBP: 00000cc089cde165 R08: 00000cc089cde165 R09: 0000000000000068 [14021.085418] R10: ffff9bcdd0f20984 R11: ffff9bcdd0f20964 R12: ffff9bcdca4a7c00 [14021.085419] R13: 0000000000000002 R14: 0000000000000000 R15: ffffffff8c241e88 [14021.085422] ? cpuidle_enter_state+0x11d/0x260 [14021.085424] cpuidle_enter+0x32/0x50 [14021.085426] do_idle+0x1c3/0x230 [14021.085429] cpu_startup_entry+0x14/0x20 [14021.085432] start_secondary+0x143/0x170 [14021.085435] secondary_startup_64+0xa4/0xb0 [14021.085437] ---[ end trace 1d715de8f034f47c ]--- [14021.085677] igb 0000:06:00.0 van0: Reset adapter [14023.003520] igb 0000:06:00.0 van0: igb: van0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX [14028.203200] igb 0000:06:00.0 van0: igb: van0 NIC Link is Down [14030.881395] igb 0000:06:00.0 van0: igb: van0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX [14240.881370] igb 0000:06:00.0 van0: Reset adapter [14242.799760] igb 0000:06:00.0 van0: igb: van0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX [14269.537953] e1000e 0000:04:00.1 wan0: Reset adapter unexpectedly http://dpaste.com/3S2D5V9 is my emerge --info forgot to mention that this happens on kernel 5.5.x but everything works fine in 5.4.x Hi Kernel Team, I used this parameter in cmdline, pcie_port_pm=off and it seems that the problem no longer appears. Maybe it's a power problem, or old version bios. Thanks for support. Could you try if this patch works? https://patchwork.kernel.org/project/netdevbpf/patch/20210420075406.64105-1-acelan.kao@canonical.com/ Hi, I've had the same issue and can confirm your patch working. Is there any chance of the patch landing? I'd really like to go back to the vanilla distro kernel. Thanks. Well, it appears that things have gotten worse with kernel version 5.15. On 5.14 I had about a 50/50 chance of igb not hanging during boot, now it hangs on every boot without exception. The patch above no longer works around the issue for me either. I currently have to resort to blacklisting igb and loading it after boot, which doesn't seem to trigger the hanging. Hello! Same problem Ubuntu 20.04 LTE Kernel: 5.4.0-135-generic Stack trace: Dec 12 18:58:05 AH15 kernel: [4699248.577166] igb 0000:01:00.1 internal: PCIe link lost Dec 12 18:58:05 AH15 kernel: [4699248.577230] igb 0000:01:00.1 internal: malformed Tx packet detected and dropped, LVMMC:0xffffffff Dec 12 18:58:05 AH15 kernel: [4699249.473382] igb 0000:01:00.0 external: PCIe link lost Dec 12 18:58:05 AH15 kernel: [4699249.473446] ------------[ cut here ]------------ Dec 12 18:58:05 AH15 kernel: [4699249.473448] igb: Failed to read reg 0xc030! Dec 12 18:58:05 AH15 kernel: [4699249.473538] WARNING: CPU: 21 PID: 501169 at drivers/net/ethernet/intel/igb/igb_main.c:756 igb_rd32.cold+0x3a/0x46 [igb] Dec 12 18:58:05 AH15 kernel: [4699249.473540] Modules linked in: ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs cpuid binfmt_misc nf_log_ipv6 ip6t_REJECT nf_reject_ipv6 xt_hl ip6t_rt nf_log_ipv4 nf_log_common ipt_REJECT nf_reject_ipv4 xt_LOG xt_limit xt_addrtype xt_tcpudp ip6table_filter ip6_tables xt_recent xt_connt> Dec 12 18:58:05 AH15 kernel: [4699249.473596] crypto_simd usbhid cryptd glue_helper igb ahci libsas drm hid i2c_i801 libahci megaraid_sas lpc_ich scsi_transport_sas dca i2c_algo_bit Dec 12 18:58:05 AH15 kernel: [4699249.473613] CPU: 21 PID: 501169 Comm: kworker/21:1 Not tainted 5.4.0-128-generic #144-Ubuntu Dec 12 18:58:05 AH15 kernel: [4699249.473615] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 1.0c 06/29/2012 Dec 12 18:58:05 AH15 kernel: [4699249.473626] Workqueue: events igb_watchdog_task [igb] Dec 12 18:58:05 AH15 kernel: [4699249.473638] RIP: 0010:igb_rd32.cold+0x3a/0x46 [igb] Dec 12 18:58:05 AH15 kernel: [4699249.473643] Code: c7 c6 c2 01 4b c0 e8 da da 23 e2 48 8b bb 30 ff ff ff e8 94 cf cc e1 84 c0 74 16 44 89 ee 48 c7 c7 d0 0e 4b c0 e8 a2 de 1e e2 <0f> 0b e9 12 3c fe ff e9 29 3c fe ff 8b b3 14 18 00 00 49 8d bc 24 Dec 12 18:58:05 AH15 kernel: [4699249.473645] RSP: 0018:ffffb1c9cf1abdb0 EFLAGS: 00010282 Dec 12 18:58:05 AH15 kernel: [4699249.473648] RAX: 0000000000000000 RBX: ffffa05c6ced0e08 RCX: 0000000000000006 Dec 12 18:58:05 AH15 kernel: [4699249.473650] RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffffa05c7fa5c8c0 Dec 12 18:58:05 AH15 kernel: [4699249.473652] RBP: ffffb1c9cf1abdc8 R08: 0000000000039e75 R09: 0000000000000004 Dec 12 18:58:05 AH15 kernel: [4699249.473654] R10: 0000000000000000 R11: 0000000000000001 R12: 00000000ffffffff Dec 12 18:58:05 AH15 kernel: [4699249.473655] R13: 000000000000c030 R14: 0000000000000000 R15: ffffa05c78909f00 Dec 12 18:58:05 AH15 kernel: [4699249.473659] FS: 0000000000000000(0000) GS:ffffa05c7fa40000(0000) knlGS:0000000000000000 Dec 12 18:58:05 AH15 kernel: [4699249.473661] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 12 18:58:05 AH15 kernel: [4699249.473662] CR2: 00007f7d7617f000 CR3: 0000000e61c0a006 CR4: 00000000000606e0 Dec 12 18:58:05 AH15 kernel: [4699249.473665] Call Trace: Dec 12 18:58:05 AH15 kernel: [4699249.473680] igb_update_stats+0x78/0x820 [igb] Dec 12 18:58:05 AH15 kernel: [4699249.473689] igb_watchdog_task+0xa8/0x410 [igb] Dec 12 18:58:05 AH15 kernel: [4699249.473697] ? __schedule+0x2eb/0x740 Dec 12 18:58:05 AH15 kernel: [4699249.473705] process_one_work+0x1eb/0x3b0 Dec 12 18:58:05 AH15 kernel: [4699249.473710] worker_thread+0x4d/0x400 Dec 12 18:58:05 AH15 kernel: [4699249.473715] kthread+0x104/0x140 Dec 12 18:58:05 AH15 kernel: [4699249.473719] ? process_one_work+0x3b0/0x3b0 Dec 12 18:58:05 AH15 kernel: [4699249.473722] ? kthread_park+0x90/0x90 Dec 12 18:58:05 AH15 kernel: [4699249.473726] ret_from_fork+0x35/0x40 Dec 12 18:58:05 AH15 kernel: [4699249.473729] ---[ end trace f6253424685efc67 ]--- Dec 12 18:58:05 AH15 kernel: [4699249.473740] igb 0000:01:00.0 external: malformed Tx packet detected and dropped, LVMMC:0xffffffff Dec 12 18:58:07 AH15 kernel: [4699250.561446] igb 0000:01:00.1 internal: malformed Tx packet detected and dropped, LVMMC:0xffffffff |