Bug 205073 - igb driver hang
Summary: igb driver hang
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-10-02 17:05 UTC by sander44
Modified: 2022-12-12 18:58 UTC (History)
6 users (show)

See Also:
Kernel Version: 5.4.0-rc1
Subsystem:
Regression: No
Bisected commit-id:


Attachments
journalctl file (285.95 KB, text/plain)
2019-10-02 17:05 UTC, sander44
Details

Description sander44 2019-10-02 17:05:11 UTC
Created attachment 285307 [details]
journalctl file

Hi, 
I installed Ubuntu 18.04.3 and with a newly compiled kernel, I have a network problem.

I ran the lspci command on the machine and the network board crashed.

In dmesg I see many warning and error messages.

ian 28 18:00:09 machine-test1 kernel: igb 0000:02:00.0 enp2s0: PCIe link lost
ian 28 18:00:09 machine-test1 kernel: ------------[ cut here ]------------
ian 28 18:00:09 machine-test1 kernel: igb: Failed to read reg 0xc030!
ian 28 18:00:09 machine-test1 kernel: WARNING: CPU: 0 PID: 12 at drivers/net/ethernet/intel/igb/igb_main.c:756 igb_rd32+0x5b/0x70 [igb]
ian 28 18:00:09 machine-test1 kernel: Modules linked in: nls_iso8859_1 snd_hda_codec_hdmi intel_rapl_msr intel_rapl_common intel_telemetry_debugfs intel_telemetry_pltdrv intel_punit_ipc intel_telemetry_core btrfs spi_pxa2xx_platform 8250_dw dw_dmac x86_pkg_temp_thermal xor dw_dmac_core zstd_compress raid6_pq libcrc32c coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper intel_cstate intel_rapl_perf snd_soc_skl snd_soc_sst_ipc pwm_lpss_pci pwm_lpss snd_soc_sst_dsp snd_hda_ext_core snd_soc_rt298 snd_soc_acpi_intel_match lpc_ich snd_soc_acpi snd_soc_rt286 input_leds snd_hda_intel snd_soc_rl6347a snd_intel_nhlt snd_soc_core i915 snd_hda_codec snd_compress snd_hda_core ac97_bus snd_hwdep snd_pcm_dmaengine snd_pcm intel_lpss_pci intel_lpss drm_kms_helper snd_seq_midi idma64 snd_seq_midi_event virt_dma drm fb_sys_fops hid_sensor_accel_3d syscopyarea snd_rawmidi hid_sensor_gyro_3d hid_sensor_als hid_sensor_magn_3d hid_sensor_incl_3d
ian 28 18:00:09 machine-test1 kernel:  hid_sensor_rotation hid_sensor_press sysfillrect hid_sensor_trigger snd_seq sysimgblt industrialio_triggered_buffer kfifo_buf snd_seq_device hid_sensor_iio_common industrialio mei_me snd_timer snd mei soundcore soc_button_array intel_hid intel_vbtn sparse_keymap mac_hid intel_pmc_ipc sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_sensor_custom hid_sensor_hub intel_ishtp_hid mmc_block sdhci_pci cqhci sdhci igb i2c_algo_bit dca ptp pps_core intel_ish_ipc intel_ishtp ahci libahci i2c_hid video pinctrl_broxton hid_generic usbhid hid
ian 28 18:00:09 machine-test1 kernel: CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.4.0-rc1-vanilla #1
ian 28 18:00:09 machine-test1 kernel: Hardware name: UEFI BIOS
ian 28 18:00:09 machine-test1 kernel: Workqueue: events igb_watchdog_task [igb]
ian 28 18:00:09 machine-test1 kernel: RIP: 0010:igb_rd32+0x5b/0x70 [igb]
ian 28 18:00:09 machine-test1 kernel: Code: 8b bf b8 fc ff ff 89 f3 48 c7 40 08 00 00 00 00 48 c7 c6 49 f1 50 c0 e8 91 5c 00 c7 89 de 48 c7 c7 08 fe 50 c0 e8 a5 4b 7a c6 <0f> 0b b8 ff ff ff ff 5b 5d c3 90 66 2e 0f 1f 84 00 00 00 00 00 0f
ian 28 18:00:09 machine-test1 kernel: RSP: 0018:ffffc003400b3db0 EFLAGS: 00010286
ian 28 18:00:09 machine-test1 kernel: RAX: 0000000000000000 RBX: 000000000000c030 RCX: 0000000000000006
ian 28 18:00:09 machine-test1 kernel: RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff9989b9a17380
ian 28 18:00:09 machine-test1 kernel: RBP: ffffc003400b3db8 R08: 0000000000000001 R09: 00000000000004f3
ian 28 18:00:09 machine-test1 kernel: R10: ffffffff88203d20 R11: 00000000000004f3 R12: ffff9989aec18e08
ian 28 18:00:09 machine-test1 kernel: R13: ffff9989afcf2300 R14: 0000000000000000 R15: 000000000000c030
ian 28 18:00:09 machine-test1 kernel: FS:  0000000000000000(0000) GS:ffff9989b9a00000(0000) knlGS:0000000000000000
ian 28 18:00:09 machine-test1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
ian 28 18:00:09 machine-test1 kernel: CR2: 00007ffc41007d00 CR3: 0000000030a0a000 CR4: 00000000003406f0
ian 28 18:00:09 machine-test1 kernel: Call Trace:
ian 28 18:00:09 machine-test1 kernel:  igb_update_stats+0x77/0x810 [igb]
ian 28 18:00:09 machine-test1 kernel:  igb_watchdog_task+0x10b/0x7b0 [igb]
ian 28 18:00:09 machine-test1 kernel:  ? __schedule+0x2f7/0x6f0
ian 28 18:00:09 machine-test1 kernel:  process_one_work+0x167/0x400
ian 28 18:00:09 machine-test1 kernel:  worker_thread+0x4d/0x460
ian 28 18:00:09 machine-test1 kernel:  kthread+0x105/0x140
ian 28 18:00:09 machine-test1 kernel:  ? rescuer_thread+0x360/0x360
ian 28 18:00:09 machine-test1 kernel:  ? kthread_destroy_worker+0x50/0x50
ian 28 18:00:09 machine-test1 kernel:  ret_from_fork+0x35/0x40
ian 28 18:00:09 machine-test1 kernel: ---[ end trace d365fd22826faf19 ]---
ian 28 18:00:13 machine-test1 systemd-timesyncd[709]: Timed out waiting for reply from 91.189.89.199:123 (ntp.ubuntu.com).
ian 28 18:00:23 machine-test1 systemd-timesyncd[709]: Timed out waiting for reply from 91.189.91.157:123 (ntp.ubuntu.com).
ian 28 18:00:26 machine-test1 gnome-software[2199]: failed to call gs_plugin_refresh on fwupd: [*/cabinet/*/source/fwupd/*] failed to download https://cdn.fwupd.org/downloads/firmware.xml.gz.asc: Connection terminated unexpectedly
ian 28 18:00:29 machine-test1 PackageKit[1351]: get-updates transaction /109_bacbaaec from uid 1000 finished with success after 3194ms
ian 28 18:00:30 machine-test1 kernel: ------------[ cut here ]------------
Comment 1 vladi 2020-03-01 01:43:34 UTC
looks like I have similar problem:

doing a btrfs send receive triggers this in my dmesg:

[14021.085239] ------------[ cut here ]------------                                                                                                                                                                                    [0/628]
[14021.085256] NETDEV WATCHDOG: van0 (igb): transmit queue 0 timed out                                                                                                                                                                        
[14021.085278] WARNING: CPU: 4 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x1fb/0x200                                                                                                                                                 
[14021.085281] Modules linked in: wireguard(O) ip6_udp_tunnel udp_tunnel tcp_diag udp_diag raw_diag inet_diag netlink_diag nfnetlink_queue xt_nat macvlan veth ip6table_filter ip6_tables nf_conntrack_netlink nfnetlink bridge stp llc 8021q 
iptable_raw nf_log_ipv4 nf_log_common xt_LOG xt_limit ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_NFQUEUE xt_mark ipv6 xt_conntrack iptable_filter xt_MASQUERADE xt_addrtype iptable_nat nf_nat iptable_mangle ip_tables x_tables sch_fq_codel nf_c
onntrack_sip nf_conntrack nf_defrag_ipv4 nf_defrag_ipv6 amdgpu kvm_amd tun ccp snd_hda_codec_hdmi mfd_core gpu_sched kvm ttm snd_hda_intel efi_pstore snd_intel_dspcfg irqbypass drm_kms_helper sha1_generic snd_hda_codec tcp_cubic syscopyar
ea snd_hda_core sysfillrect sysimgblt efivars fb_sys_fops k10temp snd_pcm snd_timer drm snd backlight soundcore cp210x evdev usbserial bfq acpi_cpufreq button algif_rng algif_aead algif_hash algif_skcipher af_alg overlay virtiofs fuse ext
4 mbcache jbd2 dm_thin_pool dm_persistent_data                                                                                                                                                                                                
[14021.085336]  dm_snapshot dm_bufio dm_bio_prison usb_storage sr_mod cdrom virtio_crypto crypto_engine virtio_mmio virtio_pci virtio_input virtio_balloon virtio_rng virtio_console virtio_blk virtio_ring virtio          
[14021.085350] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G           O      5.5.7 #65                                                                                                                                                            
[14021.085355] Hardware name: System manufacturer System Product Name/PRIME X370-PRO, BIOS 5220 09/12/2019                                                                                                      
[14021.085358] RIP: 0010:dev_watchdog+0x1fb/0x200                                                                                                                                                                                             
[14021.085362] Code: 49 63 56 e0 eb 93 4c 89 ef c6 05 1f f6 7a 00 01 e8 8a cc fc ff 44 89 e1 48 89 c2 4c 89 ee 48 c7 c7 40 6b 18 8c e8 15 41 9f ff <0f> 0b eb bf 90 48 83 ec 40 48 89 5c 24 10 48 89 6c 24 18 48 89 cb
[14021.085363] RSP: 0018:ffff9f5700204e98 EFLAGS: 00010282                                                                                                                                                                                    
[14021.085365] RAX: 0000000000000000 RBX: ffff9bcdca4c88c0 RCX: 0000000000000933                                                                                                                                
[14021.085366] RDX: 0000000000000001 RSI: 0000000000000092 RDI: ffffffff8c5c456c                                                                                                                                                              
[14021.085367] RBP: ffff9bcdcb66041c R08: 0000000000000933 R09: 0000000000000001                                                                                                                                
[14021.085368] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000                                                                                                                                                              
[14021.085369] R13: ffff9bcdcb660000 R14: ffff9bcdcb660440 R15: 0000000000000008                                                                                                                                
[14021.085370] FS:  0000000000000000(0000) GS:ffff9bcdd0f00000(0000) knlGS:0000000000000000                                                                                                                                                   
[14021.085372] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                                                                                                                                                
[14021.085373] CR2: 00007f68c8000010 CR3: 00000003bf346000 CR4: 00000000003406e0                                                                                                                                                              
[14021.085374] Call Trace:                                                                                                                                                                                      
[14021.085376]  <IRQ>                                                                                                                                                                                                                         
[14021.085379]  ? pfifo_fast_enqueue+0x180/0x180                                                                                                                                                                
[14021.085383]  call_timer_fn.isra.0+0x11/0x70                                                                                                                                                                                                
[14021.085386]  run_timer_softirq+0x342/0x390                                                                                                                                                                   
[14021.085389]  ? tick_sched_timer+0x40/0x90                                                                                                                                                                                                  
[14021.085391]  ? tick_sched_do_timer+0x70/0x70                                                                                                                                                                                               
[14021.085394]  ? sched_clock+0x5/0x10                                                                                                                                                                                                        
[14021.085397]  __do_softirq+0xd7/0x22c                                                                                                                                                                         
[14021.085401]  irq_exit+0xe5/0xf0                                                                                                                                                                                                            
[14021.085403]  smp_apic_timer_interrupt+0x66/0xa0                                                                                                                                                                                            
[14021.085406]  apic_timer_interrupt+0xf/0x20                                                                                                                                                                                                 
[14021.085407]  </IRQ>                                                                                                                                                                                                                        
[14021.085410] RIP: 0010:cpuidle_enter_state+0x138/0x260                                                                                                                                                                                      
[14021.085412] Code: e8 ed 0f aa ff 31 ff 48 89 c5 e8 73 28 aa ff 45 84 f6 74 12 9c 58 f6 c4 02 0f 85 0b 01 00 00 31 ff e8 3c 9a ae ff fb 45 85 ed <0f> 88 cf 00 00 00 49 63 d5 bf 68 00 00 00 48 89 e8 48 89 d1 48 2b        
[14021.085413] RSP: 0018:ffff9f5700137e78 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13                                                                                                                                                         
[14021.085415] RAX: ffff9bcdd0f217c0 RBX: ffffffff8c241da0 RCX: 000000000000001f                                                                                                                                                              
[14021.085416] RDX: 0000000000000000 RSI: 0000000023975c86 RDI: 0000000000000000                                                                                                                                                              
[14021.085417] RBP: 00000cc089cde165 R08: 00000cc089cde165 R09: 0000000000000068                                                                                                                                                              
[14021.085418] R10: ffff9bcdd0f20984 R11: ffff9bcdd0f20964 R12: ffff9bcdca4a7c00                                                                                                                                                              
[14021.085419] R13: 0000000000000002 R14: 0000000000000000 R15: ffffffff8c241e88                                                                                                                                                              
[14021.085422]  ? cpuidle_enter_state+0x11d/0x260                                                                                                                                                                                             
[14021.085424]  cpuidle_enter+0x32/0x50                                                                                                                                                                         
[14021.085426]  do_idle+0x1c3/0x230                                                                                                                                                                             
[14021.085429]  cpu_startup_entry+0x14/0x20                                                                            
[14021.085432]  start_secondary+0x143/0x170                                                                                                                                                                                                   
[14021.085435]  secondary_startup_64+0xa4/0xb0                                                                                                                                                                                                
[14021.085437] ---[ end trace 1d715de8f034f47c ]---                                                                                                                                                                                           
[14021.085677] igb 0000:06:00.0 van0: Reset adapter                                                                                                                                                                                           
[14023.003520] igb 0000:06:00.0 van0: igb: van0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX                                                                                                                                     
[14028.203200] igb 0000:06:00.0 van0: igb: van0 NIC Link is Down                                                                                                                                                                              
[14030.881395] igb 0000:06:00.0 van0: igb: van0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[14240.881370] igb 0000:06:00.0 van0: Reset adapter                                                                                                                                                                                           
[14242.799760] igb 0000:06:00.0 van0: igb: van0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX                                                                                                                                                                                                                            
[14269.537953] e1000e 0000:04:00.1 wan0: Reset adapter unexpectedly
Comment 2 vladi 2020-03-01 01:45:07 UTC
http://dpaste.com/3S2D5V9 is my emerge --info forgot to mention that this happens on kernel 5.5.x but everything works fine in 5.4.x
Comment 3 sander44 2020-05-31 11:27:40 UTC
Hi Kernel Team,

I used this parameter in cmdline, pcie_port_pm=off and it seems that the problem no longer appears.
Maybe it's a power problem, or old version bios.

Thanks for support.
Comment 5 c-h 2021-10-13 17:23:46 UTC
Hi,

I've had the same issue and can confirm your patch working.

Is there any chance of the patch landing?
I'd really like to go back to the vanilla distro kernel.

Thanks.
Comment 6 c-h 2021-11-22 19:03:20 UTC
Well, it appears that things have gotten worse with kernel version 5.15.
On 5.14 I had about a 50/50 chance of igb not hanging during boot, now it hangs on every boot without exception.

The patch above no longer works around the issue for me either.
I currently have to resort to blacklisting igb and loading it after boot, which doesn't seem to trigger the hanging.
Comment 7 Perlover 2022-12-12 18:58:10 UTC
Hello!
Same problem
Ubuntu 20.04 LTE
Kernel: 5.4.0-135-generic

Stack trace:

Dec 12 18:58:05 AH15 kernel: [4699248.577166] igb 0000:01:00.1 internal: PCIe link lost
Dec 12 18:58:05 AH15 kernel: [4699248.577230] igb 0000:01:00.1 internal: malformed Tx packet detected and dropped, LVMMC:0xffffffff
Dec 12 18:58:05 AH15 kernel: [4699249.473382] igb 0000:01:00.0 external: PCIe link lost
Dec 12 18:58:05 AH15 kernel: [4699249.473446] ------------[ cut here ]------------
Dec 12 18:58:05 AH15 kernel: [4699249.473448] igb: Failed to read reg 0xc030!
Dec 12 18:58:05 AH15 kernel: [4699249.473538] WARNING: CPU: 21 PID: 501169 at drivers/net/ethernet/intel/igb/igb_main.c:756 igb_rd32.cold+0x3a/0x46 [igb]
Dec 12 18:58:05 AH15 kernel: [4699249.473540] Modules linked in: ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs cpuid binfmt_misc nf_log_ipv6 ip6t_REJECT nf_reject_ipv6 xt_hl ip6t_rt nf_log_ipv4 nf_log_common ipt_REJECT nf_reject_ipv4 xt_LOG xt_limit xt_addrtype xt_tcpudp ip6table_filter ip6_tables xt_recent xt_connt>
Dec 12 18:58:05 AH15 kernel: [4699249.473596]  crypto_simd usbhid cryptd glue_helper igb ahci libsas drm hid i2c_i801 libahci megaraid_sas lpc_ich scsi_transport_sas dca i2c_algo_bit
Dec 12 18:58:05 AH15 kernel: [4699249.473613] CPU: 21 PID: 501169 Comm: kworker/21:1 Not tainted 5.4.0-128-generic #144-Ubuntu
Dec 12 18:58:05 AH15 kernel: [4699249.473615] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 1.0c 06/29/2012
Dec 12 18:58:05 AH15 kernel: [4699249.473626] Workqueue: events igb_watchdog_task [igb]
Dec 12 18:58:05 AH15 kernel: [4699249.473638] RIP: 0010:igb_rd32.cold+0x3a/0x46 [igb]
Dec 12 18:58:05 AH15 kernel: [4699249.473643] Code: c7 c6 c2 01 4b c0 e8 da da 23 e2 48 8b bb 30 ff ff ff e8 94 cf cc e1 84 c0 74 16 44 89 ee 48 c7 c7 d0 0e 4b c0 e8 a2 de 1e e2 <0f> 0b e9 12 3c fe ff e9 29 3c fe ff 8b b3 14 18 00 00 49 8d bc 24
Dec 12 18:58:05 AH15 kernel: [4699249.473645] RSP: 0018:ffffb1c9cf1abdb0 EFLAGS: 00010282
Dec 12 18:58:05 AH15 kernel: [4699249.473648] RAX: 0000000000000000 RBX: ffffa05c6ced0e08 RCX: 0000000000000006
Dec 12 18:58:05 AH15 kernel: [4699249.473650] RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffffa05c7fa5c8c0
Dec 12 18:58:05 AH15 kernel: [4699249.473652] RBP: ffffb1c9cf1abdc8 R08: 0000000000039e75 R09: 0000000000000004
Dec 12 18:58:05 AH15 kernel: [4699249.473654] R10: 0000000000000000 R11: 0000000000000001 R12: 00000000ffffffff
Dec 12 18:58:05 AH15 kernel: [4699249.473655] R13: 000000000000c030 R14: 0000000000000000 R15: ffffa05c78909f00
Dec 12 18:58:05 AH15 kernel: [4699249.473659] FS:  0000000000000000(0000) GS:ffffa05c7fa40000(0000) knlGS:0000000000000000
Dec 12 18:58:05 AH15 kernel: [4699249.473661] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 12 18:58:05 AH15 kernel: [4699249.473662] CR2: 00007f7d7617f000 CR3: 0000000e61c0a006 CR4: 00000000000606e0
Dec 12 18:58:05 AH15 kernel: [4699249.473665] Call Trace:
Dec 12 18:58:05 AH15 kernel: [4699249.473680]  igb_update_stats+0x78/0x820 [igb]
Dec 12 18:58:05 AH15 kernel: [4699249.473689]  igb_watchdog_task+0xa8/0x410 [igb]
Dec 12 18:58:05 AH15 kernel: [4699249.473697]  ? __schedule+0x2eb/0x740
Dec 12 18:58:05 AH15 kernel: [4699249.473705]  process_one_work+0x1eb/0x3b0
Dec 12 18:58:05 AH15 kernel: [4699249.473710]  worker_thread+0x4d/0x400
Dec 12 18:58:05 AH15 kernel: [4699249.473715]  kthread+0x104/0x140
Dec 12 18:58:05 AH15 kernel: [4699249.473719]  ? process_one_work+0x3b0/0x3b0
Dec 12 18:58:05 AH15 kernel: [4699249.473722]  ? kthread_park+0x90/0x90
Dec 12 18:58:05 AH15 kernel: [4699249.473726]  ret_from_fork+0x35/0x40
Dec 12 18:58:05 AH15 kernel: [4699249.473729] ---[ end trace f6253424685efc67 ]---
Dec 12 18:58:05 AH15 kernel: [4699249.473740] igb 0000:01:00.0 external: malformed Tx packet detected and dropped, LVMMC:0xffffffff
Dec 12 18:58:07 AH15 kernel: [4699250.561446] igb 0000:01:00.1 internal: malformed Tx packet detected and dropped, LVMMC:0xffffffff

Note You need to log in before you can comment on or make changes to this bug.