Bug 198027

Summary: WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x260/0x270()
Product: Networking Reporter: SviMik (svimik)
Component: OtherAssignee: Stephen Hemminger (stephen)
Status: REOPENED ---    
Severity: normal    
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 4.14.2-1.el6.elrepo.x86_64 Subsystem:
Regression: No Bisected commit-id:

Description SviMik 2017-11-29 08:25:18 UTC
------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x260/0x270()
Modules linked in: xt_TRACE iptable_raw seqiv xfrm6_mode_tunnel xfrm4_mode_tunnel xt_set xt_multiport ip_set_hash_ip ip_set_hash_net ip_set nfnetlink xt_hashlimit ts_kmp ipt_REJECT xt_LOG xt_nat ts_bm xt_string xt_mark xt_connmark xt_TCPMSS iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat arptable_filter arp_tables ebtable_filter ebtables bridge tun arc4 ecb ppp_mppe l2tp_ppp l2tp_core pptp pppox ppp_generic slhc gre xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 af_key netconsole configfs 8021q garp stp llc bonding iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 gpio_ich iTCO_wdt iTCO_vendor_support coretemp freq_table mperf intel_powerclamp kvm_intel kvm microcode joydev serio_raw pcspkr i2c_i801 sg lpc_ich mei_me mei igb hwmon dca ptp pps_core ext4 jbd2 mbcache raid1 sd_mod crc_t10dif crc32_pclmul ghash_clmulni_intel crc32c_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 ahci libahci ttm drm_kms_helper sysimgblt sysfillrect syscopyarea dm_mirror dm_region_hash dm_log dm_mod
CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 3.10.108-1.el6.elrepo.x86_64 #1
Hardware name: Supermicro X9SCD/X9SCD, BIOS 1.0a 08/19/2011
ffffffff81a7ef44 ffff880139f69c28 ffffffff815f803e ffff880139f69c68
ffffffff8105c490 ffff880139f69c48 ffff88013567a000 ffff880134a548c0
0000000000000008 0000000000000000 ffff88013567a000 ffff880139f69cc8
Call Trace:
[<ffffffff815f803e>] dump_stack+0x19/0x1b
[<ffffffff8105c490>] warn_slowpath_common+0x70/0xa0
[<ffffffff8105c576>] warn_slowpath_fmt+0x46/0x50
[<ffffffff8107b100>] ? __queue_work+0x160/0x390
[<ffffffff81548f30>] dev_watchdog+0x260/0x270
[<ffffffff8107b330>] ? __queue_work+0x390/0x390
[<ffffffff81548cd0>] ? __netdev_watchdog_up+0x80/0x80
[<ffffffff8106caa9>] call_timer_fn+0x49/0x160
[<ffffffff8106d16b>] run_timer_softirq+0x23b/0x2c0
[<ffffffff81548cd0>] ? __netdev_watchdog_up+0x80/0x80
[<ffffffff81064f47>] __do_softirq+0xf7/0x2a0
[<ffffffff81065128>] run_ksoftirqd+0x38/0x50
[<ffffffff8108b8cd>] smpboot_thread_fn+0xfd/0x180
[<ffffffff8108b7d0>] ? smpboot_create_threads+0x80/0x80
[<ffffffff810835ae>] kthread+0xce/0xe0
[<ffffffff810834e0>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff816044c8>] ret_from_fork+0x58/0x90
[<ffffffff810834e0>] ? kthread_freezable_should_stop+0x70/0x70
---[ end trace c68bbf1b3044263a ]---
Comment 1 Stephen Hemminger 2017-11-29 17:46:13 UTC
The 3.10 kernel is End Of Life, can you reproduce with a later kernel.
This looks like a network driver specific suspend/resume bug.
You need to describe the identify the network driver (lspci or ethtool).

*** This bug has been marked as a duplicate of bug 196399 ***
Comment 2 Stephen Hemminger 2017-11-29 17:48:05 UTC
The only network device listed is the Intel Wireless card.
Please contact the Intel wifi team.
Comment 3 SviMik 2017-11-29 18:18:32 UTC
>can you reproduce with a later kernel
This is a kind of bug I cannot reproduce intentionally (it just happened without my attention, I don't know how to trigger it on purpose). Also, it happens very infrequently (first time this month), so "sit and wait" method can't be used here either.

>The 3.10 kernel is End Of Life
Sure, but (1) it ended quite recently, (2) I have no proof that this bug was fixed in newer kernels.

>The only network device listed is the Intel Wireless card.
I doubt I have any wireless cards in my server.

Here is my lspci output:
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 Processor Family DRAM Controller (rev 09)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port (rev 09)
00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04)
00:16.1 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #2 (rev 04)
00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05)
00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a5)
00:1f.0 ISA bridge: Intel Corporation C204 Chipset Family LPC Controller (rev 05)
00:1f.2 RAID bus controller: Intel Corporation SATA Controller [RAID mode] (rev 05)
00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05)
01:00.0 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01)
01:00.1 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01)
02:03.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200eW WPCM450 (rev 0a)

And ethtool output:
driver: igb
version: 5.0.3-k
firmware-version: 3.32, 0x8000023b
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
Comment 4 SviMik 2017-12-09 17:27:59 UTC
I have reproduced it with kernel 4.14.2-1.el6.elrepo.x86_64

Here is the oops message:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:320 dev_watchdog+0x219/0x220
Modules linked in: tcp_westwood tcp_yeah tcp_vegas sch_fq_codel des3_ede_x86_64 des_generic seqiv xfrm6_mode_tunnel xfrm4_mode_tunnel ghash_generic gf128mul ghash_clmulni_intel cryptd gcm ip_set_hash_net ip_set_hash_ip ip_set nfnetlink ctr drbg ansi_cprng authenc echainiv xfrm4_mode_transport cbc ts_kmp ipt_REJECT nf_reject_ipv4 xt_connlimit xt_conntrack xt_hashlimit nf_log_ipv4 nf_log_common xt_LOG xt_mark xt_nat ts_bm xt_string xt_TCPMSS iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_filter ip_tables arptable_filter arp_tables ebtable_filter ebtables tun arc4 ecb ppp_mppe l2tp_ppp l2tp_core udp_tunnel ip6_udp_tunnel pptp pppox ppp_generic slhc gre netconsole configfs xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 af_key bnx2fc fcoe libfcoe libfc
scsi_transport_fc 8021q garp sunrpc ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack libcrc32c ip6table_filter ip6_tables ppdev pvpanic parport_pc parport joydev pcspkr input_leds virtio_balloon i2c_piix4 sg ext4 jbd2 mbcache floppy sd_mod sr_mod cdrom pata_acpi ata_generic ata_piix e1000 virtio_pci virtio_ring virtio cirrus ttm dm_mirror dm_region_hash dm_log dm_mod dax be2iscsi bnx2i cnic uio cxgb4i cxgb4 ptp pps_core cxgb3i libcxgbi cxgb3 mdio libcxgb libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.2-1.el6.elrepo.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
task: ffff88011a440300 task.stack: ffffc900006a0000
RIP: 0010:dev_watchdog+0x219/0x220
RSP: 0018:ffff88011fd03c98 EFLAGS: 00010292
RAX: 0000000000000039 RBX: ffff880113372000 RCX: 000000000000083f
RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000083f
RBP: ffff88011fd03cd8 R08: 0000000000000005 R09: 0000000000000000
R10: ffffffff8231d645 R11: 3165282030687465 R12: ffff880113649000
R13: 0000000000000001 R14: 0000000000000001 R15: ffff880113372000
FS:  0000000000000000(0000) GS:ffff88011fd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f53246f9000 CR3: 0000000001e09001 CR4: 00000000001606e0
Call Trace:
<IRQ>
? dequeue_skb+0x260/0x260
call_timer_fn+0x3f/0x150
? find_next_bit+0xb/0x10
? __next_timer_interrupt+0xb4/0xe0
run_timer_softirq+0x3bd/0x4f0
? update_process_times+0x59/0x70
? tick_sched_timer+0x52/0xa0
? __run_hrtimer+0x8f/0x1b0
? tick_nohz_handler+0xc0/0xc0
? kvm_clock_read+0x1e/0x20
? kvm_clock_get_cycles+0x9/0x10
? ktime_get+0x5a/0xd0
? kvm_clock_read+0x1e/0x20
__do_softirq+0xe2/0x2ec
? hrtimer_interrupt+0x10f/0x1d0
irq_exit+0xbc/0xd0
smp_apic_timer_interrupt+0x7d/0x140
apic_timer_interrupt+0x93/0xa0
</IRQ>
RIP: 0010:native_safe_halt+0x6/0x10
RSP: 0018:ffffc900006a3e18 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff88011fd12020
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc900006a3e18 R08: 00000000d5d3ce1a R09: ffff880115ae0c88
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffff88011a440300 R14: 0000000000000000 R15: 0000000000000000
? sched_clock+0x9/0x10
default_idle+0x29/0x110
? sched_clock_cpu+0xae/0xc0
arch_cpu_idle+0xf/0x20
default_idle_call+0x23/0x40
cpuidle_idle_call+0xcb/0x160
do_idle+0x85/0xd0
cpu_startup_entry+0x68/0x70
start_secondary+0x15c/0x170
secondary_startup_64+0xa5/0xa5
Code: eb a4 48 89 df 89 4d c8 c6 05 c9 98 8f 00 01 e8 be 72 fc ff 8b 4d c8 48 89 c2 48 89 de 48 c7 c7 e0 14 d2 81 31 c0 e8 ab 2f 9f ff <0f> ff eb bd 0f 1f 00 55 48 89 e5 48 83 ec 50 48 89 5d d8 4c 89
---[ end trace f1d59f0bbe552a50 ]---
e1000 0000:00:03.0 eth0: Reset adapter
Comment 5 SviMik 2017-12-09 17:33:24 UTC
More info regarding the server with latest oops:

[root@localhost ~]# uname -a
Linux localhost 4.14.2-1.el6.elrepo.x86_64 #1 SMP Fri Nov 24 15:18:27 EST 2017 x86_64 x86_64 x86_64 GNU/Linux

[root@localhost ~]# lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01)
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:02.0 VGA compatible controller: Cirrus Logic GD 5446
00:03.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 03)
00:04.0 RAM memory: Red Hat, Inc Virtio memory balloon

[root@localhost ~]# ethtool -i eth0
driver: e1000
version: 7.3.21-k8-NAPI
firmware-version:
bus-info: 0000:00:03.0