Bug 153461 - sky2: NETDEV WATCHDOG: enp5s0 (sky2): transmit queue 0 timed out
Summary: sky2: NETDEV WATCHDOG: enp5s0 (sky2): transmit queue 0 timed out
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_network@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-08-20 12:14 UTC by Steven Haigh
Modified: 2016-09-03 07:41 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.6.6
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Trace Timer Function (762 bytes, patch)
2016-08-21 00:46 UTC, [account disabled by the administrator]
Details | Diff
dmesg with debug patch as requested (87.24 KB, text/plain)
2016-08-21 08:38 UTC, Steven Haigh
Details
Fedora 4.4.6-300 config (171.02 KB, text/plain)
2016-08-21 13:16 UTC, Steven Haigh
Details
Test Patch (851 bytes, patch)
2016-08-21 16:36 UTC, [account disabled by the administrator]
Details | Diff
dmesg after 'test patch' (72.22 KB, text/plain)
2016-08-22 10:48 UTC, Steven Haigh
Details

Description Steven Haigh 2016-08-20 12:14:57 UTC
After booting, I noticed that within the first few minutes, I get this crash message. The adapter works normally afterwards, but all network activity stops until after this crash message.

If it matters, this is using jumbo frames (MTU 9000).

[  179.289531] ------------[ cut here ]------------
[  179.289547] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:306 dev_watchdog+0x235/0x240
[  179.289551] NETDEV WATCHDOG: enp5s0 (sky2): transmit queue 0 timed out
[  179.289554] Modules linked in: rfcomm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_security ip6table_mangle ip6table_raw ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 iptable_security iptable_mangle iptable_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables bnep nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) snd_hda_codec_hdmi ir_lirc_codec lirc_dev r820t rtl2832 i2c_mux iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek snd_hda_codec_generic coretemp kvm_intel snd_hda_intel dvb_usb_rtl28xxu drm_kms_helper btusb dvb_usb_v2 btrtl btbcm snd_hda_codec drm btintel dvb_core kvm snd_hda_core snd_hwdep bluetooth snd_seq snd_seq_device
[  179.289601]  irqbypass rc_core joydev snd_pcm rfkill snd_timer snd i7core_edac i2c_i801 lpc_ich soundcore shpchp edac_core i5500_temp asus_atk0110 acpi_cpufreq tpm_tis tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc mxm_wmi crc32c_intel serio_raw sky2 hid_microsoft fjes wmi
[  179.289623] CPU: 0 PID: 0 Comm: swapper/0 Tainted: P          IOE   4.6.6-300.fc24.x86_64 #1
[  179.289625] Hardware name: System manufacturer System Product Name/P6X58D-E, BIOS 0803    08/06/2012
[  179.289627]  0000000000000286 60d93611ea8ce70a ffff8801b9203d98 ffffffff813dbf2f
[  179.289630]  ffff8801b9203de8 0000000000000000 ffff8801b9203dd8 ffffffff810a741b
[  179.289632]  0000013281c6c8c0 0000000000000000 ffff8801b580b000 0000000000000000
[  179.289635] Call Trace:
[  179.289636]  <IRQ>  [<ffffffff813dbf2f>] dump_stack+0x63/0x84
[  179.289645]  [<ffffffff810a741b>] __warn+0xcb/0xf0
[  179.289648]  [<ffffffff810a749f>] warn_slowpath_fmt+0x5f/0x80
[  179.289651]  [<ffffffff816ec5e5>] dev_watchdog+0x235/0x240
[  179.289654]  [<ffffffff816ec3b0>] ? qdisc_rcu_free+0x40/0x40
[  179.289657]  [<ffffffff81112e85>] call_timer_fn+0x35/0x120
[  179.289660]  [<ffffffff816ec3b0>] ? qdisc_rcu_free+0x40/0x40
[  179.289662]  [<ffffffff811134ce>] run_timer_softirq+0x21e/0x2d0
[  179.289666]  [<ffffffff817e3ce2>] __do_softirq+0x112/0x2e7
[  179.289668]  [<ffffffff810ad145>] irq_exit+0x105/0x110
[  179.289671]  [<ffffffff817e3ae2>] smp_apic_timer_interrupt+0x42/0x50
[  179.289674]  [<ffffffff817e1c1c>] apic_timer_interrupt+0x8c/0xa0
[  179.289675]  <EOI>  [<ffffffff81675120>] ? cpuidle_enter_state+0x110/0x2b0
[  179.289681]  [<ffffffff816752f7>] cpuidle_enter+0x17/0x20
[  179.289684]  [<ffffffff810eb1ea>] call_cpuidle+0x2a/0x50
[  179.289687]  [<ffffffff810eb5d3>] cpu_startup_entry+0x2a3/0x350
[  179.289690]  [<ffffffff817d3b07>] rest_init+0x77/0x80
[  179.289694]  [<ffffffff81d69006>] start_kernel+0x453/0x474
[  179.289697]  [<ffffffff81d68120>] ? early_idt_handler_array+0x120/0x120
[  179.289700]  [<ffffffff81d68346>] x86_64_start_reservations+0x2a/0x2c
[  179.289703]  [<ffffffff81d68494>] x86_64_start_kernel+0x14c/0x16f
[  179.289705] ---[ end trace 387d5665812891f6 ]---
[  179.289710] sky2 0000:05:00.0 enp5s0: tx timeout
[  179.289717] sky2 0000:05:00.0 enp5s0: transmit ring 125 .. 0 report=125 done=125
[  179.295228] sky2 0000:05:00.0 enp5s0: checksum offload not possible with jumbo frames
[  182.388925] sky2 0000:05:00.0 enp5s0: Link is up at 1000 Mbps, full duplex, flow control rx
Comment 1 Steven Haigh 2016-08-20 12:29:52 UTC
$ lspci -v

05:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 12)
        Subsystem: ASUSTeK Computer Inc. Motherboard
        Flags: bus master, fast devsel, latency 0, IRQ 36
        Memory at fbefc000 (64-bit, non-prefetchable) [size=16K]
        I/O ports at e800 [size=256]
        Expansion ROM at fbec0000 [disabled] [size=128K]
        Capabilities: [48] Power Management version 3
        Capabilities: [50] Vital Product Data
        Capabilities: [5c] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [e0] Express Legacy Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Kernel driver in use: sky2
        Kernel modules: sky2
Comment 2 [account disabled by the administrator] 2016-08-21 00:44:25 UTC
Can you run with the below patch I want to see what function this driver is calling for it's timer interrupt workloads.
Comment 3 [account disabled by the administrator] 2016-08-21 00:46:01 UTC
Created attachment 229521 [details]
Trace Timer Function
Comment 4 Steven Haigh 2016-08-21 01:28:17 UTC
Trying to create a new RPM for Fedora 24 to test this... Not having much luck at the moment - see how we go...
Comment 5 Steven Haigh 2016-08-21 08:38:10 UTC
Ok, after a couple of goes building the packages of Fedora kernel at 160 minutes per go - finally got it going.... Attaching dmesg as a text file. Seems that functions gets hit heaps....
Comment 6 Steven Haigh 2016-08-21 08:38:38 UTC
Created attachment 229561 [details]
dmesg with debug patch as requested
Comment 7 Steven Haigh 2016-08-21 09:02:30 UTC
As another data point, I just realised that this *ONLY* seems to trigger when I do the first packet that is a jumbo frame.

If I do nothing much on my desktop, let it sit there idle, it doesn't happen. When I log into a different system via SSH, the connection sits there for ages - then the timeout happens. Afterwards, the network adaptor works perfectly for hours.

For example, this one after mucking around on reddit for a while (Internet is 1492 byte packets for me):

[  999.025298] ------------[ cut here ]------------
[  999.025314] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:306 dev_watchdog+0x235/0x240
[  999.025317] NETDEV WATCHDOG: enp5s0 (sky2): transmit queue 0 timed out
[  999.025320] Modules linked in: rfcomm ip6t_REJECT nf_reject_ipv6 xt_conntrack ip6t_rpfilter ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_raw ip6table_mangle ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_raw iptable_mangle iptable_security ebtable_filter ebtables ip6table_filter ip6_tables bnep nvidia_drm(POE) snd_hda_codec_hdmi nvidia_modeset(POE) nvidia(POE) coretemp kvm_intel kvm ir_lirc_codec lirc_dev irqbypass snd_hda_codec_realtek r820t snd_hda_codec_generic iTCO_wdt rtl2832 iTCO_vendor_support snd_hda_intel i2c_mux snd_usb_audio snd_hda_codec btusb snd_usbmidi_lib snd_hda_core snd_rawmidi i2c_i801 snd_hwdep snd_seq btrtl btbcm snd_seq_device btintel snd_pcm
[  999.025367]  drm_kms_helper bluetooth drm lpc_ich snd_timer dvb_usb_rtl28xxu dvb_usb_v2 dvb_core snd joydev rc_core rfkill i7core_edac soundcore asus_atk0110 edac_core i5500_temp shpchp acpi_cpufreq tpm_tis tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc mxm_wmi crc32c_intel serio_raw sky2 hid_microsoft wmi fjes
[  999.025394] CPU: 0 PID: 0 Comm: swapper/0 Tainted: P          IOE   4.6.6-300.fc24.x86_64 #1
[  999.025396] Hardware name: System manufacturer System Product Name/P6X58D-E, BIOS 0803    08/06/2012
[  999.025398]  0000000000200286 9ef4baeae982a4fb ffff8801b9203d78 ffffffff813dbf2f
[  999.025401]  ffff8801b9203dc8 0000000000000000 ffff8801b9203db8 ffffffff810a741b
[  999.025404]  00000132b920f560 0000000000000000 ffff880036877000 0000000000000000
[  999.025407] Call Trace:
[  999.025409]  <IRQ>  [<ffffffff813dbf2f>] dump_stack+0x63/0x84
[  999.025419]  [<ffffffff810a741b>] __warn+0xcb/0xf0
[  999.025422]  [<ffffffff810a749f>] warn_slowpath_fmt+0x5f/0x80
[  999.025428]  [<ffffffff81124ae4>] ? tick_program_event+0x44/0x70
[  999.025432]  [<ffffffff816ec5e5>] dev_watchdog+0x235/0x240
[  999.025435]  [<ffffffff816ec3b0>] ? qdisc_rcu_free+0x40/0x40
[  999.025438]  [<ffffffff81112e85>] call_timer_fn+0x35/0x120
[  999.025441]  [<ffffffff816ec3b0>] ? qdisc_rcu_free+0x40/0x40
[  999.025444]  [<ffffffff811134ce>] run_timer_softirq+0x21e/0x2d0
[  999.025448]  [<ffffffff817e3ce2>] __do_softirq+0x112/0x2e7
[  999.025451]  [<ffffffff810ad145>] irq_exit+0x105/0x110
[  999.025454]  [<ffffffff817e3a24>] do_IRQ+0x54/0xd0
[  999.025457]  [<ffffffff817e18cc>] common_interrupt+0x8c/0x8c
[  999.025458]  <EOI>  [<ffffffff81675120>] ? cpuidle_enter_state+0x110/0x2b0
[  999.025466]  [<ffffffff816752f7>] cpuidle_enter+0x17/0x20
[  999.025469]  [<ffffffff810eb1ea>] call_cpuidle+0x2a/0x50
[  999.025472]  [<ffffffff810eb5d3>] cpu_startup_entry+0x2a3/0x350
[  999.025475]  [<ffffffff817d3b07>] rest_init+0x77/0x80
[  999.025480]  [<ffffffff81d69006>] start_kernel+0x453/0x474
[  999.025483]  [<ffffffff81d68120>] ? early_idt_handler_array+0x120/0x120
[  999.025487]  [<ffffffff81d68346>] x86_64_start_reservations+0x2a/0x2c
[  999.025490]  [<ffffffff81d68494>] x86_64_start_kernel+0x14c/0x16f
[  999.025493] ---[ end trace 15d54a244e09137d ]---
[  999.025499] sky2 0000:05:00.0 enp5s0: tx timeout
[  999.025506] sky2 0000:05:00.0 enp5s0: transmit ring 55 .. 81 report=55 done=55
[  999.031341] sky2 0000:05:00.0 enp5s0: checksum offload not possible with jumbo frames
[ 1002.076022] sky2 0000:05:00.0 enp5s0: Link is up at 1000 Mbps, full duplex, flow control rx
Comment 8 [account disabled by the administrator] 2016-08-21 13:13:23 UTC
Ok send me your config, I want to see if you have CONFIG_NET_POLL_CONTROLLER enabled.
Comment 9 Steven Haigh 2016-08-21 13:16:00 UTC
Created attachment 229571 [details]
Fedora 4.4.6-300 config

Config from the installed RPM attached. Also a grep to make things easier.

# grep -i NET_POLL /boot/config-4.6.6-300.fc24.x86_64
CONFIG_NET_POLL_CONTROLLER=y
Comment 10 [account disabled by the administrator] 2016-08-21 16:35:44 UTC
See if the below patch helps out. I am not sure if this will fix your issue 100% but seems to be a good place to start.
Comment 11 [account disabled by the administrator] 2016-08-21 16:36:00 UTC
Created attachment 229581 [details]
Test Patch
Comment 12 Steven Haigh 2016-08-22 01:35:32 UTC
Am building this now... will take approx 3 hours to build, so I'll test it at home tonight ~1100 UTC.
Comment 13 [account disabled by the administrator] 2016-08-22 03:57:41 UTC
If your building directly on the board with that marvell adaptor I would recommend looking into doing a kernel compile using a cross compiler. I don't known what architecture versions of gcc your distro comes with what if it's Ubuntu this is probably a pretty straight forward affair of installing the arm toolchain and building it, then moving it other.
Comment 14 Steven Haigh 2016-08-22 10:48:07 UTC
Created attachment 229661 [details]
dmesg after 'test patch'

Booted with rebuilt kernel with test patch. Same problem. dmesg attached.
Comment 15 [account disabled by the administrator] 2016-08-22 14:09:48 UTC
I have another idea but seems that the kernel your testing on is rather old, would you minding updating to a newer kernel like 4.8 in order to see if this is just an older issue. If it still is then I have another idea of what may be causing it.
Comment 16 Steven Haigh 2016-08-22 14:49:56 UTC
Not sure I can do this easily unless Fedora takes the plunge and goes to 4.8.

Interestingly enough, are you thinking about this patch?

diff --git a/drivers/net/ethernet/marvell/sky2.c b/drivers/net/ethernet/marvell/sky2.c
index ec0a221..467138b 100644
--- a/drivers/net/ethernet/marvell/sky2.c
+++ b/drivers/net/ethernet/marvell/sky2.c
@@ -2418,7 +2418,7 @@ static int sky2_change_mtu(struct net_device *dev, int new_mtu)
 	sky2_write32(hw, B0_IMSK, 0);
 	sky2_read32(hw, B0_IMSK);
 
-	dev->trans_start = jiffies;	/* prevent tx timeout */
+	netif_trans_update(dev);	/* prevent tx timeout */

It seems that's the only change to this driver since 2015 - although that's a treewide change - so it may not be applicable to the 4.6 tree.
Comment 17 [account disabled by the administrator] 2016-08-22 15:37:10 UTC
Basically there is one of two things probably going wrong,
1. This is the one your pointing one.
2. Issues with making the timer with mod_timer being called every tick and not per second in sky2_link_up.
You can try that but after looking at that function, the internal code is the same expect for checking if we are already set to jiffies. This may we the issue so test and see but I am pretty doubtful after doubt checking that function's internal code base.
Comment 18 [account disabled by the administrator] 2016-08-30 21:39:23 UTC
Can  you try that and see if it feels the issue.
Comment 19 Steven Haigh 2016-09-03 07:41:42 UTC
Just for the record, Fedora has changed to kernel 4.7.2.

Crash messages are still the same though:
[  119.332567] ------------[ cut here ]------------
[  119.332583] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:272 dev_watchdog+0x222/0x230
[  119.332587] NETDEV WATCHDOG: enp5s0 (sky2): transmit queue 0 timed out
[  119.332589] Modules linked in: rfcomm rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_raw ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security iptable_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security ebtable_filter ebtables ip6table_filter ip6_tables bnep usblp nvidia_drm(POE) nvidia_modeset(POE) snd_hda_codec_hdmi nvidia(POE) intel_powerclamp coretemp kvm_intel ir_lirc_codec lirc_dev kvm r820t rtl2832 iTCO_wdt iTCO_vendor_support i2c_mux snd_hda_codec_realtek irqbypass snd_hda_codec_generic intel_cstate intel_uncore snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq btusb
[  119.332640]  btrtl snd_seq_device btbcm dvb_usb_rtl28xxu btintel snd_pcm dvb_usb_v2 bluetooth dvb_core drm_kms_helper joydev i2c_i801 rc_core snd_timer drm rfkill lpc_ich snd i7core_edac soundcore edac_core i5500_temp asus_atk0110 shpchp acpi_cpufreq tpm_tis tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc mxm_wmi crc32c_intel serio_raw sky2 hid_microsoft fjes wmi
[  119.332672] CPU: 0 PID: 0 Comm: swapper/0 Tainted: P          IOE   4.7.2-201.fc24.x86_64 #1
[  119.332674] Hardware name: System manufacturer System Product Name/P6X58D-E, BIOS 0803    08/06/2012
[  119.332676]  0000000000000286 9a1c3fef271a01e3 ffff8801b9203d78 ffffffff813d941f
[  119.332678]  ffff8801b9203dc8 0000000000000000 ffff8801b9203db8 ffffffff8109facb
[  119.332680]  00000110b9628000 0000000000000000 ffff8800368ad000 0000000000000000
[  119.332683] Call Trace:
[  119.332684]  <IRQ>  [<ffffffff813d941f>] dump_stack+0x63/0x84
[  119.332693]  [<ffffffff8109facb>] __warn+0xcb/0xf0
[  119.332696]  [<ffffffff8109fb4f>] warn_slowpath_fmt+0x5f/0x80
[  119.332699]  [<ffffffff816f6de2>] dev_watchdog+0x222/0x230
[  119.332702]  [<ffffffff816f6bc0>] ? qdisc_rcu_free+0x40/0x40
[  119.332705]  [<ffffffff8110d435>] call_timer_fn+0x35/0x120
[  119.332708]  [<ffffffff816f6bc0>] ? qdisc_rcu_free+0x40/0x40
[  119.332710]  [<ffffffff8110da7e>] run_timer_softirq+0x21e/0x2d0
[  119.332712]  [<ffffffff817ee5d2>] __do_softirq+0x112/0x2e7
[  119.332715]  [<ffffffff810a5745>] irq_exit+0x105/0x110
[  119.332717]  [<ffffffff817ee314>] do_IRQ+0x54/0xd0
[  119.332721]  [<ffffffff817ec1cc>] common_interrupt+0x8c/0x8c
[  119.332722]  <EOI>  [<ffffffff8167b04f>] ? cpuidle_enter_state+0x11f/0x2c0
[  119.332729]  [<ffffffff8167b040>] ? cpuidle_enter_state+0x110/0x2c0
[  119.332731]  [<ffffffff8167b227>] cpuidle_enter+0x17/0x20
[  119.332734]  [<ffffffff810e3c2a>] call_cpuidle+0x2a/0x50
[  119.332736]  [<ffffffff810e4013>] cpu_startup_entry+0x2a3/0x350
[  119.332739]  [<ffffffff817de4c7>] rest_init+0x77/0x80
[  119.332741]  [<ffffffff81d61001>] start_kernel+0x458/0x479
[  119.332744]  [<ffffffff81d60120>] ? early_idt_handler_array+0x120/0x120
[  119.332746]  [<ffffffff81d602db>] x86_64_start_reservations+0x2f/0x31
[  119.332748]  [<ffffffff81d60429>] x86_64_start_kernel+0x14c/0x16f
[  119.332751] ---[ end trace 5982559bc3952368 ]---
[  119.332756] sky2 0000:05:00.0 enp5s0: tx timeout
[  119.332762] sky2 0000:05:00.0 enp5s0: transmit ring 39 .. 64 report=39 done=39
[  119.336920] sky2 0000:05:00.0 enp5s0: checksum offload not possible with jumbo frames
[  125.527142] sky2 0000:05:00.0 enp5s0: Link is up at 1000 Mbps, full duplex, flow control rx

Note You need to log in before you can comment on or make changes to this bug.