After booting, I noticed that within the first few minutes, I get this crash message. The adapter works normally afterwards, but all network activity stops until after this crash message. If it matters, this is using jumbo frames (MTU 9000). [ 179.289531] ------------[ cut here ]------------ [ 179.289547] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:306 dev_watchdog+0x235/0x240 [ 179.289551] NETDEV WATCHDOG: enp5s0 (sky2): transmit queue 0 timed out [ 179.289554] Modules linked in: rfcomm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_security ip6table_mangle ip6table_raw ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 iptable_security iptable_mangle iptable_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables bnep nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) snd_hda_codec_hdmi ir_lirc_codec lirc_dev r820t rtl2832 i2c_mux iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek snd_hda_codec_generic coretemp kvm_intel snd_hda_intel dvb_usb_rtl28xxu drm_kms_helper btusb dvb_usb_v2 btrtl btbcm snd_hda_codec drm btintel dvb_core kvm snd_hda_core snd_hwdep bluetooth snd_seq snd_seq_device [ 179.289601] irqbypass rc_core joydev snd_pcm rfkill snd_timer snd i7core_edac i2c_i801 lpc_ich soundcore shpchp edac_core i5500_temp asus_atk0110 acpi_cpufreq tpm_tis tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc mxm_wmi crc32c_intel serio_raw sky2 hid_microsoft fjes wmi [ 179.289623] CPU: 0 PID: 0 Comm: swapper/0 Tainted: P IOE 4.6.6-300.fc24.x86_64 #1 [ 179.289625] Hardware name: System manufacturer System Product Name/P6X58D-E, BIOS 0803 08/06/2012 [ 179.289627] 0000000000000286 60d93611ea8ce70a ffff8801b9203d98 ffffffff813dbf2f [ 179.289630] ffff8801b9203de8 0000000000000000 ffff8801b9203dd8 ffffffff810a741b [ 179.289632] 0000013281c6c8c0 0000000000000000 ffff8801b580b000 0000000000000000 [ 179.289635] Call Trace: [ 179.289636] <IRQ> [<ffffffff813dbf2f>] dump_stack+0x63/0x84 [ 179.289645] [<ffffffff810a741b>] __warn+0xcb/0xf0 [ 179.289648] [<ffffffff810a749f>] warn_slowpath_fmt+0x5f/0x80 [ 179.289651] [<ffffffff816ec5e5>] dev_watchdog+0x235/0x240 [ 179.289654] [<ffffffff816ec3b0>] ? qdisc_rcu_free+0x40/0x40 [ 179.289657] [<ffffffff81112e85>] call_timer_fn+0x35/0x120 [ 179.289660] [<ffffffff816ec3b0>] ? qdisc_rcu_free+0x40/0x40 [ 179.289662] [<ffffffff811134ce>] run_timer_softirq+0x21e/0x2d0 [ 179.289666] [<ffffffff817e3ce2>] __do_softirq+0x112/0x2e7 [ 179.289668] [<ffffffff810ad145>] irq_exit+0x105/0x110 [ 179.289671] [<ffffffff817e3ae2>] smp_apic_timer_interrupt+0x42/0x50 [ 179.289674] [<ffffffff817e1c1c>] apic_timer_interrupt+0x8c/0xa0 [ 179.289675] <EOI> [<ffffffff81675120>] ? cpuidle_enter_state+0x110/0x2b0 [ 179.289681] [<ffffffff816752f7>] cpuidle_enter+0x17/0x20 [ 179.289684] [<ffffffff810eb1ea>] call_cpuidle+0x2a/0x50 [ 179.289687] [<ffffffff810eb5d3>] cpu_startup_entry+0x2a3/0x350 [ 179.289690] [<ffffffff817d3b07>] rest_init+0x77/0x80 [ 179.289694] [<ffffffff81d69006>] start_kernel+0x453/0x474 [ 179.289697] [<ffffffff81d68120>] ? early_idt_handler_array+0x120/0x120 [ 179.289700] [<ffffffff81d68346>] x86_64_start_reservations+0x2a/0x2c [ 179.289703] [<ffffffff81d68494>] x86_64_start_kernel+0x14c/0x16f [ 179.289705] ---[ end trace 387d5665812891f6 ]--- [ 179.289710] sky2 0000:05:00.0 enp5s0: tx timeout [ 179.289717] sky2 0000:05:00.0 enp5s0: transmit ring 125 .. 0 report=125 done=125 [ 179.295228] sky2 0000:05:00.0 enp5s0: checksum offload not possible with jumbo frames [ 182.388925] sky2 0000:05:00.0 enp5s0: Link is up at 1000 Mbps, full duplex, flow control rx
$ lspci -v 05:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 12) Subsystem: ASUSTeK Computer Inc. Motherboard Flags: bus master, fast devsel, latency 0, IRQ 36 Memory at fbefc000 (64-bit, non-prefetchable) [size=16K] I/O ports at e800 [size=256] Expansion ROM at fbec0000 [disabled] [size=128K] Capabilities: [48] Power Management version 3 Capabilities: [50] Vital Product Data Capabilities: [5c] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [e0] Express Legacy Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Kernel driver in use: sky2 Kernel modules: sky2
Can you run with the below patch I want to see what function this driver is calling for it's timer interrupt workloads.
Created attachment 229521 [details] Trace Timer Function
Trying to create a new RPM for Fedora 24 to test this... Not having much luck at the moment - see how we go...
Ok, after a couple of goes building the packages of Fedora kernel at 160 minutes per go - finally got it going.... Attaching dmesg as a text file. Seems that functions gets hit heaps....
Created attachment 229561 [details] dmesg with debug patch as requested
As another data point, I just realised that this *ONLY* seems to trigger when I do the first packet that is a jumbo frame. If I do nothing much on my desktop, let it sit there idle, it doesn't happen. When I log into a different system via SSH, the connection sits there for ages - then the timeout happens. Afterwards, the network adaptor works perfectly for hours. For example, this one after mucking around on reddit for a while (Internet is 1492 byte packets for me): [ 999.025298] ------------[ cut here ]------------ [ 999.025314] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:306 dev_watchdog+0x235/0x240 [ 999.025317] NETDEV WATCHDOG: enp5s0 (sky2): transmit queue 0 timed out [ 999.025320] Modules linked in: rfcomm ip6t_REJECT nf_reject_ipv6 xt_conntrack ip6t_rpfilter ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_raw ip6table_mangle ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_raw iptable_mangle iptable_security ebtable_filter ebtables ip6table_filter ip6_tables bnep nvidia_drm(POE) snd_hda_codec_hdmi nvidia_modeset(POE) nvidia(POE) coretemp kvm_intel kvm ir_lirc_codec lirc_dev irqbypass snd_hda_codec_realtek r820t snd_hda_codec_generic iTCO_wdt rtl2832 iTCO_vendor_support snd_hda_intel i2c_mux snd_usb_audio snd_hda_codec btusb snd_usbmidi_lib snd_hda_core snd_rawmidi i2c_i801 snd_hwdep snd_seq btrtl btbcm snd_seq_device btintel snd_pcm [ 999.025367] drm_kms_helper bluetooth drm lpc_ich snd_timer dvb_usb_rtl28xxu dvb_usb_v2 dvb_core snd joydev rc_core rfkill i7core_edac soundcore asus_atk0110 edac_core i5500_temp shpchp acpi_cpufreq tpm_tis tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc mxm_wmi crc32c_intel serio_raw sky2 hid_microsoft wmi fjes [ 999.025394] CPU: 0 PID: 0 Comm: swapper/0 Tainted: P IOE 4.6.6-300.fc24.x86_64 #1 [ 999.025396] Hardware name: System manufacturer System Product Name/P6X58D-E, BIOS 0803 08/06/2012 [ 999.025398] 0000000000200286 9ef4baeae982a4fb ffff8801b9203d78 ffffffff813dbf2f [ 999.025401] ffff8801b9203dc8 0000000000000000 ffff8801b9203db8 ffffffff810a741b [ 999.025404] 00000132b920f560 0000000000000000 ffff880036877000 0000000000000000 [ 999.025407] Call Trace: [ 999.025409] <IRQ> [<ffffffff813dbf2f>] dump_stack+0x63/0x84 [ 999.025419] [<ffffffff810a741b>] __warn+0xcb/0xf0 [ 999.025422] [<ffffffff810a749f>] warn_slowpath_fmt+0x5f/0x80 [ 999.025428] [<ffffffff81124ae4>] ? tick_program_event+0x44/0x70 [ 999.025432] [<ffffffff816ec5e5>] dev_watchdog+0x235/0x240 [ 999.025435] [<ffffffff816ec3b0>] ? qdisc_rcu_free+0x40/0x40 [ 999.025438] [<ffffffff81112e85>] call_timer_fn+0x35/0x120 [ 999.025441] [<ffffffff816ec3b0>] ? qdisc_rcu_free+0x40/0x40 [ 999.025444] [<ffffffff811134ce>] run_timer_softirq+0x21e/0x2d0 [ 999.025448] [<ffffffff817e3ce2>] __do_softirq+0x112/0x2e7 [ 999.025451] [<ffffffff810ad145>] irq_exit+0x105/0x110 [ 999.025454] [<ffffffff817e3a24>] do_IRQ+0x54/0xd0 [ 999.025457] [<ffffffff817e18cc>] common_interrupt+0x8c/0x8c [ 999.025458] <EOI> [<ffffffff81675120>] ? cpuidle_enter_state+0x110/0x2b0 [ 999.025466] [<ffffffff816752f7>] cpuidle_enter+0x17/0x20 [ 999.025469] [<ffffffff810eb1ea>] call_cpuidle+0x2a/0x50 [ 999.025472] [<ffffffff810eb5d3>] cpu_startup_entry+0x2a3/0x350 [ 999.025475] [<ffffffff817d3b07>] rest_init+0x77/0x80 [ 999.025480] [<ffffffff81d69006>] start_kernel+0x453/0x474 [ 999.025483] [<ffffffff81d68120>] ? early_idt_handler_array+0x120/0x120 [ 999.025487] [<ffffffff81d68346>] x86_64_start_reservations+0x2a/0x2c [ 999.025490] [<ffffffff81d68494>] x86_64_start_kernel+0x14c/0x16f [ 999.025493] ---[ end trace 15d54a244e09137d ]--- [ 999.025499] sky2 0000:05:00.0 enp5s0: tx timeout [ 999.025506] sky2 0000:05:00.0 enp5s0: transmit ring 55 .. 81 report=55 done=55 [ 999.031341] sky2 0000:05:00.0 enp5s0: checksum offload not possible with jumbo frames [ 1002.076022] sky2 0000:05:00.0 enp5s0: Link is up at 1000 Mbps, full duplex, flow control rx
Ok send me your config, I want to see if you have CONFIG_NET_POLL_CONTROLLER enabled.
Created attachment 229571 [details] Fedora 4.4.6-300 config Config from the installed RPM attached. Also a grep to make things easier. # grep -i NET_POLL /boot/config-4.6.6-300.fc24.x86_64 CONFIG_NET_POLL_CONTROLLER=y
See if the below patch helps out. I am not sure if this will fix your issue 100% but seems to be a good place to start.
Created attachment 229581 [details] Test Patch
Am building this now... will take approx 3 hours to build, so I'll test it at home tonight ~1100 UTC.
If your building directly on the board with that marvell adaptor I would recommend looking into doing a kernel compile using a cross compiler. I don't known what architecture versions of gcc your distro comes with what if it's Ubuntu this is probably a pretty straight forward affair of installing the arm toolchain and building it, then moving it other.
Created attachment 229661 [details] dmesg after 'test patch' Booted with rebuilt kernel with test patch. Same problem. dmesg attached.
I have another idea but seems that the kernel your testing on is rather old, would you minding updating to a newer kernel like 4.8 in order to see if this is just an older issue. If it still is then I have another idea of what may be causing it.
Not sure I can do this easily unless Fedora takes the plunge and goes to 4.8. Interestingly enough, are you thinking about this patch? diff --git a/drivers/net/ethernet/marvell/sky2.c b/drivers/net/ethernet/marvell/sky2.c index ec0a221..467138b 100644 --- a/drivers/net/ethernet/marvell/sky2.c +++ b/drivers/net/ethernet/marvell/sky2.c @@ -2418,7 +2418,7 @@ static int sky2_change_mtu(struct net_device *dev, int new_mtu) sky2_write32(hw, B0_IMSK, 0); sky2_read32(hw, B0_IMSK); - dev->trans_start = jiffies; /* prevent tx timeout */ + netif_trans_update(dev); /* prevent tx timeout */ It seems that's the only change to this driver since 2015 - although that's a treewide change - so it may not be applicable to the 4.6 tree.
Basically there is one of two things probably going wrong, 1. This is the one your pointing one. 2. Issues with making the timer with mod_timer being called every tick and not per second in sky2_link_up. You can try that but after looking at that function, the internal code is the same expect for checking if we are already set to jiffies. This may we the issue so test and see but I am pretty doubtful after doubt checking that function's internal code base.
Can you try that and see if it feels the issue.
Just for the record, Fedora has changed to kernel 4.7.2. Crash messages are still the same though: [ 119.332567] ------------[ cut here ]------------ [ 119.332583] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:272 dev_watchdog+0x222/0x230 [ 119.332587] NETDEV WATCHDOG: enp5s0 (sky2): transmit queue 0 timed out [ 119.332589] Modules linked in: rfcomm rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_raw ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security iptable_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security ebtable_filter ebtables ip6table_filter ip6_tables bnep usblp nvidia_drm(POE) nvidia_modeset(POE) snd_hda_codec_hdmi nvidia(POE) intel_powerclamp coretemp kvm_intel ir_lirc_codec lirc_dev kvm r820t rtl2832 iTCO_wdt iTCO_vendor_support i2c_mux snd_hda_codec_realtek irqbypass snd_hda_codec_generic intel_cstate intel_uncore snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq btusb [ 119.332640] btrtl snd_seq_device btbcm dvb_usb_rtl28xxu btintel snd_pcm dvb_usb_v2 bluetooth dvb_core drm_kms_helper joydev i2c_i801 rc_core snd_timer drm rfkill lpc_ich snd i7core_edac soundcore edac_core i5500_temp asus_atk0110 shpchp acpi_cpufreq tpm_tis tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc mxm_wmi crc32c_intel serio_raw sky2 hid_microsoft fjes wmi [ 119.332672] CPU: 0 PID: 0 Comm: swapper/0 Tainted: P IOE 4.7.2-201.fc24.x86_64 #1 [ 119.332674] Hardware name: System manufacturer System Product Name/P6X58D-E, BIOS 0803 08/06/2012 [ 119.332676] 0000000000000286 9a1c3fef271a01e3 ffff8801b9203d78 ffffffff813d941f [ 119.332678] ffff8801b9203dc8 0000000000000000 ffff8801b9203db8 ffffffff8109facb [ 119.332680] 00000110b9628000 0000000000000000 ffff8800368ad000 0000000000000000 [ 119.332683] Call Trace: [ 119.332684] <IRQ> [<ffffffff813d941f>] dump_stack+0x63/0x84 [ 119.332693] [<ffffffff8109facb>] __warn+0xcb/0xf0 [ 119.332696] [<ffffffff8109fb4f>] warn_slowpath_fmt+0x5f/0x80 [ 119.332699] [<ffffffff816f6de2>] dev_watchdog+0x222/0x230 [ 119.332702] [<ffffffff816f6bc0>] ? qdisc_rcu_free+0x40/0x40 [ 119.332705] [<ffffffff8110d435>] call_timer_fn+0x35/0x120 [ 119.332708] [<ffffffff816f6bc0>] ? qdisc_rcu_free+0x40/0x40 [ 119.332710] [<ffffffff8110da7e>] run_timer_softirq+0x21e/0x2d0 [ 119.332712] [<ffffffff817ee5d2>] __do_softirq+0x112/0x2e7 [ 119.332715] [<ffffffff810a5745>] irq_exit+0x105/0x110 [ 119.332717] [<ffffffff817ee314>] do_IRQ+0x54/0xd0 [ 119.332721] [<ffffffff817ec1cc>] common_interrupt+0x8c/0x8c [ 119.332722] <EOI> [<ffffffff8167b04f>] ? cpuidle_enter_state+0x11f/0x2c0 [ 119.332729] [<ffffffff8167b040>] ? cpuidle_enter_state+0x110/0x2c0 [ 119.332731] [<ffffffff8167b227>] cpuidle_enter+0x17/0x20 [ 119.332734] [<ffffffff810e3c2a>] call_cpuidle+0x2a/0x50 [ 119.332736] [<ffffffff810e4013>] cpu_startup_entry+0x2a3/0x350 [ 119.332739] [<ffffffff817de4c7>] rest_init+0x77/0x80 [ 119.332741] [<ffffffff81d61001>] start_kernel+0x458/0x479 [ 119.332744] [<ffffffff81d60120>] ? early_idt_handler_array+0x120/0x120 [ 119.332746] [<ffffffff81d602db>] x86_64_start_reservations+0x2f/0x31 [ 119.332748] [<ffffffff81d60429>] x86_64_start_kernel+0x14c/0x16f [ 119.332751] ---[ end trace 5982559bc3952368 ]--- [ 119.332756] sky2 0000:05:00.0 enp5s0: tx timeout [ 119.332762] sky2 0000:05:00.0 enp5s0: transmit ring 39 .. 64 report=39 done=39 [ 119.336920] sky2 0000:05:00.0 enp5s0: checksum offload not possible with jumbo frames [ 125.527142] sky2 0000:05:00.0 enp5s0: Link is up at 1000 Mbps, full duplex, flow control rx