Bug 191201
Summary: | Randomly freezes due to VMXNET3 | ||
---|---|---|---|
Product: | Drivers | Reporter: | peter.hun |
Component: | Network | Assignee: | Stephen Hemminger (stephen) |
Status: | RESOLVED INVALID | ||
Severity: | blocking | CC: | andy.moore.95, cmosqt, dwall, jsavanyo, lywjk, magic, miniflowtrader, nathan, nefelim4ag, pedretti.fabio, remigiusz.szczepanik, ryan, skhare, toracat, unixi |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 4.9.0 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Kernel Panic log
vmss file from VM at panic state CPU and RAM usage after bug occurred vmss from panic state VM vmss from panic state VM - just after reboot crash dump from panic state config-4.10.3-1.el7.elrepo.x86_64 debug-patch |
I've probably also hit that bug on Ubuntu 16.10, linux-image-4.8.0-32-generic 4.8.0-32.34 [ 8442.722056] ------------[ cut here ]------------ [ 8442.722119] kernel BUG at /build/linux-_qw1uB/linux-4.8.0/drivers/net/vmxnet3/vmxnet3_drv.c:1413! [ 8442.722198] invalid opcode: 0000 [#1] SMP [ 8442.722235] Modules linked in: xt_multiport vmw_vsock_vmci_transport vsock ppdev vmw_balloon coretemp input_leds joydev serio_raw i2c_piix4 shpchp nfit vmw_vmci parport_pc parport mac_hid ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tables ib_iser nf_conntrack_netbios_ns rdma_cm nf_conntrack_broadcast iw_cm ib_cm nf_nat_ftp ib_core nf_nat nf_conntrack_ftp nf_conntrack iptable_filter configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic [ 8442.723051] usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 vmwgfx lrw glue_helper ablk_helper cryptd ttm drm_kms_helper psmouse syscopyarea sysfillrect vmxnet3 sysimgblt ahci fb_sys_fops mptspi libahci mptscsih drm mptbase scsi_transport_spi pata_acpi fjes [ 8442.723351] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.8.0-32-generic #34-Ubuntu [ 8442.723416] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016 [ 8442.723508] task: ffff9333762a2700 task.stack: ffff9333762ac000 [ 8442.723560] RIP: 0010:[<ffffffffc030e701>] [<ffffffffc030e701>] vmxnet3_rq_rx_complete+0x8d1/0xeb0 [vmxnet3] [ 8442.723656] RSP: 0018:ffff93337fc83dc8 EFLAGS: 00010297 [ 8442.723704] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff933372735800 [ 8442.723765] RDX: 0000000000000040 RSI: 0000000000000001 RDI: 0000000000000040 [ 8442.723826] RBP: ffff93337fc83e40 R08: 0000000000000002 R09: 0000000000000030 [ 8442.723888] R10: 0000000000000000 R11: ffff93336d15c880 R12: ffff933372bae850 [ 8442.723949] R13: ffff93336d15d400 R14: ffff933372ab2010 R15: ffff933372b28018 [ 8442.724012] FS: 0000000000000000(0000) GS:ffff93337fc80000(0000) knlGS:0000000000000000 [ 8442.724081] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8442.724131] CR2: 00007f4ca84d62a0 CR3: 00000002327d5000 CR4: 00000000000006e0 [ 8442.724299] Stack: [ 8442.724324] ffff93336d15c880 ffff93336d15c880 0000000000000000 ffff93336d15c880 [ 8442.724401] ffff93337fc8002d ffff93336d15d420 0000000000000002 0000000100000040 [ 8442.724475] ffff93336d15d4e8 0000000000000000 ffff93336d15c880 ffff93336d15d420 [ 8442.724549] Call Trace: [ 8442.726430] <IRQ> [ 8442.726458] [<ffffffffc030ee3a>] vmxnet3_poll_rx_only+0x3a/0xb0 [vmxnet3] [ 8442.730045] [<ffffffff87167dfd>] ? add_interrupt_randomness+0x19d/0x210 [ 8442.730812] [<ffffffff8737fa28>] net_rx_action+0x238/0x380 [ 8442.731566] [<ffffffff8749dddd>] __do_softirq+0x10d/0x298 [ 8442.732306] [<ffffffff86c88d93>] irq_exit+0xa3/0xb0 [ 8442.733035] [<ffffffff8749db24>] do_IRQ+0x54/0xd0 [ 8442.733744] [<ffffffff8749bc02>] common_interrupt+0x82/0x82 [ 8442.734437] <EOI> [ 8442.734449] [<ffffffff86c64236>] ? native_safe_halt+0x6/0x10 [ 8442.735776] [<ffffffff86c37e60>] default_idle+0x20/0xd0 [ 8442.736434] [<ffffffff86c385cf>] arch_cpu_idle+0xf/0x20 [ 8442.737067] [<ffffffff86cc77fa>] default_idle_call+0x2a/0x40 [ 8442.737682] [<ffffffff86cc7afc>] cpu_startup_entry+0x2ec/0x350 [ 8442.738283] [<ffffffff86c518a1>] start_secondary+0x151/0x190 [ 8442.738870] Code: 90 88 45 98 4c 89 55 a0 e8 4d f0 06 c7 0f b6 45 98 4c 8b 5d 90 4c 8b 55 a0 49 c7 85 60 01 00 00 00 00 00 00 89 c6 e9 91 f8 ff ff <0f> 0b 0f 0b 49 83 85 b8 01 00 00 01 49 c7 85 60 01 00 00 00 00 [ 8442.740689] RIP [<ffffffffc030e701>] vmxnet3_rq_rx_complete+0x8d1/0xeb0 [vmxnet3] [ 8442.741273] RSP <ffff93337fc83dc8> Catched same thing on Gentoo with latest "stable" kernel 4.9.6-r1, VMware 6.5 fully patched. Virtual machine with HW version 13 and VMXNET3 network card crashed with kernel panic immediately after i tried to ssh into them. Thank you for reporting it. I have been running kernel 4.9 VM, hwversion 13 on ESXi6.5 host for several hours, and exchanging traffic and yet have not managed to reproduce this. Any pointers on how I might be able to reproduce this? A repro will allow me to collect vmss file which will help in debugging this. If you hit the issue again, could you please generate vmss file when the VM is in the panic state? It can be obtained by suspending the VM in the panic state and then locating the vmss file in the director of the virtual machine on the ESX host. More details here: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2005831 Created attachment 255445 [details]
vmss file from VM at panic state
VMSS file in panic state from gentoo VM with 4.9.6 kernel.
panic appeared immediately after ssh login attempt
So it was repeatable on VM with setting Cores per socket = 4. VCPU = 4. After i changed Cores per socket to 1 or 2, kernel panic disappeared. Strange thing, that if i set Cores per socket to 4 again - kernel panic don't appeared... So i can't reproduce it with 100% for now. Sorry, a little mistake. Was reproducable with Cores per socket = 1, VCPU = 4. Not reproducable with Cores per socket = 2 or 4. (VCPU = 4) Thank you. Any pointer on how I could get vmlinux file (debug symbols) for this gentoo VM? Also, the vmss size is only 1.17Mb, without the vmliux, I could not access it, but it seems too small. Btw, I tried reproducing this again on Ubuntu VM and on a Photon VM running kernel 4.9 with 4 VCPUS (cores per socket = 1) by passing traffic as well as leaving ssh session running for several hours. Never hit it yet. Please let me know if there is anything else that I may try that may increase the chances of me hitting it locally. Created attachment 255521 [details] CPU and RAM usage after bug occurred Hello everyone I'm suffering from the same issue. It happens "randomly" (after some time after SSH usage) on in my case two different VMs but they are quite similar. I will tell you everything I did so maybe it will give you a bit of an idea how you can reproduce this issue. ESXi version: You are running HPE Customized Image ESXi 6.5.0 version 650.9.6.0.28 released on November 2016 and based on ESXi 6.5.0 Vmkernel Release Build 4564106. ESXi hardware: HP ProLiant MicroServer Gen8, 2 CPUs x Intel(R) Celeron(R) CPU G1610T @ 2.30GHz, 11.84 GB of RAM Guest OS: CentOS 4/5 or later (64-bit) [actually it is CentOS 7) Compatibility: ESXi 6.5 and later (VM version 13) [made from within ESXi 6.5 web client] VMware Tools: Yes [open-vm-tools] CPUs: 2 [2 cores per socket, 1 socket] Memory: 1 GB Network Adapter: VMXNET 3 CentOS 7 is installed with "minimal" software. I have changed kernel by utilizing this Ansible Playbook (I'm learning Ansible, still quite green) - https://github.com/komarEX/ansible/blob/master/kernel.yml Freeze occurs on kernel I have installed right now: Linux 4.10.3-1.el7.elrepo.x86_64 #1 SMP Wed Mar 15 14:45:27 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux When freeze occurs I can see high usage of one CPU on ESXi monitor (attachment) and I cannot do anything from ESXi console side (frozen screen). On attachment you can see actually 2 cases of bug - occurred, restart, occurred again, second restart. I can try to do whatever is needed to better pinpoint source of issue but I may need a little of guidance in this matter. (In reply to Remigiusz Szczepanik from comment #8) > Created attachment 255521 [details] > > I can try to do whatever is needed to better pinpoint source of issue but I > may need a little of guidance in this matter. Thank you! I will try to repro using the steps you mentioned. But what will really help us is below: Could you please generate vmss file when the VM is in the panic state? It can be obtained by suspending the VM in the panic state and then locating the vmss file in the director of the virtual machine on the ESX host. More details here: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2005831 if we have the VMSS file when the VM is in the panic state and if we have the debug symbols for the kernel you are using, we can analyze the crash dump. That will significantly improve our odds of root causing this! Thanks again, Shri Thank you for responding this issue, since we waited so long after this issue published to here, we had switched to SR-IOV and E1000E solutions and everything seemed work like a charms, so the issues should be only related to VMXNET3. I just deployed a experimental VM with following configuration, and if it freezes again, I will try to capture vmss file of the affected VM. 8 vCores 16 GB RAM H/W v13 VMXNET3 CentOS with kernel-ml 4.10 Workload : cPanel v63 edge (In reply to peter.hun from comment #10) > Thank you for responding this issue, since we waited so long after this > issue published to here, we had switched to SR-IOV and E1000E solutions and > everything seemed work like a charms, so the issues should be only related > to VMXNET3. > > I just deployed a experimental VM with following configuration, and if it > freezes again, I will try to capture vmss file of the affected VM. > > 8 vCores > 16 GB RAM > H/W v13 > VMXNET3 > CentOS with kernel-ml 4.10 > Workload : cPanel v63 edge Thanks a lot Peter. That will really help in root causing this. Btw, not sure whether the you originally hit the issue with 8 vcpus? Earlier updates in this thread suggest the issue was hit with 4 vcpus. Actually all of my three major production servers affected by this issue, one has dual Xeon E5-2620 v3 (that's 12C24T), one has dual Xeon E5-2683 v3 (that's 28C56T), and another has a Xeon E3-1231 v3 (that's 4C8T). Tested with VM with 8/12/24/28/56 vCPUs last year, but no luck. Created attachment 255533 [details]
vmss from panic state VM
I was working on these VMs for half a day and finally panic state happened.
I have downloaded .vmss, .nvram and .vmem files from suspended machine. I will give .vmem file only if it's really needed.
Additionally I can tell that state happened exactly at full hour so I believe that come kind of cron maybe helped here.
Created attachment 255535 [details]
vmss from panic state VM - just after reboot
After reboot it happened instantly again so I attached second .vmss (also have .nvram and .vmem files).
(In reply to Remigiusz Szczepanik from comment #14) > Created attachment 255535 [details] > vmss from panic state VM - just after reboot > > After reboot it happened instantly again so I attached second .vmss (also > have .nvram and .vmem files). Thank you Remigiusz. Could you please also share vmlinux file (debug symbols) for the kernel (is it CentOS 4.10.3)you are running? Using vmss and vmlinux, I should able to run gdb. (In reply to Shrikrishna Khare from comment #15) > Could you please also share vmlinux file (debug symbols) for the kernel (is > it CentOS 4.10.3)you are running? I don't have debug symbols installed (only vmlinuz files I guess?). But I believe you can get these from elrepo. I'm using "kernel-ml.x86_64 4.10.3-1.el7.elrepo". (In reply to Remigiusz Szczepanik from comment #16) > (In reply to Shrikrishna Khare from comment #15) > > Could you please also share vmlinux file (debug symbols) for the kernel (is > > it CentOS 4.10.3)you are running? > > > I don't have debug symbols installed (only vmlinuz files I guess?). > > But I believe you can get these from elrepo. I'm using "kernel-ml.x86_64 > 4.10.3-1.el7.elrepo". Think I will need vmem file as crash complained without it. Please see the steps I tried below. Btw, I could not find 4.10.3 in the elrepo, only 4.10.4. # search vmlinux, vmlinuz rpm2cpio kernel-ml-4.10.4-1.el7.elrepo.x86_64.rpm | cpio -idt | grep -e "vmlinux" -e "vmlinuz" ./boot/vmlinuz-4.10.4-1.el7.elrepo.x86_64 # i.e. no vmlinux file, but only vmlinuz # extract vmlinux from vmlinuz wget -O extract-vmlinux https://raw.githubusercontent.com/torvalds/linux/master/scripts/extract-vmlinux ./extract-vmlinux boot/vmlinuz-4.10.4-1.el7.elrepo.x86_64 > vmlinux crash vmlinux Hana-backup-92165ff7.vmss crash 7.1.5 Copyright (C) 2002-2016 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. vmw: Memory dump is not part of this vmss file. vmw: Try to locate the companion vmem file ... crash: vmw: Hana-backup-92165ff7.vmem: No such file or directory crash: Hana-backup-92165ff7.vmss: initialization failed I have extracted vmlinux from vmlinuz and I have used .vmem file. # crash vmlinux Hana-backup-92165ff7.vmss crash 7.1.4 Copyright (C) 2002-2015 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. vmw: Memory dump is not part of this vmss file. vmw: Try to locate the companion vmem file ... vmw: vmem file: Hana-backup-92165ff7.vmem crash: vmlinux: no .gnu_debuglink section crash: vmlinux: no debugging data available If I understand correctly elrepo does not give source or debuginfo packages - https://elrepo.org/bugs/view.php?id=684 I do have config file for that kernel. Can I use it to recompile kernel against mainline source and somehow compile debuginfo? I'm not very well versed in advance kernel compilation. Created attachment 255557 [details] crash dump from panic state So it took me a while to make crash work like I would like it to. I had to recompile kernel 4.10.3 using elrepo config file and I had to recompile crash to newest version as only 4.1.8 crash has fixes for bugs introduced in kernels 4.10+. I have used crash on second .vmss (panic after reboot). As I said before I would like to not disclose .vmem file (obvious reasons). If you need anything more please tell me what I need to type info crash. In attachment you have: # crash System.map-4.10.3-1.el7.elrepo.x86_64 ./kernel_rebuild/linux-4.10.3/vmlinux Hana-backup-92165ff7.vmss crash 7.1.8 Copyright (C) 2002-2016 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. vmw: Memory dump is not part of this vmss file. vmw: Try to locate the companion vmem file ... vmw: vmem file: Hana-backup-92165ff7.vmem GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu"... WARNING: kernels compiled by different gcc versions: ./kernel_rebuild/linux-4.10.3/vmlinux: 5.4.0 Hana-backup-92165ff7.vmss kernel: 4.8.5 SYSTEM MAP: System.map-4.10.3-1.el7.elrepo.x86_64 DEBUG KERNEL: ./kernel_rebuild/linux-4.10.3/vmlinux (4.10.3) DUMPFILE: Hana-backup-92165ff7.vmss CPUS: 2 [OFFLINE: 1] DATE: Sat Mar 25 16:18:48 2017 UPTIME: 00:00:14 LOAD AVERAGE: 0.29, 0.06, 0.02 TASKS: 167 NODENAME: hana RELEASE: 4.10.3-1.el7.elrepo.x86_64 VERSION: #1 SMP Wed Mar 15 14:45:27 EDT 2017 MACHINE: x86_64 (2294 Mhz) MEMORY: 1 GB PANIC: "kernel BUG at drivers/net/vmxnet3/vmxnet3_drv.c:1413!" PID: 0 COMMAND: "swapper/0" TASK: ffffffff81c10500 (1 of 2) [THREAD_INFO: ffffffff81c10500] CPU: 0 STATE: TASK_RUNNING WARNING: panic task not found crash> bt (...) crash> log (...) crash> ps @Shrikrishna Khare Can you confirm that it is indeed the same bug (just on older kernel)? https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1654319 (In reply to Remigiusz Szczepanik from comment #21) > @Shrikrishna Khare > > Can you confirm that it is indeed the same bug (just on older kernel)? > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1654319 Yes, it is the same issue, let me check that vmss. (In reply to Remigiusz Szczepanik from comment #20) > Created attachment 255557 [details] > crash dump from panic state > > So it took me a while to make crash work like I would like it to. > I had to recompile kernel 4.10.3 using elrepo config file and I had to > recompile crash to newest version as only 4.1.8 crash has fixes for bugs > introduced in kernels 4.10+. > > I have used crash on second .vmss (panic after reboot). Thanks a lot! Really appreciate all your efforts and help with this! > > As I said before I would like to not disclose .vmem file (obvious reasons). > If you need anything more please tell me what I need to type info crash. I understand, but that makes our debugging tricky, but let us give it a try. On the crash prompt: 1. bt # now with the right debug symbols, I take it that it shows the panic stack with vmxnet3_rq_rx_complete? 2. set print pretty on 3. info locals # should list local variables in vmxnet3_rq_rx_complete, does it list rcd, we would like to get values of all local variables as well as arguments to this function? In particular try below: 4. p *rq 5. p *rcd 6. p *rbi 7. p idx 8. p ring_idx 9. p *ring 10. p *rxd Basically let us get the state of vmxnet3 device when this hit the BUG_ON in the receive path, then I can check if anything is amiss. The relevant code: http://lxr.free-electrons.com/source/drivers/net/vmxnet3/vmxnet3_drv.c#L1257 Thanks! Shri > > In attachment you have: > > # crash System.map-4.10.3-1.el7.elrepo.x86_64 > ./kernel_rebuild/linux-4.10.3/vmlinux Hana-backup-92165ff7.vmss > > crash 7.1.8 > Copyright (C) 2002-2016 Red Hat, Inc. > Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation > Copyright (C) 1999-2006 Hewlett-Packard Co > Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited > Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. > Copyright (C) 2005, 2011 NEC Corporation > Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. > Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. > This program is free software, covered by the GNU General Public License, > and you are welcome to change it and/or distribute copies of it under > certain conditions. Enter "help copying" to see the conditions. > This program has absolutely no warranty. Enter "help warranty" for details. > > vmw: Memory dump is not part of this vmss file. > vmw: Try to locate the companion vmem file ... > vmw: vmem file: Hana-backup-92165ff7.vmem > > GNU gdb (GDB) 7.6 > Copyright (C) 2013 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "x86_64-unknown-linux-gnu"... > > WARNING: kernels compiled by different gcc versions: > ./kernel_rebuild/linux-4.10.3/vmlinux: 5.4.0 > Hana-backup-92165ff7.vmss kernel: 4.8.5 > > SYSTEM MAP: System.map-4.10.3-1.el7.elrepo.x86_64 > DEBUG KERNEL: ./kernel_rebuild/linux-4.10.3/vmlinux (4.10.3) > DUMPFILE: Hana-backup-92165ff7.vmss > CPUS: 2 [OFFLINE: 1] > DATE: Sat Mar 25 16:18:48 2017 > UPTIME: 00:00:14 > LOAD AVERAGE: 0.29, 0.06, 0.02 > TASKS: 167 > NODENAME: hana > RELEASE: 4.10.3-1.el7.elrepo.x86_64 > VERSION: #1 SMP Wed Mar 15 14:45:27 EDT 2017 > MACHINE: x86_64 (2294 Mhz) > MEMORY: 1 GB > PANIC: "kernel BUG at drivers/net/vmxnet3/vmxnet3_drv.c:1413!" > PID: 0 > COMMAND: "swapper/0" > TASK: ffffffff81c10500 (1 of 2) [THREAD_INFO: ffffffff81c10500] > CPU: 0 > STATE: TASK_RUNNING > WARNING: panic task not found > > crash> bt > (...) > crash> log > (...) > crash> ps > 1. bt # now with the right debug symbols, I take it that it shows the panic
> stack with vmxnet3_rq_rx_complete?
Apparently it doesn't. Moreover crash tells me:
PANIC: "kernel BUG at drivers/net/vmxnet3/vmxnet3_drv.c:1413!"
PID: 0
COMMAND: "swapper/0"
TASK: ffffffff81c10500 (1 of 2) [THREAD_INFO: ffffffff81c10500]
CPU: 0
STATE: TASK_RUNNING (ACTIVE)
WARNING: panic task not found
Last line. "panic task not found"
So if I run:
crash> bt
PID: 0 TASK: ffffffff81c10500 CPU: 0 COMMAND: "swapper/0"
#0 [ffffffff81c03d80] __schedule at ffffffff8177ed1c
#1 [ffffffff81c03db8] native_safe_halt at ffffffff81784006
#2 [ffffffff81c03de8] default_idle at ffffffff81783d4e
#3 [ffffffff81c03e08] arch_cpu_idle at ffffffff81037baf
#4 [ffffffff81c03e18] default_idle_call at ffffffff8178416c
#5 [ffffffff81c03e28] do_idle at ffffffff810cbef8
#6 [ffffffff81c03e60] cpu_startup_entry at ffffffff810cc231
#7 [ffffffff81c03e88] rest_init at ffffffff81776617
#8 [ffffffff81c03e98] start_kernel at ffffffff81da71f7
#9 [ffffffff81c03ee8] x86_64_start_reservations at ffffffff81da65d6
#10 [ffffffff81c03ef8] x86_64_start_kernel at ffffffff81da6724
crash> set -p
set: no panic task found!
crash> set -c 0
PID: 0
COMMAND: "swapper/0"
TASK: ffffffff81c10500 (1 of 2) [THREAD_INFO: ffffffff81c10500]
CPU: 0
STATE: TASK_RUNNING (ACTIVE)
crash> set -c 1
PID: 0
COMMAND: "swapper/1"
TASK: ffff88003fb94080 (1 of 2) [THREAD_INFO: ffff88003fb94080]
CPU: 1
STATE: TASK_RUNNING (ACTIVE)
crash> bt
PID: 0 TASK: ffff88003fb94080 CPU: 1 COMMAND: "swapper/1"
#0 [ffffc9000020fe20] __schedule at ffffffff8177ed1c
#1 [ffffc9000020fe58] native_safe_halt at ffffffff81784006
#2 [ffffc9000020fe88] default_idle at ffffffff81783d4e
#3 [ffffc9000020fea8] arch_cpu_idle at ffffffff81037baf
#4 [ffffc9000020feb8] default_idle_call at ffffffff8178416c
#5 [ffffc9000020fec8] do_idle at ffffffff810cbef8
#6 [ffffc9000020ff00] cpu_startup_entry at ffffffff810cc231
#7 [ffffc9000020ff28] start_secondary at ffffffff81052054
I have a lot of tasks but not the panic one:
crash> last
[66871679980576] [IN] PID: 572 TASK: ffff8800399cab00 CPU: 0 COMMAND: "vmtoolsd"
[66871612637175] [IN] PID: 6183 TASK: ffff88003b345600 CPU: 0 COMMAND: "smtp"
[66871581528376] [IN] PID: 6170 TASK: ffff88003b139580 CPU: 0 COMMAND: "kworker/0:2"
[66871224877535] [IN] PID: 938 TASK: ffff8800399bab00 CPU: 1 COMMAND: "tuned"
[66871213468448] [IN] PID: 6097 TASK: ffff88003a98d600 CPU: 1 COMMAND: "kworker/1:1"
[66870695509563] [IN] PID: 9 TASK: ffff88003fb8c080 CPU: 1 COMMAND: "rcuos/0"
[66870695472934] [IN] PID: 7 TASK: ffff88003fb89580 CPU: 1 COMMAND: "rcu_sched"
[66870388582379] [IN] PID: 21 TASK: ffff88003fbf1580 CPU: 1 COMMAND: "rcuos/1"
[66870377129420] [IN] PID: 18 TASK: ffff88003fbbc080 CPU: 1 COMMAND: "ksoftirqd/1"
[66870371282980] [IN] PID: 983 TASK: ffff88003b342b00 CPU: 1 COMMAND: "master"
[66870370621290] [IN] PID: 985 TASK: ffff88003a2dc080 CPU: 0 COMMAND: "qmgr"
[66870368604083] [IN] PID: 992 TASK: ffff88003b13d600 CPU: 1 COMMAND: "tlsmgr"
[66870348187337] [IN] PID: 882 TASK: ffff88003c7bab00 CPU: 0 COMMAND: "rs:main Q:Reg"
[66870348137285] [IN] PID: 881 TASK: ffff88003c7bd600 CPU: 0 COMMAND: "in:imjournal"
[66870348125355] [IN] PID: 6182 TASK: ffff88003b13ab00 CPU: 1 COMMAND: "local"
[66870346763493] [IN] PID: 6181 TASK: ffff8800361b4080 CPU: 0 COMMAND: "trivial-rewrite"
[66870346737411] [IN] PID: 6178 TASK: ffff8800398f5600 CPU: 0 COMMAND: "cleanup"
[66870346653737] [RU] PID: 415 TASK: ffff8800363ad600 CPU: 1 COMMAND: "systemd-journal"
[66870345260017] [IN] PID: 6111 TASK: ffff8800363a9580 CPU: 1 COMMAND: "kworker/1:3"
[66870326578232] [IN] PID: 4198 TASK: ffff88003c7b9580 CPU: 1 COMMAND: "kworker/1:0"
[66870323456819] [IN] PID: 602 TASK: ffff8800399bd600 CPU: 1 COMMAND: "NetworkManager"
[66870323286289] [IN] PID: 612 TASK: ffff88003b138000 CPU: 1 COMMAND: "gdbus"
[66870323128721] [IN] PID: 563 TASK: ffff88003cc54080 CPU: 1 COMMAND: "dbus-daemon"
[66870323011581] [IN] PID: 586 TASK: ffff88003be19580 CPU: 1 COMMAND: "gdbus"
[66870322884504] [IN] PID: 566 TASK: ffff88003cc51580 CPU: 1 COMMAND: "polkitd"
[66870322580600] [IN] PID: 567 TASK: ffff88003a90d600 CPU: 1 COMMAND: "systemd-logind"
[66870318690684] [IN] PID: 5713 TASK: ffff88003fb45600 CPU: 1 COMMAND: "pickup"
[66870314012253] [IN] PID: 1 TASK: ffff88003fb40000 CPU: 0 COMMAND: "systemd"
[66870312550634] [IN] PID: 6099 TASK: ffff880036930000 CPU: 0 COMMAND: "kworker/0:0"
[66870300261359] [IN] PID: 5664 TASK: ffff8800399c8000 CPU: 0 COMMAND: "kworker/u4:4"
[66870298754740] [IN] PID: 598 TASK: ffff8800399b9580 CPU: 0 COMMAND: "firewalld"
[66870298066092] [IN] PID: 579 TASK: ffff88003c7b8000 CPU: 0 COMMAND: "crond"
[66870297743047] [IN] PID: 544 TASK: ffff8800399bc080 CPU: 0 COMMAND: "auditd"
[66870297622980] [IN] PID: 40 TASK: ffff88003df7ab00 CPU: 0 COMMAND: "kauditd"
[66869466618701] [IN] PID: 677 TASK: ffff88003b340000 CPU: 0 COMMAND: "gmain"
[66868653505756] [IN] PID: 29 TASK: ffff88003dcd2b00 CPU: 1 COMMAND: "khugepaged"
[66868519433080] [IN] PID: 13 TASK: ffff88003fb92b00 CPU: 0 COMMAND: "watchdog/0"
[66868518464398] [IN] PID: 16 TASK: ffff88003fbb9580 CPU: 1 COMMAND: "watchdog/1"
[66868365687554] [IN] PID: 565 TASK: ffff88003cc50000 CPU: 1 COMMAND: "irqbalance"
[66850221360652] [IN] PID: 6 TASK: ffff88003fb88000 CPU: 0 COMMAND: "ksoftirqd/0"
[66849197364092] [IN] PID: 399 TASK: ffff88003a3c1580 CPU: 1 COMMAND: "kworker/1:1H"
[66848173339718] [IN] PID: 400 TASK: ffff88003a3c4080 CPU: 0 COMMAND: "kworker/0:1H"
[66844088824438] [IN] PID: 342 TASK: ffff8800398f2b00 CPU: 1 COMMAND: "btrfs-cleaner"
[66844088496858] [IN] PID: 343 TASK: ffff8800398f4080 CPU: 1 COMMAND: "btrfs-transacti"
[66843101155645] [IN] PID: 503 TASK: ffff880039c04080 CPU: 0 COMMAND: "btrfs-cleaner"
[66843101066941] [IN] PID: 504 TASK: ffff880039c05600 CPU: 1 COMMAND: "btrfs-transacti"
[66827185902700] [IN] PID: 5711 TASK: ffff8800399b8000 CPU: 0 COMMAND: "kworker/u4:5"
[66827181568996] [IN] PID: 6076 TASK: ffff880023660000 CPU: 1 COMMAND: "kworker/u4:1"
[66823251253622] [IN] PID: 4638 TASK: ffff88003a8e2b00 CPU: 0 COMMAND: "kworker/0:1"
[66823251246612] [IN] PID: 5547 TASK: ffff8800361b0000 CPU: 0 COMMAND: "kworker/0:4"
[66823251209329] [IN] PID: 2 TASK: ffff88003fb41580 CPU: 0 COMMAND: "kthreadd"
[66823236413494] [IN] PID: 879 TASK: ffff88003a8e5600 CPU: 1 COMMAND: "sshd"
[66818135390983] [IN] PID: 6075 TASK: ffff880023665600 CPU: 0 COMMAND: "kworker/u4:0"
[66817576151940] [IN] PID: 6167 TASK: ffff88003a8e1580 CPU: 0 COMMAND: "kworker/u4:3"
[66817573221273] [IN] PID: 6166 TASK: ffff88003fbbd600 CPU: 0 COMMAND: "kworker/u4:2"
[66799754004545] [IN] PID: 2339 TASK: ffff88003b13c080 CPU: 0 COMMAND: "sshd"
[66799753307264] [IN] PID: 2408 TASK: ffff8800399c9580 CPU: 1 COMMAND: "bash"
[66785052428665] [IN] PID: 11 TASK: ffff88003fb90000 CPU: 0 COMMAND: "migration/0"
[66784187730559] [IN] PID: 3255 TASK: ffff88003b344080 CPU: 1 COMMAND: "kworker/u5:1"
[66784184445159] [IN] PID: 553 TASK: ffff88003cc52b00 CPU: 1 COMMAND: "auditd"
[66781148806540] [IN] PID: 17 TASK: ffff88003fbbab00 CPU: 1 COMMAND: "migration/1"
[66780003032440] [IN] PID: 5601 TASK: ffff880023664080 CPU: 1 COMMAND: "kworker/1:2"
[66246988457847] [IN] PID: 576 TASK: ffff88003a908000 CPU: 1 COMMAND: "chronyd"
[64814608465097] [IN] PID: 877 TASK: ffff88003dcc4080 CPU: 1 COMMAND: "tuned"
[64058239054346] [IN] PID: 41 TASK: ffff88003df7c080 CPU: 0 COMMAND: "kswapd0"
[63973989336167] [IN] PID: 2439 TASK: ffff8800363a8000 CPU: 1 COMMAND: "sshd"
[63973988922623] [IN] PID: 2463 TASK: ffff88003a3c2b00 CPU: 1 COMMAND: "bash"
[48761462305916] [IN] PID: 323 TASK: ffff88003a2d9580 CPU: 1 COMMAND: "kworker/u5:0"
[47144992352413] [IN] PID: 2462 TASK: ffff88003a3c5600 CPU: 0 COMMAND: "su"
[47144973301719] [IN] PID: 2459 TASK: ffff88003c7bc080 CPU: 0 COMMAND: "sudo"
[47143593082031] [IN] PID: 2440 TASK: ffff88003fb42b00 CPU: 0 COMMAND: "bash"
[47141229692450] [IN] PID: 2436 TASK: ffff8800363aab00 CPU: 1 COMMAND: "sshd"
[46801273145488] [IN] PID: 2407 TASK: ffff8800399cd600 CPU: 1 COMMAND: "su"
[46801249654976] [IN] PID: 2404 TASK: ffff88003a3c0000 CPU: 1 COMMAND: "sudo"
[46799180716297] [IN] PID: 2340 TASK: ffff88003dcdab00 CPU: 0 COMMAND: "bash"
[45891490302616] [IN] PID: 2336 TASK: ffff88003dcc2b00 CPU: 1 COMMAND: "sshd"
[ 13202418030] [IN] PID: 939 TASK: ffff88003dcc5600 CPU: 1 COMMAND: "tuned"
[ 13182382728] [IN] PID: 937 TASK: ffff88003df79580 CPU: 1 COMMAND: "tuned"
[ 13166754457] [IN] PID: 936 TASK: ffff88003a8e4080 CPU: 0 COMMAND: "gmain"
[ 12728443201] [IN] PID: 874 TASK: ffff88003a90ab00 CPU: 0 COMMAND: "rsyslogd"
[ 9875184187] [IN] PID: 437 TASK: ffff8800363ac080 CPU: 1 COMMAND: "systemd-udevd"
[ 9543608280] [IN] PID: 593 TASK: ffff88003be1c080 CPU: 0 COMMAND: "agetty"
[ 5877264907] [IN] PID: 610 TASK: ffff88003a909580 CPU: 1 COMMAND: "gmain"
[ 4544673780] [IN] PID: 23 TASK: ffff88003fbf4080 CPU: 1 COMMAND: "kdevtmpfs"
[ 4535151979] [IN] PID: 596 TASK: ffff88003a989580 CPU: 1 COMMAND: "runaway-killer-"
[ 4534482665] [IN] PID: 594 TASK: ffff88003be1ab00 CPU: 0 COMMAND: "JS Sour~ Thread"
[ 4519813916] [IN] PID: 592 TASK: ffff88003be1d600 CPU: 1 COMMAND: "JS GC Helper"
[ 4491563091] [IN] PID: 584 TASK: ffff88003be18000 CPU: 0 COMMAND: "gmain"
[ 4329930894] [IN] PID: 564 TASK: ffff88003cc55600 CPU: 1 COMMAND: "dbus-daemon"
[ 4111166111] [IN] PID: 14 TASK: ffff88003fb95600 CPU: 0 COMMAND: "cpuhp/0"
[ 4102748854] [IN] PID: 15 TASK: ffff88003fbb8000 CPU: 1 COMMAND: "cpuhp/1"
[ 4023245737] [IN] PID: 502 TASK: ffff880039c02b00 CPU: 0 COMMAND: "btrfs-extent-re"
[ 4023205983] [IN] PID: 501 TASK: ffff880039c01580 CPU: 0 COMMAND: "btrfs-qgroup-re"
[ 4023166619] [IN] PID: 500 TASK: ffff880039c00000 CPU: 0 COMMAND: "btrfs-readahead"
[ 4022763948] [IN] PID: 499 TASK: ffff880039e7d600 CPU: 0 COMMAND: "btrfs-delayed-m"
[ 4022725694] [IN] PID: 498 TASK: ffff880039e7c080 CPU: 0 COMMAND: "btrfs-freespace"
[ 4022687960] [IN] PID: 497 TASK: ffff880039e7ab00 CPU: 0 COMMAND: "btrfs-endio-wri"
[ 4022649715] [IN] PID: 496 TASK: ffff880039e79580 CPU: 0 COMMAND: "btrfs-rmw"
[ 4022596237] [IN] PID: 495 TASK: ffff880039e78000 CPU: 0 COMMAND: "btrfs-endio-rep"
[ 4022553019] [IN] PID: 494 TASK: ffff880039d45600 CPU: 0 COMMAND: "btrfs-endio-rai"
[ 4022270333] [IN] PID: 493 TASK: ffff880039d44080 CPU: 0 COMMAND: "btrfs-endio-met"
[ 4022187789] [IN] PID: 492 TASK: ffff880039d42b00 CPU: 0 COMMAND: "btrfs-endio-met"
[ 4022152262] [IN] PID: 490 TASK: ffff880039d41580 CPU: 0 COMMAND: "btrfs-endio"
[ 4015271795] [IN] PID: 489 TASK: ffff880039d40000 CPU: 1 COMMAND: "btrfs-fixup"
[ 4009744449] [IN] PID: 487 TASK: ffff88003a98c080 CPU: 0 COMMAND: "btrfs-submit"
[ 4007793306] [IN] PID: 486 TASK: ffff88003a988000 CPU: 1 COMMAND: "btrfs-cache"
[ 4004354466] [IN] PID: 484 TASK: ffff88003bb8ab00 CPU: 0 COMMAND: "btrfs-flush_del"
[ 4000721584] [IN] PID: 483 TASK: ffff88003bb8d600 CPU: 1 COMMAND: "btrfs-delalloc"
[ 3998239904] [IN] PID: 482 TASK: ffff88003bb8c080 CPU: 0 COMMAND: "btrfs-worker-hi"
[ 3994395002] [IN] PID: 481 TASK: ffff88003bb89580 CPU: 0 COMMAND: "btrfs-worker"
[ 3876841558] [IN] PID: 463 TASK: ffff88003bb88000 CPU: 0 COMMAND: "nfit"
[ 3056246826] [IN] PID: 4 TASK: ffff88003fb44080 CPU: 0 COMMAND: "kworker/0:0H"
[ 3052889052] [IN] PID: 20 TASK: ffff88003fbf0000 CPU: 1 COMMAND: "kworker/1:0H"
[ 2612557318] [IN] PID: 341 TASK: ffff8800398f1580 CPU: 1 COMMAND: "btrfs-extent-re"
[ 2612263455] [IN] PID: 340 TASK: ffff8800398f0000 CPU: 1 COMMAND: "btrfs-qgroup-re"
[ 2611026509] [IN] PID: 339 TASK: ffff880039ecd600 CPU: 1 COMMAND: "btrfs-readahead"
[ 2610979062] [IN] PID: 338 TASK: ffff880039ecc080 CPU: 1 COMMAND: "btrfs-delayed-m"
[ 2610943773] [IN] PID: 337 TASK: ffff880039ecab00 CPU: 1 COMMAND: "btrfs-freespace"
[ 2610796167] [IN] PID: 336 TASK: ffff880039ec9580 CPU: 1 COMMAND: "btrfs-endio-wri"
[ 2610736280] [IN] PID: 335 TASK: ffff880039ec8000 CPU: 1 COMMAND: "btrfs-rmw"
[ 2610672683] [IN] PID: 334 TASK: ffff880039ee5600 CPU: 1 COMMAND: "btrfs-endio-rep"
[ 2610615219] [IN] PID: 333 TASK: ffff880039ee4080 CPU: 1 COMMAND: "btrfs-endio-rai"
[ 2610555397] [IN] PID: 332 TASK: ffff880039ee2b00 CPU: 1 COMMAND: "btrfs-endio-met"
[ 2610494628] [IN] PID: 331 TASK: ffff880039ee1580 CPU: 1 COMMAND: "btrfs-endio-met"
[ 2610434824] [IN] PID: 330 TASK: ffff880039ee0000 CPU: 1 COMMAND: "btrfs-endio"
[ 2610369422] [IN] PID: 329 TASK: ffff88003a911580 CPU: 1 COMMAND: "btrfs-fixup"
[ 2610309520] [IN] PID: 328 TASK: ffff88003a915600 CPU: 1 COMMAND: "btrfs-submit"
[ 2610244336] [IN] PID: 327 TASK: ffff8800361b5600 CPU: 1 COMMAND: "btrfs-cache"
[ 2610100243] [IN] PID: 326 TASK: ffff88003a8fab00 CPU: 1 COMMAND: "btrfs-flush_del"
[ 2610027400] [IN] PID: 325 TASK: ffff88003a98ab00 CPU: 1 COMMAND: "btrfs-delalloc"
[ 2609608030] [IN] PID: 324 TASK: ffff88003a2dd600 CPU: 1 COMMAND: "btrfs-worker-hi"
[ 2609443664] [IN] PID: 322 TASK: ffff88003a2d8000 CPU: 1 COMMAND: "btrfs-worker"
[ 2555519433] [IN] PID: 314 TASK: ffff88003a2dab00 CPU: 0 COMMAND: "bioset"
[ 2269801721] [IN] PID: 268 TASK: ffff88003a8f8000 CPU: 0 COMMAND: "scsi_eh_0"
[ 2257640722] [IN] PID: 271 TASK: ffff880036b5ab00 CPU: 0 COMMAND: "scsi_eh_2"
[ 2114105126] [IN] PID: 280 TASK: ffff880036931580 CPU: 0 COMMAND: "ttm_swap"
[ 2105308766] [IN] PID: 277 TASK: ffff8800361b2b00 CPU: 0 COMMAND: "bioset"
[ 2101715141] [IN] PID: 276 TASK: ffff8800361b1580 CPU: 1 COMMAND: "bioset"
[ 2095462148] [IN] PID: 274 TASK: ffff880036934080 CPU: 1 COMMAND: "vmw_pvscsi_wq_1"
[ 2090856254] [IN] PID: 273 TASK: ffff880036b58000 CPU: 0 COMMAND: "scsi_tmf_2"
[ 2090315361] [IN] PID: 272 TASK: ffff88003a8fc080 CPU: 1 COMMAND: "scsi_tmf_1"
[ 2086686421] [IN] PID: 270 TASK: ffff880036b5d600 CPU: 1 COMMAND: "scsi_eh_1"
[ 2085309360] [IN] PID: 269 TASK: ffff880036b5c080 CPU: 0 COMMAND: "scsi_tmf_0"
[ 2079346645] [IN] PID: 267 TASK: ffff88003a8fd600 CPU: 1 COMMAND: "ata_sff"
[ 1548966015] [IN] PID: 91 TASK: ffff88003a912b00 CPU: 1 COMMAND: "charger_manager"
[ 1509549094] [IN] PID: 78 TASK: ffff88003a910000 CPU: 1 COMMAND: "ipv6_addrconf"
[ 1473328109] [IN] PID: 76 TASK: ffff88003a9b0000 CPU: 0 COMMAND: "kaluad_sync"
[ 1473216616] [IN] PID: 75 TASK: ffff88003a9b1580 CPU: 0 COMMAND: "kaluad"
[ 1473136969] [IN] PID: 74 TASK: ffff88003a9b5600 CPU: 0 COMMAND: "kmpath_rdacd"
[ 1465862404] [IN] PID: 73 TASK: ffff88003a9b4080 CPU: 0 COMMAND: "acpi_thermal_pm"
[ 1424377931] [IN] PID: 72 TASK: ffff88003a9b2b00 CPU: 1 COMMAND: "kthrotld"
[ 1038363757] [IN] PID: 43 TASK: ffff88003a8e0000 CPU: 1 COMMAND: "bioset"
[ 1038299274] [IN] PID: 42 TASK: ffff88003df7d600 CPU: 1 COMMAND: "vmstat"
[ 518060696] [IN] PID: 38 TASK: ffff88003df78000 CPU: 1 COMMAND: "watchdogd"
[ 472319573] [IN] PID: 37 TASK: ffff88003dcdd600 CPU: 1 COMMAND: "devfreq_wq"
[ 472212721] [IN] PID: 36 TASK: ffff88003dcdc080 CPU: 1 COMMAND: "md"
[ 154943148] [IN] PID: 33 TASK: ffff88003dcd9580 CPU: 1 COMMAND: "kblockd"
[ 154781957] [IN] PID: 32 TASK: ffff88003dcd8000 CPU: 1 COMMAND: "bioset"
[ 154590679] [IN] PID: 31 TASK: ffff88003dcd5600 CPU: 1 COMMAND: "kintegrityd"
[ 154521031] [IN] PID: 30 TASK: ffff88003dcd4080 CPU: 1 COMMAND: "crypto"
[ 153833820] [IN] PID: 28 TASK: ffff88003dcc1580 CPU: 0 COMMAND: "ksmd"
[ 153653459] [IN] PID: 27 TASK: ffff88003dcd1580 CPU: 1 COMMAND: "kcompactd0"
[ 153524673] [IN] PID: 26 TASK: ffff88003dcd0000 CPU: 1 COMMAND: "writeback"
[ 152959346] [IN] PID: 25 TASK: ffff88003dcc0000 CPU: 0 COMMAND: "oom_reaper"
[ 144428302] [IN] PID: 24 TASK: ffff88003fbf5600 CPU: 1 COMMAND: "netns"
[ 140877813] [IN] PID: 22 TASK: ffff88003fbf2b00 CPU: 1 COMMAND: "rcuob/1"
[ 135819063] [IN] PID: 10 TASK: ffff88003fb8d600 CPU: 0 COMMAND: "rcuob/0"
[ 134951627] [IN] PID: 12 TASK: ffff88003fb91580 CPU: 0 COMMAND: "lru-add-drain"
[ 134141337] [IN] PID: 8 TASK: ffff88003fb8ab00 CPU: 0 COMMAND: "rcu_bh"
[ 0] [RU] PID: 0 TASK: ffffffff81c10500 CPU: 0 COMMAND: "swapper/0"
[ 0] [RU] PID: 0 TASK: ffff88003fb94080 CPU: 1 COMMAND: "swapper/1"
According to log the panic state happened in interrupt:
[66871.638531] ------------[ cut here ]------------
[66871.651583] kernel BUG at drivers/net/vmxnet3/vmxnet3_drv.c:1413!
[66871.651706] invalid opcode: 0000 [#1] SMP
[66871.651819] Modules linked in: ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter vmw_vsock_vmci_transport vsock ppdev coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cryptd vmw_balloon intel_rapl_perf pcspkr input_leds sg i2c_piix4 nfit vmw_vmci shpchp parport_pc parport acpi_cpufreq ip_tables btrfs xor raid6_pq ata_generic pata_acpi sd_mod crc32c_intel serio_raw vmwgfx drm_kms_helper syscopyarea vmxnet3 sysfillrect
[66871.652887] sysimgblt fb_sys_fops ttm drm vmw_pvscsi ata_piix libata fjes
[66871.653513] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.10.3-1.el7.elrepo.x86_64 #1
[66871.653663] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
[66871.653916] task: ffff88003fb94080 task.stack: ffffc9000020c000
[66871.654642] RIP: 0010:vmxnet3_rq_rx_complete+0xd35/0xdf0 [vmxnet3]
[66871.654791] RSP: 0018:ffff88003fd03e10 EFLAGS: 00010297
[66871.654894] RAX: 0000000000000040 RBX: ffff88003629d1e8 RCX: ffff88003d658700
[66871.655019] RDX: 0000000000000004 RSI: 0000000000000001 RDI: 0000000000000040
[66871.655126] RBP: ffff88003fd03e88 R08: 0000000000000030 R09: 0000000000000000
[66871.655246] R10: ffff88003b3cc3c0 R11: ffff88003629c900 R12: ffff88003c09f280
[66871.655358] R13: ffff88003b3560f0 R14: ffff88003629d100 R15: 0000000000000028
[66871.655482] FS: 0000000000000000(0000) GS:ffff88003fd00000(0000) knlGS:0000000000000000
[66871.655603] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[66871.655717] CR2: 000055bd3ee9db28 CR3: 000000003c8ed000 CR4: 00000000001406e0
[66871.656197] Call Trace:
[66871.657426] <IRQ>
[66871.660567] ? ktime_get+0x3c/0xb0
[66871.660709] vmxnet3_poll_rx_only+0x36/0xa0 [vmxnet3]
[66871.662274] net_rx_action+0x260/0x3c0
[66871.664457] __do_softirq+0xc9/0x28c
[66871.674621] irq_exit+0xd9/0xf0
[66871.675640] do_IRQ+0x51/0xd0
[66871.676849] common_interrupt+0x93/0x93
[66871.677796] RIP: 0010:native_safe_halt+0x6/0x10
[66871.678663] RSP: 0018:ffffc9000020fe80 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff1b
[66871.681640] RAX: 0000000000000000 RBX: ffff88003fb94080 RCX: 0000000000000000
[66871.682518] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[66871.683502] RBP: ffffc9000020fe80 R08: 00003cd1cbb71906 R09: 0000000000000000
[66871.684346] R10: 0000000000000204 R11: 0000000000000000 R12: 0000000000000001
[66871.685258] R13: ffff88003fb94080 R14: 0000000000000000 R15: 0000000000000000
[66871.686044] </IRQ>
[66871.686797] default_idle+0x1e/0xd0
[66871.688424] arch_cpu_idle+0xf/0x20
[66871.689185] default_idle_call+0x2c/0x40
[66871.690262] do_idle+0x158/0x200
[66871.690948] cpu_startup_entry+0x71/0x80
[66871.692150] start_secondary+0x154/0x190
[66871.693378] start_cpu+0x14/0x14
[66871.693999] Code: 78 2c 12 a0 48 c7 c7 90 43 12 a0 31 c0 44 89 4d b0 4c 89 5d b8 e8 ec d4 28 e1 4c 8b 5d b8 44 8b 4d b0 e9 8e f5 ff ff 0f 0b 0f 0b <0f> 0b 0f 0b 41 3b 96 78 01 00 00 0f 84 0a f4 ff ff 0f 0b 41 f6
[66871.696089] RIP: vmxnet3_rq_rx_complete+0xd35/0xdf0 [vmxnet3] RSP: ffff88003fd03e10
[66871.702943] ---[ end trace 07ef0fdac6ebe666 ]---
[66871.703653] Kernel panic - not syncing: Fatal exception in interrupt
[66871.708828] Kernel Offset: disabled
[66871.712014] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
Should I do "bt" on every task and look for vmxnet3_rq_rx_complete panic stack? I don't know if I find one when crash cannot find it itself.
So according to log panic state happened in task "ffff88003fb94080" which is PID 0 on CPU 1 meaning "swapper/1". I have checked "irq" and nothing I find interesting is there. But I can give you "task" of that task where panic occured: PID: 0 TASK: ffff88003fb94080 CPU: 1 COMMAND: "swapper/1" struct task_struct { thread_info = { flags = 8 }, state = 0, stack = 0xffffc9000020c000, usage = { counter = 2 }, flags = 2097218, ptrace = 0, wake_entry = { next = 0x0 }, on_cpu = 1, cpu = 1, wakee_flips = 13, wakee_flip_decay_ts = 4361538059, last_wakee = 0xffff8800363ad600, wake_cpu = 1, on_rq = 1, prio = 120, static_prio = 120, normal_prio = 120, rt_priority = 0, sched_class = 0xffffffff8181cf80 <idle_sched_class>, se = { load = { weight = 1048576, inv_weight = 4194304 }, run_node = { __rb_parent_color = 1, rb_right = 0x0, rb_left = 0x0 }, group_node = { next = 0xffff88003fb94128, prev = 0xffff88003fb94128 }, on_rq = 0, exec_start = 135180636, sum_exec_runtime = 0, vruntime = 0, prev_sum_exec_runtime = 0, nr_migrations = 0, statistics = { wait_start = 0, wait_max = 0, wait_count = 0, wait_sum = 0, iowait_count = 0, iowait_sum = 0, sleep_start = 0, sleep_max = 0, sum_sleep_runtime = 0, block_start = 0, block_max = 0, exec_max = 0, slice_max = 0, nr_migrations_cold = 0, nr_failed_migrations_affine = 0, nr_failed_migrations_running = 0, nr_failed_migrations_hot = 0, nr_forced_migrations = 0, nr_wakeups = 0, nr_wakeups_sync = 0, nr_wakeups_migrate = 0, nr_wakeups_local = 0, nr_wakeups_remote = 0, nr_wakeups_affine = 0, nr_wakeups_affine_attempts = 0, nr_wakeups_passive = 0, nr_wakeups_idle = 0 }, depth = 0, parent = 0x0, cfs_rq = 0xffff88003fd19530, my_q = 0x0, avg = { last_update_time = 0, load_sum = 48887808, util_sum = 0, period_contrib = 1023, load_avg = 1024, util_avg = 0 } }, rt = { run_list = { next = 0xffff88003fb942c0, prev = 0xffff88003fb942c0 }, timeout = 0, watchdog_stamp = 0, time_slice = 100, on_rq = 0, on_list = 0, back = 0x0, parent = 0x0, rt_rq = 0xffff88003fd19670, my_q = 0x0 }, sched_task_group = 0xffffffff82015b80 <root_task_group>, dl = { rb_node = { __rb_parent_color = 18446612133383324432, rb_right = 0x0, rb_left = 0x0 }, dl_runtime = 0, dl_deadline = 0, dl_period = 0, dl_bw = 0, runtime = 0, deadline = 0, flags = 0, dl_throttled = 0, dl_boosted = 0, dl_yielded = 0, dl_timer = { node = { node = { __rb_parent_color = 18446612133383324520, rb_right = 0x0, rb_left = 0x0 }, expires = 0 }, _softexpires = 0, function = 0xffffffff810ca3d0 <dl_task_timer>, base = 0xffff88003fc12600, state = 0 '\000', is_rel = 0 '\000', start_pid = -1, start_site = 0x0, start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000" } }, preempt_notifiers = { first = 0x0 }, btrace_seq = 0, policy = 0, nr_cpus_allowed = 1, cpus_allowed = { bits = {2, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615} }, sched_info = { pcount = 0, run_delay = 0, last_arrival = 0, last_queued = 0 }, tasks = { next = 0xffff88003fb41cf8, prev = 0xffffffff81c10c78 <init_task+1912> }, pushable_tasks = { prio = 140, prio_list = { next = 0xffff88003fb94810, prev = 0xffff88003fb94810 }, node_list = { next = 0xffff88003fb94820, prev = 0xffff88003fb94820 } }, pushable_dl_tasks = { __rb_parent_color = 18446612133383325744, rb_right = 0x0, rb_left = 0x0 }, mm = 0x0, active_mm = 0xffff880036057440, vmacache_seqnum = 0, vmacache = {0x0, 0x0, 0x0, 0x0}, rss_stat = { events = 0, count = {0, 0, 0, 0} }, exit_state = 0, exit_code = 0, exit_signal = 0, pdeath_signal = 0, jobctl = 0, personality = 0, sched_reset_on_fork = 0, sched_contributes_to_load = 1, sched_migrated = 0, sched_remote_wakeup = 0, in_execve = 0, in_iowait = 0, restore_sigmask = 0, memcg_may_oom = 0, memcg_kmem_skip_account = 0, atomic_flags = 0, restart_block = { fn = 0xffffffff810942a0 <do_no_restart_syscall>, { futex = { uaddr = 0x0, val = 0, flags = 0, bitset = 0, time = 0, uaddr2 = 0x0 }, nanosleep = { clockid = 0, rmtp = 0x0, compat_rmtp = 0x0, expires = 0 }, poll = { ufds = 0x0, nfds = 0, has_timeout = 0, tv_sec = 0, tv_nsec = 0 } } }, pid = 0, tgid = 0, stack_canary = 5645735404757978967, real_parent = 0xffff88003fb40000, parent = 0xffffffff81c10500 <init_task>, children = { next = 0xffff88003fb94918, prev = 0xffff88003fb94918 }, sibling = { next = 0xffff88003fb94928, prev = 0xffff88003fb94928 }, group_leader = 0xffff88003fb94080, ptraced = { next = 0xffff88003fb408c0, prev = 0xffff88003fb408c0 }, ptrace_entry = { next = 0xffff88003fb408d0, prev = 0xffff88003fb408d0 }, pids = {{ node = { next = 0x0, pprev = 0x0 }, pid = 0xffffffff81c56c00 <init_struct_pid> }, { node = { next = 0x0, pprev = 0x0 }, pid = 0xffffffff81c56c00 <init_struct_pid> }, { node = { next = 0x0, pprev = 0x0 }, pid = 0xffffffff81c56c00 <init_struct_pid> }}, thread_group = { next = 0xffff88003fb949a8, prev = 0xffff88003fb949a8 }, thread_node = { next = 0xffff88003fb53750, prev = 0xffff88003fb53750 }, vfork_done = 0x0, set_child_tid = 0x0, clear_child_tid = 0x0, utime = 0, stime = 2434000000, gtime = 0, prev_cputime = { utime = 0, stime = 0, lock = { raw_lock = { val = { counter = 0 } } } }, vtime_seqcount = { sequence = 6 }, vtime_snap = 4294667340, vtime_snap_whence = VTIME_SYS, tick_dep_mask = { counter = 0 }, nvcsw = 0, nivcsw = 1140396, start_time = 44000000, real_start_time = 44000000, min_flt = 0, maj_flt = 0, cputime_expires = { utime = 0, stime = 0, sum_exec_runtime = 0 }, cpu_timers = {{ next = 0xffff88003fb94a70, prev = 0xffff88003fb94a70 }, { next = 0xffff88003fb94a80, prev = 0xffff88003fb94a80 }, { next = 0xffff88003fb94a90, prev = 0xffff88003fb94a90 }}, ptracer_cred = 0x0, real_cred = 0xffff88003fa9e9c0, cred = 0xffff88003fa9e9c0, comm = "swapper/1\000\000\000\000\000\000", nameidata = 0x0, sysvsem = { undo_list = 0x0 }, sysvshm = { shm_clist = { next = 0xffff88003fb94ad8, prev = 0xffff88003fb94ad8 } }, fs = 0xffff88003fa973c0, files = 0xffff88003facc2c0, nsproxy = 0xffffffff81c56ea0 <init_nsproxy>, signal = 0xffff88003fb53740, sighand = 0xffff88003fb4eb40, blocked = { sig = {0} }, real_blocked = { sig = {0} }, saved_sigmask = { sig = {0} }, pending = { list = { next = 0xffff88003fb94b28, prev = 0xffff88003fb94b28 }, signal = { sig = {0} } }, sas_ss_sp = 0, sas_ss_size = 0, sas_ss_flags = 2, task_works = 0x0, audit_context = 0x0, loginuid = { val = 4294967295 }, sessionid = 4294967295, seccomp = { mode = 0, filter = 0x0 }, parent_exec_id = 0, self_exec_id = 0, alloc_lock = { { rlock = { raw_lock = { val = { counter = 0 } } } } }, pi_lock = { raw_lock = { val = { counter = 0 } } }, wake_q = { next = 0x0 }, pi_waiters = { rb_node = 0x0 }, pi_waiters_leftmost = 0x0, pi_blocked_on = 0x0, journal_info = 0x0, bio_list = 0x0, plug = 0x0, reclaim_state = 0x0, backing_dev_info = 0x0, io_context = 0x0, ptrace_message = 0, last_siginfo = 0x0, ioac = { rchar = 0, wchar = 0, syscr = 0, syscw = 0, read_bytes = 0, write_bytes = 0, cancelled_write_bytes = 0 }, acct_rss_mem1 = 0, acct_vm_mem1 = 0, acct_timexpd = 0, mems_allowed = { bits = {1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0} }, mems_allowed_seq = { sequence = 0 }, cpuset_mem_spread_rotor = -1, cpuset_slab_spread_rotor = -1, cgroups = 0xffffffff81c8cca0 <init_css_set>, cg_list = { next = 0xffff88003fb94cd8, prev = 0xffff88003fb94cd8 }, robust_list = 0x0, compat_robust_list = 0x0, pi_state_list = { next = 0xffff88003fb94cf8, prev = 0xffff88003fb94cf8 }, pi_state_cache = 0x0, perf_event_ctxp = {0x0, 0x0}, perf_event_mutex = { owner = { counter = 0 }, wait_lock = { { rlock = { raw_lock = { val = { counter = 0 } } } } }, osq = { tail = { counter = 0 } }, wait_list = { next = 0xffff88003fb94d30, prev = 0xffff88003fb94d30 } }, perf_event_list = { next = 0xffff88003fb94d40, prev = 0xffff88003fb94d40 }, mempolicy = 0xffff88003f88d000, il_next = 0, pref_node_fork = 0, numa_scan_seq = 0, numa_scan_period = 1000, numa_scan_period_max = 0, numa_preferred_nid = -1, numa_migrate_retry = 0, node_stamp = 0, last_task_numa_placement = 0, last_sum_exec_runtime = 0, numa_work = { next = 0xffff88003fb94d90, func = 0x0 }, numa_entry = { next = 0x0, prev = 0x0 }, numa_group = 0x0, numa_faults = 0x0, total_numa_faults = 0, numa_faults_locality = {0, 0, 0}, numa_pages_migrated = 0, tlb_ubc = { cpumask = { bits = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0} }, flush_required = false, writable = false }, rcu = { next = 0x0, func = 0x0 }, splice_pipe = 0x0, task_frag = { page = 0x0, offset = 0, size = 0 }, delays = 0xffff88003fa97380, nr_dirtied = 0, nr_dirtied_pause = 32, dirty_paused_when = 0, timer_slack_ns = 50000, default_timer_slack_ns = 50000, curr_ret_stack = -1, ret_stack = 0x0, ftrace_timestamp = 0, trace_overrun = { counter = 0 }, tracing_graph_pause = { counter = 0 }, trace = 0, trace_recursion = 0, memcg_in_oom = 0x0, memcg_oom_gfp_mask = 0, memcg_oom_order = 0, memcg_nr_pages_over_high = 0, utask = 0x0, sequential_io = 0, sequential_io_avg = 0, pagefault_disabled = 0, oom_reaper_list = 0x0, stack_vm_area = 0xffff88003fb99500, stack_refcount = { counter = 1 }, thread = { tls_array = {{ { { a = 0, b = 0 }, { limit0 = 0, base0 = 0, base1 = 0, type = 0, s = 0, dpl = 0, p = 0, limit = 0, avl = 0, l = 0, d = 0, g = 0, base2 = 0 } } }, { { { a = 0, b = 0 }, { limit0 = 0, base0 = 0, base1 = 0, type = 0, s = 0, dpl = 0, p = 0, limit = 0, avl = 0, l = 0, d = 0, g = 0, base2 = 0 } } }, { { { a = 0, b = 0 }, { limit0 = 0, base0 = 0, base1 = 0, type = 0, s = 0, dpl = 0, p = 0, limit = 0, avl = 0, l = 0, d = 0, g = 0, base2 = 0 } } }}, sp0 = 18446683600572186624, sp = 18446683600572186144, es = 0, ds = 0, fsindex = 0, gsindex = 0, status = 0, fsbase = 0, gsbase = 0, ptrace_bps = {0x0, 0x0, 0x0, 0x0}, debugreg6 = 0, ptrace_dr7 = 0, cr2 = 0, trap_nr = 6, error_code = 0, io_bitmap_ptr = 0x0, iopl = 0, io_bitmap_max = 0, addr_limit = { seg = 18446744073709551615 }, sig_on_uaccess_err = 0, uaccess_err = 0, fpu = { last_cpu = 4294967295, fpstate_active = 0 '\000', fpregs_active = 0 '\000', state = { fsave = { cwd = 0, swd = 0, twd = 0, fip = 0, fcs = 0, foo = 0, fos = 0, st_space = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, status = 0 }, fxsave = { cwd = 0, swd = 0, twd = 0, fop = 0, { { rip = 0, rdp = 0 }, { fip = 0, fcs = 0, foo = 0, fos = 0 } }, mxcsr = 0, mxcsr_mask = 0, st_space = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, xmm_space = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, padding = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, { padding1 = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, sw_reserved = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0} } }, soft = { cwd = 0, swd = 0, twd = 0, fip = 0, fcs = 0, foo = 0, fos = 0, st_space = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, ftop = 0 '\000', changed = 0 '\000', lookahead = 0 '\000', no_update = 0 '\000', rm = 0 '\000', alimit = 0 '\000', info = 0x0, entry_eip = 0 }, xsave = { i387 = { cwd = 0, swd = 0, twd = 0, fop = 0, { { rip = 0, rdp = 0 }, { fip = 0, fcs = 0, foo = 0, fos = 0 } }, mxcsr = 0, mxcsr_mask = 0, st_space = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, xmm_space = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, padding = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, { padding1 = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, sw_reserved = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0} } }, header = { xfeatures = 0, xcomp_bv = 0, reserved = {0, 0, 0, 0, 0, 0} }, extended_state_area = 0xffff88003fb95600 "" }, __padding = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"... } } } } struct thread_info { flags = 8 } (In reply to Remigiusz Szczepanik from comment #24) > Should I do "bt" on every task and look for vmxnet3_rq_rx_complete panic > stack? I don't know if I find one when crash cannot find it itself. try set logging on thread apply all bt i.e. redirect backtraces all threads to a file, and then grep for vmxnet3_rq_rx_complete. Once the thread is found, go back to the crash prompt: thread <thread number> bt # should show stack with vmxnet3_rq_rx_complete now run the commands to get rcd, rbi values etc. Commands you gave didn't work but I did same think in other way. Anyway there's no "vmxnet3" keyword anywhere in any task "bt". I think that's because it didn't exactly crash kernel but it introduced some kind of loop/freeze. (In reply to Remigiusz Szczepanik from comment #27) > Commands you gave didn't work but I did same think in other way. > > Anyway there's no "vmxnet3" keyword anywhere in any task "bt". > > I think that's because it didn't exactly crash kernel but it introduced some > kind of loop/freeze. I thought the vmss was created after hitting BUG_ON. Let us collect vmss when the VM is in BUG_ON state? (In reply to Shrikrishna Khare from comment #28) > (In reply to Remigiusz Szczepanik from comment #27) > > Commands you gave didn't work but I did same think in other way. > > > > Anyway there's no "vmxnet3" keyword anywhere in any task "bt". > > > > I think that's because it didn't exactly crash kernel but it introduced > some > > kind of loop/freeze. > > I thought the vmss was created after hitting BUG_ON. Let us collect vmss > when the VM is in BUG_ON state? Well according to logs it did hit BUG_ON but for some reason kernel was working after that that for some time (which I believe is why I don't have a trace for this crash). I may be mistaken though. If you tell me what I must do to get proper vmss then I will but right now I'm clueless. (In reply to Remigiusz Szczepanik from comment #29) > Well according to logs it did hit BUG_ON but for some reason kernel was > working after that that for some time (which I believe is why I don't have a > trace for this crash). > I may be mistaken though. > > If you tell me what I must do to get proper vmss then I will but right now > I'm clueless. That is strange. Let me think about this. In the meanwhile, may be we can try some other tricks to get to the bottom of this: 1. Lets narrow it down the patch that may have introduced this behavior. Given that you hit this issue relatively frequently (I haven't hit this issue even once :-(), may be you can try disabling one of the recently added features to vmxnet3 and let the VM run and see if the issue happens? ethtool -G eth0 rx-mini 0 # disable rx data ring ethtool -g eth0 # check if rx-mini is indeed 0 replace eth0 with your interface name. Whether or not the bug happens again gives us a clue. 2. Will it be possible for you to apply a vmxnet3 patch and run the patched driver? In that case I can provide with a debug patch, just let me know which kernel version I should generate the patch against and where I can find sources for that kernel, and I will share a patch. That debug patch will print out several of the state info (rcd, rxd, rbi etc.) before hitting the BUG_ON. Not as good as crash dump, but should help us make progress. Again, thank you for patiently helping with this! Thanks, Shri Created attachment 255579 [details] config-4.10.3-1.el7.elrepo.x86_64 The first option if "what if" and the second one is actually a lot better. I have attached config for kernel 4.10.3. You can use it on: https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.10.3.tar.xz After you recompile it please provide me with RPM packages for CentOS 7 and I will change my current running kernel with yours and try to "crash" it again. I don't know if it's helpful information but as this is network problem I guess I have to tell you a bit about my network. I believe that it essentially crashing on some kind of unexpected data in packets or frames. On the same ESXi I'm running PfSense (2.3.3-RELEASE-p1 (amd64)) and I have my 2 VPSes on LAN side of PfSense. It may or may not be a case when PfSense forwards somehow deformed packet to CentOSes? I don't know it's just a random idea. However if I somehow find a way to "crash" VM on my whim then I will try to collect all packets running from PfSense to that VM so we can see if any of them is anyhow deformed or unexpected. Created attachment 255585 [details]
debug-patch
(In reply to Remigiusz Szczepanik from comment #31) > Created attachment 255579 [details] > config-4.10.3-1.el7.elrepo.x86_64 > > The first option if "what if" and the second one is actually a lot better. > > I have attached config for kernel 4.10.3. > You can use it on: > https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.10.3.tar.xz > > After you recompile it please provide me with RPM packages for CentOS 7 Please find debug-patch in the attachments. It applies cleanly on 4.10.3 and will print several vmxnet3 data structures just before hitting the BUG_ON case. Would you be able to patch your vmxnet3 driver and try it out? Or else, I can try to create RPM package for CentOS. Did not get around to trying that today. and > I will change my current running kernel with yours and try to "crash" it > again. > > I don't know if it's helpful information but as this is network problem I > guess I have to tell you a bit about my network. > I believe that it essentially crashing on some kind of unexpected data in > packets or frames. > > On the same ESXi I'm running PfSense (2.3.3-RELEASE-p1 (amd64)) and I have > my 2 VPSes on LAN side of PfSense. It may or may not be a case when PfSense > forwards somehow deformed packet to CentOSes? I don't know it's just a > random idea. > > However if I somehow find a way to "crash" VM on my whim then I will try to > collect all packets running from PfSense to that VM so we can see if any of > them is anyhow deformed or unexpected. Yes, that will surely help. Thanks again! (In reply to Shrikrishna Khare from comment #33) > Or else, I can try to create RPM package for CentOS. Did not get around to > trying that today. Apparently I have very little time this week so please .RPM would be very very nice. (In reply to Remigiusz Szczepanik from comment #34) > (In reply to Shrikrishna Khare from comment #33) > > Or else, I can try to create RPM package for CentOS. Did not get around to > > trying that today. > > > Apparently I have very little time this week so please .RPM would be very > very nice. Ok. Will do. Just a tip. Freeze happens frequently (but not every time) when I mistype "yum" and run any "yum list whateverpacket" from under non-root user. When yum is on "Determining fastest mirrors" server freeze. Happened to me 2 times in row today but when I was prepared for the 3rd time (tcpdumps on) it didn't work. I built CentOS rpm with debug patch and just as I was about to share it, my CentOS VM got (un)lucky and I hit this issue, so could root cause it. The bug is in the vmxnet3 device emulation (ESX 6.5) and will be fixed in the next update. In the meantime, suggested workaround: - disable rx data ring: ethtool -G eth? rx-mini 0 The issue should not be hit if using HW version 12 or older (with any kernel) or with kernel older than 4.8 (any HW version). Remigiusz, thank you for all your help in debugging and root causing this and being very patient as we went through multiple iterations. Sure, no problem. Just don't forget about me when writing changelog :) Would it be fixed in next version of ESXi or you upstream patch to kernel driver? Any actual time of fix release? (In reply to Remigiusz Szczepanik from comment #38) > Sure, no problem. Just don't forget about me when writing changelog :) Certainly :-). The bug (and thus the fix) is in the device emulation though which part of the ESX sources and not the Linux kernel. > > Would it be fixed in next version of ESXi or you upstream patch to kernel > driver? > Any actual time of fix release? As I said above, it is not a driver bug, but an ESX bug. I don't know the timeline, I will try to find out and update. (In reply to Shrikrishna Khare from comment #37) > I built CentOS rpm with debug patch and just as I was about to share it, my > CentOS VM got (un)lucky and I hit this issue, so could root cause it. > > The bug is in the vmxnet3 device emulation (ESX 6.5) and will be fixed in > the next update. > > In the meantime, suggested workaround: > - disable rx data ring: ethtool -G eth? rx-mini 0 To be absolutely safe from hitting this bug, the above setting must be done before the interface had any opportunity to receive traffic. This may be not always be feasible. A safer alternative is to shutdown the VM and add below the vmx file and reboot: vmxnet3.rev.30 = FALSE > > The issue should not be hit if using HW version 12 or older (with any > kernel) or with kernel older than 4.8 (any HW version). > > Remigiusz, thank you for all your help in debugging and root causing this > and being very patient as we went through multiple iterations. is this issue fixed in ESXi 6.5d / ESXi-6.5.0-20170404001-standard (Build 5310538) ? (In reply to peter.hun from comment #41) > is this issue fixed in ESXi 6.5d / ESXi-6.5.0-20170404001-standard (Build > 5310538) ? No. That was cut before the fix. The next 6.5 update should have the fix. I just ran into this with Red Hat 6.9. A case was opened with Red Hat, and this is how I found this thread. I have a case open with VMware, and just awaiting a response. Has a fix been created yet? (In reply to Dan Wall from comment #43) > I just ran into this with Red Hat 6.9. A case was opened with Red Hat, and > this is how I found this thread. > > I have a case open with VMware, and just awaiting a response. > > Has a fix been created yet? What is the vmxnet3 driver version in RHEL 6.9? As mentioned above, the fix is part of ESXi, and will be part of next 6.5 update. I can consistently produce this bug on Debian stretch. The bug seems to occur ONLY when lro is enabled. I just read through all the comments and applied both vmxnet3.rev.30 = FALSE and ethtool -G eth? rx-mini 0. Both corrected the issue as well. Waiting patiently for the next ESXi update before going to a 4.9 kernel. (In reply to Shrikrishna Khare from comment #44) > (In reply to Dan Wall from comment #43) > > I just ran into this with Red Hat 6.9. A case was opened with Red Hat, and > > this is how I found this thread. > > > > I have a case open with VMware, and just awaiting a response. > > > > Has a fix been created yet? > > What is the vmxnet3 driver version in RHEL 6.9? > > As mentioned above, the fix is part of ESXi, and will be part of next 6.5 > update. Hi, I got the same issuse. I use ESXi6.5 ((Updated) Dell-ESXi-6.5.0-4564106-A00 (Dell)) on Dell PowerEdge R630 Create a machine running CentOS6.6 and then use 'yum update' upgrade to 6.9 (use 'uname -r' then print '2.6.32-696.3.2.el6.x86_64') Sometimes it freeze several minutes after power on, and sometimes freezes several hours from boot I check kdump and found 'PANIC: kernel BUG at drivers/net/vmxnet3/vmxnet3_drv.c:1412!' (In reply to Wang Jingkai from comment #47) > (In reply to Shrikrishna Khare from comment #44) > > (In reply to Dan Wall from comment #43) > > > I just ran into this with Red Hat 6.9. A case was opened with Red Hat, > and > > > this is how I found this thread. > > > > > > I have a case open with VMware, and just awaiting a response. > > > > > > Has a fix been created yet? > > > > What is the vmxnet3 driver version in RHEL 6.9? > > > > As mentioned above, the fix is part of ESXi, and will be part of next 6.5 > > update. > > Hi, I got the same issuse. > I use ESXi6.5 ((Updated) Dell-ESXi-6.5.0-4564106-A00 (Dell)) on Dell > PowerEdge R630 > Create a machine running CentOS6.6 and then use 'yum update' upgrade to 6.9 > (use 'uname -r' then print '2.6.32-696.3.2.el6.x86_64') > > Sometimes it freeze several minutes after power on, and sometimes freezes > several hours from boot > > I check kdump and found > 'PANIC: kernel BUG at drivers/net/vmxnet3/vmxnet3_drv.c:1412!' Build 4564106 does not have the fix. The fix will be available in 6.5 update release (not released yet). In the meantime, please use workaround mentioned above. Confirmed update release has fixed the issue. Nice job! (In reply to miniflowtrader from comment #49) > Confirmed update release has fixed the issue. Nice job! Thank you for verifying this! I think it is OK to close this bug. Thanks, Shri I'm running build 4887370 and am still experiencing this bug; is there a newer build I'm missing? Thanks so much for the work on this in the meantime guys. (In reply to Ryan Breaker from comment #51) > I'm running build 4887370 and am still experiencing this bug; is there a > newer build I'm missing? > > Thanks so much for the work on this in the meantime guys. You want to be on 5969303. Released 2017-07-27. (In reply to miniflowtrader from comment #52) > (In reply to Ryan Breaker from comment #51) > > I'm running build 4887370 and am still experiencing this bug; is there a > > newer build I'm missing? > > > > Thanks so much for the work on this in the meantime guys. > > You want to be on 5969303. Released 2017-07-27. Oh got it, I'm still new to VMWare and am looking into updating now, thank you! I'm running the HPE custom version of 5969303, HPE-ESXi-6.5.0-Update1-iso-650.U1.10.1.0.14 and I am still experiencing this issue with Debian Stretch. Is it possible that HPE remove this fix? On the latest VMware ESXi 6.5.0 build-7526125 this kernel message is still printed as of kernel 4.9.75. Haven't seen an actual crash yet. I believe this issues is resolved in vSphere/ESXi 6.5 U1, see: https://kb.vmware.com/s/article/2151480 This bug (the freeze issue) should be closed since it is not a kernel bug. About the warning, you may want to refer to https://bugzilla.kernel.org/show_bug.cgi?id=194569 Since this is not an upstream kernel bug. Closing |
Created attachment 248601 [details] Kernel Panic log This issue affects all virtual machines running on ESXi 6.5 host (with virtual hardware version 13), the guest will freeze randomly (sometimes several minutes after power on, and sometimes freezes several hours from boot). I got this kernel panic log several times, possibly this issue was caused by VMXNET3. All kernel newer than 4.8.x are affected with this issue, if I downgrade the kernel version back to 4.4.x, the VMs will work like a charm. (Guest OS is CentOS 7.3 with kernel-ml) And this issue doesn't happen while virtual hardware version 11 with ESXi 6.5, only happen on virtual hardware version 13 + ESXi 6.5.