Created attachment 258289 [details] kernel panic screenshoot Hardware: Intel E5-2670 Network controllers 2x 10G Intel 82599 1x 100G Mellanox Connectx-5 Test scenario: Packet generator -> (RX traffix on Mellanox 100G / 12Mpps UDP random destination ip / random port )Tested HOST (TX traffix from Intel 82599 on 20 vlans attached to ethernet) -> Sink Kernal panic occured after 5 minutes from start to send traffic by packet generator.
Aditional errors from dmesg [ 104.028291] ixgbe 0000:06:00.0 enp6s0f0: Detected Tx Unit Hang Tx Queue <1> TDH, TDT <d7>, <bf> next_to_use <bf> next_to_clean <d7> tx_buffer_info[next_to_clean] time_stamp <ffff3bf3> jiffies <ffff40a0> [ 104.028293] ixgbe 0000:06:00.0 enp6s0f0: tx hang 1 detected on queue 1, resetting adapter [ 115.676061] ------------[ cut here ]------------ [ 115.676067] WARNING: CPU: 21 PID: 116 at net/sched/sch_generic.c:320 dev_watchdog+0xc5/0x122 [ 115.676067] Modules linked in: ipmi_si x86_pkg_temp_thermal [ 115.676072] CPU: 21 PID: 116 Comm: ksoftirqd/21 Not tainted 4.13.0+ #3 [ 115.676073] task: ffff88046d40e800 task.stack: ffffc90003640000 [ 115.676074] RIP: 0010:dev_watchdog+0xc5/0x122 [ 115.676075] RSP: 0018:ffffc90003643da0 EFLAGS: 00010286 [ 115.676076] RAX: 000000000000003e RBX: ffff88086d010000 RCX: 0000000000000000 [ 115.676077] RDX: ffff88046fd53310 RSI: ffff88046fd4ca98 RDI: ffff88046fd4ca98 [ 115.676078] RBP: ffffc90003643db0 R08: 0000000000000000 R09: 0000000000000000 [ 115.676079] R10: ffffc90003643e38 R11: ffffffff81e27ddc R12: 0000000000000015 [ 115.676079] R13: ffffffff8164fdfa R14: ffff88086d010438 R15: ffff88086d010000 [ 115.676081] FS: 0000000000000000(0000) GS:ffff88046fd40000(0000) knlGS:0000000000000000 [ 115.676081] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 115.676082] CR2: 00007f225204e8c8 CR3: 000000086a6ff004 CR4: 00000000000606e0 [ 115.676083] Call Trace: [ 115.676087] call_timer_fn+0x5b/0x123 [ 115.676089] run_timer_softirq+0x13b/0x15e [ 115.676091] ? pick_next_task_fair+0x1ad/0x2f6 [ 115.676095] __do_softirq+0xe4/0x23a [ 115.676097] ? sort_range+0x1d/0x1d [ 115.676101] run_ksoftirqd+0x15/0x2a [ 115.676102] smpboot_thread_fn+0x128/0x13f [ 115.676104] kthread+0xf7/0xfc [ 115.676106] ? __init_completion+0x24/0x24 [ 115.676109] ret_from_fork+0x22/0x30 [ 115.676109] Code: 0f a8 69 00 00 75 38 48 89 df c6 05 03 a8 69 00 01 e8 cb f5 fd ff 44 89 e1 48 89 de 48 c7 c7 0a 9e ae 81 48 89 c2 e8 7a 16 a2 ff <0f> ff eb 10 41 ff c4 48 05 40 01 00 00 41 39 f4 75 9a eb 0d 48 [ 115.676128] ---[ end trace 4d5f38095306e2dd ]--- [ 115.676131] ixgbe 0000:06:00.0 enp6s0f0: initiating reset due to tx timeout [ 115.676212] ixgbe 0000:06:00.0 enp6s0f0: Reset adapter [ 116.068428] ixgbe 0000:06:00.0 enp6s0f0: detected SFP+: 3 [ 116.217023] ixgbe 0000:06:00.0 enp6s0f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX [ 120.444122] ixgbe 0000:06:00.0 enp6s0f0: Detected Tx Unit Hang Tx Queue <20> TDH, TDT <d7>, <39> next_to_use <39> next_to_clean <d7> tx_buffer_info[next_to_clean] time_stamp <ffff4ca7> jiffies <ffff50a8> [ 120.444127] ixgbe 0000:06:00.0 enp6s0f0: tx hang 5 detected on queue 20, resetting adapter [ 120.444128] ixgbe 0000:06:00.0 enp6s0f0: initiating reset due to tx timeout [ 120.444135] ixgbe 0000:06:00.0 enp6s0f0: Reset adapter [ 120.848348] ixgbe 0000:06:00.0 enp6s0f0: detected SFP+: 3 [ 121.001485] ixgbe 0000:06:00.0 enp6s0f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX [ 2929.465026] ------------[ cut here ]------------ [ 2929.465035] WARNING: CPU: 5 PID: 35 at kernel/rcu/tree.c:2715 rcu_process_callbacks+0x327/0x36f [ 2929.465035] Modules linked in: ipmi_si x86_pkg_temp_thermal [ 2929.465040] CPU: 5 PID: 35 Comm: ksoftirqd/5 Tainted: G W 4.13.0+ #3 [ 2929.465042] task: ffff88046dadb400 task.stack: ffffc900033b8000 [ 2929.465044] RIP: 0010:rcu_process_callbacks+0x327/0x36f [ 2929.465045] RSP: 0018:ffffc900033bbe00 EFLAGS: 00010002 [ 2929.465047] RAX: 0000000000000246 RBX: ffff88046fb59840 RCX: ffffffffffffd801 [ 2929.465048] RDX: 0000000000000000 RSI: ffffc900033bbe18 RDI: ffff88046fb59878 [ 2929.465049] RBP: ffffc900033bbe60 R08: 0000000000000001 R09: ffffffff81633d20 [ 2929.465050] R10: ffffc900033bbda8 R11: ffffea0011852c80 R12: ffffffff81c41b00 [ 2929.465051] R13: ffffc900033bbe18 R14: ffff88046fb59878 R15: ffff88046dadb400 [ 2929.465053] FS: 0000000000000000(0000) GS:ffff88046fb40000(0000) knlGS:0000000000000000 [ 2929.465054] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2929.465055] CR2: 00007fca94a07349 CR3: 000000086a6ff000 CR4: 00000000000606e0 [ 2929.465056] Call Trace: [ 2929.465060] ? net_rx_action+0x21e/0x22d [ 2929.465064] __do_softirq+0xe4/0x23a [ 2929.465066] ? sort_range+0x1d/0x1d [ 2929.465070] run_ksoftirqd+0x15/0x2a [ 2929.465071] smpboot_thread_fn+0x128/0x13f [ 2929.465074] kthread+0xf7/0xfc [ 2929.465075] ? __init_completion+0x24/0x24 [ 2929.465078] ret_from_fork+0x22/0x30 [ 2929.465079] Code: 8b 8b 90 00 00 00 48 2b 0d b7 54 bc 00 48 39 ca 7d 07 48 89 93 90 00 00 00 48 83 7b 38 00 0f 94 c1 48 85 d2 0f 94 c2 38 d1 74 02 <0f> ff 50 9d 4c 89 f7 e8 7e 22 00 00 84 c0 74 09 e8 06 f1 ff ff [ 2929.465107] ---[ end trace 4d5f38095306e2de ]--- [ 2929.509019] ixgbe 0000:06:00.0 enp6s0f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX [ 2929.512261] BUG: Bad page state in process ksoftirqd/21 pfn:45c020 [ 2929.512264] page:ffffea0011700800 count:-1 mapcount:0 mapping: (null) index:0x0 [ 2929.512947] flags: 0x2000000000000000() [ 2929.513202] raw: 2000000000000000 0000000000000000 0000000000000000 ffffffffffffffff [ 2929.513204] raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000 [ 2929.513205] page dumped because: nonzero _count [ 2929.513205] Modules linked in: ipmi_si x86_pkg_temp_thermal [ 2929.513210] CPU: 21 PID: 116 Comm: ksoftirqd/21 Tainted: G W 4.13.0+ #3 [ 2929.513210] Call Trace: [ 2929.513215] dump_stack+0x4d/0x63 [ 2929.513218] bad_page+0xf3/0x10f [ 2929.513220] check_new_page_bad+0x73/0x75 [ 2929.513222] get_page_from_freelist+0x419/0x63f [ 2929.513225] __alloc_pages_nodemask+0xef/0x182 [ 2929.513227] page_frag_alloc+0x38/0x10c [ 2929.513230] __napi_alloc_skb+0x6a/0xbd [ 2929.513234] mlx5e_handle_rx_cqe_mpwrq+0x91/0x388 [ 2929.513236] mlx5e_poll_rx_cq+0x113/0x171 [ 2929.513238] mlx5e_napi_poll+0x81/0x149 [ 2929.513240] net_rx_action+0xd3/0x22d [ 2929.513242] __do_softirq+0xe4/0x23a [ 2929.513244] ? sort_range+0x1d/0x1d [ 2929.513247] run_ksoftirqd+0x15/0x2a [ 2929.513248] smpboot_thread_fn+0x128/0x13f [ 2929.513250] kthread+0xf7/0xfc [ 2929.513252] ? __init_completion+0x24/0x24 [ 2929.513254] ret_from_fork+0x22/0x30 [ 2929.513255] Disabling lock debugging due to kernel taint [ 2929.514165] BUG: Bad page state in process ksoftirqd/16 pfn:45d020 [ 2929.514168] page:ffffea0011740800 count:-1 mapcount:0 mapping: (null) index:0x0 [ 2929.514640] flags: 0x2000000000000000() [ 2929.514898] raw: 2000000000000000 0000000000000000 0000000000000000 ffffffffffffffff [ 2929.514900] raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000 [ 2929.514901] page dumped because: nonzero _count [ 2929.514901] Modules linked in: ipmi_si x86_pkg_temp_thermal [ 2929.514905] CPU: 16 PID: 91 Comm: ksoftirqd/16 Tainted: G B W 4.13.0+ #3 [ 2929.514906] Call Trace: [ 2929.514910] dump_stack+0x4d/0x63 [ 2929.514913] bad_page+0xf3/0x10f [ 2929.514915] check_new_page_bad+0x73/0x75 [ 2929.514917] get_page_from_freelist+0x419/0x63f [ 2929.514919] __alloc_pages_nodemask+0xef/0x182 [ 2929.514922] page_frag_alloc+0x38/0x10c [ 2929.514924] __napi_alloc_skb+0x6a/0xbd [ 2929.514927] mlx5e_handle_rx_cqe_mpwrq+0x91/0x388 [ 2929.514930] mlx5e_poll_rx_cq+0x113/0x171 [ 2929.514932] mlx5e_napi_poll+0x81/0x149 [ 2929.514934] net_rx_action+0xd3/0x22d [ 2929.514936] __do_softirq+0xe4/0x23a [ 2929.514938] ? sort_range+0x1d/0x1d [ 2929.514941] run_ksoftirqd+0x15/0x2a [ 2929.514942] smpboot_thread_fn+0x128/0x13f [ 2929.514944] kthread+0xf7/0xfc [ 2929.514946] ? __init_completion+0x24/0x24 [ 2929.514948] ret_from_fork+0x22/0x30 [ 2929.517505] BUG: Bad page state in process ksoftirqd/17 pfn:46a320 [ 2929.517508] page:ffffea0011a8c800 count:-1 mapcount:0 mapping: (null) index:0x0 [ 2929.517985] flags: 0x2000000000000000() [ 2929.518247] raw: 2000000000000000 0000000000000000 0000000000000000 ffffffffffffffff [ 2929.518248] raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000 [ 2929.518249] page dumped because: nonzero _count [ 2929.518250] Modules linked in: ipmi_si x86_pkg_temp_thermal [ 2929.518254] CPU: 17 PID: 96 Comm: ksoftirqd/17 Tainted: G B W 4.13.0+ #3 [ 2929.518255] Call Trace: [ 2929.518258] dump_stack+0x4d/0x63 [ 2929.518261] bad_page+0xf3/0x10f [ 2929.518263] check_new_page_bad+0x73/0x75 [ 2929.518265] get_page_from_freelist+0x419/0x63f [ 2929.518267] __alloc_pages_nodemask+0xef/0x182 [ 2929.518269] page_frag_alloc+0x38/0x10c [ 2929.518272] __napi_alloc_skb+0x6a/0xbd [ 2929.518274] mlx5e_handle_rx_cqe_mpwrq+0x91/0x388 [ 2929.518276] ? skb_release_all+0x1f/0x22 [ 2929.518278] mlx5e_poll_rx_cq+0x113/0x171 [ 2929.518280] mlx5e_napi_poll+0x81/0x149 [ 2929.518282] net_rx_action+0xd3/0x22d [ 2929.518284] __do_softirq+0xe4/0x23a [ 2929.518286] ? sort_range+0x1d/0x1d [ 2929.518289] run_ksoftirqd+0x15/0x2a [ 2929.518290] smpboot_thread_fn+0x128/0x13f [ 2929.518292] kthread+0xf7/0xfc [ 2929.518293] ? __init_completion+0x24/0x24 [ 2929.518295] ret_from_fork+0x22/0x30 [ 2931.740134] ixgbe 0000:06:00.0 enp6s0f0: Detected Tx Unit Hang Tx Queue <27> TDH, TDT <167>, <153> next_to_use <153> next_to_clean <167> tx_buffer_info[next_to_clean] time_stamp <1000a07f2> jiffies <1000a0a10>
Created attachment 258299 [details] Another panic Same kernel config Same test Just upgraded ( 11.09.2017 9:30 )kernel to the latest net-next from git: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git