Bug 196889 - kernel panic net_rx_action+0x21e/0x22d
Summary: kernel panic net_rx_action+0x21e/0x22d
Status: NEW
Alias: None
Product: Networking
Classification: Unclassified
Component: Other (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-09-10 12:21 UTC by Pawel Staszewski
Modified: 2017-09-11 11:42 UTC (History)
0 users

See Also:
Kernel Version: 4.13.0+
Subsystem:
Regression: No
Bisected commit-id:


Attachments
kernel panic screenshoot (374.84 KB, image/jpeg)
2017-09-10 12:21 UTC, Pawel Staszewski
Details
Another panic (1.59 MB, image/png)
2017-09-11 11:42 UTC, Pawel Staszewski
Details

Description Pawel Staszewski 2017-09-10 12:21:13 UTC
Created attachment 258289 [details]
kernel panic screenshoot

Hardware:
Intel E5-2670
Network controllers
2x 10G Intel 82599
1x 100G Mellanox Connectx-5

Test scenario:
Packet generator -> (RX traffix on Mellanox 100G / 12Mpps UDP random destination ip / random port )Tested HOST (TX traffix from Intel 82599 on 20 vlans attached to ethernet) -> Sink



Kernal panic occured after 5 minutes from start to send traffic by packet generator.
Comment 1 Pawel Staszewski 2017-09-10 13:05:16 UTC
Aditional errors from dmesg

[  104.028291] ixgbe 0000:06:00.0 enp6s0f0: Detected Tx Unit Hang
                 Tx Queue             <1>
                 TDH, TDT             <d7>, <bf>
                 next_to_use          <bf>
                 next_to_clean        <d7>
               tx_buffer_info[next_to_clean]
                 time_stamp           <ffff3bf3>
                 jiffies              <ffff40a0>
[  104.028293] ixgbe 0000:06:00.0 enp6s0f0: tx hang 1 detected on queue 1, resetting adapter

[  115.676061] ------------[ cut here ]------------
[  115.676067] WARNING: CPU: 21 PID: 116 at net/sched/sch_generic.c:320 dev_watchdog+0xc5/0x122
[  115.676067] Modules linked in: ipmi_si x86_pkg_temp_thermal
[  115.676072] CPU: 21 PID: 116 Comm: ksoftirqd/21 Not tainted 4.13.0+ #3
[  115.676073] task: ffff88046d40e800 task.stack: ffffc90003640000
[  115.676074] RIP: 0010:dev_watchdog+0xc5/0x122
[  115.676075] RSP: 0018:ffffc90003643da0 EFLAGS: 00010286
[  115.676076] RAX: 000000000000003e RBX: ffff88086d010000 RCX: 0000000000000000
[  115.676077] RDX: ffff88046fd53310 RSI: ffff88046fd4ca98 RDI: ffff88046fd4ca98
[  115.676078] RBP: ffffc90003643db0 R08: 0000000000000000 R09: 0000000000000000
[  115.676079] R10: ffffc90003643e38 R11: ffffffff81e27ddc R12: 0000000000000015
[  115.676079] R13: ffffffff8164fdfa R14: ffff88086d010438 R15: ffff88086d010000
[  115.676081] FS:  0000000000000000(0000) GS:ffff88046fd40000(0000) knlGS:0000000000000000
[  115.676081] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  115.676082] CR2: 00007f225204e8c8 CR3: 000000086a6ff004 CR4: 00000000000606e0
[  115.676083] Call Trace:
[  115.676087]  call_timer_fn+0x5b/0x123
[  115.676089]  run_timer_softirq+0x13b/0x15e
[  115.676091]  ? pick_next_task_fair+0x1ad/0x2f6
[  115.676095]  __do_softirq+0xe4/0x23a
[  115.676097]  ? sort_range+0x1d/0x1d
[  115.676101]  run_ksoftirqd+0x15/0x2a
[  115.676102]  smpboot_thread_fn+0x128/0x13f
[  115.676104]  kthread+0xf7/0xfc
[  115.676106]  ? __init_completion+0x24/0x24
[  115.676109]  ret_from_fork+0x22/0x30
[  115.676109] Code: 0f a8 69 00 00 75 38 48 89 df c6 05 03 a8 69 00 01 e8 cb f5 fd ff 44 89 e1 48 89 de 48 c7 c7 0a 9e ae 81 48 89 c2 e8 7a 16 a2 ff <0f> ff eb 10 41 ff c4 48 05 40 01 00 00 41 39 f4 75 9a eb 0d 48
[  115.676128] ---[ end trace 4d5f38095306e2dd ]---
[  115.676131] ixgbe 0000:06:00.0 enp6s0f0: initiating reset due to tx timeout
[  115.676212] ixgbe 0000:06:00.0 enp6s0f0: Reset adapter
[  116.068428] ixgbe 0000:06:00.0 enp6s0f0: detected SFP+: 3
[  116.217023] ixgbe 0000:06:00.0 enp6s0f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
[  120.444122] ixgbe 0000:06:00.0 enp6s0f0: Detected Tx Unit Hang
                 Tx Queue             <20>
                 TDH, TDT             <d7>, <39>
                 next_to_use          <39>
                 next_to_clean        <d7>
               tx_buffer_info[next_to_clean]
                 time_stamp           <ffff4ca7>
                 jiffies              <ffff50a8>
[  120.444127] ixgbe 0000:06:00.0 enp6s0f0: tx hang 5 detected on queue 20, resetting adapter
[  120.444128] ixgbe 0000:06:00.0 enp6s0f0: initiating reset due to tx timeout
[  120.444135] ixgbe 0000:06:00.0 enp6s0f0: Reset adapter
[  120.848348] ixgbe 0000:06:00.0 enp6s0f0: detected SFP+: 3
[  121.001485] ixgbe 0000:06:00.0 enp6s0f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX


[ 2929.465026] ------------[ cut here ]------------
[ 2929.465035] WARNING: CPU: 5 PID: 35 at kernel/rcu/tree.c:2715 rcu_process_callbacks+0x327/0x36f
[ 2929.465035] Modules linked in: ipmi_si x86_pkg_temp_thermal
[ 2929.465040] CPU: 5 PID: 35 Comm: ksoftirqd/5 Tainted: G        W       4.13.0+ #3
[ 2929.465042] task: ffff88046dadb400 task.stack: ffffc900033b8000
[ 2929.465044] RIP: 0010:rcu_process_callbacks+0x327/0x36f
[ 2929.465045] RSP: 0018:ffffc900033bbe00 EFLAGS: 00010002
[ 2929.465047] RAX: 0000000000000246 RBX: ffff88046fb59840 RCX: ffffffffffffd801
[ 2929.465048] RDX: 0000000000000000 RSI: ffffc900033bbe18 RDI: ffff88046fb59878
[ 2929.465049] RBP: ffffc900033bbe60 R08: 0000000000000001 R09: ffffffff81633d20
[ 2929.465050] R10: ffffc900033bbda8 R11: ffffea0011852c80 R12: ffffffff81c41b00
[ 2929.465051] R13: ffffc900033bbe18 R14: ffff88046fb59878 R15: ffff88046dadb400
[ 2929.465053] FS:  0000000000000000(0000) GS:ffff88046fb40000(0000) knlGS:0000000000000000
[ 2929.465054] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2929.465055] CR2: 00007fca94a07349 CR3: 000000086a6ff000 CR4: 00000000000606e0
[ 2929.465056] Call Trace:
[ 2929.465060]  ? net_rx_action+0x21e/0x22d
[ 2929.465064]  __do_softirq+0xe4/0x23a
[ 2929.465066]  ? sort_range+0x1d/0x1d
[ 2929.465070]  run_ksoftirqd+0x15/0x2a
[ 2929.465071]  smpboot_thread_fn+0x128/0x13f
[ 2929.465074]  kthread+0xf7/0xfc
[ 2929.465075]  ? __init_completion+0x24/0x24
[ 2929.465078]  ret_from_fork+0x22/0x30
[ 2929.465079] Code: 8b 8b 90 00 00 00 48 2b 0d b7 54 bc 00 48 39 ca 7d 07 48 89 93 90 00 00 00 48 83 7b 38 00 0f 94 c1 48 85 d2 0f 94 c2 38 d1 74 02 <0f> ff 50 9d 4c 89 f7 e8 7e 22 00 00 84 c0 74 09 e8 06 f1 ff ff
[ 2929.465107] ---[ end trace 4d5f38095306e2de ]---
[ 2929.509019] ixgbe 0000:06:00.0 enp6s0f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
[ 2929.512261] BUG: Bad page state in process ksoftirqd/21  pfn:45c020
[ 2929.512264] page:ffffea0011700800 count:-1 mapcount:0 mapping:          (null) index:0x0
[ 2929.512947] flags: 0x2000000000000000()
[ 2929.513202] raw: 2000000000000000 0000000000000000 0000000000000000 ffffffffffffffff
[ 2929.513204] raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
[ 2929.513205] page dumped because: nonzero _count
[ 2929.513205] Modules linked in: ipmi_si x86_pkg_temp_thermal
[ 2929.513210] CPU: 21 PID: 116 Comm: ksoftirqd/21 Tainted: G        W       4.13.0+ #3
[ 2929.513210] Call Trace:
[ 2929.513215]  dump_stack+0x4d/0x63
[ 2929.513218]  bad_page+0xf3/0x10f
[ 2929.513220]  check_new_page_bad+0x73/0x75
[ 2929.513222]  get_page_from_freelist+0x419/0x63f
[ 2929.513225]  __alloc_pages_nodemask+0xef/0x182
[ 2929.513227]  page_frag_alloc+0x38/0x10c
[ 2929.513230]  __napi_alloc_skb+0x6a/0xbd
[ 2929.513234]  mlx5e_handle_rx_cqe_mpwrq+0x91/0x388
[ 2929.513236]  mlx5e_poll_rx_cq+0x113/0x171
[ 2929.513238]  mlx5e_napi_poll+0x81/0x149
[ 2929.513240]  net_rx_action+0xd3/0x22d
[ 2929.513242]  __do_softirq+0xe4/0x23a
[ 2929.513244]  ? sort_range+0x1d/0x1d
[ 2929.513247]  run_ksoftirqd+0x15/0x2a
[ 2929.513248]  smpboot_thread_fn+0x128/0x13f
[ 2929.513250]  kthread+0xf7/0xfc
[ 2929.513252]  ? __init_completion+0x24/0x24
[ 2929.513254]  ret_from_fork+0x22/0x30
[ 2929.513255] Disabling lock debugging due to kernel taint
[ 2929.514165] BUG: Bad page state in process ksoftirqd/16  pfn:45d020
[ 2929.514168] page:ffffea0011740800 count:-1 mapcount:0 mapping:          (null) index:0x0
[ 2929.514640] flags: 0x2000000000000000()
[ 2929.514898] raw: 2000000000000000 0000000000000000 0000000000000000 ffffffffffffffff
[ 2929.514900] raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
[ 2929.514901] page dumped because: nonzero _count
[ 2929.514901] Modules linked in: ipmi_si x86_pkg_temp_thermal
[ 2929.514905] CPU: 16 PID: 91 Comm: ksoftirqd/16 Tainted: G    B   W       4.13.0+ #3
[ 2929.514906] Call Trace:
[ 2929.514910]  dump_stack+0x4d/0x63
[ 2929.514913]  bad_page+0xf3/0x10f
[ 2929.514915]  check_new_page_bad+0x73/0x75
[ 2929.514917]  get_page_from_freelist+0x419/0x63f
[ 2929.514919]  __alloc_pages_nodemask+0xef/0x182
[ 2929.514922]  page_frag_alloc+0x38/0x10c
[ 2929.514924]  __napi_alloc_skb+0x6a/0xbd
[ 2929.514927]  mlx5e_handle_rx_cqe_mpwrq+0x91/0x388
[ 2929.514930]  mlx5e_poll_rx_cq+0x113/0x171
[ 2929.514932]  mlx5e_napi_poll+0x81/0x149
[ 2929.514934]  net_rx_action+0xd3/0x22d
[ 2929.514936]  __do_softirq+0xe4/0x23a
[ 2929.514938]  ? sort_range+0x1d/0x1d
[ 2929.514941]  run_ksoftirqd+0x15/0x2a
[ 2929.514942]  smpboot_thread_fn+0x128/0x13f
[ 2929.514944]  kthread+0xf7/0xfc
[ 2929.514946]  ? __init_completion+0x24/0x24
[ 2929.514948]  ret_from_fork+0x22/0x30
[ 2929.517505] BUG: Bad page state in process ksoftirqd/17  pfn:46a320
[ 2929.517508] page:ffffea0011a8c800 count:-1 mapcount:0 mapping:          (null) index:0x0
[ 2929.517985] flags: 0x2000000000000000()
[ 2929.518247] raw: 2000000000000000 0000000000000000 0000000000000000 ffffffffffffffff
[ 2929.518248] raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
[ 2929.518249] page dumped because: nonzero _count
[ 2929.518250] Modules linked in: ipmi_si x86_pkg_temp_thermal
[ 2929.518254] CPU: 17 PID: 96 Comm: ksoftirqd/17 Tainted: G    B   W       4.13.0+ #3
[ 2929.518255] Call Trace:
[ 2929.518258]  dump_stack+0x4d/0x63
[ 2929.518261]  bad_page+0xf3/0x10f
[ 2929.518263]  check_new_page_bad+0x73/0x75
[ 2929.518265]  get_page_from_freelist+0x419/0x63f
[ 2929.518267]  __alloc_pages_nodemask+0xef/0x182
[ 2929.518269]  page_frag_alloc+0x38/0x10c
[ 2929.518272]  __napi_alloc_skb+0x6a/0xbd
[ 2929.518274]  mlx5e_handle_rx_cqe_mpwrq+0x91/0x388
[ 2929.518276]  ? skb_release_all+0x1f/0x22
[ 2929.518278]  mlx5e_poll_rx_cq+0x113/0x171
[ 2929.518280]  mlx5e_napi_poll+0x81/0x149
[ 2929.518282]  net_rx_action+0xd3/0x22d
[ 2929.518284]  __do_softirq+0xe4/0x23a
[ 2929.518286]  ? sort_range+0x1d/0x1d
[ 2929.518289]  run_ksoftirqd+0x15/0x2a
[ 2929.518290]  smpboot_thread_fn+0x128/0x13f
[ 2929.518292]  kthread+0xf7/0xfc
[ 2929.518293]  ? __init_completion+0x24/0x24
[ 2929.518295]  ret_from_fork+0x22/0x30

[ 2931.740134] ixgbe 0000:06:00.0 enp6s0f0: Detected Tx Unit Hang
                 Tx Queue             <27>
                 TDH, TDT             <167>, <153>
                 next_to_use          <153>
                 next_to_clean        <167>
               tx_buffer_info[next_to_clean]
                 time_stamp           <1000a07f2>
                 jiffies              <1000a0a10>
Comment 2 Pawel Staszewski 2017-09-11 11:42:40 UTC
Created attachment 258299 [details]
Another panic

Same kernel config
Same test
Just upgraded ( 11.09.2017 9:30 )kernel to the latest net-next from git:
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git

Note You need to log in before you can comment on or make changes to this bug.