After an upgrade from 3.2 to 4.1 some of my equipment starting kernel panicing in the IPMI handler (often within a few minutes of booting but sometimes taking a few hours) [ 337.167974] general protection fault: 0000 [#1] PREEMPT SMP [ 337.235887] Modules linked in: [ 337.272453] CPU: 6 PID: 40 Comm: ksoftirqd/6 Not tainted 4.5.4 #3 [ 337.345555] Hardware name: RadiSys Corp. ATCA-4600/ATCA-4600 , BIOS A4600 0x1.0x0.00.00-0x3 03/27/2012 [ 337.467720] task: ffff8806719555c0 ti: ffff880671b14000 task.ti: ffff880671b14000 [ 337.557516] RIP: 0010:[<ffffffffbe396f56>] [<ffffffffbe396f56>] handle_new_recv_msgs+0x98/0x14a .[ 337.662989] RSP: 0018:ffff880671b17cb8 EFLAGS: 00010046 [ 337.727735] RAX: dead000000000100 RBX: ffff880670803000 RCX: 0000000000000007 [ 337.813364] RDX: dead000000000200 RSI: 0000000000000246 RDI: dead000000000200 [ 337.898984] RBP: ffff88066e7e3000 R08: dead000000000100 R09: 0000000000000430 [ 337.984617] R10: 0000000000000030 R11: 0000000000000000 R12: 0000000000000246 [ 338.070253] R13: 0000000000000000 R14: ffff880670803cb4 R15: ffff880670803cb8 [ 338.155889] FS: 0000000000000000(0000) GS:ffff88067fa00000(0000) knlGS:0000000000000000 [ 338.253000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 338.321914] CR2: 00007fcf12c87945 CR3: 000000003f00b000 CR4: 00000000000406e0 [ 338.407551] Stack: [ 338.431577] ffff880600000000 000000007fa16300 ffff88066e7e3000 ffff8806719555c0 [ 338.520356] ffff880670803d00 ffff8806719555c0 0000000000000000 ffff88037016c280 [ 338.609143] ffff88007915fe78 ffffffffbe940726 0000000000000000 ffff880670720c40 [ 338.697922] Call Trace: [ 338.727177] [<ffffffffbe940726>] ? __schedule+0x8a4/0x91b [ 338.792962] [<ffffffffbe3970ef>] ? smi_recv_tasklet+0xe7/0xf0 [ 338.862936] [<ffffffffbe32d698>] ? blk_done_softirq+0x88/0x9b [ 338.932909] [<ffffffffbe3f7ff0>] ? kbd_bh+0x79/0x85 [ 338.992441] [<ffffffffbe05f1e3>] ? tasklet_action+0x72/0xc5 [ 339.060317] [<ffffffffbe05f851>] ? __do_softirq+0x122/0x28d [ 339.128207] [<ffffffffbe076e89>] ? smpboot_create_threads+0x5c/0x5c [ 339.204432] [<ffffffffbe05f9d7>] ? run_ksoftirqd+0x1b/0x40 [ 339.271269] [<ffffffffbe07701e>] ? smpboot_thread_fn+0x195/0x19a [ 339.344374] [<ffffffffbe07457e>] ? kthread+0xc3/0xcb [ 339.404944] [<ffffffffbe0744bb>] ? kthread_freezable_should_stop+0x5c/0x5c [ 339.488503] [<ffffffffbe943ccf>] ? ret_from_fork+0x3f/0x70 [ 339.555336] [<ffffffffbe0744bb>] ? kthread_freezable_should_stop+0x5c/0x5c [ 339.638874] Code: e8 9a c6 5a 00 49 89 c4 83 7c 24 0c 00 7f 4c 48 8b 45 00 48 8b 55 08 49 b8 00 01 00 00 00 00 ad de 48 bf 00 02 00 00 00 00 ad de <48> 89 50 08 48 89 44 24 20 48 89 02 4c 89 45 00 48 89 7d 08 75 [ 339.865890] RIP [<ffffffffbe396f56>] handle_new_recv_msgs+0x98/0x14a [ 339.943166] RSP <ffff880671b17cb8> [ 339.984952] ---[ end trace ef78791815fa859c ]--- [ 340.040306] Kernel panic - not syncing: Fatal exception in interrupt [ 341.143442] Shutting down cpus with NMI [ 341.195113] IPMI message handler: BMC returned incorrect response, expected netfn 7 cmd 34, got netfn 7 cmd 33 [ 341.315195] IPMI message received with no owner. This [ 341.315195] could be because of a malformed message, or [ 341.315195] because of a hardware error. Contact your [ 341.315195] hardware vender for assistance [ 341.549096] general protection fault: 0000 [#2] PREEMPT SMP [ 341.616984] Modules linked in: [ 341.653542] CPU: 6 PID: 40 Comm: ksoftirqd/6 Tainted: G D 4.5.4 #3 [ 341.741257] Hardware name: RadiSys Corp. ATCA-4600/ATCA-4600 , BIOS A4600 0x1.0x0.00.00-0x3 03/27/2012 [ 341.863421] task: ffff8806719555c0 ti: ffff880671b14000 task.ti: ffff880671b14000 [ 341.953220] RIP: 0010:[<ffffffffbe396f56>] [<ffffffffbe396f56>] handle_new_recv_msgs+0x98/0x14a [ 342.058691] RSP: 0018:ffff880671b17858 EFLAGS: 00010046 [ 342.122393] RAX: dead000000000100 RBX: ffff880670803000 RCX: 0000000000000007 [ 342.208021] RDX: dead000000000200 RSI: 0000000000000046 RDI: dead000000000200 [ 342.293649] RBP: ffff88037009f400 R08: dead000000000100 R09: 0000000000000459 [ 342.379277] R10: 0000000000000030 R11: 0000000000000000 R12: 0000000000000000 [ 342.464898] R13: 0000000000000001 R14: ffff880670803cb4 R15: ffff880670803cb8 [ 342.550526] FS: 0000000000000000(0000) GS:ffff88067fa00000(0000) knlGS:0000000000000000 [ 342.647631] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 342.716547] CR2: 00007fcf12c87945 CR3: 000000003f00b000 CR4: 00000000000406e0 [ 342.802176] Stack: [ 342.826196] ffff313433343432 00000000be34f1c4 ffff88037009f400 ffffffffbf295137 [ 342.914975] ffff880670803d00 ffff880671b17918 ffffffffbf295137 ffffffffbecff089 [ 343.003754] ffffffffbecff087 ffff880671b17918 ffffffffbf29513d ffffffffbe350e54 [ 343.092540] Call Trace: [ 343.121794] [<ffffffffbe350e54>] ? vsnprintf+0x83/0x3d1 [ 343.185492] [<ffffffffbe3970ef>] ? smi_recv_tasklet+0xe7/0xf0 [ 343.255460] [<ffffffffbe0a3c97>] ? mod_timer+0x184/0x196 [ 343.320214] [<ffffffffbe407a6f>] ? wait_for_xmitr+0x1a/0x7d [ 343.388088] [<ffffffffbe397365>] ? ipmi_smi_msg_received+0x26d/0x28a [ 343.465366] [<ffffffffbe39c02b>] ? smi_event_handler+0x3f9/0x54e [ 343.538472] [<ffffffffbe093ecf>] ? console_unlock+0x3d5/0x40e [ 343.608436] [<ffffffffbe39c19f>] ? flush_messages+0x1f/0x26 [ 343.676315] [<ffffffffbe395b83>] ? panic_event+0xe5/0x10c [ 343.742105] [<ffffffffbe09481c>] ? vprintk_emit+0x3b2/0x3b4 [ 343.809992] [<ffffffffbe074fa2>] ? notifier_call_chain+0x3e/0x6d [ 343.883090] [<ffffffffbe075447>] ? __atomic_notifier_call_chain+0x3a/0x4d [ 343.965603] [<ffffffffbe101088>] ? panic+0xe9/0x1fe [ 344.025131] [<ffffffffbe016944>] ? oops_end+0x8a/0x99 [ 344.086752] [<ffffffffbe9459d8>] ? general_protection+0x28/0x30 [ 344.158805] [<ffffffffbe396f56>] ? handle_new_recv_msgs+0x98/0x14a [ 344.233991] [<ffffffffbe396f30>] ? handle_new_recv_msgs+0x72/0x14a [ 344.309172] [<ffffffffbe940726>] ? __schedule+0x8a4/0x91b [ 344.374955] [<ffffffffbe3970ef>] ? smi_recv_tasklet+0xe7/0xf0 [ 344.444915] [<ffffffffbe32d698>] ? blk_done_softirq+0x88/0x9b [ 344.514875] [<ffffffffbe3f7ff0>] ? kbd_bh+0x79/0x85 [ 344.574394] [<ffffffffbe05f1e3>] ? tasklet_action+0x72/0xc5 [ 344.642265] [<ffffffffbe05f851>] ? __do_softirq+0x122/0x28d [ 344.710138] [<ffffffffbe076e89>] ? smpboot_create_threads+0x5c/0x5c [ 344.786363] [<ffffffffbe05f9d7>] ? run_ksoftirqd+0x1b/0x40 [ 344.853191] [<ffffffffbe07701e>] ? smpboot_thread_fn+0x195/0x19a [ 344.926282] [<ffffffffbe07457e>] ? kthread+0xc3/0xcb [ 344.986846] [<ffffffffbe0744bb>] ? kthread_freezable_should_stop+0x5c/0x5c [ 345.070379] [<ffffffffbe943ccf>] ? ret_from_fork+0x3f/0x70 [ 345.137206] [<ffffffffbe0744bb>] ? kthread_freezable_should_stop+0x5c/0x5c [ 345.220738] Code: e8 9a c6 5a 00 49 89 c4 83 7c 24 0c 00 7f 4c 48 8b 45 00 48 8b 55 08 49 b8 00 01 00 00 00 00 ad de 48 bf 00 02 00 00 00 00 ad de <48> 89 50 08 48 89 44 24 20 48 89 02 4c 89 45 00 48 89 7d 08 75 [ 345.447735] RIP [<ffffffffbe396f56>] handle_new_recv_msgs+0x98/0x14a [ 345.525009] RSP <ffff880671b17858> [ 345.566787] ---[ end trace ef78791815fa859d ]--- [ 345.622137] Kernel panic - not syncing: Fatal exception in interrupt [ 345.698377] Kernel Offset: 0x3d000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) I managed to bisect it to this commit - https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7ea0ed2b5be81781ba976bc03414ef5da76270b9 - before that commit they are perfectly stable and after it they crash. Here is the crash with DEBUG_LIST enabled. After 2 minutes the ipmi watchdog will kick in and reboot the system because its no longer processing IPMI messages correctly. [ 3450.354706] IPMI message handler: BMC returned incorrect response, expected netfn 7 cmd 34, got netfn 7 cmd 33 [ 3450.354712] IPMI message received with no owner. This [ 3450.354712] could be because of a malformed message, or [ 3450.354712] because of a hardware error. Contact your [ 3450.354712] hardware vender for assistance [ 3450.354716] ------------[ cut here ]------------ [ 3450.354726] WARNING: CPU: 11 PID: 66 at lib/list_debug.c:53 __list_del_entry+0x8d/0x9b() [ 3450.354728] list_del corruption, ffff88066e2ba800->next is LIST_POISON1 (dead000000000100) [ 3450.354730] Modules linked in: [ 3450.354735] CPU: 11 PID: 66 Comm: ksoftirqd/11 Not tainted 4.5.4 #2 [ 3450.354737] Hardware name: RadiSys Corp. ATCA-4600/ATCA-4600 , BIOS A4600 0x1.0x0.00.00-0x3 03/27/2012 [ 3450.354739] 0000000000000000 ffffffff84d3009f ffffffff84341f8b ffffffff84d3009f [ 3450.354743] 0000000000000082 ffff88067104fc38 ffffffff84d3009f ffff88067104fc38 [ 3450.354746] ffffffff8405b4f0 0000000000000000 ffffffff84357b98 ffff88065156e6d0 [ 3450.354749] Call Trace: [ 3450.354757] [<ffffffff84341f8b>] ? dump_stack+0x63/0x8c [ 3450.354763] [<ffffffff8405b4f0>] ? warn_slowpath_common+0x99/0xb2 [ 3450.354766] [<ffffffff84357b98>] ? __list_del_entry+0x8d/0x9b [ 3450.354769] [<ffffffff8405b5aa>] ? warn_slowpath_fmt+0x45/0x4d [ 3450.354775] [<ffffffff84084b56>] ? dequeue_task_fair+0x6d2/0x6e1 [ 3450.354781] [<ffffffff840132de>] ? __switch_to+0x406/0x47f [ 3450.354783] [<ffffffff84357b98>] ? __list_del_entry+0x8d/0x9b [ 3450.354786] [<ffffffff84357baf>] ? list_del+0x9/0x26 [ 3450.354792] [<ffffffff84391322>] ? handle_new_recv_msgs+0x83/0x123 [ 3450.354800] [<ffffffff849324e6>] ? __schedule+0x8a4/0x91b [ 3450.354803] [<ffffffff843914a4>] ? smi_recv_tasklet+0xc1/0xcc [ 3450.354807] [<ffffffff843283ec>] ? blk_done_softirq+0x81/0x99 [ 3450.354811] [<ffffffff8405ed5b>] ? tasklet_action+0x72/0xc5 [ 3450.354813] [<ffffffff8405f3c9>] ? __do_softirq+0x122/0x28d [ 3450.354819] [<ffffffff84076567>] ? smpboot_create_threads+0x5c/0x5c [ 3450.354822] [<ffffffff8405f54f>] ? run_ksoftirqd+0x1b/0x40 [ 3450.354825] [<ffffffff840766fc>] ? smpboot_thread_fn+0x195/0x19a [ 3450.354828] [<ffffffff84073dad>] ? kthread+0xc3/0xcb [ 3450.354831] [<ffffffff84073cea>] ? kthread_freezable_should_stop+0x5c/0x5c [ 3450.354835] [<ffffffff849356cf>] ? ret_from_fork+0x3f/0x70 [ 3450.354838] [<ffffffff84073cea>] ? kthread_freezable_should_stop+0x5c/0x5c [ 3450.354840] ---[ end trace 823e65229bb291df ]--- [ 3450.354842] IPMI message received with no owner. This [ 3450.354842] could be because of a malformed message, or [ 3450.354842] because of a hardware error. Contact your [ 3450.354842] hardware vender for assistance [ 3450.354846] ------------[ cut here ]------------ [ 3450.354849] WARNING: CPU: 11 PID: 66 at lib/list_debug.c:56 __list_del_entry+0x8d/0x9b() [ 3450.354851] list_del corruption, ffff88066e2ba800->prev is LIST_POISON2 (dead000000000200) [ 3450.354852] Modules linked in: [ 3450.354855] CPU: 11 PID: 66 Comm: ksoftirqd/11 Tainted: G W 4.5.4 #2 [ 3450.354856] Hardware name: RadiSys Corp. ATCA-4600/ATCA-4600 , BIOS A4600 0x1.0x0.00.00-0x3 03/27/2012 [ 3450.354858] 0000000000000000 ffffffff84d3009f ffffffff84341f8b ffffffff84d3009f [ 3450.354861] 0000000000000082 ffff88067104fc38 ffffffff84d3009f ffff88067104fc38 [ 3450.354864] ffffffff8405b4f0 0000000000000000 ffffffff84357b98 ffff88065156e6d0 [ 3450.354867] Call Trace: [ 3450.354869] [<ffffffff84341f8b>] ? dump_stack+0x63/0x8c [ 3450.354873] [<ffffffff8405b4f0>] ? warn_slowpath_common+0x99/0xb2 [ 3450.354876] [<ffffffff84357b98>] ? __list_del_entry+0x8d/0x9b [ 3450.354879] [<ffffffff8405b5aa>] ? warn_slowpath_fmt+0x45/0x4d [ 3450.354881] [<ffffffff84084b56>] ? dequeue_task_fair+0x6d2/0x6e1 [ 3450.354884] [<ffffffff840132de>] ? __switch_to+0x406/0x47f [ 3450.354887] [<ffffffff84357b98>] ? __list_del_entry+0x8d/0x9b [ 3450.354890] [<ffffffff84357baf>] ? list_del+0x9/0x26 [ 3450.354892] [<ffffffff84391322>] ? handle_new_recv_msgs+0x83/0x123 [ 3450.354896] [<ffffffff849324e6>] ? __schedule+0x8a4/0x91b [ 3450.354899] [<ffffffff843914a4>] ? smi_recv_tasklet+0xc1/0xcc [ 3450.354901] [<ffffffff843283ec>] ? blk_done_softirq+0x81/0x99 [ 3450.354903] [<ffffffff8405ed5b>] ? tasklet_action+0x72/0xc5 [ 3450.354906] [<ffffffff8405f3c9>] ? __do_softirq+0x122/0x28d [ 3450.354909] [<ffffffff84076567>] ? smpboot_create_threads+0x5c/0x5c [ 3450.354912] [<ffffffff8405f54f>] ? run_ksoftirqd+0x1b/0x40 [ 3450.354915] [<ffffffff840766fc>] ? smpboot_thread_fn+0x195/0x19a [ 3450.354917] [<ffffffff84073dad>] ? kthread+0xc3/0xcb [ 3450.354920] [<ffffffff84073cea>] ? kthread_freezable_should_stop+0x5c/0x5c [ 3450.354923] [<ffffffff849356cf>] ? ret_from_fork+0x3f/0x70 [ 3450.354926] [<ffffffff84073cea>] ? kthread_freezable_should_stop+0x5c/0x5c [ 3450.354928] ---[ end trace 823e65229bb291e0 ]---