Bug 75561
Summary: | kernel BUG at drivers/misc/mei/hbm.c:787 | ||
---|---|---|---|
Product: | Drivers | Reporter: | Julian Sikorski (belegdol) |
Component: | Other | Assignee: | drivers_other |
Status: | NEW --- | ||
Severity: | high | CC: | alan, dparsons, szg00000, tomas.winkler, tomasw |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.14.2 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | journalctl -b -1 output |
+ Thomas Winkler Can you check you have this patch applied ? commit b1b94b5d387e3a1f034c308e22f9295828d7174a Author: Tomas Winkler <tomas.winkler@intel.com> Date: Mon Mar 3 00:21:28 2014 +0200 mei: me: do not reset when less than expected data is received There is a race in ME hardware between data copy for host and interrupt delivery. An interrupt can be delivered prior to whole data copied for the host to read but rather then going trough the reset we just merely need to wait for the next interrupt. The bug is visible in read/write stress with multiple connections per client This is a regression caused as a side effect of the commit: commit 544f94601409653f07ae6e22d4a39e3a90dceead mei: do not run reset flow from the interrupt thread Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Cc: stable <stable@vger.kernel.org> # 3.14 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> The bug just occurred again with 3.14.4-200.fc20.x86_64. It does not look like the patch you are referring to is included in that kernel, neither as a part of stable queue nor as a Fedora patch. Can I ask you to cherry pick this patch and test it? Thanks I will, but keep in mind it took me 2 weeks to reproduce the bug and I'm not even sure what caused it. The laptop suspended when streaming a movie from an external usb drive to a raspberry pi over nfs, can this be it? I believe the only traffic between driver and mei device would be during suspend/resume in your case the streaming is not relevant. I experience this bug in the 3.14 kernels. It's reliably reproducible after a couple of suspend/resume cycles. As Tomas reported, the first resume seems to trigger the kernel oops, and the system freeze happens at the second resume. Successful suspend/resume is restored with the b1b94b5d patch (applied manually to 3.14.3 source). A "less data available than length=" message is written to the kernel log in place of the kernel oops. There is still some odd behaviour: the system immediately jumps back to suspend after the first resume. But after triggering resume again, my system does then resume and operate properly (no freeze). I haven't yet tested a 3.15 kernel. The bug seems to be fixed later in 3.14, without the b1b94b5d patch. In 3.14.13 (Debian 3.14.13-2) I'm no longer experiencing the freeze even after repeated suspend/resume cycles. |
Created attachment 135191 [details] journalctl -b -1 output Deal all, since upgrading to 3.14.1, I started seeing the following bug: maj 05 17:53:37 snowball2 kernel: mei_me 0000:00:16.0: less data available than length=00000002. maj 05 17:53:37 snowball2 kernel: mei_me 0000:00:16.0: unexpected reset: dev_state = INIT_CLIENTS maj 05 17:53:37 snowball2 kernel: mei_me 0000:00:16.0: hbm: properties response: state mismatch, [1, 1] maj 05 17:53:37 snowball2 kernel: mei_me 0000:00:16.0: unexpected reset: dev_state = INIT_CLIENTS maj 05 17:53:37 snowball2 kernel: mei_me 0000:00:16.0: hbm: start: state mismatch, [1, 3] maj 05 17:53:37 snowball2 kernel: ------------[ cut here ]------------ maj 05 17:53:37 snowball2 kernel: kernel BUG at drivers/misc/mei/hbm.c:787! maj 05 17:53:38 snowball2 kernel: invalid opcode: 0000 [#1] SMP maj 05 17:53:37 snowball2 NetworkManager[912]: <info> NetworkManager state is now DISCONNECTED maj 05 17:53:37 snowball2 NetworkManager[912]: <info> (wlan0): device state change: unmanaged -> unavailable (reason 'managed') [10 20 2] maj 05 17:53:37 snowball2 NetworkManager[912]: <info> (wlan0): bringing up device. maj 05 17:53:37 snowball2 NetworkManager[912]: <info> (wlan0) supports 5 scan SSIDs maj 05 17:53:37 snowball2 NetworkManager[912]: <info> (wlan0): supplicant interface state: starting -> ready maj 05 17:53:37 snowball2 NetworkManager[912]: <info> (wlan0): device state change: unavailable -> disconnected (reason 'supplicant-available') [20 30 42] maj 05 17:53:37 snowball2 NetworkManager[912]: <info> (wlan0): supplicant interface state: ready -> disconnected maj 05 17:53:37 snowball2 NetworkManager[912]: <info> (wlan0) supports 5 scan SSIDs maj 05 17:53:38 snowball2 kernel: Modules linked in: dm_crypt nf_conntrack_netbios_ns nf_conntrack_broadcast ccm bnep bluetooth 6lowpan_iphc ip6t_rpfilter ip6t_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw nvidia(POF) fuse arc4 iwldvm mac80211 snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic iTCO_wdt iTCO_vendor_support x86_pkg_temp_thermal coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel usb_storage snd_hda_intel microcode snd_hda_codec snd_hwdep snd_seq joydev snd_seq_device iwlwifi maj 05 17:53:38 snowball2 kernel: snd_pcm serio_raw cfg80211 jmb38x_ms drm jme sdhci_pci i2c_i801 mii sdhci rfkill mmc_core snd_timer memstick i2c_core mei_me snd lpc_ich mei mfd_core soundcore shpchp nfsd auth_rpcgss nfs_acl lockd sunrpc binfmt_misc firewire_ohci firewire_core crc_itu_t wmi video maj 05 17:53:38 snowball2 kernel: CPU: 3 PID: 8875 Comm: irq/51-mei_me Tainted: PF O 3.14.2-200.fc20.x86_64 #1 maj 05 17:53:38 snowball2 kernel: Hardware name: CLEVO P150HMx/P150HMx, BIOS 4.6.4 08/09/2011 maj 05 17:53:38 snowball2 kernel: task: ffff88031bc51300 ti: ffff88031c0e2000 task.ti: ffff88031c0e2000 maj 05 17:53:38 snowball2 kernel: RIP: 0010:[<ffffffffa012b5d0>] [<ffffffffa012b5d0>] mei_hbm_dispatch+0x650/0xea0 [mei] maj 05 17:53:38 snowball2 kernel: RSP: 0000:ffff88031c0e3cf8 EFLAGS: 00010293 maj 05 17:53:38 snowball2 kernel: RAX: 0000000000000000 RBX: ffff88031d872000 RCX: 0000000000000003 maj 05 17:53:38 snowball2 kernel: RDX: ffffc90005b9c004 RSI: ffffc90005b9c004 RDI: 00000000801c1c1d maj 05 17:53:38 snowball2 kernel: RBP: ffff88031c0e3d30 R08: 0000000000000000 R09: 0000000000000001 maj 05 17:53:38 snowball2 kernel: R10: 00000000000299a4 R11: 0000000000040000 R12: ffff88031d87230a maj 05 17:53:38 snowball2 kernel: R13: ffff88031d87250c R14: ffff88031d872200 R15: ffffffff810e6340 maj 05 17:53:38 snowball2 kernel: FS: 0000000000000000(0000) GS:ffff88032f4c0000(0000) knlGS:0000000000000000 maj 05 17:53:38 snowball2 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 maj 05 17:53:38 snowball2 kernel: CR2: 0000000000000000 CR3: 0000000001c0c000 CR4: 00000000000407e0 maj 05 17:53:38 snowball2 kernel: Stack: maj 05 17:53:38 snowball2 kernel: ffffffff810a4522 ffff880200000000 ffff88031d872000 ffff88031c0e3d9c maj 05 17:53:38 snowball2 kernel: ffff88031d87250c ffff88031d872200 ffffffff810e6340 ffff88031c0e3d88 maj 05 17:53:38 snowball2 kernel: ffffffffa012c180 000000031c0e3d70 0000000000000293 00000000ffffffb9 maj 05 17:53:38 snowball2 kernel: Call Trace: maj 05 17:53:38 snowball2 kernel: [<ffffffff810a4522>] ? insert_work+0x62/0xa0 maj 05 17:53:38 snowball2 kernel: [<ffffffff810e6340>] ? irq_finalize_oneshot.part.30+0xe0/0xe0 maj 05 17:53:38 snowball2 kernel: [<ffffffffa012c180>] mei_irq_read_handler+0x2a0/0x6e0 [mei] maj 05 17:53:38 snowball2 kernel: [<ffffffff810e6340>] ? irq_finalize_oneshot.part.30+0xe0/0xe0 maj 05 17:53:38 snowball2 kernel: [<ffffffffa0123f88>] mei_me_irq_thread_handler+0x148/0x300 [mei_me] maj 05 17:53:38 snowball2 kernel: [<ffffffff816f2482>] ? __schedule+0x2e2/0x740 maj 05 17:53:38 snowball2 kernel: [<ffffffff810e6360>] irq_thread_fn+0x20/0x50 maj 05 17:53:38 snowball2 kernel: [<ffffffff810e67ef>] irq_thread+0x13f/0x170 maj 05 17:53:38 snowball2 kernel: [<ffffffff810e63f0>] ? irq_forced_thread_fn+0x60/0x60 maj 05 17:53:38 snowball2 kernel: [<ffffffff810e66b0>] ? irq_thread_check_affinity+0xf0/0xf0 maj 05 17:53:38 snowball2 kernel: [<ffffffff810ae211>] kthread+0xe1/0x100 maj 05 17:53:38 snowball2 kernel: [<ffffffff810ae130>] ? insert_kthread_work+0x40/0x40 maj 05 17:53:38 snowball2 kernel: [<ffffffff816fef7c>] ret_from_fork+0x7c/0xb0 maj 05 17:53:38 snowball2 kernel: [<ffffffff810ae130>] ? insert_kthread_work+0x40/0x40 maj 05 17:53:38 snowball2 kernel: Code: 37 f5 ff ff 84 c0 0f 84 af 01 00 00 f6 05 42 cf 00 00 04 0f 85 12 06 00 00 48 89 df e8 aa 68 00 00 e9 25 fc ff ff 0f 1f 44 00 00 <0f> 0b 66 0f 1f 44 00 00 f6 05 33 d0 00 00 04 0f 85 88 06 00 00 maj 05 17:53:38 snowball2 kernel: RIP [<ffffffffa012b5d0>] mei_hbm_dispatch+0x650/0xea0 [mei] maj 05 17:53:38 snowball2 kernel: RSP <ffff88031c0e3cf8> maj 05 17:53:38 snowball2 kernel: ---[ end trace 422e659790487932 ]--- maj 05 17:53:38 snowball2 kernel: BUG: unable to handle kernel paging request at ffffffffffffffd8 maj 05 17:53:38 snowball2 kernel: IP: [<ffffffff810ae800>] kthread_data+0x10/0x20 maj 05 17:53:38 snowball2 kernel: PGD 1c0f067 PUD 1c11067 PMD 0 maj 05 17:53:38 snowball2 acpid[831]: client connected from 908[0:0] maj 05 17:53:38 snowball2 acpid[831]: 1 client rule loaded maj 05 17:53:38 snowball2 kernel: Oops: 0000 [#2] SMP maj 05 17:53:38 snowball2 gnome-session[1764]: (realcrypt:2873): Gtk-CRITICAL **: IA__gtk_widget_get_direction: assertion 'GTK_IS_WIDGET (widget)' failed maj 05 17:53:38 snowball2 kernel: Modules linked in: dm_crypt nf_conntrack_netbios_ns nf_conntrack_broadcast ccm bnep bluetooth 6lowpan_iphc ip6t_rpfilter ip6t_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw nvidia(POF) fuse arc4 iwldvm mac80211 snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic iTCO_wdt iTCO_vendor_support x86_pkg_temp_thermal coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel usb_storage snd_hda_intel microcode snd_hda_codec snd_hwdep snd_seq joydev snd_seq_device iwlwifi maj 05 17:53:38 snowball2 kernel: snd_pcm serio_raw cfg80211 jmb38x_ms drm jme sdhci_pci i2c_i801 mii sdhci rfkill mmc_core snd_timer memstick i2c_core mei_me snd lpc_ich mei mfd_core soundcore shpchp nfsd auth_rpcgss nfs_acl lockd sunrpc binfmt_misc firewire_ohci firewire_core crc_itu_t wmi video maj 05 17:53:38 snowball2 kernel: CPU: 3 PID: 8875 Comm: irq/51-mei_me Tainted: PF D O 3.14.2-200.fc20.x86_64 #1 maj 05 17:53:38 snowball2 kernel: Hardware name: CLEVO P150HMx/P150HMx, BIOS 4.6.4 08/09/2011 maj 05 17:53:38 snowball2 kernel: task: ffff88031bc51300 ti: ffff88031c0e2000 task.ti: ffff88031c0e2000 maj 05 17:53:38 snowball2 kernel: RIP: 0010:[<ffffffff810ae800>] [<ffffffff810ae800>] kthread_data+0x10/0x20 maj 05 17:53:38 snowball2 kernel: RSP: 0000:ffff88031c0e3a08 EFLAGS: 00010202 maj 05 17:53:38 snowball2 kernel: RAX: 0000000000000000 RBX: ffff88031bc51300 RCX: 0000000000000015 maj 05 17:53:38 snowball2 kernel: RDX: ffff88031c0e3e80 RSI: 0000000000000000 RDI: ffff88031bc51300 maj 05 17:53:38 snowball2 kernel: RBP: ffff88031c0e3a08 R08: 0000000000000000 R09: 0000000000000001 maj 05 17:53:38 snowball2 kernel: R10: 0000000000000001 R11: ffffffff8112192b R12: ffffffff81ed8520 maj 05 17:53:38 snowball2 kernel: R13: ffff88031bc51968 R14: ffff88031bc51300 R15: ffff88031bc51300 maj 05 17:53:38 snowball2 kernel: FS: 0000000000000000(0000) GS:ffff88032f4c0000(0000) knlGS:0000000000000000 maj 05 17:53:38 snowball2 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 maj 05 17:53:38 snowball2 kernel: CR2: ffffffffffffffd8 CR3: 0000000001c0c000 CR4: 00000000000407e0 maj 05 17:53:38 snowball2 kernel: Stack: maj 05 17:53:38 snowball2 kernel: ffff88031c0e3a28 ffffffff810e6413 0000000000000000 ffffffff81ed8520 maj 05 17:53:38 snowball2 kernel: ffff88031c0e3a58 ffffffff810aafd7 000000000000000b 0000000000000000 maj 05 17:53:38 snowball2 kernel: 0000000000000246 0000000000000000 ffff88031c0e3ae8 ffffffff8108c754 maj 05 17:53:38 snowball2 kernel: Call Trace: maj 05 17:53:38 snowball2 kernel: [<ffffffff810e6413>] irq_thread_dtor+0x23/0xb0 maj 05 17:53:38 snowball2 kernel: [<ffffffff810aafd7>] task_work_run+0xa7/0xe0 maj 05 17:53:38 snowball2 kernel: [<ffffffff8108c754>] do_exit+0x2c4/0xa30 maj 05 17:53:38 snowball2 kernel: [<ffffffff816eb1ac>] ? printk+0x77/0x8e maj 05 17:53:38 snowball2 kernel: [<ffffffff816f704c>] oops_end+0x9c/0xe0 maj 05 17:53:38 snowball2 kernel: [<ffffffff81017fcb>] die+0x4b/0x70 maj 05 17:53:38 snowball2 kernel: [<ffffffff816f68e0>] do_trap+0x60/0x170 maj 05 17:53:38 snowball2 kernel: [<ffffffff810e6340>] ? irq_finalize_oneshot.part.30+0xe0/0xe0 maj 05 17:53:38 snowball2 kernel: [<ffffffff810e6340>] ? irq_finalize_oneshot.part.30+0xe0/0xe0 maj 05 17:53:38 snowball2 kernel: [<ffffffff810150aa>] do_invalid_op+0xaa/0xe0 maj 05 17:53:38 snowball2 kernel: [<ffffffffa012b5d0>] ? mei_hbm_dispatch+0x650/0xea0 [mei] maj 05 17:53:38 snowball2 kernel: [<ffffffff810caa32>] ? enqueue_task_fair+0x412/0x660 maj 05 17:53:38 snowball2 kernel: [<ffffffff810c2885>] ? sched_clock_cpu+0x85/0xc0 maj 05 17:53:38 snowball2 kernel: [<ffffffff8170075e>] invalid_op+0x1e/0x30 maj 05 17:53:38 snowball2 kernel: [<ffffffff810e6340>] ? irq_finalize_oneshot.part.30+0xe0/0xe0 maj 05 17:53:38 snowball2 kernel: [<ffffffffa012b5d0>] ? mei_hbm_dispatch+0x650/0xea0 [mei] maj 05 17:53:38 snowball2 kernel: [<ffffffffa012afbf>] ? mei_hbm_dispatch+0x3f/0xea0 [mei] maj 05 17:53:38 snowball2 kernel: [<ffffffff810a4522>] ? insert_work+0x62/0xa0 maj 05 17:53:38 snowball2 kernel: [<ffffffff810e6340>] ? irq_finalize_oneshot.part.30+0xe0/0xe0 maj 05 17:53:38 snowball2 kernel: [<ffffffffa012c180>] mei_irq_read_handler+0x2a0/0x6e0 [mei] maj 05 17:53:38 snowball2 kernel: [<ffffffff810e6340>] ? irq_finalize_oneshot.part.30+0xe0/0xe0 maj 05 17:53:38 snowball2 kernel: [<ffffffffa0123f88>] mei_me_irq_thread_handler+0x148/0x300 [mei_me] maj 05 17:53:38 snowball2 kernel: [<ffffffff816f2482>] ? __schedule+0x2e2/0x740 maj 05 17:53:38 snowball2 kernel: [<ffffffff810e6360>] irq_thread_fn+0x20/0x50 maj 05 17:53:38 snowball2 kernel: [<ffffffff810e67ef>] irq_thread+0x13f/0x170 maj 05 17:53:38 snowball2 kernel: [<ffffffff810e63f0>] ? irq_forced_thread_fn+0x60/0x60 maj 05 17:53:38 snowball2 kernel: [<ffffffff810e66b0>] ? irq_thread_check_affinity+0xf0/0xf0 maj 05 17:53:38 snowball2 kernel: [<ffffffff810ae211>] kthread+0xe1/0x100 maj 05 17:53:38 snowball2 kernel: [<ffffffff810ae130>] ? insert_kthread_work+0x40/0x40 maj 05 17:53:38 snowball2 kernel: [<ffffffff816fef7c>] ret_from_fork+0x7c/0xb0 maj 05 17:53:38 snowball2 kernel: [<ffffffff810ae130>] ? insert_kthread_work+0x40/0x40 maj 05 17:53:38 snowball2 kernel: Code: 00 48 89 e5 5d 48 8b 40 c8 48 c1 e8 02 83 e0 01 c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 48 8b 87 28 04 00 00 55 48 89 e5 <48> 8b 40 d8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 maj 05 17:53:38 snowball2 kernel: RIP [<ffffffff810ae800>] kthread_data+0x10/0x20 maj 05 17:53:38 snowball2 kernel: RSP <ffff88031c0e3a08> maj 05 17:53:38 snowball2 kernel: CR2: ffffffffffffffd8 maj 05 17:53:38 snowball2 kernel: ---[ end trace 422e659790487933 ]--- maj 05 17:53:38 snowball2 kernel: Fixing recursive fault but reboot is needed! The bug is initially invisible, but will cause the system to freeze on next suspend attempt. Full journalctl log is attached. Please let me know if more info is needed.