Bug 60583 - Kernel oops at boot in acpi_ev_notify_dispatch
Summary: Kernel oops at boot in acpi_ev_notify_dispatch
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: ACPICA-Core (show other bugs)
Hardware: i386 Linux
: P1 normal
Assignee: Lv Zheng
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-07-19 12:45 UTC by ronald.vink
Modified: 2014-06-11 00:15 UTC (History)
3 users (show)

See Also:
Kernel Version: 3.5.4 and 3.10.1
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Output of acpidump, hwinfo lspci syslog (268.94 KB, application/x-compressed-tar)
2013-07-19 12:45 UTC, ronald.vink
Details
syslog of the crash (160.68 KB, text/plain)
2013-07-23 05:32 UTC, ronald.vink
Details
syslog of the crash (7.76 KB, text/plain)
2013-07-25 05:32 UTC, ronald.vink
Details
[DBG PATCH] ACPICA: Events: Add delayed handler flushing to ensure all executions are exit. (3.05 KB, patch)
2013-12-12 02:39 UTC, Lv Zheng
Details | Diff
Patches for v3.5-rc4 (9.56 KB, patch)
2013-12-12 04:19 UTC, Lv Zheng
Details | Diff
attachment-4822-0.html (1.61 KB, text/html)
2014-06-11 00:15 UTC, ronald.vink
Details

Description ronald.vink 2013-07-19 12:45:52 UTC
Created attachment 106942 [details]
Output of acpidump, hwinfo lspci syslog

Sometimes the kernel oops at boot, something to do with acpi.
It started when I installed 3.5.4 on a Panasonic CF-19 Toughbook.
I tried 3.10.1 but the same result.
Crash occurs in same function but different adress. (drivers/acpi/acpica/evmisc.c line 267 for 3.5.4 and line 216 for 3.1.0.1)

syslog reports :
Jul  5 10:44:36 ship1 kernel: BUG: unable to handle kernel NULL pointer dereference at   (null)
Jul  5 10:44:36 ship1 kernel: IP: [<  (null)>]   (null)
Jul  5 10:44:36 ship1 kernel: *pde = 00000000
Jul  5 10:44:36 ship1 kernel: Oops: 0010 [#1] SMP
Jul  5 10:44:36 ship1 kernel: Modules linked in: af_packet uinput nfsd exportfs nfs lockd nfs_acl sunrpc ipv6 iptable_filter iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle iptable_raw ip_tables x_tables binfmt_misc loop cpufreq_ondemand cpufreq_conservative cpufreq_powersave speedstep_lib acpi_cpufreq arc4 freq_table iwlwifi mac80211 led_class mperf cfg80211 video intel_agp thermal processor rfkill i2c_i801 xhci_hcd intel_gtt thermal_sys agpgart kvm_intel kvm e1000e microcode sg coretemp rtc_cmos i2c_core serio_raw evdev hwmon battery ac button ext3 jbd pata_jmicron ide_pci_generic ahci libahci ata_piix libata sd_mod scsi_mod uhci_hcd ohci_hcd ehci_hcd usbhid usbcore usb_common [last unloaded: scsi_wait_scan]
Jul  5 10:44:36 ship1 kernel:
Jul  5 10:44:36 ship1 kernel: Pid: 1009, comm: kworker/0:2 Not tainted 3.5.4-vessel #6 Panasonic Corporation CF-191HAAXF4/CF19-6
Jul  5 10:44:36 ship1 kernel: EIP: 0060:[<00000000>] EFLAGS: 00010282 CPU: 0
Jul  5 10:44:36 ship1 kernel: EIP is at 0x0
Jul  5 10:44:36 ship1 kernel: EAX: f584d4a0 EBX: f5aad938 ECX: 00000000 EDX: 00000080
Jul  5 10:44:36 ship1 kernel: ESI: f5b575a8 EDI: f5e61740 EBP: dfa99f2c ESP: dfa99f20
Jul  5 10:44:36 ship1 kernel: DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Jul  5 10:44:36 ship1 kernel: CR0: 8005003b CR2: 00000000 CR3: 1fb83000 CR4: 001407d0
Jul  5 10:44:36 ship1 kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Jul  5 10:44:36 ship1 kernel: DR6: ffff0ff0 DR7: 00000400
Jul  5 10:44:36 ship1 kernel: Process kworker/0:2 (pid: 1009, ti=dfa98000 task=f5aed790 task.ti=dfa98000)
Jul  5 10:44:36 ship1 kernel: Stack:
Jul  5 10:44:36 ship1 kernel: c02a1a99 f4ebd3a0 f4ebd3a8 dfa99f3c c0294cc7 f4ebd3a8 f59f6a80 dfa99f78
Jul  5 10:44:36 ship1 kernel: c013e4fb 00000000 c051b000 c0520920 f5aed790 f5ad3500 00000046 c0294caa
Jul  5 10:44:36 ship1 kernel: 00000000 f5e65a05 f5e65a00 f59f6a80 f5e61740 35944000 dfa99fb8 c013ea70
Jul  5 10:44:36 ship1 kernel: Call Trace:
Jul  5 10:44:36 ship1 kernel: [<c02a1a99>] ? acpi_ev_notify_dispatch+0x4e/0x61
Jul  5 10:44:36 ship1 kernel: [<c0294cc7>] acpi_os_execute_deferred+0x1d/0x28
Jul  5 10:44:36 ship1 kernel: [<c013e4fb>] process_one_work+0xfb/0x340
Jul  5 10:44:36 ship1 kernel: [<c0294caa>] ? acpi_os_wait_events_complete+0x19/0x19
Jul  5 10:44:36 ship1 kernel: [<c013ea70>] worker_thread+0x140/0x3a0
Jul  5 10:44:36 ship1 kernel: [<c013e930>] ? manage_workers+0x1f0/0x1f0
Jul  5 10:44:36 ship1 kernel: [<c0144354>] kthread+0x74/0x80
Jul  5 10:44:36 ship1 kernel: [<c01442e0>] ? kthread_freezable_should_stop+0x60/0x60
Jul  5 10:44:36 ship1 kernel: [<c03d11b6>] kernel_thread_helper+0x6/0xd
Jul  5 10:44:36 ship1 kernel: Code:  Bad EIP value.
Jul  5 10:44:36 ship1 kernel: EIP: [<00000000>] 0x0 SS:ESP 0068:dfa99f20
Jul  5 10:44:36 ship1 kernel: CR2: 0000000000000000
Jul  5 10:44:36 ship1 kernel: ---[ end trace 11fc5ca1174c2a77 ]---
Jul  5 10:44:38 ship1 kernel: BUG: unable to handle kernel paging request at fffffffc
Jul  5 10:44:38 ship1 kernel: IP: [<c0143f3a>] kthread_data+0xa/0x10
Jul  5 10:44:38 ship1 kernel: *pde = 00528067 *pte = 00000000
Jul  5 10:44:38 ship1 kernel: Oops: 0000 [#2] SMP
Jul  5 10:44:38 ship1 kernel: Modules linked in: af_packet uinput nfsd exportfs nfs lockd nfs_acl sunrpc ipv6 iptable_filter iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle iptable_raw ip_tables x_tables binfmt_misc loop cpufreq_ondemand cpufreq_conservative cpufreq_powersave speedstep_lib acpi_cpufreq arc4 freq_table iwlwifi mac80211 led_class mperf cfg80211 video intel_agp thermal processor rfkill i2c_i801 xhci_hcd intel_gtt thermal_sys agpgart kvm_intel kvm e1000e microcode sg coretemp rtc_cmos i2c_core serio_raw evdev hwmon battery ac button ext3 jbd pata_jmicron ide_pci_generic ahci libahci ata_piix libata sd_mod scsi_mod uhci_hcd ohci_hcd ehci_hcd usbhid usbcore usb_common [last unloaded: scsi_wait_scan]
Jul  5 10:44:38 ship1 kernel:
Jul  5 10:44:38 ship1 kernel: Pid: 1009, comm: kworker/0:2 Tainted: G      D      3.5.4-vessel #6 Panasonic Corporation CF-191HAAXF4/CF19-6
Jul  5 10:44:38 ship1 kernel: EIP: 0060:[<c0143f3a>] EFLAGS: 00010002 CPU: 0
Jul  5 10:44:38 ship1 kernel: EIP is at kthread_data+0xa/0x10
Jul  5 10:44:38 ship1 kernel: EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
Jul  5 10:44:38 ship1 kernel: ESI: f5e64920 EDI: 00000000 EBP: dfa99cf0 ESP: dfa99ce4
Jul  5 10:44:38 ship1 kernel: DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Jul  5 10:44:38 ship1 kernel: CR0: 8005003b CR2: fffffffc CR3: 1fb83000 CR4: 001407d0
Jul  5 10:44:38 ship1 kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Jul  5 10:44:38 ship1 kernel: DR6: ffff0ff0 DR7: 00000400
Jul  5 10:44:38 ship1 kernel: Process kworker/0:2 (pid: 1009, ti=dfa98000 task=f5aed790 task.ti=dfa98000)
Jul  5 10:44:38 ship1 kernel: Stack:
Jul  5 10:44:38 ship1 kernel: c0140d4c f5aed790 f5e64920 dfa99d8c c03cf26c f5402698 00000000 dfa99d14
Jul  5 10:44:38 ship1 kernel: c01d1c70 dfa99d43 f5827990 00000000 dfa99d34 c020ca60 dfa99d43 00000004
Jul  5 10:44:38 ship1 kernel: c051b000 c0520920 f5aed790 f581a000 000003f1 dfa99d44 c017e397 00000000
Jul  5 10:44:38 ship1 kernel: Call Trace:
Jul  5 10:44:38 ship1 kernel: [<c0140d4c>] ? wq_worker_sleeping+0xc/0x80
Jul  5 10:44:38 ship1 kernel: [<c03cf26c>] __schedule+0x34c/0x750
Jul  5 10:44:38 ship1 kernel: [<c01d1c70>] ? d_hash_and_lookup+0x30/0x60
Jul  5 10:44:38 ship1 kernel: [<c020ca60>] ? proc_flush_task+0xd0/0x1a0
Jul  5 10:44:38 ship1 kernel: [<c017e397>] ? call_rcu_sched+0x17/0x20
Jul  5 10:44:38 ship1 kernel: [<c017e397>] ? call_rcu_sched+0x17/0x20
Jul  5 10:44:38 ship1 kernel: [<c012d6a9>] ? release_task+0x1d9/0x310
Jul  5 10:44:38 ship1 kernel: [<c012ce01>] ? exit_mm+0x1/0x100
Jul  5 10:44:38 ship1 kernel: [<c014978a>] ? switch_task_namespaces+0x1a/0x50
Jul  5 10:44:38 ship1 kernel: [<c03cf8de>] schedule+0x1e/0x50
Jul  5 10:44:38 ship1 kernel: [<c012eacf>] do_exit+0x4cf/0x7d0
Jul  5 10:44:38 ship1 kernel: [<c03cdd7b>] ? printk+0x38/0x3a
Jul  5 10:44:38 ship1 kernel: [<c0105726>] oops_end+0x86/0xd0
Jul  5 10:44:38 ship1 kernel: [<c01228a8>] no_context+0xd8/0x1a0
Jul  5 10:44:38 ship1 kernel: [<c0122a15>] __bad_area_nosemaphore+0xa5/0x140
Jul  5 10:44:38 ship1 kernel: [<c0122ac2>] bad_area_nosemaphore+0x12/0x20
Jul  5 10:44:38 ship1 kernel: [<c0122f39>] do_page_fault+0x279/0x400
Jul  5 10:44:38 ship1 kernel: [<c035ed00>] ? netlink_broadcast_filtered+0x1d0/0x340
Jul  5 10:44:38 ship1 kernel: [<c035ee95>] ? netlink_broadcast+0x25/0x30
Jul  5 10:44:38 ship1 kernel: [<c029c0cc>] ? acpi_bus_generate_netlink_event+0xfd/0x109
Jul  5 10:44:38 ship1 kernel: [<c0122cc0>] ? mm_fault_error+0x170/0x170
Jul  5 10:44:38 ship1 kernel: [<c03d0a0a>] error_code+0x5a/0x60
Jul  5 10:44:38 ship1 kernel: [<c0122cc0>] ? mm_fault_error+0x170/0x170
Jul  5 10:44:38 ship1 kernel: [<c02a1a99>] ? acpi_ev_notify_dispatch+0x4e/0x61
Jul  5 10:44:38 ship1 kernel: [<c0294cc7>] acpi_os_execute_deferred+0x1d/0x28
Jul  5 10:44:38 ship1 kernel: [<c013e4fb>] process_one_work+0xfb/0x340
Jul  5 10:44:38 ship1 kernel: [<c0294caa>] ? acpi_os_wait_events_complete+0x19/0x19
Jul  5 10:44:38 ship1 kernel: [<c013ea70>] worker_thread+0x140/0x3a0
Jul  5 10:44:38 ship1 kernel: [<c013e930>] ? manage_workers+0x1f0/0x1f0
Jul  5 10:44:38 ship1 kernel: [<c0144354>] kthread+0x74/0x80
Jul  5 10:44:38 ship1 kernel: [<c01442e0>] ? kthread_freezable_should_stop+0x60/0x60
Jul  5 10:44:38 ship1 kernel: [<c03d11b6>] kernel_thread_helper+0x6/0xd
Jul  5 10:44:38 ship1 kernel: Code: 90 55 64 a1 ac c5 51 c0 8b 80 74 01 00 00 89 e5 5d 8b 40 f8 c3 8d b6 00 00 00 00 8d bf 00 00 00 00 55 8b 80 74 01 00 00 89 e5 5d <8b> 40 fc c3 66 90 55 31 c0 89 e5 5d c3 89 f6 8d bc 27 00 00 00
Jul  5 10:44:38 ship1 kernel: EIP: [<c0143f3a>] kthread_data+0xa/0x10 SS:ESP 0068:dfa99ce4
Jul  5 10:44:38 ship1 kernel: CR2: 00000000fffffffc
Jul  5 10:44:38 ship1 kernel: ---[ end trace 11fc5ca1174c2a78 ]---
Comment 1 Aaron Lu 2013-07-22 05:30:19 UTC
Did any previous kernel have this problem, say v3.4?
Comment 2 ronald.vink 2013-07-23 05:31:55 UTC
After about 800 reboots 3.4.54 crashes at the same point.
I will try 3.4.0 now.

In the syslog I can see some ACPI warmings :

Jul 23 05:36:12 minos1 kernel: ACPI: Deprecated procfs I/F for battery is loaded, please retry with CONFIG_ACPI_PROCFS_POWER cleared
Jul 23 05:36:12 minos1 kernel: ACPI: Battery Slot [BAT1] (battery present)
Jul 23 05:36:12 minos1 kernel: ACPI: Deprecated procfs I/F for AC is loaded, please retry with CONFIG_ACPI_PROCFS_POWER cleared
Jul 23 05:36:12 minos1 kernel: ACPI: AC Adapter [AC] (on-line)
Jul 23 05:36:12 minos1 kernel: ACPI Warning: 0x0000f040-0x0000f05f SystemIO conflicts with Region \_SB_.PCI0.SBUS.SMBI 1 (20120320/utaddress-251)
Jul 23 05:36:12 minos1 kernel: ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver

Do these warnings have anything to do with the crash ?
Comment 3 ronald.vink 2013-07-23 05:32:56 UTC
Created attachment 106994 [details]
syslog of the crash
Comment 4 ronald.vink 2013-07-23 06:44:19 UTC
Kernel version 3.4.0 does the same at the same place.
Could not get a crashdump, the syslog and messages file ends with zeros.
Comment 5 ronald.vink 2013-07-25 05:32:15 UTC
Created attachment 107013 [details]
syslog of the crash

Kernel v3.3.0 has the same problem, however the crash occurred after 850 reboots.
It looks like the newer the kernel, the sooner the oops will come.
Once I had the same oops when I booted v3.5.4 in single mode, and was waiting at the prompt for +- 10 sec.
If someone wants me to try some patches, I am willing to do so. 
So, now I will do the same with v3.2.0.
Comment 6 ronald.vink 2013-07-26 05:19:07 UTC
OK, after a day of reboots, this night v3.2.0 also crashed, the same way as v 3.3.30 did. I am not sure if it was the acpi, because the syslog file did not report any oops. I will try 3.4.0 again.
Comment 7 Lv Zheng 2013-07-26 06:06:37 UTC
(In reply to ronald.vink from comment #2)
> After about 800 reboots 3.4.54 crashes at the same point.
I will try 3.4.0
> now.

In the syslog I can see some ACPI warmings :

Jul 23 05:36:12 minos1
> kernel: ACPI: Deprecated procfs I/F for battery is loaded, please retry with
> CONFIG_ACPI_PROCFS_POWER cleared
Jul 23 05:36:12 minos1 kernel: ACPI:
> Battery Slot [BAT1] (battery present)
Jul 23 05:36:12 minos1 kernel: ACPI:
> Deprecated procfs I/F for AC is loaded, please retry with
> CONFIG_ACPI_PROCFS_POWER cleared
Jul 23 05:36:12 minos1 kernel: ACPI: AC
> Adapter [AC] (on-line)
Jul 23 05:36:12 minos1 kernel: ACPI Warning:
> 0x0000f040-0x0000f05f SystemIO conflicts with Region \_SB_.PCI0.SBUS.SMBI 1
> (20120320/utaddress-251)
Jul 23 05:36:12 minos1 kernel: ACPI: If an ACPI
> driver is available for this device, you should use it instead of the native
> driver

Do these warnings have anything to do with the crash ?

There are two warnings in your comments.

I'm not sure the procfs can trigger this crash, but the latter of resource conflicts will not lead to the crash as indicated by your oops stack.
Comment 8 Lv Zheng 2013-07-29 07:47:58 UTC
According to the log, the panic happens in the acpi_os_execute_deferred() where the dpc->function becomes NULL in a racing environment which finally causes the message of "EIP: 0060:[<00000000>]".

This can be caused by an invocation of acpi_remove_notify_handler().
According to your log, it seems to happen during the shutdown.

By inveistigation on the notify related codes, the implementation seems to be racy.  Please wait for fix codes from us.  I need to first reproduce the same bug.

Thanks for the reporting.
Comment 9 ronald.vink 2013-07-29 08:24:56 UTC
The oops always occurs at boot. The log shows 2 boot-ups. The first comes to 05:36:20 where a timed reboot will occur after about 14 sec.
The second time the boot will come to 05:37:23, a reboot will be done at 05:37:37 but the acpi oop comes first.

We have different types of hardware, Nice 3100, 3600 and 91, Arbor MB-!77Q0, G7B330-B, Lenove T60 T61 T500 laptops. But only the Panasonic Toughbook type CF-191HAAXF4 has the problem.

If you want I can do some more tests with your patches.
We are using kernel version 3.5.4 on many systems now, so it is preferable to have this kernel version working. With newer versions I have problems with not compiling thirdparty drivers.
Comment 10 Lan Tianyu 2013-11-25 07:30:10 UTC
Lv, do you have some updates?
Comment 11 Lv Zheng 2013-12-12 02:39:31 UTC
Created attachment 118131 [details]
[DBG PATCH] ACPICA: Events: Add delayed handler flushing to ensure all executions are exit.

Could you please try this fix?  It is fixed inside ACPICA.

I think this issue can also be fixed by implementing acpi_os_wait_events_complete() in Linux OSL using wait queues.
Comment 12 Lv Zheng 2013-12-12 04:19:12 UTC
Created attachment 118141 [details]
Patches for v3.5-rc4

This is a tarball containing v3.5-rc4 based fixes.
I noticed you might need to cherry-pick 2 additional commits before using what I've posted here.
Comment 13 ronald.vink 2013-12-13 07:33:43 UTC
Lv, thanks for the patches. I have no time to have a look at it now, but will let you know if it works somewhere begin January.
Comment 14 Lv Zheng 2014-01-08 01:49:13 UTC
Pinging... Any updates?

I noticed that the patches are detected by bugzilla as text/plain.  You may have trouble to download them.
You can right click the attachment link of "Patches for v3.5-rc4" (https://bugzilla.kernel.org/attachment.cgi?id=118141) in the header of this thread, then "Save target as..." and renmae it from acpica-event.txt to acpica-event.tgz.
Comment 15 Lv Zheng 2014-01-22 10:50:03 UTC
Pinging...
Ronald, could you give the fix a try?
Comment 16 ronald.vink 2014-01-27 09:32:42 UTC
Lv,
I have done the patches, and did some tests. Looks good now.
Thanks for the help.
Comment 17 Lv Zheng 2014-03-04 08:20:34 UTC
Hi,

Glad to hear that.

The patch has been proposed to ACPICA using an Intel internal process.
It could appear in linux/master tree after several release cycles.
Closing this bug.

Thanks for the reporting.
Comment 18 Lv Zheng 2014-06-11 00:14:59 UTC
Shipped in v3.16-rc1.
Closing.
Comment 19 ronald.vink 2014-06-11 00:15:29 UTC
Created attachment 139011 [details]
attachment-4822-0.html

Dit is het email antwoord apparaat van Ronald Vink. Ronald is op dit moment op vakantie. Hij hoopt op 23 Juni weer aanwezig te zijn om je email te lezen.

Note You need to log in before you can comment on or make changes to this bug.