Bug 2705
Summary: | kernel panic after unloading acpi-modules in use | ||
---|---|---|---|
Product: | ACPI | Reporter: | Stephan Fudeus (kernel) |
Component: | Config-Other | Assignee: | Wang, Zhenyu Z (zhenyu.z.wang) |
Status: | REJECTED DUPLICATE | ||
Severity: | normal | CC: | acpi-bugzilla |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.6 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
Kernel-Messages
kernel-config (shortened) proc entry removing patch proc entry removing patch - 2.4.26 syslog after applying patch dmesg after unloading thermal dmesg while booting acpidmp step-by-step unloading modules until error correct process remove fs in 2.4.26 patch ksymoops when unloading processor-module full kernel config |
Description
Stephan Fudeus
2004-05-14 01:20:21 UTC
Created attachment 2859 [details]
Kernel-Messages
Created attachment 2860 [details]
kernel-config (shortened)
I just found out that even without any daemon accessing /proc/acpi... i get errors on unloading e.g. the ac-module, but not yet a kernel panic: Badness in remove_proc_entry at fs/proc/generic.c:685 Call Trace: [<c0181e07>] remove_proc_entry+0x177/0x180 [<d107a13d>] acpi_ac_remove_fs+0x1d/0x2d [ac] [<d107a2b5>] acpi_ac_remove+0x32/0x41 [ac] [<c0209b80>] acpi_driver_detach+0x39/0x7c [<c0209c32>] acpi_bus_unregister_driver+0x12/0x51 [<d107a2ce>] cleanup_module+0xa/0x1e [ac] [<c012d598>] sys_delete_module+0x158/0x190 [<c0145398>] do_munmap+0x158/0x1c0 [<c0104139>] sysenter_past_esp+0x52/0x71 Created attachment 2879 [details] proc entry removing patch This patch is against vanilla 2.6.6 + acpi-20040326-2.6.6.diff ( http://www.kernel.org/pub/linux/kernel/people/lenb/acpi/patches/release/2.6.6/ ) Created attachment 2880 [details]
proc entry removing patch - 2.4.26
This patch is against vanilla 2.4.26 + acpi-20040326-2.4.26.diff
After applying proposed patch: kernel-panic on unloading thermal-module which was in use by: cpuspeed -d -t /proc/acpi/thermal_zone/THRM/temperature 75 After unloading the other acpi-modules (i.e. button,processor,fan,ac,battery) reloading of ac-modules yields to stack-trace (see following attachment) and after that, reloading of module "fan" blocks each module-operation (modprobe,lsmod,...) Created attachment 2885 [details]
syslog after applying patch
Could you attach your dmesg and kernel panic message of unloading thermal module? In your syslog, the first part of call track is at boot time or not? Also, attach your acpidmp pls. Created attachment 2886 [details]
dmesg after unloading thermal
syslog (last attachment) was not at boot time, just manual unloading short
after booting the patched kernel
Created attachment 2887 [details]
dmesg while booting
Stephan, I've tested this patch with 2.6.6, and I encounter issue of loading button module about module_alloc error. Delete four line in button.c/acpi_button_exit func: if(fixed_pwr_button) acpi_button_remove(fixed_pwr_button, ACPI_BUS_TYPE_POWER_BUTTON); if(fixed_sleep_button) acpi_button_remove(fixed_sleep_button, ACPI_BUS_TYPE_SLEEP_BUTTON); The error seems gone. (button patch is error on this point.) Can you stop your user side application, like cpuspeed? And try unload thermal module. I think if cpufreq is in use, and CONFIG_X86_POWERNOW_K7 is set ( according to your .config), only processor module would be affected. I've tested with my centrino and it's ok. I am suspecting it is not the fault of thermal. Does the error below happened in loading battery module? Can you attach your acpidmp output? (acpidmp is included in http://www.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/pmtools-xxx.tar.gz) ACPI-1133: *** Error: Method execution failed [\_SB_.GBFE] (Node c1238e7c), AE_AML_NO_OPERAND May 17 09:57:46 mindstorm kernel: ACPI-1133: *** Error: Method execution failed [\_SB_.ITOS] (Node c1238e3c), AE_AML_NO_OPERAND May 17 09:57:46 mindstorm kernel: ACPI-1133: *** Error: Method execution failed [\_SB_.BAT1.UPBI] (Node c1238c7c), AE_AML_NO_OPERAND May 17 09:57:46 mindstorm kernel: ACPI-1133: *** Error: Method execution failed [\_SB_.BAT1._BIF] (Node c1238cbc), AE_AML_NO_OPERAND Created attachment 2894 [details]
acpidmp
Created attachment 2896 [details]
step-by-step unloading modules until error
ok, when using rmmod instead of modprobe -r you can see, that unloading processor is the culprit, not unloading thermal Unloading processor module will fail if thermal is using it. Is it unloading processor module causing kernel panic, as your messge said? -zhen Correct, it breaks down to unloading "processor"-module causes the panic. I was able to unload "thermal" without errors. Any oops info available? Unload cpufreq related modules first, then unload processor. Any difference? -zhen Created attachment 2897 [details]
correct process remove fs in 2.4.26 patch
The only cpufreq-module I have is cpufreq_userspace, the others are build into the kernel (for now). Unloading cpufreq_userspace before unloading processor does not change the behavior. Shall I recompile the kernel with more debug-info and/or cpufreq more modularized? Created attachment 2899 [details]
ksymoops when unloading processor-module
Ok. Add CONFIG_ACPI_DEBUG and make cpufreq components into modules. Then pls attach your whole kernel .config. thanks, -zhen The problem keeps the same with cpufreq as modules and never loaded until unloading of processor. Created attachment 2900 [details]
full kernel config
Completely disabling cpufreq in the kernel does not fix it. Still errors on unloading processor. There are no special acpi-debug messages near the oops. I don't think the processor unload issue is related to /proc. As the fault is in cpu_idle, probably pm_idle used to point to acpi_processor_idle and no longer does. If this is true, then you should see this issue go away when you boot with "idle=poll". I thought we fixed this already -- maybe just in 2.6. Can you test recent 2.6? I'm marking this report RESOLVED as the /proc patch is good to go. Right, for the cpu idle oops, we have patch. Please refer to bug 1716 Ok, thanks anyway. With a new 2.6.6 and idle=poll I experience no more oopses on unloading processor (just badness in remove_proc_entry as in initial bug description). When using the patches from bug 1716 I experienced no more oopses on unloading apci-modules, but a kernel-panic when re-inserting them (e.g. battery). Sorry, I don't have the messages since the logfiles where not synced. I'll try to reproduce it the next days. please test 2.6.7-rc1 2.6.7-rc1 does not panic anymore when unloading and re-loading modules, but there is still one oops when unloading processor: Unable to handle kernel paging request at virtual address d103e237 printing eip: d103e237 *pde = 0fa15067 *pte = 00000000 Oops: 0000 [#1] PREEMPT Modules linked in: cpufreq_userspace powernow_k7 freq_table snd_seq_midi snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_via82xx snd_pcm snd_timer snd_ac97_codec snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore ip_conntrack_ftp ds yenta_socket pcmcia_core button fan ac battery ipt_LOG ipt_REJECT ipt_state iptable_filter iptable_nat ip_conntrack ip_tables 8139too mii crc32 nls_iso8859_1 nls_cp437 vfat fat genrtc CPU: 0 EIP: 0060:[<d103e237>] Not tainted EFLAGS: 00010216 (2.6.7-rc1) EIP is at 0xd103e237 eax: b3ee1032 ebx: 00008008 ecx: b3ee07b4 edx: 00008008 esi: cffd54b0 edi: c03c6c80 ebp: cffd5400 esp: c039ffbc ds: 007b es: 007b ss: 0068 Process swapper (pid: 0, threadinfo=c039e000 task=c0353a40) Stack: 0009ef00 0009ef00 c039e000 0009ef00 c03c6c80 0041b007 c01020cd c039e000 c03a05f2 c0353a40 00000000 c03bee98 00000017 c03a0320 c03c6a80 00000000 c010019f Call Trace: [<c01020cd>] cpu_idle+0x2d/0x40 [<c03a05f2>] start_kernel+0x172/0x1a0 [<c03a0320>] unknown_bootoption+0x0/0x130 Code: Bad EIP value. <0>Kernel panic: Attempted to kill the idle task! In idle task - not syncing According with your comment #27, have you tried that way? Or, do you find some patches lost(like in 1716) in updated acpi? Could you verify it? thanks, -zhen Yes, for processor module panic ("kill idle"), we have another patch to fix it. please refer to Bug 1716. It's not related with proc file. Ok, I thought patches from bug 1716 might be included in 2.6.7-rc1. I'll apply them and check again. patches from bug 1716 combined with 2.6.7-rc1 seem to fix all issues. thanks! *** This bug has been marked as a duplicate of 1716 *** |