Bug 2705 - kernel panic after unloading acpi-modules in use
Summary: kernel panic after unloading acpi-modules in use
Status: REJECTED DUPLICATE of bug 1716
Alias: None
Product: ACPI
Classification: Unclassified
Component: Config-Other (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Wang, Zhenyu Z
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-05-14 01:20 UTC by Stephan Fudeus
Modified: 2004-05-28 08:31 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.6
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
Kernel-Messages (13.05 KB, text/plain)
2004-05-14 01:22 UTC, Stephan Fudeus
Details
kernel-config (shortened) (4.33 KB, text/plain)
2004-05-14 01:23 UTC, Stephan Fudeus
Details
proc entry removing patch (6.01 KB, patch)
2004-05-16 18:47 UTC, Wang, Zhenyu Z
Details | Diff
proc entry removing patch - 2.4.26 (6.42 KB, patch)
2004-05-16 18:48 UTC, Wang, Zhenyu Z
Details | Diff
syslog after applying patch (5.73 KB, text/plain)
2004-05-17 01:19 UTC, Stephan Fudeus
Details
dmesg after unloading thermal (864 bytes, text/plain)
2004-05-17 02:10 UTC, Stephan Fudeus
Details
dmesg while booting (9.70 KB, text/plain)
2004-05-17 02:13 UTC, Stephan Fudeus
Details
acpidmp (96.20 KB, text/plain)
2004-05-18 00:54 UTC, Stephan Fudeus
Details
step-by-step unloading modules until error (1.23 KB, text/plain)
2004-05-18 01:29 UTC, Stephan Fudeus
Details
correct process remove fs in 2.4.26 patch (642 bytes, patch)
2004-05-18 03:28 UTC, Wang, Zhenyu Z
Details | Diff
ksymoops when unloading processor-module (1.43 KB, text/plain)
2004-05-18 04:40 UTC, Stephan Fudeus
Details
full kernel config (2.04 KB, application/x-gzip)
2004-05-18 07:44 UTC, Stephan Fudeus
Details

Description Stephan Fudeus 2004-05-14 01:20:21 UTC
Distribution: Mandrake 10.0 with vanilla 2.6.6 
Hardware Environment: Sony Vaio PCG-FX805 
Software Environment:  
Problem Description: 
Badness in remove_proc_entry at fs/proc/generic.c:685 
occured while unloading acpi-modules whose proc-entries where still in use by 
cpuspeed-daemon
Comment 1 Stephan Fudeus 2004-05-14 01:22:17 UTC
Created attachment 2859 [details]
Kernel-Messages
Comment 2 Stephan Fudeus 2004-05-14 01:23:31 UTC
Created attachment 2860 [details]
kernel-config (shortened)
Comment 3 Stephan Fudeus 2004-05-14 07:44:57 UTC
I just found out that even without any daemon accessing /proc/acpi... i get 
errors on unloading e.g. the ac-module, but not yet a kernel panic: 
 
Badness in remove_proc_entry at fs/proc/generic.c:685 
Call Trace: 
 [<c0181e07>] remove_proc_entry+0x177/0x180 
 [<d107a13d>] acpi_ac_remove_fs+0x1d/0x2d [ac] 
 [<d107a2b5>] acpi_ac_remove+0x32/0x41 [ac] 
 [<c0209b80>] acpi_driver_detach+0x39/0x7c 
 [<c0209c32>] acpi_bus_unregister_driver+0x12/0x51 
 [<d107a2ce>] cleanup_module+0xa/0x1e [ac] 
 [<c012d598>] sys_delete_module+0x158/0x190 
 [<c0145398>] do_munmap+0x158/0x1c0 
 [<c0104139>] sysenter_past_esp+0x52/0x71 
 
 
 
Comment 4 Wang, Zhenyu Z 2004-05-16 18:47:27 UTC
Created attachment 2879 [details]
proc entry removing patch  

This patch is against vanilla 2.6.6 + acpi-20040326-2.6.6.diff (
http://www.kernel.org/pub/linux/kernel/people/lenb/acpi/patches/release/2.6.6/
)
Comment 5 Wang, Zhenyu Z 2004-05-16 18:48:52 UTC
Created attachment 2880 [details]
proc entry removing patch - 2.4.26  

This patch is against vanilla 2.4.26 + acpi-20040326-2.4.26.diff
Comment 6 Stephan Fudeus 2004-05-17 01:18:22 UTC
After applying proposed patch: 
kernel-panic on unloading thermal-module which was in use by: 
cpuspeed -d -t /proc/acpi/thermal_zone/THRM/temperature 75 
 
After unloading the other acpi-modules (i.e. button,processor,fan,ac,battery) 
reloading of ac-modules yields to stack-trace (see following attachment) and 
after that, reloading of module "fan" blocks each module-operation 
(modprobe,lsmod,...) 
 
Comment 7 Stephan Fudeus 2004-05-17 01:19:42 UTC
Created attachment 2885 [details]
syslog after applying patch
Comment 8 Wang, Zhenyu Z 2004-05-17 01:59:31 UTC
Could you attach your dmesg and kernel panic message of unloading thermal module? 
In your syslog, the first part of call track is at boot time or not?
Also, attach your acpidmp pls.
Comment 9 Stephan Fudeus 2004-05-17 02:10:59 UTC
Created attachment 2886 [details]
dmesg after unloading thermal

syslog (last attachment) was not at boot time, just manual unloading short
after booting the patched kernel
Comment 10 Stephan Fudeus 2004-05-17 02:13:30 UTC
Created attachment 2887 [details]
dmesg while booting
Comment 11 Wang, Zhenyu Z 2004-05-17 21:38:51 UTC
Stephan,
I've tested this patch with 2.6.6, and I encounter issue of loading button
module about module_alloc error.
Delete four line in button.c/acpi_button_exit func:

if(fixed_pwr_button) 
   acpi_button_remove(fixed_pwr_button, ACPI_BUS_TYPE_POWER_BUTTON);
if(fixed_sleep_button)
   acpi_button_remove(fixed_sleep_button, ACPI_BUS_TYPE_SLEEP_BUTTON);

The error seems gone. (button patch is error on this point.)

Can you stop your user side application, like cpuspeed? And try unload thermal
module. I think if cpufreq is in use, and CONFIG_X86_POWERNOW_K7 is set (
according to your .config), only processor module would be affected. I've tested
with my centrino and it's ok. I am suspecting it is not the fault of thermal.

Does the error below happened in loading battery module?
Can you attach your acpidmp output? 
(acpidmp is included in
http://www.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/pmtools-xxx.tar.gz)

ACPI-1133: *** Error: Method execution failed [\_SB_.GBFE] (Node c1238e7c),
AE_AML_NO_OPERAND
May 17 09:57:46 mindstorm kernel:     ACPI-1133: *** Error: Method execution
failed [\_SB_.ITOS] (Node c1238e3c), AE_AML_NO_OPERAND
May 17 09:57:46 mindstorm kernel:     ACPI-1133: *** Error: Method execution
failed [\_SB_.BAT1.UPBI] (Node c1238c7c), AE_AML_NO_OPERAND
May 17 09:57:46 mindstorm kernel:     ACPI-1133: *** Error: Method execution
failed [\_SB_.BAT1._BIF] (Node c1238cbc), AE_AML_NO_OPERAND

Comment 12 Stephan Fudeus 2004-05-18 00:54:49 UTC
Created attachment 2894 [details]
acpidmp
Comment 13 Stephan Fudeus 2004-05-18 01:29:31 UTC
Created attachment 2896 [details]
step-by-step unloading modules until error
Comment 14 Stephan Fudeus 2004-05-18 01:37:22 UTC
ok, when using rmmod instead of modprobe -r you can see, that unloading 
processor is the culprit, not unloading thermal 
Comment 15 Wang, Zhenyu Z 2004-05-18 02:53:21 UTC
Unloading processor module will fail if thermal is using it.
Is it unloading processor module causing kernel panic, as your messge said? 

-zhen
Comment 16 Stephan Fudeus 2004-05-18 03:00:11 UTC
Correct, it breaks down to unloading "processor"-module causes the panic. 
I was able to unload "thermal" without errors. 
Comment 17 Wang, Zhenyu Z 2004-05-18 03:27:04 UTC
Any oops info available?
Unload cpufreq related modules first, then unload processor.
Any difference?

-zhen
Comment 18 Wang, Zhenyu Z 2004-05-18 03:28:47 UTC
Created attachment 2897 [details]
correct process remove fs in 2.4.26 patch
Comment 19 Stephan Fudeus 2004-05-18 04:39:25 UTC
The only cpufreq-module I have is cpufreq_userspace, the others are build into 
the kernel (for now). Unloading cpufreq_userspace before unloading processor 
does not change the behavior. 
 
Shall I recompile the kernel with more debug-info and/or cpufreq more 
modularized? 
 
Comment 20 Stephan Fudeus 2004-05-18 04:40:17 UTC
Created attachment 2899 [details]
ksymoops when unloading processor-module
Comment 21 Wang, Zhenyu Z 2004-05-18 07:19:22 UTC
Ok. Add CONFIG_ACPI_DEBUG and make cpufreq components into modules.
Then pls attach your whole kernel .config. 

thanks,
-zhen
Comment 22 Stephan Fudeus 2004-05-18 07:43:48 UTC
The problem keeps the same with cpufreq as modules and never loaded until 
unloading of processor. 
Comment 23 Stephan Fudeus 2004-05-18 07:44:26 UTC
Created attachment 2900 [details]
full kernel config
Comment 24 Stephan Fudeus 2004-05-18 07:54:02 UTC
Completely disabling cpufreq in the kernel does not fix it. Still errors on 
unloading processor. There are no special acpi-debug messages near the oops. 
Comment 25 Len Brown 2004-05-18 11:57:50 UTC
I don't think the processor unload issue is related to /proc. 
 
As the fault is in cpu_idle, probably pm_idle used to point to acpi_processor_idle 
and no longer does.   If this is true, then you should see this issue go away 
when you boot with "idle=poll". 
 
I thought we fixed this already -- maybe just in 2.6.  Can you test recent 2.6? 
 
I'm marking this report RESOLVED as the /proc patch is good to go. 
Comment 26 Shaohua 2004-05-18 18:06:03 UTC
Right, for the cpu idle oops, we have patch. Please refer to bug 1716
Comment 27 Stephan Fudeus 2004-05-19 02:04:46 UTC
Ok, thanks anyway. 
With a new 2.6.6 and idle=poll I experience no more oopses on unloading 
processor (just badness in remove_proc_entry as in initial bug description). 
When using the patches from bug 1716 I experienced no more oopses on unloading 
apci-modules, but a kernel-panic when re-inserting them (e.g. battery). Sorry, 
I don't have the messages since the logfiles where not synced. I'll try to 
reproduce it the next days. 
Comment 28 Len Brown 2004-05-27 23:58:26 UTC
please test 2.6.7-rc1 
Comment 29 Stephan Fudeus 2004-05-28 00:55:23 UTC
2.6.7-rc1 does not panic anymore when unloading and re-loading modules, but 
there is still one oops when unloading processor: 
Unable to handle kernel paging request at virtual address d103e237 
 printing eip: 
d103e237 
*pde = 0fa15067 
*pte = 00000000 
Oops: 0000 [#1] 
PREEMPT 
Modules linked in: cpufreq_userspace powernow_k7 freq_table snd_seq_midi 
snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_via82xx 
snd_pcm snd_timer snd_ac97_codec snd_page_alloc snd_mpu401_uart snd_rawmidi 
snd_seq_device snd soundcore ip_conntrack_ftp ds yenta_socket pcmcia_core 
button fan ac battery ipt_LOG ipt_REJECT ipt_state iptable_filter iptable_nat 
ip_conntrack ip_tables 8139too mii crc32 nls_iso8859_1 nls_cp437 vfat fat 
genrtc 
CPU:    0 
EIP:    0060:[<d103e237>]    Not tainted 
EFLAGS: 00010216   (2.6.7-rc1) 
EIP is at 0xd103e237 
eax: b3ee1032   ebx: 00008008   ecx: b3ee07b4   edx: 00008008 
esi: cffd54b0   edi: c03c6c80   ebp: cffd5400   esp: c039ffbc 
ds: 007b   es: 007b   ss: 0068 
Process swapper (pid: 0, threadinfo=c039e000 task=c0353a40) 
Stack: 0009ef00 0009ef00 c039e000 0009ef00 c03c6c80 0041b007 c01020cd c039e000 
       c03a05f2 c0353a40 00000000 c03bee98 00000017 c03a0320 c03c6a80 00000000 
       c010019f 
Call Trace: 
 [<c01020cd>] cpu_idle+0x2d/0x40 
 [<c03a05f2>] start_kernel+0x172/0x1a0 
 [<c03a0320>] unknown_bootoption+0x0/0x130 
 
Code:  Bad EIP value. 
 <0>Kernel panic: Attempted to kill the idle task! 
In idle task - not syncing 
 
Comment 30 Wang, Zhenyu Z 2004-05-28 01:07:52 UTC
According with your comment #27, have you tried that way?
Or, do you find some patches lost(like in 1716) in updated acpi?
Could you verify it?

thanks,
-zhen
Comment 31 Shaohua 2004-05-28 01:14:42 UTC
Yes, for processor module panic ("kill idle"), we have another patch to fix 
it. please refer to Bug 1716. It's not related with proc file.
Comment 32 Stephan Fudeus 2004-05-28 01:16:52 UTC
Ok, I thought patches from bug 1716 might be included in 2.6.7-rc1. 
I'll apply them and check again. 
Comment 33 Stephan Fudeus 2004-05-28 03:36:34 UTC
patches from bug 1716 combined with 2.6.7-rc1 seem to fix all issues. 
thanks! 
Comment 34 Len Brown 2004-05-28 08:31:56 UTC

*** This bug has been marked as a duplicate of 1716 ***

Note You need to log in before you can comment on or make changes to this bug.