Bug 8973 (ACPI_freezes_System)
|Summary:||boot freeze on modprobe, 2.6.21 and later|
|Product:||ACPI||Reporter:||Christian Wiegele (christian.wiegele)|
|Severity:||blocking||CC:||acpi-bugzilla, bunk, kernel, shaohua.li|
Kernel Config (working)
kernel config after "make oldconfig" 2.6.20-r9 -> 2.6.23-rc5
reverted patch rediffed against 2.6.23
PCI patch: add quirk function for some chipsets
Description Christian Wiegele 2007-09-03 05:49:30 UTC
Most recent kernel where this bug did not occur: Distribution: Getnoo Linux Hardware Environment: Intel Centrino Pro (Santa Rosa) Software Environment: Problem Description: System Freezes with ACPI enabled Steps to reproduce: Compile Kernel >= 2.6.21 With ACPI Support and min. 1 Subfeature enabled (e.g. battery). I was using kernel 2.6.20-r and my system worked fine. Then i did an upgrade to 2.6.22-r6 (latest). compiling worked fine. when i tried to boot ne new kernel, my system freezed. to find out what exactly causes it, i disabled a lot of kernel-features. i found out that disabling acpi-subfeatures solves the problem. so i compiled a kernel with acpi built in and just battery as a module. the kernel did then boot fine. afer modprobe battery the system freezed again. its the same problem with any other acpi module... i had not problem with that before kernel 2.6.20.
Comment 1 Christian Wiegele 2007-09-03 05:51:20 UTC
sorry, i sould be: i had not problem with that before kernel 2.6.21.
Comment 2 Christian Wiegele 2007-09-04 23:36:07 UTC
its still the same problem with 2.6.23_rc5
Comment 3 Len Brown 2007-09-05 17:15:09 UTC
Please attach the failing .config Does it still fail if the single ACPI driver is compiled =y instead of =m? What if you boot with acpi=noirq? please attach the output of acpidump, also please attach the output from dmesg -s64000 on the latest working ACPI kernel -- 2.6.20 will do if you still have it.
Comment 4 Christian Wiegele 2007-09-06 03:17:24 UTC
Created attachment 12726 [details] Kernel Config Kernelconfig from 2.6.22-r1 which failes with ACPI
Comment 5 Christian Wiegele 2007-09-06 03:18:29 UTC
Created attachment 12727 [details] ACPIDUMP Acpidump from 2.6.20-r9 which works.
Comment 6 Christian Wiegele 2007-09-06 03:19:49 UTC
Created attachment 12728 [details] dmesg output dmesg output from 2.6.20-r9 which works.
Comment 7 Christian Wiegele 2007-09-06 03:24:52 UTC
It makes no difference if i compile a ACPI feature as modulle or built-in. - Compiling as built-in will freeze system while booting - Compiling as a module will freeze system while loading module. I'm goint to attach a DMESG an ACPIDUMP from the failing kernel. For this im booting the 2.6.22-r1 with no ACPI subeaftures/modules.
Comment 8 Christian Wiegele 2007-09-06 03:37:05 UTC
Created attachment 12729 [details] ACPIDUMP 2.6.22-r1 ACPIDUMP 2.6.22-r1 (failing kernel)
Comment 9 Christian Wiegele 2007-09-06 03:37:52 UTC
Created attachment 12730 [details] dmesg 2.6.22-r1 dmesg output of failing kernel (2.6.22-r1)
Comment 10 Len Brown 2007-09-06 04:01:00 UTC
thanks. note that acpidump is a BIOS dump, so unless you upgrade the BIOS or change BIOS SETUP it will not change.
Comment 11 Len Brown 2007-09-06 04:10:52 UTC
can you provide the .config from the successful 2.6.20 kernel? please try booting the failing kernel with "pnpacpi=off"
Comment 12 Christian Wiegele 2007-09-06 04:13:59 UTC
Created attachment 12733 [details] Kernel Config (working) kernel Config from 2.6.20-r9 (working)(gentoo-version
Comment 13 Len Brown 2007-09-06 04:22:42 UTC
I really expect this may be a PNP issue, but this BIOS has a couple more "interesting" features Please try booting 2.6.22 with acpi_osi=!Linux (or capture the failure dmesg for 2.6.23.latest which does this by default) Please try booting with acpi_apic_instance=2
Comment 14 Christian Wiegele 2007-09-06 04:26:19 UTC
pnpacpi=off does not work. system is still freezing. but i found out something interesting. the system is not freezing when loading module "fan" so it seemes to me that some acpi modules are loadable. if you want to i can test which module will cause the system to freeze and which not.
Comment 15 Christian Wiegele 2007-09-06 04:27:43 UTC
(In reply to comment #13) > I really expect this may be a PNP issue, > but this BIOS has a couple more "interesting" features > > Please try booting 2.6.22 with acpi_osi=!Linux > (or capture the failure dmesg for 2.6.23.latest which does this by default) > > Please try booting with acpi_apic_instance=2 > acpi_apic_instance=2 -> same problem acpi_osi=!Linux -> im going to test it now
Comment 16 Christian Wiegele 2007-09-06 04:32:55 UTC
acpi_osi=!Linux -> same problem my system in one of the new "santa rosa" notebooks with centrino core 2 duo
Comment 17 Christian Wiegele 2007-09-06 04:34:42 UTC
(In reply to comment #13) > I really expect this may be a PNP issue, > but this BIOS has a couple more "interesting" features > > Please try booting 2.6.22 with acpi_osi=!Linux > (or capture the failure dmesg for 2.6.23.latest which does this by default) > > Please try booting with acpi_apic_instance=2 > how can i caputre the failure dmesg for 2.6.23.latest?
Comment 18 Christian Wiegele 2007-09-06 05:27:24 UTC
are there any other information i can provide?
Comment 19 Len Brown 2007-09-06 05:42:20 UTC
re: messages for 2.6.23... Can you download the kernel from kernel.org, currently 2.6.23-rc5, build it, and capture them the same way you did for the 2.6.22 failure in comment #9? BTW. you don't need to assign the bug to yourself to get e-mail when it changes -- the submitter always gets e-mail. I assign it to me so that if you change it, I get e-mail...
Comment 20 Len Brown 2007-09-06 05:50:53 UTC
thanks for the working 2.6.20 .config. My guess was that CONFIG_PNPACPI was the key difference, but since you still fail with "pnpacpi=off", that can't be it. There are a number of differences. Can you start with the working 2.6.20 .config, drop it into your 2.6.22 (or 23) tree, make oldconfig, attach the config here, and test the resulting kernel? The resulting config should help narrow this down no matter if the resulting kernel boots or hangs.
Comment 21 Christian Wiegele 2007-09-06 06:16:52 UTC
(In reply to comment #19) > re: messages for 2.6.23... > Can you download the kernel from kernel.org, > currently 2.6.23-rc5, build it, > and capture them the same way you did > for the 2.6.22 failure in comment #9? > > BTW. you don't need to assign the bug to yourself to > get e-mail when it changes -- the submitter always gets e-mail. > I assign it to me so that if you change it, I get e-mail... > is the full 2.6.23-rc5 available for download? or only a patch to the latest? if not im going to emerge the latest vanilla-sources with gentoo. this are the original sources from kernel.org ant its easier for me to get.
Comment 22 Christian Wiegele 2007-09-06 06:27:14 UTC
Created attachment 12734 [details] kernel config after "make oldconfig" 2.6.20-r9 -> 2.6.23-rc5 kernel-config afer "make oldconfig" i just pressed enter to all questions. hope that was okay?
Comment 23 Len Brown 2007-09-06 06:39:06 UTC
yes, on the 2.6.23-rc5 line, the 'B' will get you a baseline, in this case, 2.6.22, and the 'V' will get you to a page with a link at the top to a patch to apply to the baseline, like so: $ tar xjf linux-2.6.22.tar.bz2 $ cd linux-2.6.22/ $ patch -Np1 < ../patch-2.6.23-rc5
Comment 24 Christian Wiegele 2007-09-06 06:46:15 UTC
okay, thanks for your mini howot :-) for the 2.6.23-rc5 dmesg. du you want me to build the kernel with the upgraded 2.6.23-rc5 or should is generate a new (default) one?
Comment 25 Len Brown 2007-09-06 07:15:13 UTC
re: which config for 2.6.23 yes, try the "make oldconfig" one first -- since in theory it is the closest thing to what you had before. If that doesn't work, try defconfig, as maybe it will point out something I've not noticed in your config that may related to the issue. Certainly if you can find anything besides removing all the ACPI options to make this failure change, that will be a big hint. thanks.
Comment 26 Christian Wiegele 2007-09-06 07:35:41 UTC
i patched the kernel ant its currently compiling. im searching since more than 2 weeks to get the kernel working without removing acpi options :-) but: i am pretty sure that it has anything todo with the graphics card. because the screen goes black for a second, before it freezes...
Comment 27 Christian Wiegele 2007-09-06 07:54:52 UTC
i installed 2.6.23-r5 and im not able to boot the kernel without freezing. i disabled all acpi subeatures and it does still not boot. using acpi=off is still freezing the system.... im will now try to disable other options...
Comment 28 Christian Wiegele 2007-09-06 08:02:21 UTC
Created attachment 12735 [details] error message i got this message one time. but im not able to reproduce it... happend 2 minutes ago...
Comment 29 Christian Wiegele 2007-09-06 08:07:37 UTC
i need to disable all acpi subfeatures + "Suspend tu RAM and standby" to get the 2.6.23-rc5 booting. "suspend to ram and standby" seems to be a new feature in 2.6.23-rc5...
Comment 30 Christian Wiegele 2007-09-06 08:19:25 UTC
Created attachment 12737 [details] dmesg 2.6.23-rc5 here is the dmesg output with acpi subfeatures and suspend.... off
Comment 31 Christian Wiegele 2007-09-06 08:22:11 UTC
when i press the fn-key + "the brightness up/down key" the system freezes too....
Comment 32 Len Brown 2007-09-06 08:51:38 UTC
Hmmm, this is a photo of a hang? what do the other hangs look like -- do they get further or stop earlier? > ACPI Error (evevent-0305): No installed handler for fixed event [0, 2, 3]... fixed event 0 is the power button and 2 is the pmtimer. apparently we received these events before handlers were registered... > ACPI Error (evxfevnt-0383): Could not disable RealTimeClock events apparently this was from rtc_handler() trying to disable events, no idea why it failed... > ACPI Error (evgpe-0705): No handler or method for GPE[0,7,8,A...] disabling > event... Hmm, again, we seem to be getting GPE's firing before we've installed the handlers... A couple of things to simplify, maybe one will have an effect -- fishing at this point... CONFIG_OPROFILE=n CONFIG_RCT=n CONFIG_HPET_EMULATE_RTC=n CONFIG_HPET=n
Comment 33 Christian Wiegele 2007-09-06 23:18:35 UTC
yes/no this is a photo of a hang. but normaly it hangs without any error messages... maybe i pressed any button...? i am trying your config now...
Comment 34 Christian Wiegele 2007-09-06 23:28:01 UTC
your config didnt make any change. i tested pressing the fn-key to ahjust the brightness while the kernel is booting: it freezes immediately. even if the kernel would boot, because all acpi subfeatures are disabled... isnt it possible to trace a module loading, to see whats really going on?
Comment 35 Christian Wiegele 2007-09-07 00:41:03 UTC
i disabled nearly all device drivers an kernel features. system is still freazing. i found out that booting with acpi=off is causing a system-freeze too. is this not a acpi issue?
Comment 36 Len Brown 2007-09-07 01:19:21 UTC
> acpi=off is causing a system-freeze too If it is the same freeze, then yes, this means the cause isn't ACPI. Can you tell if the hang is in the same place with and without ACPI? try booting with init=/bin/bash -- does it every get that far? If no, then interactive module loading will not help. I'm afraid we havn't learned much about the failure in ACPI mode, other than we seem to be receiving interrupts earlier than expected try make defconfig and using that config -- if it works, that is a clue. try also to simplify by booting with "nolapic" If you changed any BIOS SETUP options, try resetting to defaults.
Comment 37 Christian Wiegele 2007-09-07 01:28:51 UTC
i have removed pci support from the kernel. now i am able to boot the kernel and use the fn-keys to control brightness. i think that acpi is automatically disabled when pci is disabled...
Comment 38 Christian Wiegele 2007-09-07 01:31:13 UTC
(In reply to comment #36) > > acpi=off is causing a system-freeze too > > If it is the same freeze, then yes, this means the cause isn't ACPI. > Can you tell if the hang is in the same place with and without ACPI? > > try booting with init=/bin/bash -- does it every get that far? > If no, then interactive module loading will not help. > > I'm afraid we havn't learned much about the failure in ACPI mode, > other than we seem to be receiving interrupts earlier than expected > > try make defconfig > and using that config -- if it works, that is a clue. > > try also to simplify by booting with "nolapic" > > If you changed any BIOS SETUP options, try resetting to defaults. > yes, acpi=off seemes to be the same freeze like acpi + subfeatures. i am trying your other options right now
Comment 39 Christian Wiegele 2007-09-07 02:02:49 UTC
try booting with init=/bin/bash -> no change booting with "nolapic" -> still freezing BIOS SETUP options, try resetting to defaults -> same problem try make defconfig and using that config -> still not working
Comment 40 Christian Wiegele 2007-09-07 02:31:52 UTC
why does loading the kernel with acpi=off pci=off work? loading the system does not work because my ide controller is connected vie pci... any other information i can provide?
Comment 41 Len Brown 2007-09-12 18:47:51 UTC
> i had not problem with that before kernel 2.6.21. I've not got a clue on the cause of this problem. I think the best route at this point would be to get a kernel git tree and bisect to identify which change caused this to break, and then to report the issue to lkml. http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html
Comment 42 Christian Wiegele 2007-09-12 23:32:26 UTC
okay, i have never done something like that before. you want me to: - install and compile git-sources-2.6.20-r9 - boot it and run "git bisect god" - install and compile git-sources-2.6.21-r1 - boot it and run "git bisect bad" right?
Comment 43 Christian Wiegele 2007-09-13 00:40:20 UTC
i found this howto: http://kerneltrap.org/node/11753 im testing it right now
Comment 44 Christian Wiegele 2007-09-13 02:20:34 UTC
len, i have finished git an have these informations now: 5eca338fb510af78eee5372ff6a3525768ab913f is first bad commit commit 5eca338fb510af78eee5372ff6a3525768ab913f Author: Bjorn Helgaas <firstname.lastname@example.org> Date: Thu Jan 18 16:44:48 2007 -0700 ACPI: remove motherboard driver (redundant with PNP system driver) The PNP system board driver (drivers/pnp/system.c) contains all the same functionality, so we don't need the ACPI version. Previously, a motherboard device would be claimed by *both* the ACPI and PNP drivers, resulting in stuff like this in /proc/ioports: 1200-121f : motherboard <-- from drivers/acpi/motherboard.c 1200-121f : pnp 00:0d <-- from drivers/pnp/system.c Make sure to enable CONFIG_PNP (and CONFIG_PNPACPI) to include the PNP system board driver. Signed-off-by: Bjorn Helgaas <email@example.com> Signed-off-by: Len Brown <firstname.lastname@example.org> :040000 040000 0b7b74615ce39414de0ca0f976e415f3da5bd3a4 04f83a52c6a7bfecfb8acba195efbfa923035cd0 M drivers
Comment 45 Christian Wiegele 2007-09-13 02:31:53 UTC
i did a "make menuconfig" but i am not able to change anything in "plug and play support" it seems to have a lot of deps... any ideas?
Comment 46 Christian Wiegele 2007-09-16 23:49:09 UTC
can you please report it to the lkml?
Comment 47 Christian Wiegele 2007-09-19 00:36:21 UTC
hello, are there any news?
Comment 48 Daniel Drake 2007-09-19 03:37:18 UTC
for my reference, original bug report: https://bugs.gentoo.org/show_bug.cgi?id=190989
Comment 49 Daniel Drake 2007-09-19 04:01:54 UTC
Created attachment 12870 [details] reverted patch rediffed against 2.6.23 Christian, would you mind just reconfirming the bisection result? Please apply this patch to a recent 2.6.23-rc release (I've already inverted and rediffed it for you, it should apply cleanly) and confirm that it makes this boot regression go away? Thanks!
Comment 50 Christian Wiegele 2007-09-19 04:34:39 UTC
Daniel, thank you for the patch. it did apply cleanly. but the system is still freezing. :-(
Comment 51 Christian Wiegele 2007-09-19 04:38:51 UTC
sould i do a bisection again? to see if it is the same result?
Comment 52 Christian Wiegele 2007-09-19 05:45:11 UTC
okay, i am doning a git bisect again right now to test if it is the same result. the way i am doing it is like that: -git bisect start -make clean -make defconfig -make -cp arch/i386/boot/bzimage /boot/mykernel -reboot system and see if it works -it it does, i say git bisect good, if not i am booting my working kernel and say git bisect bad. then i say: -make clean -make defconfig -make -cp arch/i386/boot/bzimage /boot/mykernel i am doing it until git bisect says there is no more to test and it gives me the changes. is that right?
Comment 53 Christian Wiegele 2007-09-19 05:46:48 UTC
after git bisect start i sad: git bisect good v2.6.20-rc7 git bisect bad v2.6.21-rc1
Comment 54 Christian Wiegele 2007-09-19 07:08:49 UTC
I have finised git bisect again. its the same error message like before: 5eca338fb510af78eee5372ff6a3525768ab913f is first bad commit commit 5eca338fb510af78eee5372ff6a3525768ab913f Author: Bjorn Helgaas <email@example.com> Date: Thu Jan 18 16:44:48 2007 -0700 ACPI: remove motherboard driver (redundant with PNP system driver) The PNP system board driver (drivers/pnp/system.c) contains all the same functionality, so we don't need the ACPI version. Previously, a motherboard device would be claimed by *both* the ACPI and PNP drivers, resulting in stuff like this in /proc/ioports: 1200-121f : motherboard <-- from drivers/acpi/motherboard.c 1200-121f : pnp 00:0d <-- from drivers/pnp/system.c Make sure to enable CONFIG_PNP (and CONFIG_PNPACPI) to include the PNP system board driver. Signed-off-by: Bjorn Helgaas <firstname.lastname@example.org> Signed-off-by: Len Brown <email@example.com> :040000 040000 0b7b74615ce39414de0ca0f976e415f3da5bd3a4 04f83a52c6a7bfecfb8acba195efbfa923035cd0 M drivers
Comment 55 Daniel Drake 2007-09-19 08:49:16 UTC
During bisection did you find both good and bad kernels?
Comment 56 Christian Wiegele 2007-09-19 10:28:03 UTC
yes i did. the last kernel was bad i think. the one before was the last working...
Comment 57 Christian Wiegele 2007-09-19 10:29:58 UTC
i have a notice which were bad and which were good.... i think its saved saved in a file in the git directory, too...
Comment 58 Christian Wiegele 2007-09-25 04:21:46 UTC
Comment 59 Christian Wiegele 2007-10-01 03:05:40 UTC
hello, what will happen now? is someone going to take a look at this problem? the changes which cause this the remove of the motherboard driver. if noone is going to help me i have to buy me a new notebook...
Comment 60 Christian Wiegele 2007-10-01 04:01:38 UTC
i found this in the changelog: commit 243b66e76ab722cdec1921d7f80c0cb808131c37 Author: Len Brown <firstname.lastname@example.org> Date: Thu Feb 15 22:34:36 2007 -0500 ACPI: always enable CONFIG_PNPACPI on CONFIG_ACPI kernels We removed the ACPI motherboard driver which handled the ACPI=y, PNP=n case, so now we need to enforce that PNP & PNPACPI are always enabled for ACPI kernels. Most major distros ship this way this already. Cc: Bjorn Helgaas <email@example.com> Signed-off-by: Len Brown <firstname.lastname@example.org> i think that is the problem. since the acpi motherboard driver was removed acpi uses the pnp driver. so daniels patch wont work, because acpi will still use the pnp driver when the patch is applied. i need to remove the dep acpi<->pnp. how can i do that?
Comment 61 Christian Wiegele 2007-10-02 02:29:41 UTC
i have some more news, when i have a working kernel with acpi + pnp support and i disable in drivers/acpi/Makefile the "motherboard.o" the system is freezing. when i disabling pnp support then i works again. in a working kernel motherboard.o is loaded first and is allocating something which pnp cannot allocate when its loaded later. in kernels >2.6.20 motherboard.o is removed so pnp is allocating something first which couses the freeze. im not able to disable pnp in a kernel >2.6.20 because its a dep of acpi. i think why daniels patch does not work is, that after patching a >2.6.20 kernel pnp will be loaded before motherboard.o. daniel, cat you make a patch which forces motherboard.o to be loaded before pnp? or a patch where i will be able to disable pnp but enable acpi?
Comment 62 Christian Wiegele 2007-10-02 02:52:48 UTC
i disabled all in drvers/pnp/Makefile and compiled a kernel. but it is still freezing. is it not a pnp problem?
Comment 63 Christian Wiegele 2007-10-02 04:57:59 UTC
daniel, i tested you patch with the last 2.6.22-r8 gentoo kernel. its working :-) why is the patch not working with the last sources from kernel.org? i tested your patch with the vanilla-sources-2.6.22-r7 and its working, too. im going to test your patch again with the latest vanilla-sources. maybe i have done something wrong at my first try..
Comment 64 Christian Wiegele 2007-10-02 05:11:58 UTC
okay, it looks like that we are having 2 couses of the freeze here. your patch is not working with 2.6.23-r8, but working well with 2.6.22-r8. so im doing a new bisection to see were the problem between .22-r8 and .23-r8 is..
Comment 65 Christian Wiegele 2007-10-04 04:03:18 UTC
i am getting an error while compiling the last bisection. drivers/acpi/scan.c In function ´acpi_bus_match´ drivers/acpi/scan.c:222 error: implicit declaration of function ´acpi_match_ids´ make: *** [drivers/acpi/scan.o] Error 1 make: *** [drivers/acpi] Error 2 make: *** [drivers] Error 2
Comment 66 Shaohua 2007-11-13 18:33:52 UTC
Can you attach the output of /proc/ioports and /proc/iomem with a working kernel? The ACPI motherboard.c does similar thing like pnp/system.c, so I suppose it should work, but anyway, let's check.
Comment 67 Shaohua 2007-11-13 19:26:16 UTC
PCI: Bus 35, cardbus bridge: 0000:22:09.0 IO window: 00001000-000010ff IO window: 00001400-000014ff PREFETCH window: 80000000-83ffffff MEM window: 88000000-8bffffff The resource isn't correct to me, 0x1000 - 0x107f is for LPC. can you add a '#define DEBUG' at the begining of arch/x86/pci/i386.c in the failed kernel, and try to captch the boot log (for example by serial console). I'd like to check the boot log.
Comment 68 Christian Wiegele 2007-11-14 00:16:13 UTC
Hi, thanks for your help after that time. Im sorry that i have to tell you that i have bought me a new notebook, because i was not able to work with it because of that bug. maybe we can close this one?
Comment 69 Fu Michael 2007-11-14 00:33:16 UTC
do you mind telling us what's the model name of the laptop you dumped?
Comment 70 Christian Wiegele 2007-11-14 00:41:18 UTC
It was a Samsung P55. I have a Dell Latitude D830 now. I think the P55 is mouch better. Its one of a few which come with a 15 and noch a 15.4 screen. the problem is i need the notebook for work and can't spend that much time to get it working...
Comment 71 Shaohua 2007-11-15 18:04:41 UTC
I already got the root cause, and will let Yakui to provide fix. IIRC, this is a urgent bug, and will break a lot of system.
Comment 73 ykzhao 2007-11-18 18:34:55 UTC
Created attachment 13608 [details] PCI patch: add quirk function for some chipsets
Comment 74 ykzhao 2007-11-18 18:36:09 UTC
Do you mind testing the above two patches on your P55 system? Thanks.
Comment 75 Fu Michael 2007-11-28 23:49:40 UTC
patch queued. mark as fixed.
Comment 76 Len Brown 2008-01-10 20:41:56 UTC
patch in comment #72 shipped in 2.6.24-rc4 as a7839e960675b549f06209d18283d5cee2ce9261 (PNP: increase the maximum number of resources) Patch in comment #73 may need an update