Bug 8973 (ACPI_freezes_System)
Summary: | boot freeze on modprobe, 2.6.21 and later | ||
---|---|---|---|
Product: | ACPI | Reporter: | Christian Wiegele (christian.wiegele) |
Component: | Config-Other | Assignee: | ykzhao (yakui.zhao) |
Status: | CLOSED CODE_FIX | ||
Severity: | blocking | CC: | acpi-bugzilla, bunk, kernel, shaohua.li |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | >=2.6.21 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
Kernel Config
ACPIDUMP dmesg output ACPIDUMP 2.6.22-r1 dmesg 2.6.22-r1 Kernel Config (working) kernel config after "make oldconfig" 2.6.20-r9 -> 2.6.23-rc5 error message dmesg 2.6.23-rc5 reverted patch rediffed against 2.6.23 PnP patch PCI patch: add quirk function for some chipsets |
Description
Christian Wiegele
2007-09-03 05:49:30 UTC
sorry, i sould be: i had not problem with that before kernel 2.6.21. its still the same problem with 2.6.23_rc5 Please attach the failing .config Does it still fail if the single ACPI driver is compiled =y instead of =m? What if you boot with acpi=noirq? please attach the output of acpidump, also please attach the output from dmesg -s64000 on the latest working ACPI kernel -- 2.6.20 will do if you still have it. Created attachment 12726 [details]
Kernel Config
Kernelconfig from 2.6.22-r1 which failes with ACPI
Created attachment 12727 [details]
ACPIDUMP
Acpidump from 2.6.20-r9 which works.
Created attachment 12728 [details]
dmesg output
dmesg output from 2.6.20-r9 which works.
It makes no difference if i compile a ACPI feature as modulle or built-in. - Compiling as built-in will freeze system while booting - Compiling as a module will freeze system while loading module. I'm goint to attach a DMESG an ACPIDUMP from the failing kernel. For this im booting the 2.6.22-r1 with no ACPI subeaftures/modules. Created attachment 12729 [details]
ACPIDUMP 2.6.22-r1
ACPIDUMP 2.6.22-r1 (failing kernel)
Created attachment 12730 [details]
dmesg 2.6.22-r1
dmesg output of failing kernel (2.6.22-r1)
thanks. note that acpidump is a BIOS dump, so unless you upgrade the BIOS or change BIOS SETUP it will not change. can you provide the .config from the successful 2.6.20 kernel? please try booting the failing kernel with "pnpacpi=off" Created attachment 12733 [details]
Kernel Config (working)
kernel Config from 2.6.20-r9 (working)(gentoo-version
I really expect this may be a PNP issue, but this BIOS has a couple more "interesting" features Please try booting 2.6.22 with acpi_osi=!Linux (or capture the failure dmesg for 2.6.23.latest which does this by default) Please try booting with acpi_apic_instance=2 pnpacpi=off does not work. system is still freezing. but i found out something interesting. the system is not freezing when loading module "fan" so it seemes to me that some acpi modules are loadable. if you want to i can test which module will cause the system to freeze and which not. (In reply to comment #13) > I really expect this may be a PNP issue, > but this BIOS has a couple more "interesting" features > > Please try booting 2.6.22 with acpi_osi=!Linux > (or capture the failure dmesg for 2.6.23.latest which does this by default) > > Please try booting with acpi_apic_instance=2 > acpi_apic_instance=2 -> same problem acpi_osi=!Linux -> im going to test it now acpi_osi=!Linux -> same problem my system in one of the new "santa rosa" notebooks with centrino core 2 duo (In reply to comment #13) > I really expect this may be a PNP issue, > but this BIOS has a couple more "interesting" features > > Please try booting 2.6.22 with acpi_osi=!Linux > (or capture the failure dmesg for 2.6.23.latest which does this by default) > > Please try booting with acpi_apic_instance=2 > how can i caputre the failure dmesg for 2.6.23.latest? are there any other information i can provide? re: messages for 2.6.23... Can you download the kernel from kernel.org, currently 2.6.23-rc5, build it, and capture them the same way you did for the 2.6.22 failure in comment #9? BTW. you don't need to assign the bug to yourself to get e-mail when it changes -- the submitter always gets e-mail. I assign it to me so that if you change it, I get e-mail... thanks for the working 2.6.20 .config. My guess was that CONFIG_PNPACPI was the key difference, but since you still fail with "pnpacpi=off", that can't be it. There are a number of differences. Can you start with the working 2.6.20 .config, drop it into your 2.6.22 (or 23) tree, make oldconfig, attach the config here, and test the resulting kernel? The resulting config should help narrow this down no matter if the resulting kernel boots or hangs. (In reply to comment #19) > re: messages for 2.6.23... > Can you download the kernel from kernel.org, > currently 2.6.23-rc5, build it, > and capture them the same way you did > for the 2.6.22 failure in comment #9? > > BTW. you don't need to assign the bug to yourself to > get e-mail when it changes -- the submitter always gets e-mail. > I assign it to me so that if you change it, I get e-mail... > is the full 2.6.23-rc5 available for download? or only a patch to the latest? if not im going to emerge the latest vanilla-sources with gentoo. this are the original sources from kernel.org ant its easier for me to get. Created attachment 12734 [details]
kernel config after "make oldconfig" 2.6.20-r9 -> 2.6.23-rc5
kernel-config afer "make oldconfig"
i just pressed enter to all questions. hope that was okay?
yes, on the 2.6.23-rc5 line, the 'B' will get you a baseline, in this case, 2.6.22, and the 'V' will get you to a page with a link at the top to a patch to apply to the baseline, like so: $ tar xjf linux-2.6.22.tar.bz2 $ cd linux-2.6.22/ $ patch -Np1 < ../patch-2.6.23-rc5 okay, thanks for your mini howot :-) for the 2.6.23-rc5 dmesg. du you want me to build the kernel with the upgraded 2.6.23-rc5 or should is generate a new (default) one? re: which config for 2.6.23 yes, try the "make oldconfig" one first -- since in theory it is the closest thing to what you had before. If that doesn't work, try defconfig, as maybe it will point out something I've not noticed in your config that may related to the issue. Certainly if you can find anything besides removing all the ACPI options to make this failure change, that will be a big hint. thanks. i patched the kernel ant its currently compiling. im searching since more than 2 weeks to get the kernel working without removing acpi options :-) but: i am pretty sure that it has anything todo with the graphics card. because the screen goes black for a second, before it freezes... i installed 2.6.23-r5 and im not able to boot the kernel without freezing. i disabled all acpi subeatures and it does still not boot. using acpi=off is still freezing the system.... im will now try to disable other options... Created attachment 12735 [details]
error message
i got this message one time. but im not able to reproduce it... happend 2 minutes ago...
i need to disable all acpi subfeatures + "Suspend tu RAM and standby" to get the 2.6.23-rc5 booting. "suspend to ram and standby" seems to be a new feature in 2.6.23-rc5... Created attachment 12737 [details]
dmesg 2.6.23-rc5
here is the dmesg output with acpi subfeatures and suspend.... off
when i press the fn-key + "the brightness up/down key" the system freezes too.... Hmmm, this is a photo of a hang? what do the other hangs look like -- do they get further or stop earlier? > ACPI Error (evevent-0305): No installed handler for fixed event [0, 2, 3]... fixed event 0 is the power button and 2 is the pmtimer. apparently we received these events before handlers were registered... > ACPI Error (evxfevnt-0383): Could not disable RealTimeClock events apparently this was from rtc_handler() trying to disable events, no idea why it failed... > ACPI Error (evgpe-0705): No handler or method for GPE[0,7,8,A...] disabling > event... Hmm, again, we seem to be getting GPE's firing before we've installed the handlers... A couple of things to simplify, maybe one will have an effect -- fishing at this point... CONFIG_OPROFILE=n CONFIG_RCT=n CONFIG_HPET_EMULATE_RTC=n CONFIG_HPET=n yes/no this is a photo of a hang. but normaly it hangs without any error messages... maybe i pressed any button...? i am trying your config now... your config didnt make any change. i tested pressing the fn-key to ahjust the brightness while the kernel is booting: it freezes immediately. even if the kernel would boot, because all acpi subfeatures are disabled... isnt it possible to trace a module loading, to see whats really going on? i disabled nearly all device drivers an kernel features. system is still freazing. i found out that booting with acpi=off is causing a system-freeze too. is this not a acpi issue? > acpi=off is causing a system-freeze too
If it is the same freeze, then yes, this means the cause isn't ACPI.
Can you tell if the hang is in the same place with and without ACPI?
try booting with init=/bin/bash -- does it every get that far?
If no, then interactive module loading will not help.
I'm afraid we havn't learned much about the failure in ACPI mode,
other than we seem to be receiving interrupts earlier than expected
try make defconfig
and using that config -- if it works, that is a clue.
try also to simplify by booting with "nolapic"
If you changed any BIOS SETUP options, try resetting to defaults.
i have removed pci support from the kernel. now i am able to boot the kernel and use the fn-keys to control brightness. i think that acpi is automatically disabled when pci is disabled... (In reply to comment #36) > > acpi=off is causing a system-freeze too > > If it is the same freeze, then yes, this means the cause isn't ACPI. > Can you tell if the hang is in the same place with and without ACPI? > > try booting with init=/bin/bash -- does it every get that far? > If no, then interactive module loading will not help. > > I'm afraid we havn't learned much about the failure in ACPI mode, > other than we seem to be receiving interrupts earlier than expected > > try make defconfig > and using that config -- if it works, that is a clue. > > try also to simplify by booting with "nolapic" > > If you changed any BIOS SETUP options, try resetting to defaults. > yes, acpi=off seemes to be the same freeze like acpi + subfeatures. i am trying your other options right now try booting with init=/bin/bash -> no change booting with "nolapic" -> still freezing BIOS SETUP options, try resetting to defaults -> same problem try make defconfig and using that config -> still not working why does loading the kernel with acpi=off pci=off work? loading the system does not work because my ide controller is connected vie pci... any other information i can provide? > i had not problem with that before kernel 2.6.21. I've not got a clue on the cause of this problem. I think the best route at this point would be to get a kernel git tree and bisect to identify which change caused this to break, and then to report the issue to lkml. http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html okay, i have never done something like that before. you want me to: - install and compile git-sources-2.6.20-r9 - boot it and run "git bisect god" - install and compile git-sources-2.6.21-r1 - boot it and run "git bisect bad" right? i found this howto: http://kerneltrap.org/node/11753 im testing it right now len, i have finished git an have these informations now: 5eca338fb510af78eee5372ff6a3525768ab913f is first bad commit commit 5eca338fb510af78eee5372ff6a3525768ab913f Author: Bjorn Helgaas <bjorn.helgaas@hp.com> Date: Thu Jan 18 16:44:48 2007 -0700 ACPI: remove motherboard driver (redundant with PNP system driver) The PNP system board driver (drivers/pnp/system.c) contains all the same functionality, so we don't need the ACPI version. Previously, a motherboard device would be claimed by *both* the ACPI and PNP drivers, resulting in stuff like this in /proc/ioports: 1200-121f : motherboard <-- from drivers/acpi/motherboard.c 1200-121f : pnp 00:0d <-- from drivers/pnp/system.c Make sure to enable CONFIG_PNP (and CONFIG_PNPACPI) to include the PNP system board driver. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com> :040000 040000 0b7b74615ce39414de0ca0f976e415f3da5bd3a4 04f83a52c6a7bfecfb8acba195efbfa923035cd0 M drivers i did a "make menuconfig" but i am not able to change anything in "plug and play support" it seems to have a lot of deps... any ideas? can you please report it to the lkml? hello, are there any news? for my reference, original bug report: https://bugs.gentoo.org/show_bug.cgi?id=190989 Created attachment 12870 [details]
reverted patch rediffed against 2.6.23
Christian, would you mind just reconfirming the bisection result? Please apply this patch to a recent 2.6.23-rc release (I've already inverted and rediffed it for you, it should apply cleanly) and confirm that it makes this boot regression go away? Thanks!
Daniel, thank you for the patch. it did apply cleanly. but the system is still freezing. :-( sould i do a bisection again? to see if it is the same result? okay, i am doning a git bisect again right now to test if it is the same result. the way i am doing it is like that: -git bisect start -make clean -make defconfig -make -cp arch/i386/boot/bzimage /boot/mykernel -reboot system and see if it works -it it does, i say git bisect good, if not i am booting my working kernel and say git bisect bad. then i say: -make clean -make defconfig -make -cp arch/i386/boot/bzimage /boot/mykernel i am doing it until git bisect says there is no more to test and it gives me the changes. is that right? after git bisect start i sad: git bisect good v2.6.20-rc7 git bisect bad v2.6.21-rc1 I have finised git bisect again. its the same error message like before: 5eca338fb510af78eee5372ff6a3525768ab913f is first bad commit commit 5eca338fb510af78eee5372ff6a3525768ab913f Author: Bjorn Helgaas <bjorn.helgaas@hp.com> Date: Thu Jan 18 16:44:48 2007 -0700 ACPI: remove motherboard driver (redundant with PNP system driver) The PNP system board driver (drivers/pnp/system.c) contains all the same functionality, so we don't need the ACPI version. Previously, a motherboard device would be claimed by *both* the ACPI and PNP drivers, resulting in stuff like this in /proc/ioports: 1200-121f : motherboard <-- from drivers/acpi/motherboard.c 1200-121f : pnp 00:0d <-- from drivers/pnp/system.c Make sure to enable CONFIG_PNP (and CONFIG_PNPACPI) to include the PNP system board driver. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com> :040000 040000 0b7b74615ce39414de0ca0f976e415f3da5bd3a4 04f83a52c6a7bfecfb8acba195efbfa923035cd0 M drivers During bisection did you find both good and bad kernels? yes i did. the last kernel was bad i think. the one before was the last working... i have a notice which were bad and which were good.... i think its saved saved in a file in the git directory, too... any ideas? hello, what will happen now? is someone going to take a look at this problem? the changes which cause this the remove of the motherboard driver. if noone is going to help me i have to buy me a new notebook... i found this in the changelog: commit 243b66e76ab722cdec1921d7f80c0cb808131c37 Author: Len Brown <len.brown@intel.com> Date: Thu Feb 15 22:34:36 2007 -0500 ACPI: always enable CONFIG_PNPACPI on CONFIG_ACPI kernels We removed the ACPI motherboard driver which handled the ACPI=y, PNP=n case, so now we need to enforce that PNP & PNPACPI are always enabled for ACPI kernels. Most major distros ship this way this already. Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com> i think that is the problem. since the acpi motherboard driver was removed acpi uses the pnp driver. so daniels patch wont work, because acpi will still use the pnp driver when the patch is applied. i need to remove the dep acpi<->pnp. how can i do that? i have some more news, when i have a working kernel with acpi + pnp support and i disable in drivers/acpi/Makefile the "motherboard.o" the system is freezing. when i disabling pnp support then i works again. in a working kernel motherboard.o is loaded first and is allocating something which pnp cannot allocate when its loaded later. in kernels >2.6.20 motherboard.o is removed so pnp is allocating something first which couses the freeze. im not able to disable pnp in a kernel >2.6.20 because its a dep of acpi. i think why daniels patch does not work is, that after patching a >2.6.20 kernel pnp will be loaded before motherboard.o. daniel, cat you make a patch which forces motherboard.o to be loaded before pnp? or a patch where i will be able to disable pnp but enable acpi? i disabled all in drvers/pnp/Makefile and compiled a kernel. but it is still freezing. is it not a pnp problem? daniel, i tested you patch with the last 2.6.22-r8 gentoo kernel. its working :-) why is the patch not working with the last sources from kernel.org? i tested your patch with the vanilla-sources-2.6.22-r7 and its working, too. im going to test your patch again with the latest vanilla-sources. maybe i have done something wrong at my first try.. okay, it looks like that we are having 2 couses of the freeze here. your patch is not working with 2.6.23-r8, but working well with 2.6.22-r8. so im doing a new bisection to see were the problem between .22-r8 and .23-r8 is.. i am getting an error while compiling the last bisection. drivers/acpi/scan.c In function ´acpi_bus_match´ drivers/acpi/scan.c:222 error: implicit declaration of function ´acpi_match_ids´ make[2]: *** [drivers/acpi/scan.o] Error 1 make[1]: *** [drivers/acpi] Error 2 make: *** [drivers] Error 2 Can you attach the output of /proc/ioports and /proc/iomem with a working kernel? The ACPI motherboard.c does similar thing like pnp/system.c, so I suppose it should work, but anyway, let's check. PCI: Bus 35, cardbus bridge: 0000:22:09.0 IO window: 00001000-000010ff IO window: 00001400-000014ff PREFETCH window: 80000000-83ffffff MEM window: 88000000-8bffffff The resource isn't correct to me, 0x1000 - 0x107f is for LPC. can you add a '#define DEBUG' at the begining of arch/x86/pci/i386.c in the failed kernel, and try to captch the boot log (for example by serial console). I'd like to check the boot log. Hi, thanks for your help after that time. Im sorry that i have to tell you that i have bought me a new notebook, because i was not able to work with it because of that bug. maybe we can close this one? do you mind telling us what's the model name of the laptop you dumped? It was a Samsung P55. I have a Dell Latitude D830 now. I think the P55 is mouch better. Its one of a few which come with a 15 and noch a 15.4 screen. the problem is i need the notebook for work and can't spend that much time to get it working... I already got the root cause, and will let Yakui to provide fix. IIRC, this is a urgent bug, and will break a lot of system. Created attachment 13607 [details]
PnP patch
Created attachment 13608 [details]
PCI patch: add quirk function for some chipsets
Do you mind testing the above two patches on your P55 system? Thanks. patch queued. mark as fixed. patch in comment #72 shipped in 2.6.24-rc4 as a7839e960675b549f06209d18283d5cee2ce9261 (PNP: increase the maximum number of resources) Patch in comment #73 may need an update patch in comment #73 shipped in linux-2.6.24-rc7-git5 closed. |