Distribution: openSuSE, Ubuntu, Fedora Core (maybe others) Hardware Environment: HP Compaq nx7400 Software Environment: X.org 7.1.99 - KDE 3.5.5 Problem Description: This laptop, as many others from HP, has this problem: when the computer it's booted in a Linux environment and then shut down or rebooted the system falls in a "bad state" which causes: 10/20 seconds longer boot procedure and battery status locked (doesn't update). The "bad state" is persistent! This had already been reported in bug #6455 but it's not solved yet... Steps to reproduce: Turn on and then shut down or reboot. Workarounds: 1) When the laptop is off, detach AC adapter and battery for a while (some models require at least 2 minutes), the next boot will be normal; 2) If PS/2 mouse support is compiled as a module (psmouse), remove that module before shutdown/reboot. More info at: http://emisca.altervista.org/nx7400/ (it's not made by me but I have the same computer).
I tried compiling kernel on my own with Gentoo 2006.1 (latest stable kernel is 2.6.18) and still no luck. I'm going to try other kernels (even with some patchset) with Gentoo to make some tests.
Please try the latest mm version - 2.6.20-rc1-mm1 is available now.
Created attachment 9865 [details] My kernel configuration Here's the configuration which I used to test the new kernel.
Created attachment 9866 [details] My kernel log Here is the kernel log.
I tried the new kernel as you suggested but still no luck! After first shutdown/reboot the system falls again in "bad state".
Please attach acpidump, full dmesg of "bad state" and content of battery files (/proc/acpi/battery/...) in "bad state". Also, please describe more careful the "bad state" on your laptop. What is the "battery status locked" in terms of battery files fields?
Please attach dmesg for both cases - "bad state" and normal.
please try patch from #5534.
I've already read that bug report but it doesn't seems to apply to me because my laptop's thermal zones and fans always work as they should even if the system is in "bad state". I'm going to send the information Vladimir required as soon as possible (I must recompile the kernel with psmouse as a module to send them).
Created attachment 9878 [details] Good state dmesg This is the dmesg when the system starts in good state.
Created attachment 9879 [details] Bad state dmesg This is the dmesg when the system starts in bad state.
Created attachment 9880 [details] Good state acpidump This is the result of acpidump when the system is in good state.
Created attachment 9881 [details] Bad state acpidump This is the result of acpidump when the system is in bad state.
The symptoms of "bad state": - Boot slowdown (the BIOS takes up to 20 seconds to make POST); - Kernel slowdown during boot (you can compare dmesg I attached); - Battery and AC Adapter status (read from /proc/acpi/battery) are not updated after boot. The biggest problem is that when the system falls in "bad state" the only way to back it up is to unplug the cord and disconnect the battery for few seconds.
> What is the "battery status locked" in terms of battery files fields? The fields don't change at all. The kernel reads the status of the battery and of the AC adapter during boot and then never again. If I unplug the cord and the system is in bad state the battery files in /proc/acpi/battery aren't updated and status remains: AC Adapter: on - Battery status: Fully charged (or charging). If I run the system on battery and I plug the cord the status remains: AC Adapter: off - Battery status: discharding. Percentage of charge doesn't change. If I reboot the system the kernel updates the status during boot and then it doesn't update them again.
Alexey Starikovskiy wrote: Lebedev, Vladimir P wrote: > Alexey, ... > Please look in "My kernel log" attachment: there are ERROR messages > here: > Is it criminal? > > Dec 18 18:52:10 coreblack [ 35.963000] ACPI Exception (exoparg2-0442): > AE_AML_BUFFER_LIMIT, Index (000000100) is beyond end of object > [20060707] > Dec 18 18:52:10 coreblack [ 35.964000] ACPI Error (psparse-0537): > Method parse/execution failed [\_SB_.C002.C341.C0F3._SDD] (Node > df790540), AE_AML_BUFFER_LIMIT From: Fiodor Suietov <fiodor.f.suietov@intel.com> Subject: libata: wrong sizeof for BUFFER I have reproduced the AE_AML_BUFFER_LIMIT exception basing on the SSDT ASL code and libata ata_acpi_push_id() code. There is the oversight in ata_acpi_push_id() causing the exception. The following update fixes it: Signed-off-by: Fiodor Suietov <fiodor.f.suietov@intel.com> --- --- linux-2.6.20-rc1-mm1/drivers/ata/libata-acpi.c.orig 2006-12-19 11:51:19.809222900 +0300 +++ linux-2.6.20-rc1-mm1/drivers/ata/libata-acpi.c 2006-12-19 17:36:05.128443900 +0300 @@ -672,7 +672,7 @@ int ata_acpi_push_id(struct ata_port *ap input.count = 1; input.pointer = in_params; in_params[0].type = ACPI_TYPE_BUFFER; - in_params[0].buffer.length = sizeof(atadev->id[0] * ATA_ID_WORDS); + in_params[0].buffer.length = sizeof(atadev->id[0]) * ATA_ID_WORDS; in_params[0].buffer.pointer = (u8 *)atadev->id; /* Output buffer: _SDD has no output */
Created attachment 9885 [details] created a dedicated workqueue for notify() execution Please try this patch and patch from comment #16, post dmesg and system state. In any case these patches should be useful.
Sorry, it's the first time I help debugging the kernel... How can I apply those patches? I mean: what command-line should I use?
Created attachment 9889 [details] D:\bug_7689\libata-acpi.patch
>Sorry, it's the first time I help debugging the kernel... How can I apply >those patches? I mean: what command-line should I use? For example: 1) copy the patches from comment #17 and #19 to your hard disk : for example pathnames are file_1 and file_2 2) change current directory to linux-2.6.20-rc1-mm1/ 3) cat file_1 | patch -p1 should succeed cat file_2 | patch -p1 should succeed 4) build the kernel, etc ....
Created attachment 9896 [details] Good state dmesg Dmesg when system is in good state.
Created attachment 9897 [details] Bad state dmesg Dmesg when system is in bad state.
Well, it seems that there are no more "error" messages in dmesg but the problem of the "bad state" still remains. I also have the acpidump of the good-state and the bad-state, if you want to give a look at them, but there aren't differences between them with those patch applied.
> Workarounds: > ... > 2) If PS/2 mouse support is compiled as a module (psmouse), remove that > module before shutdown/reboot. What will be the system state if psmouse is module but is not removed? What will be the system state if psmouse is not part of kernel at all? I am interested in cases - shutdown/reboot and 'suspend to disk'. Thanks.
> What will be the system state if psmouse is module but is not removed? As if it's built-in the kernel. Same thing... > What will be the system state if psmouse is not part of kernel at all? If you mean not to load psmouse at startup the system will not fall in bad state if I reboot/shutdown the system. But, in such a way the touchpad won't work, obviousely. > I am interested in cases - shutdown/reboot and 'suspend to disk'. I've never tried to "suspend" my laptop since I've never felt the needing; I'm going to try this one too. If I don't remove that module before either rebooting or shutting down the system. > Thanks. Actually, I should thank you for your patience... ;)
I've done some tests: - If I don't compile PS/2 mouse support at all, the kernel works (even old versions, now I'm using the gentoo-kernel 2.6.18 as my stable kernel); - If I compile PS/2 mouse support in kernel, the kernel breaks as soon as I reboot or shutdown my laptop; - If I compile PS/2 mouse support as a module there are three sub-cases: * If I load it and I unload it before shutdown/reboot everything works * If I load it but I don't unload it before shutdown/reboot the system falls in bad state * If I don't load it at all everything works even after reboot/shutdown, except touchpad. I suppose it's a problem with the touchpad: it seems like the BIOS require to do something before shutdown/reboot which is done by unloading the psmouse module. Maybe the BIOS only needs that the kernel frees the resources used by the touchpad module before shutdown/reboot. Suspend is still to test but I've read there are some problems here too so I would like to focus on this problem for now.
> If I compile PS/2 mouse support in kernel, the kernel breaks as soon as I reboot or shutdown my laptop; Sorry I meant to say that the system falls in bad state and not that the kernel breaks...
> I suppose it's a problem with the touchpad: it seems like the BIOS > require ... I guess that this BIOS requires to do something at the beginning of any type of boot (boot, reboot, resume). In any case we should find the cause of problem - hw/sw conflicts, etc.... Also, > Workarounds: > 1) When the laptop is off, detach AC adapter and battery for a while (some > models require at least 2 minutes), the next boot will be normal; Is it true that it is happening with your laptop too? > * If I don't load it at all everything works even after reboot/shutdown, > except touchpad. So, the problem is absent if we do NOT load psmouse in all the case - built- in, module, removed from .config, Isn't it?
> I guess that this BIOS requires to do something at the beginning > of any type of boot (boot, reboot, resume). I don't think so. If you read the symptoms you can see that bad state doesn't affect only the operating system but the BIOS itself. The POST last 10/20 seconds more than normal. > In any case we should find the cause of problem - hw/sw conflicts, etc.... I guess is a bug in the BIOS and the psmouse module seems to activate/deactivate it. >> 1) When the laptop is off, detach AC adapter and battery for a while (some >> models require at least 2 minutes), the next boot will be normal; > Is it true that it is happening with your laptop too? Yes of course. It's the only way to get laptop back to normal state. > So, the problem is absent if we do NOT load psmouse in all the case - built- > in, module, removed from .config, Isn't it? Yes, it is.
> I guess is a bug in the BIOS and the psmouse module seems to activate/deactivate it. Yes, but I guess that the Windows providers know how to do it. Do you have any experience of dealing with our problem on Windows on your computer? So, in any case, we need some time for investigation
> Do you have any experience of dealing with our problem on > Windows on your computer? There is no problem with Windows. Even with no drivers it works and the laptop doesn't fall in bad state. I installed Windows XP Home Edition SP2 from an original CD I own, using the product key bundled with the laptop. There were no drivers (I downloaded them from HP website) and rebooting/shutting down the laptop doesn't caused the bad state. There's no way of debugging the kernel?
Thanks for the information; we need some time for the investigation.
Update: I tried Damn Small Linux which ships with kernel 2.4: NO BAD STATE! The touchpad works and, after a reboot or a shutdown, the system keeps working! This should prove that this problem is related to kernel 2.6.
Surely, we work on it now.
Created attachment 10054 [details] Unregister serio drivers on shutdown This one should fix it?
Please try 'irqpoll' boot flag.
No response from bug submitter, please reopen if problem persists.
Trenn did the magic! The kernel available here (ftp://ftp.suse.com/pub/people/trenn/hp_fixes_final/) doesn't put my laptop in "bad state" and after reboot/shutdown the system fully works! Thanks dude! PS: Sorry for not responding all this time but I had some troubles with my internet connection.
Let's bring this together here...: Dmitry (not sure whether he maintains the stuff, but submitted a lot patches to serio subsystem in the recent past and he probably should ack/push the patch in the end) asked me to try these two patches. As I am very busy trying to fix some more things on these HP beasts for our SLE10 kernel, I'd be really happy if Allessandro could help out a bit with testing... The patches apply fine (with small offset in the one) in recent 2.6.20 kernel.
Created attachment 10368 [details] The first one I got from dmitry (psmouse-fiddle-with-reset)...
Created attachment 10369 [details] ... and the seond one (serio-cleanup-to-bus)
Be careful that you should boot into the kernel twice. Or suspend the kernel twice. This is because the breakage happens on shutdown when the psmouse/serio subsystem is not cleaned up and survives the reboot (I expect the Embedded Controller, is also accessing serio/i8042 very late on shutdown, gets confused and is not rebooted or correctly initialised after reboot (only if power and battery are unplugged for some time)). The bad state gets fixed when booted into a working kernel (still in bad state) and then shutting down/rebooting.
I can confirm the "bad-state" problem in the HP Compaq nx7300 too. I'll try kernel 2.6.20 with two patches from Thomas Renninger.
Ok Thomas... I will help as much as I can. What version of the kernel should I use to apply those patches? 2.6.20?
I think you got it. Fedora Core 6, updated to feb 9, 2007 Vanilla kernel 2.6.20 with Thomas Renningen patches from comment #40 and #41 compiled and installed: 1-Boot in good state with original kernel 2-Reboot->going in bad state with original kernel 3-Reboot with 2.6.20 patched kernel->still in bad state 4-Reboot with 2.6.20 patched kernel->going in good state: battery reporting ok, cpufreq working and go to full speed. 5-reboot with original Fedora kernel->still in bad state 6-reboot->going in bad state Let me know if I can do more specific testing. Thank you!
OPS! Errata I mean: 5-reboot with original Fedora kernel->still in *good* state Sorry for typo.
Same here. It works!
Hm, any of you guys using suspend-to-ram? Would be nice to test the patches with suspend to ram as well...
In openSuSE 10.2 (updated) and with the custom kernel made by Thomas, suspend to RAM only works if I add the "-f" flag to S2RAM_OPTS in /etc/pm/config. When I turn on my laptop again everything works except the keyboard! I didn't tested with kernel 2.6.20 yet.
Alessandro, is there a kernel version that has working s2ram on your box?
No luck... keyboard doesn't work on resume either with kernel 2.6.8 or kernel 2.6.20... any suggestion?
I'm going to install Gentoo again and I'm gonna try other kernel versions. Thank you for solving the "bad state" trouble but suspend to RAM is important too. If we manage to solve this problem, this box (and also others from HP) will be fully linux compatible! Should I close this bug and open a new one?
Maybe it's worth splitting up patches into the shutdown cleanup, which should be rather unrisky and it shouldn't be a problem pushing this into stable kernels (back to, don't know, 2.6.16.X). And the suspend/resume problem which could be added to 2.6.21-rcX as soon as it works half way stable (but still might break other mice/machines on suspend/ resume?).
I added the patches to latest SUSE CVS kernel head. This should make things easier to test. You should see a kernel popping up with Alexey's "execute notify handlers in own thread" (to not get confused by another problem) here: ftp.suse.com/pub/projects/kernel/kotd/i386/HEAD/kernel-default.i586.rpm Do a rpm -qp --changelog |less to check whether the patches are included. Hmm, this works for i386, but something seems to be broken with x86_64 and possibly other branches/archs. I try to get this fixed. It may take some hours/ a day until submitted patchs arrive compiled and packaged. Hmm, pre-testing by someone who comiles the stuff on his own should still be faster..., however I can add things as soon as it looks stable for broader and easier testing.
Created attachment 10392 [details] Properly reset psmouse at suspend Guys, I would appreciate if you test updated versions of the patches. The first one (psmouse-fiddle-with-reset.patch) is the one that really fixes the problem, the second one is generic improvement of i8042 suspend process and should speed it a bit as we don't try to reset mouse several times during suspend/resume. If these patches work fir you I will commit them in my tree and once they survive a -mm release will push them to Linus. Thanks! P.S. Let's move s2ram discussion to bug 7977
Created attachment 10393 [details] Let serio bus handle suspending/resuming of i8042 ports
Should I apply all four patches or only the last two?
Just the last two. I could not mark the first 2 obsolete (i guess because i wasn't the one who put them in).
The last two patches applied to vanilla kernel 2.6.20 make my laptop falling in bad state again. I'm trying again with the previous two patches.
I confirm that the first two patches work and the last two don't. What did you modified Dmitry?
I added disabling of pass-through port at shutdown (which should be a noop) and removed enabling of the mouse after resetting it. I guess that your laptops really like mouse to be fully enabled before rebooting. Could you please try applying patches 1 and 4 (patch id #10368 and patch id 10393, IOW old psmouse patch and the new serio bus patch) to make sure that my theory is correct?
Yep, you're right! Vanilla kernel 2.6.20 patched with the psmouse first patch and the serio last patch still work and it doesn't let my laptop in bad state. Now I'm going to try suspending.
I tried to fix the broken away fan issue on nx6325. After suspend, fans that should be on are in off state and I added a little line to override fan to switch it on even thermal module thinks it's off. This did not work and I was quite confused why... I tried patch 1+4, rebooted it twice and tested suspend twice. Fan state got corrected now as expected after next thermal polling. So I expect that this model (every HP has something else that gets fixed with mouse removal/cleanup) shows strange fan behaviour. I can also confirm that 1+4 works with suspend (first time machine froze, but I had huge acpi debug output switched on...) and as said, it even seem to fix things up. Dmitry, would you mind adding me to Signed-off, CC me or send me mainline commit no., so that I can track this.
Created attachment 10446 [details] Properly reset psmouse at suspend Hm, it turns out my patches completely broke suspend-to-ram which is not good, so here is the updated versions. I have them committed to my tree and I believe commit id stays the same when Linus pulls from other trees, so here they are: a1cec06177386ecc320af643de11cfa77e8945bd 82dd9eff4bf3b17f5f511ae931a1f350c36ca9eb
Created attachment 10447 [details] Let serio bus handle suspending/resuming of i8042 ports
Hm, as far as I know the bove patches fix the issue with shutdown and suspend to disk and they are in mainline so I am closing this.
In mainline? From what version?
2.6.21-rc2 should have it.