This is a HP nx5000 with two battery bays (one in use). Loading ACPI modules does not harm. Trying to read out battery state results in a hard reset of the CPU (power off, reboot). DSDT can be found: http://stio1.fh-wuerzburg.de/student/i31990/dsdt.dsl Full acpi debug output shortly before hard reset (echo 0x007FFFFF >/proc/acpi/debug_level; cat /proc/acpi/battery/C13B/state) can be found here: http://stio1.fh-wuerzburg.de/student/i31990/acpi_hard_reset_full_debug
I forgot to mention that I commented out the acpi_hw_register_write() function when logging. I thought the reset might come from there, but it seems that not I just replaced it with and it made no difference: acpi_status acpi_hw_register_write ( u8 use_lock, u32 register_id, u32 value) { printk (KERN_ERR "hw_register_write: register_id: %d, value: %d", register_id, value); return (AE_OK); } P.S.: Forget the probably wrong register access, I have no clue what is going on here...
The battery state seems must read from EC's address. Is this reboot occurs in all kernels or just latest kernel? How does if you comment the ec.c::acpi_ec_space_handler?
Only latest kernels, about two weeks or early, sorry for not mentioning, I also wanted to provide the same output for a working kernel, but had no time till now. I'll give it a try tommorrow morning, cheers.
Hmmm, seems as if this is older, at least it's more than two weeks. A collegue told me kernel 2.6.11-rc3-bk4 would work, but it does not. Removing the ec.c::acpi_ec_space_handler function helps, but it's not a wonder: the battery is shown as not present and reading battery/C13B/state which resulted in hard reset after about 4 seconds is now showing present=no. Something else: the values of battery/.../state seem to make sense and are printed out to screen before reset. It's not a broken hardware issue, there is another HP nx5000 crystal with exactly the same problem. I will go back some kernels and hopefully find the evil commit. Maybe someone else has an idea which patch kills the machine?
It came in between 2.6.11-rc2-bk4 and 2.6.11-rc2-bk9 I will search and try to identify the offending patch.
Got it. ChangeSet@1.1984, 2005-01-28 22:38:30+01:00, wim@iguana.be [WATCHDOG] i8xx_tco.c-ICH4/6/7-patch Added support for the ICH4-M, ICH6, ICH6R, ICH6-M, ICH6W and ICH6RW chipsets. Also added support for the "undocumented" ICH7. http://linux-acpi.bkbits.net:8080/to-linus/gnupatch%4041fab0d6zncfE3ino1wBcmETEcX-tg rmmodding the i8xx_tco module before reading out battery state works. I have a lot ICH4-X stuff output on lspci (USB and sound). I wonder why I need a watchdog on a laptop at all... Cannot add author of the patch to CC, will point him to this one, privately. Could the watchdog support for the chipsets just be reverted for 2.6.11-final, a hard CPU reset on machines (maybe there will appear more) isn't funny.
Great, from the log you attached, the the ACPI BIOS will read/write 0x1061 - 0x1069. According to ICH spec and the base address of PMBase, the region actually is TCO region. I got it yestday, but I don't know what is TCO for. Maybe i8xx-tco.c is for TCO. Maybe ACPI and the TCO driver have conflict.
what can we do now? Is it already too late for 2.6.11-final? I wonder if this also happens with other ICH machines. Shouldn't this patch be reverted and resubmitted for 2.6.12-rcX?
Not every ACPI BIOS touch TCO, I think. But anyway, please report it to the author of the patch first.
As mentioned I pointed him to this bug with a short description. If he only replies to me I'll forward.
Just got another report: HP Pavilion ZT3110 If Intel does not want to have break a lot of machines with the 2.6.11 kernel with their ICH chipsets, something must happen soon.
Apparently it's the io access conflict. There is no way to prevent ACPI BIOS from accessing the TCO region. The only approach is to disable the watchdog driver. Some options are: 1.if ACPI enabled, disable TCO driver 2.Blacklist the HP machines for TCO driver 3.ACPI reserve the TCO region, so the TCO driver init will fail. Option 3 actually is option 1, since nearly all BIOS will list the TCO region in ACPI region.
Re: comment #6 Good hunting. So if you have CONFIG_I8XX_TCO=n, then you get no watchdog module and this problems is gone, yes? When you have this module loaded and /dev/watchdog exists, do you have any program opening and accessing /dev/watchdog?
please attach the output from "lspci; lspci -n"
comment #13: I unloaded the module, cat ...battery/.../state survives. I load the module, cat ...battery/.../state -> CPU reset. lscpi -n: 0000:00:00.0 Class 0600: 8086:3580 (rev 02) 0000:00:00.1 Class 0880: 8086:3584 (rev 02) 0000:00:00.3 Class 0880: 8086:3585 (rev 02) 0000:00:02.0 Class 0300: 8086:3582 (rev 02) 0000:00:02.1 Class 0380: 8086:3582 (rev 02) 0000:00:1d.0 Class 0c03: 8086:24c2 (rev 01) 0000:00:1d.1 Class 0c03: 8086:24c4 (rev 01) 0000:00:1d.2 Class 0c03: 8086:24c7 (rev 01) 0000:00:1d.7 Class 0c03: 8086:24cd (rev 01) 0000:00:1e.0 Class 0604: 8086:2448 (rev 81) 0000:00:1f.0 Class 0601: 8086:24cc (rev 01) 0000:00:1f.1 Class 0101: 8086:24ca (rev 01) 0000:00:1f.5 Class 0401: 8086:24c5 (rev 01) 0000:00:1f.6 Class 0703: 8086:24c6 (rev 01) 0000:01:04.0 Class 0280: 8086:1043 (rev 04) 0000:01:06.0 Class 0607: 104c:ac8e 0000:01:06.1 Class 0607: 104c:ac8e 0000:01:06.3 Class 0180: 104c:ac8f 0000:01:0d.0 Class 0c00: 104c:8023 0000:01:0e.0 Class 0200: 14e4:4401 (rev 01) lspci comes attached.
Created attachment 4664 [details] lspci of nx5000 HP crystal
can you also provide a dmesg?
Created attachment 4671 [details] dmesg - i8xx_tco loaded manually at the end
Ok, we normally do not load this i8x_tco module or anything located in /lib/modules/.../drivers/char/watchdog. In fact I run into this because of hotplug/coldplug scripts bug. And maybe nobody will ever want to run such a watchdog on a laptop? This maybe is not a blocker..., ask me if you need more info...
I'll sent you a debug version of the driver this weekend. Then we can see wether it has to do with the device driver itself or that something else is going wrong.
Wim: I have no idea what this watchdog is for (ok it should reset the CPU if something goes terribly wrong...). Could you please comment in a few words what the behaviour of the watchdog should be and what a tco register is for? Thanks.
Thomas, a watchdog device is a device that watches your hardware. It will reboot the hardware if your system goes faulty. It normally works as follows: you load the driver (and initialize the watchdog device) and when you then open /dev/watchdog your watchdog device starts. You then need to keep it alive (with the watchdog-daemon for instance) by writing to /dev/watchdog. if you close /dev/watchdog again then the watchdog stops. So a watchdog is a device that guards/watches your hardware and that reboots your system if your system fails to trigger the watchdog. (It's mainly to avoid system hangs). the TCO in the intel 82801* IO-Controller HUBs (ICH0 till ICH7 at this moment) stands for Total Cost of Ownership. These are actually system management registers in the LPC bridge of the I/O-Controller HUB. I suggest to look at http://www.intel.com/design/chipsets/applnots/292273.htm and download the PDF file to read more about it. On http://developer.intel.com you can also do more searches for 82801 or ICH or ...
Created attachment 4713 [details] i8xx_tco.c i8xx_tco.c debug version. could you test this and sent me the dmesg afterwards?
I took a close look at this issue. When read battery status, ACPI BIOS will enable TCO watchtimer and set TCO_TMR's initial value 4, since NO_Reboot bit is set (no reboot) by default, this impact nothing. When the watchdog driver is enabled, the driver will enable NO_Reboot (enable reboot), but ACPI BIOS doesn't keep the timer alive, so the system reboot. A solution would be the driver enable NO_Reboot bit only when /dev/watchdog is opened.
Wim: is #24 enough or should I still try your debug version? Shall I wait for another test version including suggestions of David? BTW: It's not my machine, I could run some tests, but it could take some days until I can provide results.
No, you don't need to test extra. I'll make sure that the i8xx_tco driver get's changed for the NO_REBOOT bit. I wonder however why the ACPI code writes the TCO_TMR bit. I know that the TCO area is ACPI-base + 60h but still...
>I wonder however why the ACPI code writes the TCO_TMR bit. It depends on the BIOS, ACPI itself doesn't write the bit. The BIOS lets ACPI write it. Below code is from the BIOS: Store (0x01, C081) Store (0x01, C080) Store (0x01, C07F) Store (0x04, C07E) Store (0x00, C07D) Store (0x00, C081) C07D is 0x1060, C07E is 0x1061, C07F is 0x1064:3, C080 is 1066:1, C081 is 0x1069:11. TCO base is 0x1060. Every time reading the battery, the BIOS lets OS execute the code. I guess only HP knows why the code should be executed.
Tell me if I should run a test if necessary.
Wim, is the solution in comment 24 ok for you? If yes, could you please help fix it? I'm not familar with the watchdog driver. Thanks!
Yes comment 24 is clear. But I didn't have time yet to have a look at it and I'm leaving for holidays tonight. I'll have a look at it when I return on 5 April.
The translation of the code towards the TCO timer: > Below code is from the BIOS: (TCO base is 0x1060.) > Store (0x01, C081) -> C081 is 0x1069:11 TCO Timer Halt -> write 1 = halt the timer > Store (0x01, C080) -> C080 is 0x1066:1 SECOND_TO_STS -> write 1 = clear second timeout status > Store (0x01, C07F) -> C07F is 0x1064:3 TIMEOUT -> clear "TCO timer reached 0" state > Store (0x04, C07E) -> C07E is 0x1061 TCO Timer Initial Value = 2.4 sec > Store (0x00, C07D) -> C07D is 0x1060 TCO timer reload > Store (0x00, C081) -> C081 is 0x1069:11 TCO Timer Halt -> write 0 = enable the timer So the bios stops the TCO timer, programs it to 2.4 seconds and then enables the timer. I can only presume that they do this to have an "interrupt" 2.4 seconds later. I will try to make the driver more secure but the bottom line will always be that if the watchdog-driver is running and in the mean time the bios executes this code, then your system will reboot...
Created attachment 4870 [details] new driver code to test - contains NO_REBOOT logic in start and stop Can this new code be tested? I tested it on my hardware and it works as a watchdog driver, but I can't test wether or not your reboot problem get's fixed.
give me some time, I hope I can access/use the machine again.
Wim, I tried your patch in a HP nx5000 system. It really works! Please push it to upstream. Thanks!
*** Bug 2658 has been marked as a duplicate of this bug. ***
while this is an interaction with the ACPI sub-system, it appears that to provoke this issue the old watchdog driver needs to be loaded, and the issue is fixed by the new watchdog driver. So I'm moving this bug out of the ACPI category and into the driver category. The author of the driver should send the fix upstream.
I now got another machine that hits this bug. The machine gets rebooted when battery state is read out. I also can confirm that the new driver *does fix the problem*. However, it seems as if it is still not included in mainstream kernel? -> Will reopen bug to get attention.
Created attachment 5449 [details] Diff that solves rebooting when reading battery on HP nx5000 based on previously attached driver I extracted the parts from the "new driver" that fix the reboot and created a diff. Tell me if it does not patch (I used a current SUSE kernel, there should be no SUSE specific modifications in this part of the kernel ...)
Created attachment 5452 [details] updated patch, one change is already mainline (pci_match_id(..))
Getting it into the manline and the -mm kernel is planned for the comming weeks. Sorry for the delay.
The patch from this bug is now in Linus' tree and will therefore be in 2.6.13-rc7.