Bug 4251 - system reset when battery is read and i8xx_tco driver loaded
Summary: system reset when battery is read and i8xx_tco driver loaded
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Other (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Shaohua
URL:
Keywords:
: 2658 (view as bug list)
Depends on:
Blocks:
 
Reported: 2005-02-25 09:01 UTC by Thomas Renninger
Modified: 2005-08-22 17:16 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.11-rc2-bk4-9
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
lspci of nx5000 HP crystal (2.04 KB, text/plain)
2005-03-04 09:02 UTC, Thomas Renninger
Details
dmesg - i8xx_tco loaded manually at the end (20.09 KB, text/plain)
2005-03-05 08:56 UTC, Thomas Renninger
Details
i8xx_tco.c (15.66 KB, text/plain)
2005-03-13 11:00 UTC, Wim Van Sebroeck
Details
new driver code to test - contains NO_REBOOT logic in start and stop (14.76 KB, text/plain)
2005-04-06 13:02 UTC, Wim Van Sebroeck
Details
Diff that solves rebooting when reading battery on HP nx5000 based on previously attached driver (2.67 KB, patch)
2005-08-01 06:56 UTC, Thomas Renninger
Details | Diff
updated patch, one change is already mainline (pci_match_id(..)) (2.72 KB, patch)
2005-08-01 08:35 UTC, Thomas Renninger
Details | Diff

Description Thomas Renninger 2005-02-25 09:01:51 UTC
This is a HP nx5000 with two battery bays (one in use). 
 
Loading ACPI modules does not harm. 
Trying to read out battery state results in a hard reset of the CPU (power 
off, reboot). 
 
DSDT can be found: http://stio1.fh-wuerzburg.de/student/i31990/dsdt.dsl 
 
Full acpi debug output shortly before hard reset (echo 0x007FFFFF 
>/proc/acpi/debug_level; cat /proc/acpi/battery/C13B/state) can be found here: 
http://stio1.fh-wuerzburg.de/student/i31990/acpi_hard_reset_full_debug
Comment 1 Thomas Renninger 2005-02-25 09:05:22 UTC
I forgot to mention that I commented out the acpi_hw_register_write() function 
when logging. I thought the reset might come from there, but it seems that not 
I just replaced it with and it made no difference: 
acpi_status 
acpi_hw_register_write ( 
	u8                              use_lock, 
	u32                             register_id, 
	u32                             value) 
{ 
    printk (KERN_ERR "hw_register_write: register_id: %d, value: %d",  
		       register_id, value); 
    return (AE_OK); 
} 
 
P.S.: Forget the probably wrong register access, I have no clue what is going 
on here... 
Comment 2 Shaohua 2005-02-27 18:31:55 UTC
The battery state seems must read from EC's address. Is this reboot occurs in 
all kernels or just latest kernel? How does if you comment the 
ec.c::acpi_ec_space_handler?
Comment 3 Thomas Renninger 2005-02-27 18:47:24 UTC
Only latest kernels, about two weeks or early, sorry for not mentioning, I also
wanted to provide the same output for a working kernel, but had no time till now.

I'll give it a try tommorrow morning, cheers.
Comment 4 Thomas Renninger 2005-02-28 06:01:51 UTC
Hmmm, seems as if this is older, at least it's more than two weeks. A collegue 
told me kernel 2.6.11-rc3-bk4 would work, but it does not. 
 
Removing the ec.c::acpi_ec_space_handler function helps, but it's not a 
wonder: the battery is shown as not present and reading battery/C13B/state 
which resulted in hard reset after about 4 seconds is now showing present=no. 
 
Something else: 
the values of battery/.../state seem to make sense and are printed out to 
screen before reset. 
It's not a broken hardware issue, there is another HP nx5000 crystal with 
exactly the same problem. 
 
I will go back some kernels and hopefully find the evil commit. 
Maybe someone else has an idea which patch kills the machine? 
Comment 5 Thomas Renninger 2005-02-28 06:28:29 UTC
It came in between 2.6.11-rc2-bk4 and 2.6.11-rc2-bk9 
I will search and try to identify the offending patch. 
Comment 6 Thomas Renninger 2005-02-28 07:17:26 UTC
Got it.  
ChangeSet@1.1984, 2005-01-28 22:38:30+01:00, wim@iguana.be  
   [WATCHDOG] i8xx_tco.c-ICH4/6/7-patch  
  
   Added support for the ICH4-M, ICH6, ICH6R, ICH6-M, ICH6W and ICH6RW  
   chipsets. Also added support for the "undocumented" ICH7.  
  
http://linux-acpi.bkbits.net:8080/to-linus/gnupatch%4041fab0d6zncfE3ino1wBcmETEcX-tg  
  
rmmodding the i8xx_tco module before reading out battery state works.  
I have a lot ICH4-X stuff output on lspci (USB and sound).  
I wonder why I need a watchdog on a laptop at all...  
Cannot add author of the patch to CC, will point him to this one, privately. 
Could the watchdog support for the chipsets just be reverted for 2.6.11-final, 
a hard CPU reset on machines (maybe there will appear more) isn't funny. 
Comment 7 Shaohua 2005-02-28 17:12:37 UTC
Great, from the log you attached, the the ACPI BIOS will read/write 0x1061 - 
0x1069. According to ICH spec and the base address of PMBase, the region 
actually is TCO region. I got it yestday, but I don't know what is TCO for. 
Maybe i8xx-tco.c is for TCO. Maybe ACPI and the TCO driver have conflict.
Comment 8 Thomas Renninger 2005-02-28 17:26:01 UTC
what can we do now?
Is it already too late for 2.6.11-final?
I wonder if this also happens with other ICH machines.
Shouldn't this patch be reverted and resubmitted for 2.6.12-rcX?
Comment 9 Shaohua 2005-02-28 17:30:30 UTC
Not every ACPI BIOS touch TCO, I think. But anyway, please report it to the 
author of the patch first.
Comment 10 Thomas Renninger 2005-02-28 17:37:42 UTC
As mentioned I pointed him to this bug with a short description.
If he only replies to me I'll forward.
Comment 11 Thomas Renninger 2005-03-01 03:33:29 UTC
Just got another report: HP Pavilion ZT3110 
If Intel does not want to have break a lot of machines with the 2.6.11 kernel 
with their ICH chipsets, something must happen soon. 
Comment 12 Shaohua 2005-03-01 17:22:02 UTC
Apparently it's the io access conflict. There is no way to prevent ACPI BIOS 
from accessing the TCO region. The only approach is to disable the watchdog 
driver. Some options are:
1.if ACPI enabled, disable TCO driver
2.Blacklist the HP machines for TCO driver
3.ACPI reserve the TCO region, so the TCO driver init will fail.
Option 3 actually is option 1, since nearly all BIOS will list the TCO region 
in ACPI region.
Comment 13 Len Brown 2005-03-03 21:33:03 UTC
Re: comment #6
Good hunting.
So if you have CONFIG_I8XX_TCO=n, then you get no watchdog module and this 
problems is gone, yes?

When you have this module loaded and /dev/watchdog exists, do you have any 
program opening and accessing /dev/watchdog?
Comment 14 Len Brown 2005-03-03 21:47:03 UTC
please attach the output from "lspci; lspci -n"
Comment 15 Thomas Renninger 2005-03-04 08:59:40 UTC
comment #13:
I unloaded the module, cat ...battery/.../state survives.
I load the module, cat ...battery/.../state -> CPU reset.

lscpi -n:
0000:00:00.0 Class 0600: 8086:3580 (rev 02)
0000:00:00.1 Class 0880: 8086:3584 (rev 02)
0000:00:00.3 Class 0880: 8086:3585 (rev 02)
0000:00:02.0 Class 0300: 8086:3582 (rev 02)
0000:00:02.1 Class 0380: 8086:3582 (rev 02)
0000:00:1d.0 Class 0c03: 8086:24c2 (rev 01)
0000:00:1d.1 Class 0c03: 8086:24c4 (rev 01)
0000:00:1d.2 Class 0c03: 8086:24c7 (rev 01)
0000:00:1d.7 Class 0c03: 8086:24cd (rev 01)
0000:00:1e.0 Class 0604: 8086:2448 (rev 81)
0000:00:1f.0 Class 0601: 8086:24cc (rev 01)
0000:00:1f.1 Class 0101: 8086:24ca (rev 01)
0000:00:1f.5 Class 0401: 8086:24c5 (rev 01)
0000:00:1f.6 Class 0703: 8086:24c6 (rev 01)
0000:01:04.0 Class 0280: 8086:1043 (rev 04)
0000:01:06.0 Class 0607: 104c:ac8e
0000:01:06.1 Class 0607: 104c:ac8e
0000:01:06.3 Class 0180: 104c:ac8f
0000:01:0d.0 Class 0c00: 104c:8023
0000:01:0e.0 Class 0200: 14e4:4401 (rev 01)

lspci comes attached.
Comment 16 Thomas Renninger 2005-03-04 09:02:19 UTC
Created attachment 4664 [details]
lspci of nx5000 HP crystal
Comment 17 Wim Van Sebroeck 2005-03-05 07:17:40 UTC
can you also provide a dmesg?
Comment 18 Thomas Renninger 2005-03-05 08:56:51 UTC
Created attachment 4671 [details]
dmesg - i8xx_tco loaded manually at the end
Comment 19 Thomas Renninger 2005-03-09 04:58:39 UTC
Ok, we normally do not load this i8x_tco module or anything located 
in /lib/modules/.../drivers/char/watchdog. 
In fact I run into this because of hotplug/coldplug scripts bug. 
And maybe nobody will ever want to run such a watchdog on a laptop? 
 
This maybe is not a blocker..., ask me if you need more info... 
Comment 20 Wim Van Sebroeck 2005-03-10 10:51:31 UTC
I'll sent you a debug version of the driver this weekend. Then we can see wether
it has to do with the device driver itself or that something else is going wrong.
Comment 21 Thomas Renninger 2005-03-11 02:34:04 UTC
Wim: I have no idea what this watchdog is for (ok it should reset the CPU if
something goes terribly wrong...).
Could you please comment in a few words what the behaviour of the watchdog
should be and what a tco register is for?
Thanks.
Comment 22 Wim Van Sebroeck 2005-03-13 10:54:49 UTC
Thomas, a watchdog device is a device that watches your hardware. It will reboot
the hardware if your system goes faulty. It normally works as follows: you load
the driver (and initialize the watchdog device) and when you then open
/dev/watchdog your watchdog device starts. You then need to keep it alive (with
the watchdog-daemon for instance) by writing to /dev/watchdog. if you close
/dev/watchdog again then the watchdog stops.
So a watchdog is a device that guards/watches your hardware and that reboots
your system if your system fails to trigger the watchdog. (It's mainly to avoid
system hangs).

the TCO in the intel 82801* IO-Controller HUBs (ICH0 till ICH7 at this moment)
stands for Total Cost of Ownership. These are actually system management
registers in the LPC bridge of the I/O-Controller HUB. I suggest to look at
http://www.intel.com/design/chipsets/applnots/292273.htm and download the PDF
file to read more about it. On http://developer.intel.com you can also do more
searches for 82801 or ICH or ...
Comment 23 Wim Van Sebroeck 2005-03-13 11:00:25 UTC
Created attachment 4713 [details]
i8xx_tco.c

i8xx_tco.c debug version. could you test this and sent me the dmesg afterwards?
Comment 24 Shaohua 2005-03-13 23:08:50 UTC
I took a close look at this issue. When read battery status, ACPI BIOS will 
enable TCO watchtimer and set TCO_TMR's initial value 4, since NO_Reboot bit 
is set (no reboot) by default, this impact nothing.
When the watchdog driver is enabled, the driver will enable NO_Reboot (enable 
reboot), but ACPI BIOS doesn't keep the timer alive, so the system reboot.
A solution would be the driver enable NO_Reboot bit only when /dev/watchdog is 
opened.
Comment 25 Thomas Renninger 2005-03-14 03:41:46 UTC
Wim: is #24 enough or should I still try your debug version?
Shall I wait for another test version including suggestions of David?

BTW: It's not my machine, I could run some tests, but it could take some days
until I can provide results.
Comment 26 Wim Van Sebroeck 2005-03-14 11:15:11 UTC
No, you don't need to test extra. I'll make sure that the i8xx_tco driver get's
changed for the NO_REBOOT bit.
I wonder however why the ACPI code writes the TCO_TMR bit. I know that the TCO
area is ACPI-base + 60h but still...
Comment 27 Shaohua 2005-03-14 16:53:58 UTC
>I wonder however why the ACPI code writes the TCO_TMR bit.
It depends on the BIOS, ACPI itself doesn't write the bit. The BIOS lets ACPI 
write it. Below code is from the BIOS:
           Store (0x01, C081)
           Store (0x01, C080)
           Store (0x01, C07F)
           Store (0x04, C07E)
           Store (0x00, C07D)
           Store (0x00, C081)
C07D is 0x1060, C07E is 0x1061, C07F is 0x1064:3, C080 is 1066:1, C081 is 
0x1069:11. TCO base is 0x1060. Every time reading the battery, the BIOS lets 
OS execute the code. I guess only HP knows why the code should be executed. 
Comment 28 Thomas Renninger 2005-03-15 01:41:03 UTC
Tell me if I should run a test if necessary.
Comment 29 Shaohua 2005-03-21 04:27:36 UTC
Wim, is the solution in comment 24 ok for you? If yes, could you please help 
fix it? I'm not familar with the watchdog driver. Thanks!
Comment 30 Wim Van Sebroeck 2005-03-24 00:22:58 UTC
Yes comment 24 is clear. But I didn't have time yet to have a look at it and I'm
leaving for holidays tonight. I'll have a look at it when I return on 5 April.
Comment 31 Wim Van Sebroeck 2005-04-06 12:12:27 UTC
The translation of the code towards the TCO timer:

> Below code is from the BIOS: (TCO base is 0x1060.)
>            Store (0x01, C081) -> C081 is 0x1069:11
TCO Timer Halt -> write 1 = halt the timer
>            Store (0x01, C080) -> C080 is 0x1066:1
SECOND_TO_STS -> write 1 = clear second timeout status
>            Store (0x01, C07F) -> C07F is 0x1064:3
TIMEOUT -> clear "TCO timer reached 0" state
>            Store (0x04, C07E) -> C07E is 0x1061
TCO Timer Initial Value = 2.4 sec
>            Store (0x00, C07D) -> C07D is 0x1060
TCO timer reload
>            Store (0x00, C081) -> C081 is 0x1069:11
TCO Timer Halt -> write 0 = enable the timer

So the bios stops the TCO timer, programs it to 2.4 seconds and then enables the
timer. I can only presume that they do this to have an "interrupt" 2.4 seconds
later.

I will try to make the driver more secure but the bottom line will always be
that if the watchdog-driver is running and in the mean time the bios executes
this code, then your system will reboot...
Comment 32 Wim Van Sebroeck 2005-04-06 13:02:34 UTC
Created attachment 4870 [details]
new driver code to test - contains NO_REBOOT logic in start and stop

Can this new code be tested? I tested it on my hardware and it works as a
watchdog driver, but I can't test wether or not your reboot problem get's
fixed.
Comment 33 Thomas Renninger 2005-04-07 01:48:01 UTC
give me some time, I hope I can access/use the machine again.
Comment 34 Shaohua 2005-06-14 23:57:25 UTC
Wim, I tried your patch in a HP nx5000 system. It really works! Please push it 
to upstream. Thanks!
Comment 35 Shaohua 2005-07-05 18:10:44 UTC
*** Bug 2658 has been marked as a duplicate of this bug. ***
Comment 36 Len Brown 2005-07-27 18:52:43 UTC
while this is an interaction with the ACPI sub-system,
it appears that to provoke this issue the old watchdog
driver needs to be loaded, and the issue is fixed by
the new watchdog driver.  So I'm moving this bug out
of the ACPI category and into the driver category.

The author of the driver should send the fix upstream.
Comment 37 Thomas Renninger 2005-08-01 03:47:40 UTC
I now got another machine that hits this bug.
The machine gets rebooted when battery state is read out.

I also can confirm that the new driver *does fix the problem*.
However, it seems as if it is still not included in mainstream kernel?

-> Will reopen bug to get attention.
Comment 38 Thomas Renninger 2005-08-01 06:56:15 UTC
Created attachment 5449 [details]
Diff that solves rebooting when reading battery on HP nx5000 based on previously attached driver

I extracted the parts from the "new driver" that fix the reboot and created a
diff.
Tell me if it does not patch (I used a current SUSE kernel, there should be no
SUSE specific modifications in this part of the kernel ...)
Comment 39 Thomas Renninger 2005-08-01 08:35:43 UTC
Created attachment 5452 [details]
updated patch, one change is already mainline (pci_match_id(..))
Comment 40 Wim Van Sebroeck 2005-08-05 05:51:34 UTC
Getting it into the manline and the -mm kernel is planned for the comming 
weeks. Sorry for the delay.
Comment 41 Adrian Bunk 2005-08-22 17:16:21 UTC
The patch from this bug is now in Linus' tree and will therefore be in 2.6.13-rc7.

Note You need to log in before you can comment on or make changes to this bug.