Bug 1185 - Broken FAN detection prevents booting
Summary: Broken FAN detection prevents booting
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Fan (show other bugs)
Hardware: i386 Linux
: P2 blocking
Assignee: Shaohua
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-09-04 16:11 UTC by Martin Mokrejs
Modified: 2004-03-03 22:07 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.4.21-pre7
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
dmidecode (11.01 KB, text/plain)
2003-09-04 16:13 UTC, Martin Mokrejs
Details
acpidmp (97.42 KB, text/plain)
2003-09-04 16:14 UTC, Martin Mokrejs
Details
dmesg (13.37 KB, text/plain)
2003-09-04 16:14 UTC, Martin Mokrejs
Details
Dmesg with the oops (10.37 KB, text/plain)
2003-09-27 15:48 UTC, Karol Kozimor
Details

Description Martin Mokrejs 2003-09-04 16:11:28 UTC
Distribution: Gentoo
Hardware Environment: ASUS L3800C, BIOS 1.21
Software Environment:
Problem Description:

The machine starts to boot. I know that at the place ACPI prints out the status
of CPU, BATTERY, FAN etc. it prints some long message on a line, but it
immediately scrolls away. The machine if I remember well locks somewhere after init.


> From: Martin Mokrejs [mailto:mmokrejs@natur.cuni.cz] 
> Sent: Thursday, August 21, 2003 12:15 PM
> To: linux-kernel@vger.kernel.org
> Subject: ACPI kernel crash with 2.4.22-pre7 on ASUS L3800C
> 
> 
> Hi,
>   I observe time to time on cold boot hang of my laptop. I 
> remember to see 
> such hangs at least since 2.4.21-pre3. Here's my latest 
> running kernel:
> 
> # ksymoops --system-map=/boot/System.map-2.4.22-pre7 
> --vmlinux=/usr/src/linux-2.4.22-pre7/vmlinux ./cr
> ksymoops 2.4.9 on i686 2.4.22-pre7.  Options used
>      -v /usr/src/linux-2.4.22-pre7/vmlinux (specified)
>      -k /proc/ksyms (default)
>      -l /proc/modules (default)
>      -o /lib/modules/2.4.22-pre7/ (default)
>      -m /boot/System.map-2.4.22-pre7 (specified)
> 
> EFLAGS: 00010246
> eax: 00000000 ebx: 638a05f0 ecx: 00000000 edx: 00000006
> esi: 638a05f0 edi: f7ebddd0 ebp: f7ebdd78 esp: f7ebdd74
> ds: 0018 es: 0018 ss: 0018
> Process keventd: (pid 2, stackpage=f7ebd000)
> Stack: f7ebdda4 f7ebdd90 c01ede68 638a05f0 f7ebdda4 f7ebddd0 
> 638a05f0 f7ebddc0
>        c01fc0f2 638a05f0 c01fc072 f7ebddd0 00010000 c0337755 
> c033770a 00000050
>        f7ebddd4 f7ebddd4 638a05f0 f7ebddf0 c02037c2 638a05f0 
> f7ebddd0 00000000
> Call Trace:     [<c01ede68>] [<c01fc0f2>] [<c01fc072>] 
> [<c02037c2>] [<c01f8a2a>]
>   [<c0203b8d>] [<c0203ee7>] [<c01fc4c7>] [<c01f8a00>] 
> [<c0207aed>] [<c0207e5e>]
>   [<c0208b60>] [<c01dce5a>] [<c01d4f7d>] [<c011ff0a>] 
> [<c01282e5>] [<c01281b0>]
>   [<c0105000>] [<c01057ee>] [<c01281b0>]
> Code: 80 3b aa 0f 44 c3 5b 5d c3 a1 d4 55 40 c0 eb f6 55 89 e5 8b
> Using defaults from ksymoops -t elf32-i386 -a i386
> 
> 
> >>edi; f7ebddd0 <_end+37a9304c/3a4f02dc>
> >>ebp; f7ebdd78 <_end+37a92ff4/3a4f02dc>
> >>esp; f7ebdd74 <_end+37a92ff0/3a4f02dc>
> 
> Trace; c01ede68 <acpi_get_data+34/60>
> Trace; c01fc0f2 <acpi_bus_get_device+45/a9>
> Trace; c01fc072 <acpi_bus_data_handler+0/3b>
> Trace; c02037c2 <acpi_power_get_context+46/a6>
> Trace; c01f8a2a <acpi_ut_trace+29/2b>
> Trace; c0203b8d <acpi_power_off_device+46/19d>
> Trace; c0203ee7 <acpi_power_transition+111/138>
> Trace; c01fc4c7 <acpi_bus_set_power+15f/273>
> Trace; c01f8a00 <acpi_ut_debug_print_raw+29/2a>
> Trace; c0207aed <acpi_thermal_active+bf/187>
> Trace; c0207e5e <acpi_thermal_check+295/2e2>
> Trace; c0208b60 <acpi_thermal_notify+a6/105>
> Trace; c01dce5a <acpi_ev_notify_dispatch+54/7a>
> Trace; c01d4f7d <acpi_os_execute_deferred+3a/6c>
> Trace; c011ff0a <__run_task_queue+5a/70>
> Trace; c01282e5 <context_thread+135/1d0>
> Trace; c01281b0 <context_thread+0/1d0>
> Trace; c0105000 <_stext+0/0>
> Trace; c01057ee <arch_kernel_thread+2e/40>
> Trace; c01281b0 <context_thread+0/1d0>




Steps to reproduce:
Comment 1 Martin Mokrejs 2003-09-04 16:13:47 UTC
Created attachment 815 [details]
dmidecode

I have downgraded to older bios, but I was told by Karol Kozimor that this bug
appears there too. I'll add his email message to thi bug report.
Comment 2 Martin Mokrejs 2003-09-04 16:14:18 UTC
Created attachment 816 [details]
acpidmp
Comment 3 Martin Mokrejs 2003-09-04 16:14:42 UTC
Created attachment 817 [details]
dmesg
Comment 4 Martin Mokrejs 2003-09-04 16:20:00 UTC
From: Karol Kozimor <sziwan@hell.org.pl>
To: "Brown, Len" <len.brown@intel.com>
Cc: Martin Mokrejs <mmokrejs@natur.cuni.cz>, linux-kernel@vger.kernel.org,
acpi-devel@lists.sourceforge.net
Date: Thu, 4 Sep 2003 10:53:15 +0200
Subject: Re: [ACPI] RE: ACPI kernel crash with 2.4.22-pre7 on ASUS L3800C

Thus wrote Brown, Len:
> Martin,
> Does this still happen with 2.4.22?
> If yes, can I trouble you to drop the info into bugzilla so we can put
> it in the queue?

FYI, I just had it *after* boot, i.e. some 30 seconds after the swsusp 
resume (trace below), and _again_ _after_ I warm-rebooted the machine using
SysRq+B. The subsequent warm-reboot went OK.

Linux 2.4.21 + ACPI 20030619

ksymoops 2.4.9 on i686 2.4.21-xacs.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.21-xacs/ (default)
     -m /usr/src/linux/System.map (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

c01d7600
Oops: 0000
8139too mii snd-intel8x0 snd-pcm snd-timer snd-ac97-codec snd-page-alloc
snd-mpu401-uart snd-rawmidi snd-seq-device snd soundcore ppp_deflate
zlib_inflate zlib_deflate ppp_async ppp_generic slhc ptserial pctel sr_mod
scsi_mod cdrom radeon agpgart asus_acpi mousedev hid input uhci usbcore ds
yenta_socket pcmcia_core
CPU:    0
EIP:    0010:[<c01d7600>]    Tainted: P Z
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00210293
eax: 00000627   ebx: 872d3184   ecx: cff0fe08   edx: 00000000
esi: 872d3184   edi: cff0fe70   ebp: c01e6bcc   esp: cff0fe10
ds: 0018   es: 0018   ss: 0018
Process keventd (pid: 2, stackpage=cff0f000)
Stack: 00000000 c01d837d 872d3184 cff0fe48 cff0fe70 872d3184 872d3184
c01e6c6f
       872d3184 c01e6bcc cff0fe70 872d3184 cff0fe74 cff0fea0 00010000
c02912cb
       c0291280 c01ee511 872d3184 cff0fe70 872d3184 cff0fea4 cff12e00
00000000
Call Trace: [<c01d837d>]  [<c01e6c6f>]  [<c01e6bcc>]  [<c01ee511>]
[<c01ee964>]  [<c01eed14>]  [<c01e70bc>]  [<c01f2c73>]  [<c01f2f10>]
[<c01bdc09>]  [<c0118d5c>]  [<c011fecd>]  [<c0105668>]
Code: 80 3b aa 75 0b 89 d8 eb 09 8d b4 26 00 00 00 00 31 c0 5b c3


>>EIP; c01d7600 <acpi_ns_map_handle_to_node+1c/30>   <=====

>>ecx; cff0fe08 <_end+fbda4b0/124de708>
>>edi; cff0fe70 <_end+fbda518/124de708>
>>ebp; c01e6bcc <acpi_bus_data_handler+0/44>
>>esp; cff0fe10 <_end+fbda4b8/124de708>

Trace; c01d837d <acpi_get_data+39/6a>
Trace; c01e6c6f <acpi_bus_get_device+5f/b4>
Trace; c01e6bcc <acpi_bus_data_handler+0/44>
Trace; c01ee511 <acpi_power_get_context+61/cc>
Trace; c01ee964 <acpi_power_off_device+4c/1e0>
Trace; c01eed14 <acpi_power_transition+100/15c>
Trace; c01e70bc <acpi_bus_set_power+1b0/29c>
Trace; c01f2c73 <acpi_thermal_active+d3/1cc>
Trace; c01f2f10 <acpi_thermal_check+18c/2ac>
Trace; c01bdc09 <acpi_os_execute_deferred+5d/7c>
Trace; c0118d5c <__run_task_queue+50/5c>
Trace; c011fecd <context_thread+121/1a0>
Trace; c0105668 <arch_kernel_thread+28/38>

Code;  c01d7600 <acpi_ns_map_handle_to_node+1c/30>
00000000 <_EIP>:
Code;  c01d7600 <acpi_ns_map_handle_to_node+1c/30>   <=====
   0:   80 3b aa                  cmpb   $0xaa,(%ebx)   <=====
Code;  c01d7603 <acpi_ns_map_handle_to_node+1f/30>
   3:   75 0b                     jne    10 <_EIP+0x10>
Code;  c01d7605 <acpi_ns_map_handle_to_node+21/30>
   5:   89 d8                     mov    %ebx,%eax
Code;  c01d7607 <acpi_ns_map_handle_to_node+23/30>
   7:   eb 09                     jmp    12 <_EIP+0x12>
Code;  c01d7609 <acpi_ns_map_handle_to_node+25/30>
   9:   8d b4 26 00 00 00 00      lea    0x0(%esi,1),%esi
Code;  c01d7610 <acpi_ns_map_handle_to_node+2c/30>
  10:   31 c0                     xor    %eax,%eax
Code;  c01d7612 <acpi_ns_map_handle_to_node+2e/30>
  12:   5b                        pop    %ebx
Code;  c01d7613 <acpi_ns_map_handle_to_node+2f/30>
  13:   c3                        ret


1 warning issued.  Results may not be reliable.

[the kernel is tainted by swsusp and pctel module, but it shouldn't really
 matter since this oops happens mainly at boot]

I'll have yet to see if it still happens with 2.4.22.
Best regards,

-- 
Karol 'sziwan' Kozimor
sziwan@hell.org.pl
Comment 5 Martin Mokrejs 2003-09-04 16:24:22 UTC
From: Karol Kozimor <sziwan@hell.org.pl>
To: Martin MOKREJ
Comment 6 Luming Yu 2003-09-16 00:17:37 UTC
Would you please turn on ACPI debug option and have kernel-2.6.0-test5 a try.
Please attach dmesg of boot and dmesg of problem happening. Thanks a lot.
Comment 7 Karol Kozimor 2003-09-23 16:34:50 UTC
Hi,
I can't reproduce this using 2.6.0-test5-mm2, but that's not authoritative (I'm
not really using 2.6 as my main kernel). Or, more specifically, there are errors
(see below) reported, but no oops.

Under 2.4 the oops mostly happens only when booting on battery, I can't remember
having seen that when using AC (which doesn't necessarily mean it didn't occur
at all, I could have disregarded it). 
Oddly enough, there are times when even on battery the oops cannot be triggered,
this may depend on the temperature of the machine -- it has an active trip point
at 40 degrees celsius and if it is passed during boot, the oops is likely.
I'll attach the dmesg later on, the only output when the problem occurs (apart
from the oops itself) are these lines:
acpi_power-0363 [842] acpi_power_transition : Error transitioning device [CFAN]
to D3
acpi_bus-0496 [841] acpi_bus_set_power    : Error transitioning device [CFAN] to D3
acpi_thermal-0567 [840] acpi_thermal_active   : Unable to turn cooling device
[c12d24a8] 'off'
(or D0, respectively)

To spice the problem up, the above lines were a result of a resume of a
suspended kernel (swsusp), when the cooling would decrease the temperature below
40 (where the fan should turn off) -- what is odd, is that the machine was both
suspended and resumed on AC... (no oops this time)

After the oops, the machine (at least mine) is quite unstable, but usable. The
only problem is that the keyboard stops responding (sometimes SysRq works), but
the Synaptics Touchpad works fine: /proc/interrupts shows nothing abnormal.

As for the debugging options: I have debug statements compiled in by default,
but I have no clue on what to set to debug_* options. My only attempts resulted
in getting megabytes of logs, but that's not what you want probably, right?
Maybe, if those debug flags were somehow documented (hint hint)...
Comment 8 Shaohua 2003-09-26 22:26:04 UTC
can you attach the full dmesg when oops?
can you add the below code into 'acpi_fan_add' of fan.c:
>printk("%d devices in D0,%d devices in D3\n",device->power.states
[ACPI_STATE_D0].resources.count, device->power.states
[ACPI_STATE_D3].resources.count),
and what it happen when boot?
Comment 9 Karol Kozimor 2003-09-27 15:48:39 UTC
Created attachment 943 [details]
Dmesg with the oops

This is the dmesg with the oops at the bottom. Note that the oops is trigerred
manually, by switching cooling_mode to passive and back to active.
Below is what ksymoops says (though its output no different than others):
Unable to handle kernel paging request at virtual address 872c31c4
c01fd497
*pde = 00000000
Oops: 0000
CPU:	0
EIP:	0010:[<c01fd497>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
eax: 00000000	ebx: 872c31c4	ecx: cff0fdd8	edx: 00000006
esi: 872c31c4	edi: cff0fe34	ebp: c020c608	esp: cff0fde0
ds: 0018   es: 0018   ss: 0018
Process keventd (pid: 2, stackpage=cff0f000)
Stack: 00001001 c01fe17f 872c31c4 cff0fe0c cff0fe34 872c31c4 cff0fe70 c020c686
       872c31c4 c020c608 cff0fe34 00010000 c02cd715 c02cd6ca 00000050 cff0fe38
       cff0fe38 872c31c4 c0213efe 872c31c4 cff0fe34 00000000 00800000 c02ce900
Call Trace: [<c01fe17f>]  [<c020c686>]	[<c020c608>]  [<c0213efe>] 
[<c02142f5>]  [<c0214651>]  [<c020ca83>]  [<c0208e00>]	[<c0218342>] 
[<c02186bf>]  [<c01e484d>]  [<c011d02a>]  [<c01256b3>]	[<c0125580>] 
[<c0105000>]  [<c01058ee>]  [<c0125580>]
Code: 80 3b aa 0f 44 c3 5b c3 a1 d4 6b 3d c0 eb f7 8b 44 24 04 c3


>>EIP; c01fd497 <acpi_ns_map_handle_to_node+17/26>   <=====

>>ecx; cff0fdd8 <_end+fb1a000/1241e288>
>>edi; cff0fe34 <_end+fb1a05c/1241e288>
>>ebp; c020c608 <acpi_bus_data_handler+0/39>
>>esp; cff0fde0 <_end+fb1a008/1241e288>

Trace; c01fe17f <acpi_get_data+38/5d>
Trace; c020c686 <acpi_bus_get_device+45/ae>
Trace; c020c608 <acpi_bus_data_handler+0/39>
Trace; c0213efe <acpi_power_get_context+4a/ae>
Trace; c02142f5 <acpi_power_off_device+4a/1a7>
Trace; c0214651 <acpi_power_transition+113/13c>
Trace; c020ca83 <acpi_bus_set_power+170/298>
Trace; c0208e00 <acpi_ut_track_stack_ptr+1f/26>
Trace; c0218342 <acpi_thermal_active+c4/190>
Trace; c02186bf <acpi_thermal_check+29d/2ec>
Trace; c01e484d <acpi_os_execute_deferred+39/75>
Trace; c011d02a <__run_task_queue+5a/70>
Trace; c01256b3 <context_thread+133/1d0>
Trace; c0125580 <context_thread+0/1d0>
Trace; c0105000 <_stext+0/0>
Trace; c01058ee <arch_kernel_thread+2e/40>
Trace; c0125580 <context_thread+0/1d0>

Code;  c01fd497 <acpi_ns_map_handle_to_node+17/26>
00000000 <_EIP>:
Code;  c01fd497 <acpi_ns_map_handle_to_node+17/26>   <=====
   0:	80 3b aa		  cmpb	 $0xaa,(%ebx)	<=====
Code;  c01fd49a <acpi_ns_map_handle_to_node+1a/26>
   3:	0f 44 c3		  cmove  %ebx,%eax
Code;  c01fd49d <acpi_ns_map_handle_to_node+1d/26>
   6:	5b			  pop	 %ebx
Code;  c01fd49e <acpi_ns_map_handle_to_node+1e/26>
   7:	c3			  ret
Code;  c01fd49f <acpi_ns_map_handle_to_node+1f/26>
   8:	a1 d4 6b 3d c0		  mov	 0xc03d6bd4,%eax
Code;  c01fd4a4 <acpi_ns_map_handle_to_node+24/26>
   d:	eb f7			  jmp	 6 <_EIP+0x6>
Code;  c01fd4a6 <acpi_ns_convert_entry_to_handle+0/5>
   f:	8b 44 24 04		  mov	 0x4(%esp,1),%eax
Code;  c01fd4aa <acpi_ns_convert_entry_to_handle+4/5>
  13:	c3			  ret
Comment 10 Shaohua 2003-09-27 19:19:24 UTC
can you directly open or close the fan, linke that:
echo "0" >/proc/acpi/fan/CFAN/state
echo "3" >/proc/acpi/fan/CFAN/state
Comment 11 Shaohua 2003-09-27 19:58:04 UTC
In the dsdt, I get below code:
>Name (CFST, Zero)
>Method (_ON, 0, NotSerialized)
>{
>    Store (One, CFST)
>}
>Method (_OFF, 0, NotSerialized)
>{
>    Store (Zero, CFST)
>}
CFST don't associate with any ioports, so _ON and _OFF can't control CFAN at 
all. I guess it's a BIOS error. But it doesn't means it's the reason of oops.
Comment 12 Karol Kozimor 2003-09-28 08:49:44 UTC
No, echoing neither 0 nor 3 gives any effect on the fan or on the system
whatsoever, except for the already mentioned lines appearing in the logs.

BTW: A seemingly reliable way to reproduce the oops:
1. Boot
2. echo 1 > /proc/acpi/thermal_zone/THRM/cooling_mode
[the fan is in D3 now]
3. echo 0 > /proc/acpi/fan/CFAN/state
[error in the logs, no effect on the fan]
4. echo 3 > /proc/acpi/fan/CFAN/state
[Oops]
Comment 13 Shaohua 2003-09-28 20:14:00 UTC
ok, did you mean directly opening or closing the fan will cause error yet? so, 
we can narrow the problem: CFAN device has error. is it true? so please test 
without thermal zone, in this way we can easily find out the cause. As I 
mentioned, your DSDT can't control the physical fan( hardware ), that is the 
CFAN is a pseudo device. so, I create a pseudo fan in my machine, but no error 
occur. 
Comment 14 Karol Kozimor 2003-09-29 13:35:30 UTC
I added the code you asked about. Funny thing is that even if the fan is off,
this code will show that it's in D0, specifically:

1 devices in D0,0 devices in D3
ACPI: Fan [CFAN] (off)
1 devices in D0,0 devices in D3
ACPI: Processor [CPU0] (supports C1 C2, 8 throttling states)
ACPI: Thermal Zone [THRM] (34 C)

(note the temp is below the active trip point)

I'll try to boot without the thermal zone module, but it'll take a while, since
my kernel tree got somehow corrupted. Anyway, I doubt the problem is in the
thermal code: the oopses only happen when the system is about to change the
state of the fan (i.e. by accessing CFAN device), which happens at boot (if
temperature passes the trip point), and at other times when polling_frequency is
set. Otherwise, the thermal code works well and the SLMT method does its job of
handling the fan quite well indeed.

It seems to me that the problem is indeed in the fan handling, and not in the
thermal code, especially that the oops may be trigerred without touching
thermal-specific code at all (i.e. echo [03] > CFAN/state).

A tangential question: how to determine if a 0x80 thermal event is issued? It
would seem to me that the specific GPE (_L00) is executed at times (the system
notices temperature changes and runs SLMT), but no such events are passed to the
userspace (i.e. /proc/acpi/event)?
Comment 15 Karol Kozimor 2003-10-03 10:02:25 UTC
I compiled the kernel with CONFIG_ACPI_FAN=m and CONFIG_ACPI_THERMAL=m.
(2.4.22 with 20030918 ACPI patch)

# modprobe fan
# echo 3 > /proc/fan/CFAN/state
# echo 0 > /proc/fan/CFAN/state
[Oops]

OTOH, if I modprobe thermal.o without fan.o loaded, the following appears:

Unable to handle kernel paging request at virtual address 876c33c4
*pde = 00000000
c01ff71c
Oops: 0000
CPU:    0
EIP:    0010:[<c01ff71c>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
eax: 00000000   ebx: 876c33c4   ecx: cebddc94   edx: 00000006
esi: 876c33c4   edi: cebddcf0   ebp: c020e88c   esp: cebddc9c
ds: 0018   es: 0018   ss: 0018
Process modprobe.old (pid: 1008, stackpage=cebdd000)
Stack: 00001001 c0200403 876c33c4 cebddcc8 cebddcf0 876c33c4 cebddd2c c020e90a
       876c33c4 c020e88c cebddcf0 00010000 c02cb9c0 c02cb975 00000050 cebddcf4
       cebddcf4 876c33c4 c0215cd2 876c33c4 cebddcf0 00000000 00800000 c02ccaf6
Call Trace: [<c0200403>]  [<c020e90a>]  [<c020e88c>]  [<c0215cd2>]  [<c02160c9>]
 [<c0216425>]  [<c020ed07>]  [<c020b100>]  [<d28ebbb6>]  [<d28ed41c>] 
[<d28ed1d6>]  [<d28ebf33>]  [<d28eca93>]  [<d28ed44e>]  [<d28ed1d6>] 
[<d28ed5d2>]  [<d28eceaf>]  [<d28ed67e>]  [<d28ed1d6>]  [<d28edd40>] 
[<c020f782>]  [<d28edd40>]  [<c020f900>]  [<d28edd40>]  [<d28edd48>] 
[<c020f847>]  [<c020f32b>]  [<d28edd40>]  [<d28edd40>]  [<c020fc12>] 
[<c020f847>]  [<d28edd40>]  [<d28ed09e>]  [<d28ed0d0>]  [<d28edd40>] 
[<d28ed6da>]  [<d28ed1d6>]  [<c0119368>]  [<d28eb060>]  [<d28eb060>]  [<c01075df>]
Code: 80 3b aa 0f 44 c3 5b c3 a1 f4 77 35 c0 eb f7 8b 44 24 04 c3


>>EIP; c01ff71c <acpi_ns_map_handle_to_node+17/26>   <=====

>>ecx; cebddc94 <_end+e8672bc/1249d688>
>>edi; cebddcf0 <_end+e867318/1249d688>
>>ebp; c020e88c <acpi_bus_data_handler+0/39>
>>esp; cebddc9c <_end+e8672c4/1249d688>

Trace; c0200403 <acpi_get_data+38/5d>
Trace; c020e90a <acpi_bus_get_device+45/ae>
Trace; c020e88c <acpi_bus_data_handler+0/39>
Trace; c0215cd2 <acpi_power_get_context+4a/ae>
Trace; c02160c9 <acpi_power_off_device+4a/1a7>
Trace; c0216425 <acpi_power_transition+113/13c>
Trace; c020ed07 <acpi_bus_set_power+170/298>
Trace; c020b100 <acpi_ut_debug_print+75/9f>
Trace; d28ebbb6 <[thermal]acpi_thermal_active+c4/190>
Trace; d28ed41c <[thermal].text.end+247/a2b>
Trace; d28ed1d6 <[thermal].text.end+1/a2b>
Trace; d28ebf33 <[thermal]acpi_thermal_check+29d/2ec>
Trace; d28eca93 <[thermal]acpi_thermal_add_fs+176/243>
Trace; d28ed44e <[thermal].text.end+279/a2b>
Trace; d28ed1d6 <[thermal].text.end+1/a2b>
Trace; d28ed5d2 <[thermal].text.end+3fd/a2b>
Trace; d28eceaf <[thermal]acpi_thermal_add+f9/1b9>
Trace; d28ed67e <[thermal].text.end+4a9/a2b>
Trace; d28ed1d6 <[thermal].text.end+1/a2b>
Trace; d28edd40 <[thermal]acpi_thermal_driver+0/d4>
Trace; c020f782 <acpi_bus_driver_init+6f/134>
Trace; d28edd40 <[thermal]acpi_thermal_driver+0/d4>
Trace; c020f900 <acpi_bus_attach+b9/138>
Trace; d28edd40 <[thermal]acpi_thermal_driver+0/d4>
Trace; d28edd48 <[thermal]acpi_thermal_driver+8/d4>
Trace; c020f847 <acpi_bus_attach+0/138>
Trace; c020f32b <acpi_bus_walk+b1/cc>
Trace; d28edd40 <[thermal]acpi_thermal_driver+0/d4>
Trace; d28edd40 <[thermal]acpi_thermal_driver+0/d4>
Trace; c020fc12 <acpi_bus_register_driver+a4/dc>
Trace; c020f847 <acpi_bus_attach+0/138>
Trace; d28edd40 <[thermal]acpi_thermal_driver+0/d4>
Trace; d28ed09e <[thermal]acpi_thermal_init+39/ac>
Trace; d28ed0d0 <[thermal]acpi_thermal_init+6b/ac>
Trace; d28edd40 <[thermal]acpi_thermal_driver+0/d4>
Trace; d28ed6da <[thermal].text.end+505/a2b>
Trace; d28ed1d6 <[thermal].text.end+1/a2b>
Trace; c0119368 <sys_init_module+538/690>
Trace; d28eb060 <[thermal]acpi_thermal_get_temperature+0/a4>
Trace; d28eb060 <[thermal]acpi_thermal_get_temperature+0/a4>
Trace; c01075df <system_call+33/38>

Code;  c01ff71c <acpi_ns_map_handle_to_node+17/26>
00000000 <_EIP>:
Code;  c01ff71c <acpi_ns_map_handle_to_node+17/26>   <=====
   0:   80 3b aa                  cmpb   $0xaa,(%ebx)   <=====
Code;  c01ff71f <acpi_ns_map_handle_to_node+1a/26>
   3:   0f 44 c3                  cmove  %ebx,%eax
Code;  c01ff722 <acpi_ns_map_handle_to_node+1d/26>
   6:   5b                        pop    %ebx
Code;  c01ff723 <acpi_ns_map_handle_to_node+1e/26>
   7:   c3                        ret
Code;  c01ff724 <acpi_ns_map_handle_to_node+1f/26>
   8:   a1 f4 77 35 c0            mov    0xc03577f4,%eax
Code;  c01ff729 <acpi_ns_map_handle_to_node+24/26>
   d:   eb f7                     jmp    6 <_EIP+0x6>
Code;  c01ff72b <acpi_ns_convert_entry_to_handle+0/5>
   f:   8b 44 24 04               mov    0x4(%esp,1),%eax
Code;  c01ff72f <acpi_ns_convert_entry_to_handle+4/5>
  13:   c3                        ret

Comment 16 Shaohua 2003-10-09 02:11:12 UTC
using below code to gather some info, what will it happen?
--- power.c     2003-08-29 08:49:20.000000000 +0800
+++ power.c.new 2003-10-09 17:07:18.000000000 +0800
@@ -326,6 +326,7 @@
        cl = &device->power.states[device->power.state].resources;
        tl = &device->power.states[state].resources;

+       printk("Change device [%s] from D%c to D%c,power resources 
number:current %d,target %d\n", device->pnp.bus_id, '0'+device-
>power.state, '0'+state,cl->count,tl->count);
        device->power.state = ACPI_STATE_UNKNOWN;

        if (!cl->count && !tl->count) {

I guess the number is wrong.
Comment 17 Karol Kozimor 2003-10-10 10:09:37 UTC
Using the above code:
# cd /proc/acpi/fan/CFAN
# cat state
status:                  on
# echo 3 > state
Change device [CFAN] from D0 to D3,power resources
^I^I^Inumber:current 1,target 0
acpi_power-0366 [32] acpi_power_transition : Error transitioning device [CFAN] to D3
acpi_bus-0496 [31] acpi_bus_set_power    : Error transitioning device [CFAN] to D3
[OK so far]
[switched to passive cooling]
# cat state
status:                  off
# echo 0 > state
[as above]
# echo 3 > state
Change device [CFAN] from D/ to D3,power resources
^I^I^Inumber:current 659317645,target 0
[Oops]

Weird, huh? Especially the "D/" thing (that's how syslog states, at least).
Comment 18 Martin Mokrejs 2003-10-10 10:51:33 UTC
Oct 10 19:16:54 vrapenec kernel: spurious 8259A interrupt: IRQ7.
Oct 10 19:17:13 vrapenec kernel: Change device [CFAN] from D0 to D3,power resources number:current 1,target 0
Oct 10 19:17:13 vrapenec kernel: acpi_power-0365 [36] acpi_power_transition : Error transitioning device [CFAN] to D3
Oct 10 19:17:13 vrapenec kernel: acpi_bus-0496 [35] acpi_bus_set_power    : Error transitioning device [CFAN] to D3
Oct 10 19:17:24 vrapenec kernel: Change device [CFAN] from D3 to D0,power resources number:current 0,target 1
Oct 10 19:17:27 vrapenec kernel: Change device [CFAN] from D0 to D3,power resources number:current 1,target 0
Oct 10 19:17:27 vrapenec kernel: acpi_power-0365 [38] acpi_power_transition : Error transitioning device [CFAN] to D3
Oct 10 19:17:27 vrapenec kernel: acpi_bus-0496 [37] acpi_bus_set_power    : Error transitioning device [CFAN] to D3
Oct 10 19:17:57 vrapenec kernel: Change device [CFAN] from D3 to D0,power resources number:current 0,target 1
Oct 10 19:18:17 vrapenec kernel: Change device [CFAN] from D0 to D3,power resources number:current 1,target 0
Oct 10 19:18:58 vrapenec kernel: Change device [CFAN] from D3 to D0,power resources number:current 0,target 1
Oct 10 19:18:58 vrapenec kernel: acpi_power-0365 [48] acpi_power_transition : Error transitioning device [CFAN] to D0
Oct 10 19:18:58 vrapenec kernel: acpi_bus-0496 [47] acpi_bus_set_power    : Error transitioning device [CFAN] to D0
Oct 10 19:19:00 vrapenec kernel: Change device [CFAN] from D/ to D3,power resources number:current 1111804189,target 0

Tested with 2.4.23-pre5-acpi-20030918 with the printk patch as requested below

During those "echo $number > " steps I got only oopses in system, my shell got killed. Unfortunately, I decided
to connect to net, and that raised my FAN to max RPM and locked computer. ctrl+alt+del worked, kill/initd during init 6
complained that some processes are locked. Do you need the stacktrace resolved?

The machine does not reboot sometimes, I get instead:

Power down.
host/usb-uhci.c: interrupt, status 20, frame #0
host/usb-uhci.c: Host controller halted, trying to restart.

.... and nothing happens. Holding 5s the power button helps on L3800C.
Comment 19 Martin Mokrejs 2003-10-10 10:53:23 UTC
Oct 10 19:16:54 vrapenec kernel: spurious 8259A interrupt: IRQ7.
Oct 10 19:17:13 vrapenec kernel: Change device [CFAN] from D0 to D3,power resources number:current 1,target 0
Oct 10 19:17:13 vrapenec kernel: acpi_power-0365 [36] acpi_power_transition : Error transitioning device [CFAN] to D3
Oct 10 19:17:13 vrapenec kernel: acpi_bus-0496 [35] acpi_bus_set_power    : Error transitioning device [CFAN] to D3
Oct 10 19:17:24 vrapenec kernel: Change device [CFAN] from D3 to D0,power resources number:current 0,target 1
Oct 10 19:17:27 vrapenec kernel: Change device [CFAN] from D0 to D3,power resources number:current 1,target 0
Oct 10 19:17:27 vrapenec kernel: acpi_power-0365 [38] acpi_power_transition : Error transitioning device [CFAN] to D3
Oct 10 19:17:27 vrapenec kernel: acpi_bus-0496 [37] acpi_bus_set_power    : Error transitioning device [CFAN] to D3
Oct 10 19:17:57 vrapenec kernel: Change device [CFAN] from D3 to D0,power resources number:current 0,target 1
Oct 10 19:18:17 vrapenec kernel: Change device [CFAN] from D0 to D3,power resources number:current 1,target 0
Oct 10 19:18:58 vrapenec kernel: Change device [CFAN] from D3 to D0,power resources number:current 0,target 1
Oct 10 19:18:58 vrapenec kernel: acpi_power-0365 [48] acpi_power_transition : Error transitioning device [CFAN] to D0
Oct 10 19:18:58 vrapenec kernel: acpi_bus-0496 [47] acpi_bus_set_power    : Error transitioning device [CFAN] to D0
Oct 10 19:19:00 vrapenec kernel: Change device [CFAN] from D/ to D3,power resources number:current 1111804189,target 0

Tested with 2.4.23-pre5-acpi-20030918 with the printk patch as requested below

During those "echo $number > " steps I got only oopses in system, my shell got killed. Unfortunately, I decided
to connect to net, and that raised my FAN to max RPM and locked computer. ctrl+alt+del worked, kill/initd during init 6
complained that some processes are locked. Do you need the stacktrace resolved?

The machine does not reboot sometimes, I get instead:

Power down. 
host/usb-uhci.c: interrupt, status 20, frame #0
host/usb-uhci.c: Host controller halted, trying to restart.

.... and nothing happens. Holding 5s the power button helps on L3800C.
Comment 20 Martin Mokrejs 2003-10-12 16:01:56 UTC
I did few more tests. Here's some older output from 2.4.23-pre5-acpi-20030918,
which I thought did not get stored on the disk as the computer locked ... but we
were lucky:

spurious 8259A interrupt: IRQ7.
Change device [CFAN] from D0 to D3,power resources number:current 1,target 0
acpi_power-0365 [36] acpi_power_transition : Error transitioning device [CFAN] to D3
acpi_bus-0496 [35] acpi_bus_set_power    : Error transitioning device [CFAN] to D3
Change device [CFAN] from D3 to D0,power resources number:current 0,target 1
Change device [CFAN] from D0 to D3,power resources number:current 1,target 0
acpi_power-0365 [38] acpi_power_transition : Error transitioning device [CFAN] to D3
acpi_bus-0496 [37] acpi_bus_set_power    : Error transitioning device [CFAN] to D3
Change device [CFAN] from D3 to D0,power resources number:current 0,target 1
Change device [CFAN] from D0 to D3,power resources number:current 1,target 0
vrapenec root # echo 0 > /proc/acpi/fan/CFAN/state
vrapenec root # echo 3 > /proc/acpi/fan/CFAN/state
Unable to handle kernel paging request at virtual address 43860570
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[<c01eef28>]    Not tainted
EFLAGS: 00010246
eax: 00000000   ebx: 43860570   ecx: f1b7fe24   edx: 00000006
esi: 43860570   edi: f1b7fe80   ebp: c01fe098   esp: f1b7fe2c
ds: 0018   es: 0018   ss: 0018
Process bash (pid: 2233, stackpage=f1b7f000)
Stack: 00001001 c01efc0f 43860570 f1b7fe58 f1b7fe80 43860570 f1b7febc c01fe116
       43860570 c01fe098 f1b7fe80 00010000 c036e679 c036e62e 00000050 f1b7fe84
       f1b7fe84 43860570 c0205a3a 43860570 f1b7fe80 00000000 00800000 c036f877
Call Trace:    [<c01efc0f>] [<c01fe116>] [<c01fe098>] [<c0205a3a>] [<c0205e31>]
  [<c02061c5>] [<c01fe513>] [<c01fa900>] [<c020339c>] [<c015d120>] [<c013ad73>]
  [<c010749f>]

Code: 80 3b aa 0f 44 c3 5b c3 a1 d4 d0 43 c0 eb f7 8b 44 24 04 c3


This is vrapenec.gsf.de.gsf.de (Linux i686 2.4.23-pre5) 19:19:01

vrapenec.gsf.de login:


spurious 8259A interrupt: IRQ7.
Change device [CFAN] from D0 to D3,power resources number:current 1,target 0
acpi_power-0365 [36] acpi_power_transition : Error transitioning device [CFAN] to D3
acpi_bus-0496 [35] acpi_bus_set_power    : Error transitioning device [CFAN] to D3
Change device [CFAN] from D3 to D0,power resources number:current 0,target 1
Change device [CFAN] from D0 to D3,power resources number:current 1,target 0
acpi_power-0365 [38] acpi_power_transition : Error transitioning device [CFAN] to D3
acpi_bus-0496 [37] acpi_bus_set_power    : Error transitioning device [CFAN] to D3
Change device [CFAN] from D3 to D0,power resources number:current 0,target 1
Change device [CFAN] from D0 to D3,power resources number:current 1,target 0
Change device [CFAN] from D3 to D0,power resources number:current 0,target 1
acpi_power-0365 [48] acpi_power_transition : Error transitioning device [CFAN] to D0
acpi_bus-0496 [47] acpi_bus_set_power    : Error transitioning device [CFAN] to D0
Change device [CFAN] from D/ to D3,power resources number:current
1111804189,target 0
Unable to handle kernel paging request at virtual address 43860570
 printing eip:
c01eef28
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[<c01eef28>]    Not tainted
EFLAGS: 00010246
eax: 00000000   ebx: 43860570   ecx: f1b7fe24   edx: 00000006
esi: 43860570   edi: f1b7fe80   ebp: c01fe098   esp: f1b7fe2c
ds: 0018   es: 0018   ss: 0018
Process bash (pid: 2233, stackpage=f1b7f000)
Stack: 00001001 c01efc0f 43860570 f1b7fe58 f1b7fe80 43860570 f1b7febc c01fe116
       43860570 c01fe098 f1b7fe80 00010000 c036e679 c036e62e 00000050 f1b7fe84
       f1b7fe84 43860570 c0205a3a 43860570 f1b7fe80 00000000 00800000 c036f877
Call Trace:    [<c01efc0f>] [<c01fe116>] [<c01fe098>] [<c0205a3a>] [<c0205e31>]
  [<c02061c5>] [<c01fe513>] [<c01fa900>] [<c020339c>] [<c015d120>] [<c013ad73>]
  [<c010749f>]

Code: 80 3b aa 0f 44 c3 5b c3 a1 d4 d0 43 c0 eb f7 8b 44 24 04 c3
Comment 21 Martin Mokrejs 2003-10-12 16:05:55 UTC
And here is 2.4.23-pre7:

vrapenec root # tail -f /var/log/kern.log &
[1] 2016
Oct 13 00:26:39 vrapenec kernel: [drm] AGP 0.99 Aperture @ 0xe0000000 256MB
Oct 13 00:26:39 vrapenec kernel: [drm] Initialized radeon 1.7.0 20020828 on
minor 0
Oct 13 00:26:39 vrapenec kernel: ISO 9660 Extensions: Microsoft Joliet Level 1
Oct 13 00:26:39 vrapenec kernel: ISOFS: changing to secondary root
Oct 13 00:26:39 vrapenec kernel: kjournald starting.  Commit interval 5
seconds
Oct 13 00:26:39 vrapenec kernel: EXT3-fs warning: maximal mount count reached,
running e2fsck is recommended
Oct 13 00:26:39 vrapenec kernel: EXT3 FS 2.4-0.9.19, 19 August 2002 on
ide0(3,3), internal journal
Oct 13 00:26:39 vrapenec kernel: EXT3-fs: recovery complete.
Oct 13 00:26:39 vrapenec kernel: EXT3-fs: mounted filesystem with ordered data
mode.
Oct 13 00:26:39 vrapenec kernel: eth0: link up, 100Mbps, full-duplex, lpa
0x45E1
vrapenec root # echo 0 > /proc/acpi/fan/CFAN/state
vrapenec root # echo 3 > /proc/acpi/fan/CFAN/state
vrapenec root # Oct 13 00:30:19 vrapenec kernel: Change device [CFAN] from D0
to D3,power resources number:current 1,target 0
Oct 13 00:30:19 vrapenec kernel: acpi_power-0365 [30] acpi_power_transition :
Error transitioning device [CFAN] to D3
Oct 13 00:30:19 vrapenec kernel: acpi_bus-0496 [29] acpi_bus_set_power    :
Error transitioning device [CFAN] to D3
echo 0 > /proc/acpi/fan/CFAN/state
vrapenec root # Oct 13 00:30:35 vrapenec kernel: Change device [CFAN] from D3
to D0,power resources number:current 0,target 1
echo 3 > /proc/acpi/fan/CFAN/state
vrapenec root # Oct 13 00:30:42 vrapenec kernel: Change device [CFAN] from D0
to D3,power resources number:current 1,target 0
Oct 13 00:30:42 vrapenec kernel: acpi_power-0365 [32] acpi_power_transition :
Error transitioning device [CFAN] to D3
Oct 13 00:30:42 vrapenec kernel: acpi_bus-0496 [31] acpi_bus_set_power    :
Error transitioning device [CFAN] to D3
echo 1 > /proc/acpi/thermal_zone/THRM/cooling_mode
vrapenec root # echo 0 > /proc/acpi/fan/CFAN/state
vrapenec root # Oct 13 00:30:54 vrapenec kernel: Change device [CFAN] from D3
to D0,power resources number:current 0,target 1
echo 3 > /proc/acpi/fan/CFAN/state
vrapenec root # Oct 13 00:30:58 vrapenec kernel: Change device [CFAN] from D0
to D3,power resources number:current 1,target 0
echo 0 > /proc/acpi/fan/CFAN/state
vrapenec root # Oct 13 00:31:14 vrapenec kernel: Change device [CFAN] from D3
to D0,power resources number:current 0,target 1
Oct 13 00:31:14 vrapenec kernel: acpi_power-0365 [40] acpi_power_transition :
Error transitioning device [CFAN] to D0
Oct 13 00:31:14 vrapenec kernel: acpi_bus-0496 [39] acpi_bus_set_power    :
Error transitioning device [CFAN] to D0
echo 3 > /proc/acpi/fan/CFAN/state
vrapenec root # Oct 13 00:31:20 vrapenec kernel: Change device [CFAN] from D/
to D3,power resources number:current 0,target 0
Oct 13 00:31:20 vrapenec kernel: acpi_power-0365 [42] acpi_power_transition :
Error transitioning device [CFAN] to D3
Oct 13 00:31:20 vrapenec kernel: acpi_bus-0496 [41] acpi_bus_set_power    :
Error transitioning device [CFAN] to D3
echo 0 > /proc/acpi/thermal_zone/THRM/cooling_mode
vrapenec root # Oct 13 00:31:39 vrapenec kernel: Change device [CFAN] from D/
to D0,power resources number:current 0,target 1
echo 0 > /proc/acpi/fan/CFAN/state
vrapenec root # echo 0 > /proc/acpi/fan/CFAN/state
vrapenec root # echo 3 > /proc/acpi/fan/CFAN/state
vrapenec root # Oct 13 00:32:16 vrapenec kernel: Change device [CFAN] from D0
to D3,power resources number:current 1,target 0
echo 3 > /proc/acpi/fan/CFAN/state
vrapenec root # echo 0 > /proc/acpi/fan/CFAN/state
vrapenec root # Oct 13 00:32:31 vrapenec kernel: Change device [CFAN] from D3
to D0,power resources number:current 0,target 1
echo 3 > /proc/acpi/fan/CFAN/state
vrapenec root # Oct 13 00:32:41 vrapenec kernel: Change device [CFAN] from D0
to D3,power resources number:current 1,target 0
echo 3 > /proc/acpi/fan/CFAN/state
vrapenec root # echo 0 > /proc/acpi/fan/CFAN/state
vrapenec root # Oct 13 00:32:46 vrapenec kernel: Change device [CFAN] from D3
to D0,power resources number:current 0,target 1
echo 0 > /proc/acpi/fan/CFAN/state
vrapenec root # echo 1 > /proc/acpi/thermal_zone/THRM/cooling_mode
vrapenec root # Oct 13 00:32:52 vrapenec kernel: Change device [CFAN] from D0
to D3,power resources number:current 1,target 0
echo 3 > /proc/acpi/fan/CFAN/state
vrapenec root # echo 3 > /proc/acpi/fan/CFAN/state
vrapenec root # echo 0 > /proc/acpi/fan/CFAN/state
vrapenec root # Oct 13 00:33:03 vrapenec kernel: Change device [CFAN] from D3
to D0,power resources number:current 0,target 1
echo 0 > /proc/acpi/fan/CFAN/state
vrapenec root # echo 1 > /proc/acpi/thermal_zone/THRM/cooling_mode
vrapenec root # echo 0 > /proc/acpi/fan/CFAN/state
vrapenec root # echo 3 > /proc/acpi/fan/CFAN/state
vrapenec root # Oct 13 00:33:32 vrapenec kernel: Change device [CFAN] from D0
to D3,power resources number:current 1,target 0
echo 3 > /proc/acpi/fan/CFAN/state
vrapenec root # echo 0 > /proc/acpi/thermal_zone/THRM/cooling_mode
vrapenec root # Oct 13 00:33:44 vrapenec kernel: Change device [CFAN] from D3
to D0,power resources number:current 0,target 1
echo 3 > /proc/acpi/fan/CFAN/state
vrapenec root # Oct 13 00:34:05 vrapenec kernel: Change device [CFAN] from D0
to D3,power resources number:current 1,target 0
echo 3 > /proc/acpi/fan/CFAN/state
vrapenec root # echo 1 > /proc/acpi/thermal_zone/THRM/cooling_mode
vrapenec root # echo 1 > /proc/acpi/thermal_zone/THRM/cooling_mode
vrapenec root # echo 3 > /proc/acpi/fan/CFAN/state
vrapenec root # echo 0 > /proc/acpi/fan/CFAN/state
vrapenec root # Oct 13 00:34:36 vrapenec kernel: Change device [CFAN] from D3
to D0,power resources number:current 0,target 1
echo 0 > /proc/acpi/fan/CFAN/state
vrapenec root # echo 3 > /proc/acpi/fan/CFAN/state
vrapenec root # Oct 13 00:34:40 vrapenec kernel: Change device [CFAN] from D0
to D3,power resources number:current 1,target 0



My conclusion is that the state get's set to "D/" only sometimes when
acpi_power-0365 and acpi_bus-0496 give error. Often, the echo commands change
the state but the acpi_power and acpi_bus- are somehow not called and therefore
no error occurs. I did not manage yet to get Ooops on this 2.4.23-pre7.
Comment 22 Shaohua 2003-10-12 20:47:33 UTC
right, ACPI_STATE_UNKNOWN + '0' = '/'
>Unable to handle kernel paging request at virtual address 43860570
43860570 is below c0000000. sounds like fan device->power.states
[ACPI_STATE_D0].resources.handles have errors. But they have been correct, 
because when initialize fan, we must use these handles. I guess they are 
changed uncorrectly after initializing fan. Try don't insert fan & thermalzone 
modules, and directly control power resource PRCF, what will it happen?
Comment 23 Karol Kozimor 2003-10-13 06:18:53 UTC
Controlling PRCF on a kernel without thermal and fan modules loaded produces no
visible output (i.e. neither any trace in logs, nor a change in PRCF/state).

I'll be compiling 2.4.22 with 20031002 ACPI to see if it makes any difference
(so far, 20030918 oopses).
Comment 24 Karol Kozimor 2003-10-15 14:43:15 UTC
I applied the changes Martin proposed. I'm still using 20030918 code, for what
matters. As usual, hand-triggering the oops doesn't work if cooling_mode is
active, so I switch to passive later on.

Oh, BTW: why does fan state change internally (in the code), even though the
state file still says: on? I.e. if the fan is in D0, echo 3 > state won't turn
it off (will produce "error transitioning from D0 to D3"), but subsequent echo 0
> state will produce "error transitioning from D3 to D0" (that's the first two
changes in the log below), as if the fan was actually off, which is obviously wrong.

Here's the guts:
Change device [CFAN] from D0 to D3,power resources
^I^I^Inumber:current 1,target 0
Current device's handle:c12d21a8
acpi_power-0368 [32] acpi_power_transition : Error transitioning device [CFAN] to D3
acpi_bus-0496 [31] acpi_bus_set_power    : Error transitioning device [CFAN] to D3
Change device [CFAN] from D3 to D0,power resources
^I^I^Inumber:current 0,target 1
Target device's handle:c12d21a8
Change device [CFAN] from D0 to D3,power resources
^I^I^Inumber:current 1,target 0
Current device's handle:c12d21a8
acpi_power-0368 [34] acpi_power_transition : Error transitioning device [CFAN] to D3
acpi_bus-0496 [33] acpi_bus_set_power    : Error transitioning device [CFAN] to D3
Change device [CFAN] from D0 to D3,power resources
^I^I^Inumber:current 1,target 0
Current device's handle:c12d21a8
acpi_power-0368 [38] acpi_power_transition : Error transitioning device [CFAN] to D3
acpi_bus-0496 [37] acpi_bus_set_power    : Error transitioning device [CFAN] to D3
Change device [CFAN] from D3 to D0,power resources
^I^I^Inumber:current 0,target 1
Target device's handle:c12d21a8
Change device [CFAN] from D0 to D3,power resources
^I^I^Inumber:current 1,target 0
Current device's handle:c12d21a8
Change device [CFAN] from D3 to D0,power resources
^I^I^Inumber:current 0,target 1
Target device's handle:c12d21a8
acpi_power-0368 [49] acpi_power_transition : Error transitioning device [CFAN] to D0
acpi_bus-0496 [48] acpi_bus_set_power    : Error transitioning device [CFAN] to D0
Change device [CFAN] from D/ to D0,power resources
^I^I^Inumber:current 659448717,target 1
Target device's handle:c12d21a8
Current device's handle:872c33cc
Unable to handle kernel paging request at virtual address 872c33cc
*pde = 00000000
c02009bc
Oops: 0000
CPU:    0
EIP:    0010:[<c02009bc>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
eax: 00000000   ebx: 872c33cc   ecx: c133de10   edx: 00000006
esi: 872c33cc   edi: c133de6c   ebp: c020fb2c   esp: c133de18
ds: 0018   es: 0018   ss: 0018
Process bash (pid: 1216, stackpage=c133d000)
Stack: 00001001 c02016a3 872c33cc c133de44 c133de6c 872c33cc c133dea8 c020fbaa
       872c33cc c020fb2c c133de6c 00010000 c02cf380 c02cf335 00000050 c133de70
       c133de70 872c33cc c02174ce 872c33cc c133de6c 00000000 00800000 c02d0588
Call Trace: [<c02016a3>]  [<c020fbaa>]  [<c020fb2c>]  [<c02174ce>]  [<c02178c5>]
 [<c0217bcb>]  [<c0217c70>]  [<c020ffa7>]  [<c020c400>]  [<c0214e30>] 
[<c0157b72>]  [<c016a0f8>]  [<c0147f5d>]  [<c01078af>]
Code: 80 3b aa 0f 44 c3 5b c3 a1 f4 b7 35 c0 eb f7 8b 44 24 04 c3


>>EIP; c02009bc <acpi_ns_map_handle_to_node+17/26>   <=====

>>ecx; c133de10 <_end+fc3438/12499688>
>>edi; c133de6c <_end+fc3494/12499688>
>>ebp; c020fb2c <acpi_bus_data_handler+0/39>
>>esp; c133de18 <_end+fc3440/12499688>

Trace; c02016a3 <acpi_get_data+38/5d>
Trace; c020fbaa <acpi_bus_get_device+45/ae>
Trace; c020fb2c <acpi_bus_data_handler+0/39>
Trace; c02174ce <acpi_power_get_context+4a/ae>
Trace; c02178c5 <acpi_power_off_device+4a/1a7>
Trace; c0217bcb <acpi_power_transition+bd/1a5>
Trace; c0217c70 <acpi_power_transition+162/1a5>
Trace; c020ffa7 <acpi_bus_set_power+170/298>
Trace; c020c400 <acpi_ut_trace+a/2c>
Trace; c0214e30 <acpi_fan_write_state+b1/de>
Trace; c0157b72 <dupfd+52/70>
Trace; c016a0f8 <proc_file_write+98/e0>
Trace; c0147f5d <sys_write+ad/1e0>
Trace; c01078af <system_call+33/38>

Code;  c02009bc <acpi_ns_map_handle_to_node+17/26>
00000000 <_EIP>:
Code;  c02009bc <acpi_ns_map_handle_to_node+17/26>   <=====
   0:   80 3b aa                  cmpb   $0xaa,(%ebx)   <=====
Code;  c02009bf <acpi_ns_map_handle_to_node+1a/26>
   3:   0f 44 c3                  cmove  %ebx,%eax
Code;  c02009c2 <acpi_ns_map_handle_to_node+1d/26>
   6:   5b                        pop    %ebx
Code;  c02009c3 <acpi_ns_map_handle_to_node+1e/26>
   7:   c3                        ret
Code;  c02009c4 <acpi_ns_map_handle_to_node+1f/26>
   8:   a1 f4 b7 35 c0            mov    0xc035b7f4,%eax
Code;  c02009c9 <acpi_ns_map_handle_to_node+24/26>
   d:   eb f7                     jmp    6 <_EIP+0x6>
Code;  c02009cb <acpi_ns_convert_entry_to_handle+0/5>
   f:   8b 44 24 04               mov    0x4(%esp,1),%eax
Code;  c02009cf <acpi_ns_convert_entry_to_handle+4/5>
  13:   c3                        ret
Comment 25 Karol Kozimor 2003-10-15 15:35:29 UTC
I've just tested without thermal.o:

1) If thermal.o is not present, it's not possible to trigger the oops (one needs
to switch the fan off, and this is done either when the temp drops below 40C,
which is next to impossible during normal operation, or when cooling_mode is
switched).
2) If thermal.o was loaded, cooling_mode was switched to passive, and the module
immediately unloaded: the oops occurs in the same situation (see log below).

FYI: the output in the middle of the log comes from switching the cooling_mode,
and not from manually trying to switch the fan. Therefore, no errors seem to occur.

[echo n > fan/CFAN/state]
Change device [CFAN] from D0 to D3,power resources
^I^I^Inumber:current 1,target 0
Current device's handle:c12d21a8
acpi_power-0368 [34] acpi_power_transition : Error transitioning device [CFAN] to D3
acpi_bus-0496 [33] acpi_bus_set_power    : Error transitioning device [CFAN] to D3
Change device [CFAN] from D3 to D0,power resources
^I^I^Inumber:current 0,target 1
Target device's handle:c12d21a8
Change device [CFAN] from D0 to D3,power resources
^I^I^Inumber:current 1,target 0
Current device's handle:c12d21a8
acpi_power-0368 [36] acpi_power_transition : Error transitioning device [CFAN] to D3
acpi_bus-0496 [35] acpi_bus_set_power    : Error transitioning device [CFAN] to D3
Change device [CFAN] from D3 to D0,power resources
^I^I^Inumber:current 0,target 1
Target device's handle:c12d21a8
ACPI: Thermal Zone [THRM] (49 C)
[echo n > thermal_zone/THRM/cooling_mode]
Change device [CFAN] from D0 to D3,power resources
^I^I^Inumber:current 1,target 0
Current device's handle:c12d21a8
Change device [CFAN] from D3 to D0,power resources
^I^I^Inumber:current 0,target 1
Target device's handle:c12d21a8
Change device [CFAN] from D0 to D3,power resources
^I^I^Inumber:current 1,target 0
Current device's handle:c12d21a8
[echo n > fan/CFAN/state]
Change device [CFAN] from D3 to D0,power resources
^I^I^Inumber:current 0,target 1
Target device's handle:c12d21a8
acpi_power-0368 [61] acpi_power_transition : Error transitioning device [CFAN] to D0
acpi_bus-0496 [60] acpi_bus_set_power    : Error transitioning device [CFAN] to D0
Change device [CFAN] from D/ to D3,power resources
^I^I^Inumber:current 659448717,target 0
Current device's handle:876c33cc
[Oops]
Comment 26 Shaohua 2003-10-15 20:41:44 UTC
the below patch should be able to get rid of oops. But I guess error will 
still exist.
--- power.c     2003-09-11 10:48:13.000000000 +0800
+++ power.c.new 2003-10-16 11:37:49.000000000 +0800
@@ -326,8 +326,6 @@
        cl = &device->power.states[device->power.state].resources;
        tl = &device->power.states[state].resources;

-       device->power.state = ACPI_STATE_UNKNOWN;
-
        if (!cl->count && !tl->count) {
                result = -ENODEV;
                goto end;
@@ -345,8 +343,6 @@
                        goto end;
        }

-       device->power.state = state;
-
        /*
         * Then we dereference all power resources used in the current list.
         */
@@ -356,7 +352,9 @@
                        goto end;
        }

+       device->power.state = state;
 end:
+       /*TBD:if error occurs, should we let devices have original state?*/
        if (result)
                ACPI_DEBUG_PRINT((ACPI_DB_WARN,
                        "Error transitioning device [%s] to D%d\n",
Comment 27 Karol Kozimor 2003-10-16 07:58:36 UTC
The oops does not occur when your patch is applied. Errors are understandable, since the appropriate ASL methods do not work as expected by the spec. Also, the error returned is -8, as Martin has already stated. Thanks for your work.  FYI: A funny thing I noticed: after some time, the errors stop being reported. I'll try take a look into that.
Comment 28 Shaohua 2003-10-26 18:33:51 UTC
The patch has been merged. I'd like close it. If needed, we can reopen it.

Note You need to log in before you can comment on or make changes to this bug.