Bug 8385 - 2.6.21 regression: BUG: scheduling while atomic: kacpid - Acer Travelmate 4001 lmi, Acer TM660
Summary: 2.6.21 regression: BUG: scheduling while atomic: kacpid - Acer Travelmate 400...
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Thermal (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: acpi_power-thermal
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-04-28 01:34 UTC by François Valenduc
Modified: 2008-01-12 21:25 UTC (History)
6 users (show)

See Also:
Kernel Version: 2.6.21
Tree: Mainline
Regression: ---


Attachments
kernel log extract (186.46 KB, text/plain)
2007-04-28 01:35 UTC, François Valenduc
Details
Output of acpidump (109.35 KB, text/plain)
2007-04-28 23:11 UTC, François Valenduc
Details
Kernel configuration file (42.12 KB, text/plain)
2007-04-28 23:35 UTC, François Valenduc
Details
dmesg of kernel 2.6.21 (official vanilla sources) (16.58 KB, text/plain)
2007-04-29 02:56 UTC, François Valenduc
Details
kernel configuration 2.6.21, official vanilla sources (41.56 KB, text/plain)
2007-04-29 02:57 UTC, François Valenduc
Details
ACPI errors occuring before the kernel panic (18.84 KB, text/plain)
2007-04-29 02:57 UTC, François Valenduc
Details
Undo sync execution of Notify (856 bytes, patch)
2007-04-29 04:06 UTC, Alexey Starikovskiy
Details | Diff
Execute Notify on other thread (5.46 KB, patch)
2007-04-29 04:07 UTC, Alexey Starikovskiy
Details | Diff
Remove recursion from thermal notify (749 bytes, patch)
2007-05-11 06:20 UTC, Alexey Starikovskiy
Details | Diff
Do not do acpi_thermal_check recursively/in parallel (1.38 KB, patch)
2007-06-05 04:59 UTC, Alexey Starikovskiy
Details | Diff

Description François Valenduc 2007-04-28 01:34:48 UTC
Most recent kernel where this bug did *NOT* occur:
Distribution: Debian sid
Hardware Environment: Acer Travelmate 4001 lmi, intel centrino 1,5 Ghz
Software Environment:
Problem Description:
When the thermal module is loaded, I get plenty of problems: kernel bug, kernel
oopses and also sometime kernel panic. Not loading the thermal module seems to
solve the problem, but I am not totally sure that it's enough to avoid my
computer to freeze. I have added an extract of kernel log. It contains around
2000 lines of acpi related errors !

Steps to reproduce: simply trying to use the computer when the thermal modules
is loaded !
Comment 1 François Valenduc 2007-04-28 01:35:51 UTC
Created attachment 11307 [details]
kernel log extract
Comment 2 Zhang Rui 2007-04-28 01:53:43 UTC
Does this happen in some earlier release, e.g 2.6.21-rc3?
Comment 3 François Valenduc 2007-04-28 01:56:38 UTC
It has happened at least since 2.6.21-rc4. I have not tested earlier
pre-releases of 2.6.21, but I had never managed to get an extract of a kernel
log until now.
Comment 4 Bruno Prémont 2007-04-28 11:16:13 UTC
Could be I'm seeing the same thing on my Acer TM660 Laptop. I only get a 
partial trace, also in kacpid.
The bug triggers when compiling, thus having the CPU running at max 
speed/load. Seems to be when the fan starts running to cool CPU down.
Until now I have not yet tried with release, but just with -rc7. The same does 
not happen with 2.6.20.1.

I have ACPI built-in (thermal, with sensor available and working with 2.6.20 
and older releases):
#
# ACPI (Advanced Configuration and Power Interface) Support
#
CONFIG_ACPI=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_SLEEP_PROC_FS=y
# CONFIG_ACPI_SLEEP_PROC_SLEEP is not set
# CONFIG_ACPI_PROCFS is not set
CONFIG_ACPI_AC=y
CONFIG_ACPI_BATTERY=y
CONFIG_ACPI_BUTTON=y
CONFIG_ACPI_VIDEO=m
CONFIG_ACPI_FAN=y
# CONFIG_ACPI_DOCK is not set
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_THERMAL=y
# CONFIG_ACPI_ASUS is not set
# CONFIG_ACPI_IBM is not set
# CONFIG_ACPI_TOSHIBA is not set
CONFIG_ACPI_BLACKLIST_YEAR=0
# CONFIG_ACPI_DEBUG is not set
CONFIG_ACPI_EC=y
CONFIG_ACPI_POWER=y
CONFIG_ACPI_SYSTEM=y
CONFIG_X86_PM_TIMER=y
# CONFIG_ACPI_CONTAINER is not set
# CONFIG_ACPI_SBS is not set


Oops: 0000 [#1]
Modules linked in: i915 drm cpufreq_conservative squashfs zlib_inflate loop 
nfs lockd nfs_acl sunrpc snd_pcm_oss snd_mixer_oss xfs usb_storage acerhk b44 
nsc_ircc irda crc_ccitt sr_mod cdrom ehci_hcd snd_intel8x0m uhci_hcd usbcore 
snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_timer snd snd_page_alloc 
i2c_i801 sg evdev pcspkr
CPU:    0
EIP:    0060:[<c0104aa1>]   Not tainted VLI
EFLAGS: 00010246   (2.6.21-rc7 #1)
EIP is at dump_trace+0x64/0xb0
eax: 00000000   ebx: dde01fe0   ecx: c038a8a8   edx: c03482c1
esi: 0001e000   edi: c034938d   ebp: dde0010c   esp: dde000fc
ds: 007b  es: 007b  fs: 00d8  gs: 0000  ss: 0068
Process kacpid (pid: 28, ti=dddfe000 task=c1457a50 task.ti=dde00000)
Stack: dc31c3c0 c034938d ffff755c dde001ec dde00120 c0104baa c038a8a8 c034938d
       dde00194 dde0012c c0104bd2 c034938d dde00138 c0104cf6 dde00144 dde00184
       c02fef98 c034a130 c1457bdc ffff0005 0000001c 00000000 dde45f80 00000000
Call Trace:
Comment 5 François Valenduc 2007-04-28 13:29:40 UTC
It's indeed always happening when the CPU load is high. It also happens
sometimes when hald is started during the boot process.
Comment 6 Len Brown 2007-04-28 18:16:51 UTC
any difference if you boot with "ec_intr=0"?
Comment 7 Len Brown 2007-04-28 18:33:00 UTC
The stack trace shows recursive notify handlers.
Have seen these on HP nx6325 -- interesting to
find them on Acer as well.

Please attach the output from acpidump.

Unclear, however, why this code is running with interrupts off.

BUG: scheduling while atomic: kacpid/0xce0b2a80/24
 [<c0104e3a>] show_trace_log_lvl+0x1a/0x30
 [<c01054e2>] show_trace+0x12/0x20
 [<c0105596>] dump_stack+0x16/0x20
 [<c02c33ac>] __sched_text_start+0x43c/0x5c0
 [<c02c3b98>] schedule_timeout+0x48/0xc0
 [<c021d4ad>] acpi_ec_wait+0xf1/0x151
 [<c021d628>] acpi_ec_transaction+0x11b/0x1c5
 [<c021d7c1>] acpi_ec_write+0x30/0x32
 [<c021d8c5>] acpi_ec_space_handler+0x9c/0x163
 [<c0208bfa>] acpi_ev_address_space_dispatch+0x16c/0x1b9
 [<c020cc00>] acpi_ex_access_region+0x203/0x217
 [<c020cd28>] acpi_ex_field_datum_io+0x114/0x1a5
 [<c020d0d9>] acpi_ex_write_with_update_rule+0x110/0x118
 [<c020d252>] acpi_ex_insert_into_field+0x171/0x29f
 [<c020b940>] acpi_ex_write_data_to_field+0x20e/0x226
 [<c020fd5c>] acpi_ex_store_object_to_node+0x70/0xa6
 [<c020feff>] acpi_ex_store+0xe8/0x23d
 [<c020dc1d>] acpi_ex_opcode_1A_1T_1R+0x3c0/0x53e
 [<c020627f>] acpi_ds_exec_end_op+0xca/0x3db
 [<c0215286>] acpi_ps_parse_loop+0x56f/0x715
 [<c02146fd>] acpi_ps_parse_aml+0x68/0x246
 [<c021597e>] acpi_ps_execute_method+0x11f/0x1c1
 [<c0212bb8>] acpi_ns_evaluate+0xa0/0x100
 [<c0212810>] acpi_evaluate_object+0x120/0x1c0
 [<c021f8d1>] acpi_power_on+0xc2/0x110
 [<c021fc3a>] acpi_power_transition+0x78/0xf7
 [<c021b949>] acpi_bus_set_power+0xe8/0x185
 [<e15c31dc>] acpi_thermal_active+0x6a/0xe5 [thermal]
 [<e15c34e6>] acpi_thermal_check+0x28f/0x39b [thermal]
 [<e15c399f>] acpi_thermal_notify+0x39/0x61 [thermal]
 [<c0209758>] acpi_ev_queue_notify_request+0xd9/0xf4
 [<c020f9c8>] acpi_ex_opcode_2A_0T_0R+0x68/0x98
 [<c020627f>] acpi_ds_exec_end_op+0xca/0x3db
 [<c0215286>] acpi_ps_parse_loop+0x56f/0x715
 [<c02146fd>] acpi_ps_parse_aml+0x68/0x246
 [<c021597e>] acpi_ps_execute_method+0x11f/0x1c1
 [<c0212bb8>] acpi_ns_evaluate+0xa0/0x100
 [<c0212810>] acpi_evaluate_object+0x120/0x1c0
 [<c021f8d1>] acpi_power_on+0xc2/0x110
 [<c021fc3a>] acpi_power_transition+0x78/0xf7
 [<c021b949>] acpi_bus_set_power+0xe8/0x185
 [<e15c31dc>] acpi_thermal_active+0x6a/0xe5 [thermal]
 [<e15c34e6>] acpi_thermal_check+0x28f/0x39b [thermal]
 [<e15c399f>] acpi_thermal_notify+0x39/0x61 [thermal]
 [<c0209758>] acpi_ev_queue_notify_request+0xd9/0xf4
 [<c020f9c8>] acpi_ex_opcode_2A_0T_0R+0x68/0x98
 [<c020627f>] acpi_ds_exec_end_op+0xca/0x3db
 [<c0215286>] acpi_ps_parse_loop+0x56f/0x715
 [<c02146fd>] acpi_ps_parse_aml+0x68/0x246
 [<c021597e>] acpi_ps_execute_method+0x11f/0x1c1
 [<c0212bb8>] acpi_ns_evaluate+0xa0/0x100
 [<c0212810>] acpi_evaluate_object+0x120/0x1c0
 [<c021f8d1>] acpi_power_on+0xc2/0x110
 [<c021fc3a>] acpi_power_transition+0x78/0xf7
 [<c021b949>] acpi_bus_set_power+0xe8/0x185
 [<e15c31dc>] acpi_thermal_active+0x6a/0xe5 [thermal]
 [<e15c34e6>] acpi_thermal_check+0x28f/0x39b [thermal]
 [<e15c399f>] acpi_thermal_notify+0x39/0x61 [thermal]
 [<c0209758>] acpi_ev_queue_notify_request+0xd9/0xf4
 [<c020f9c8>] acpi_ex_opcode_2A_0T_0R+0x68/0x98
 [<c020627f>] acpi_ds_exec_end_op+0xca/0x3db
 [<c0215286>] acpi_ps_parse_loop+0x56f/0x715
 [<c02146fd>] acpi_ps_parse_aml+0x68/0x246
 [<c021597e>] acpi_ps_execute_method+0x11f/0x1c1
 [<c0212bb8>] acpi_ns_evaluate+0xa0/0x100

Comment 8 François Valenduc 2007-04-28 23:11:32 UTC
Created attachment 11326 [details]
Output of acpidump
Comment 9 François Valenduc 2007-04-28 23:12:17 UTC
Using ec_intr=0 doesn't solve the problem at all. I even get almost always a
kernel panic during startup when hald starts. It shows plenty of error with
"acpi_ns_evaluate" which seems similar to the ones listed in the log I have
posted. The last message I see is "Bad EIP value".
Comment 10 Alexey Starikovskiy 2007-04-28 23:31:34 UTC
Could you please post whole .config?
Comment 11 François Valenduc 2007-04-28 23:35:50 UTC
Created attachment 11327 [details]
Kernel configuration file

Here it is
Comment 12 François Valenduc 2007-04-28 23:40:29 UTC
Here is a part of the message I see when there is a kernel panic:

EIP 0060:[<c012072d>] Tainted P VLI
EIP is at run_timer_softirq+0x14d/0x160
...
EIP [<c012072d>] run_timer_softirq+0x14d/0x160 SS:ESP0068c14b81bc
...
general protection fault 0000
EIP is at complete+0xa/0x40
Process kacpid ...

Please let me know if you want more details. The kernel is tainted because I use
the hsf drivers. However, I have already tested without with release candidates
of 2.6.21 and it made no difference. I can test again without hsf drivers loaded
if you want.

Thanks for your help.
Comment 13 Alexey Starikovskiy 2007-04-28 23:56:37 UTC
Your config mentions suspend2, could you please try without it?
Comment 14 François Valenduc 2007-04-28 23:59:25 UTC
Yet another problem. This is not a kernel panic but the computer is frozen
anyway when it occurs:

EIP 0060 [<c0104dca>] not tainted VLI
EFLAGS 00010246 [2.6.21.1 #8]
EIP is at dump_trace+0x6a/0xc0
process kacpid (pid:24, ti=c14b8000 task dfe1e070 task.ti=c14b8000
...
general protection fault: 0000 #2
EIP 0060 [<c01164fa>] not tainted
EIP is at complete +0xa/0x40
process kacpid (pid:24, ti=c14b6000 task dfe1e070 task.ti=c14b8010
Comment 15 François Valenduc 2007-04-29 00:38:45 UTC
So, I have tried without suspend2. I almost tought the problem was gone but it
is not the case. It takes more time to see the problem occuring but it occurs
anyway.
Comment 16 Alexey Starikovskiy 2007-04-29 01:46:05 UTC
ok, do you have any other off-tree patches applied?
Comment 17 François Valenduc 2007-04-29 01:52:33 UTC
I use vesafb-tng and fbsplash patches made by Spock. In fact, I have already
tested release candidates of 2.6.21 without any extra patches the problem also
occurs.
Comment 18 Alexey Starikovskiy 2007-04-29 02:10:53 UTC
then let's see your config/dmesg from those vanilla kernels.
Comment 19 François Valenduc 2007-04-29 02:56:27 UTC
Created attachment 11328 [details]
dmesg of kernel 2.6.21 (official vanilla sources)
Comment 20 François Valenduc 2007-04-29 02:57:10 UTC
Created attachment 11329 [details]
kernel configuration 2.6.21, official vanilla sources
Comment 21 François Valenduc 2007-04-29 02:57:49 UTC
Created attachment 11330 [details]
ACPI errors occuring before the kernel panic
Comment 22 François Valenduc 2007-04-29 02:58:10 UTC
I have thus recompiled a pure vanilla kernel and this time, the problem occurs
maximum 2 minutes after the end of the boot process. It always ends up in a
kernel panic. Shortly before that, I get the same errors as in the first kernel
log extract (see new attachment). Then the kernel panic occurs like that:

die+0xe/Ox1d0
do_page_fault +0x227/+0x610
error_code +0x74/+0x7c
_wake_up +0x22/+0x30
_queue_work +0x26/+0x40
kblockd_schedule_work 0xf/0x20
blk_unplug_timeout +0xb/+0x10
run_timer_softirq +0xb/+0x10
_do_softirq +0x42/+0x90
do_softirq +0x2a/+0x30
irq_exit +0x2/+0x40
do_IRQ +0x45/+0x80
common_interrupt +0x23/+0x28
cpu_idle +0x39/+0x50
start_kernel

EIP [<c14b7fe3>] 0xc147bfe3 SS:ESP 0068:c037d80
Kernel panic: not sincing: fatal exception in interrupt.
Comment 23 Bruno Prémont 2007-04-29 03:13:05 UTC
Intresting point, the following sequence makes crash not happen (immediately) 
anymore on just heavy CPU load and temperature reaches the treashold at which 
FAN starts:

boot without thermal (built as module)

modprobe thermal
rmmod thermal
modprobe thermal

But later while running the system crashed:
(gkrellm was in D state and system crashed during/at the end of sysreq+t I 
issued to determine why it was stuck in that state - I assume some procfs 
reading)
Same partial trace as I usually get, but preceeded with the recursive 
notifications that Fran
Comment 24 Alexey Starikovskiy 2007-04-29 04:04:35 UTC
Could you please try to set Stack debugging variables:
# CONFIG_DEBUG_STACKOVERFLOW is not set
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_4KSTACKS is not set
Comment 25 Alexey Starikovskiy 2007-04-29 04:06:11 UTC
Created attachment 11331 [details]
Undo sync execution of Notify

Please try to apply this patch and see if problem goes away...
Comment 26 Alexey Starikovskiy 2007-04-29 04:07:37 UTC
Created attachment 11332 [details]
Execute Notify on other thread

If above patch helps, please try if things still work with this patch applied.
Comment 27 François Valenduc 2007-04-29 07:02:33 UTC
It seems to work much better with both patches you posted. I was able to compile
a kernel or to make a kernel-header packages, what I had never been able to do
before with kernel 2.6.21. Previously, I always got a kernel panic during one of
those two operations. Also, I don't encounter anymore a kernel panic as early as
2 minutes after bootup. So I hope everything works fine now. But, could you
explain what your two patches are doing ?

Thanks for your help,
Fran
Comment 28 Alexey Starikovskiy 2007-04-29 07:18:55 UTC
Well, there is a long story in 5534. take a look if you are interested.
Basically, Notify operator of AML interpreter needs to execute some arbitrary
C-code, which may call AML interpreter again. latest HP notebooks were known to
cause a deadlock if we schedule notify execution after the code that issues it.
Thus the patch you just applied was invented (it executes notify on separate
thread). At some point it was desided that having another thread is dangerous
and executing notify inplace (thus having several AML interpreters on single
stack) is less dangerous. Thus the patch you just reverted. In your case it seem
to do stack overflow, which was predicted as one of the dangers of this patch.
Comment 29 François Valenduc 2007-04-29 07:57:46 UTC
So, if I have well understood,the way the notification of the execution of the
AML code is suitable for some notebooks but not for others. So what do you plan
to do now ? Can it become a kernel configuration option so that we can choose
the way we want to execute it ?


Comment 30 Alexey Starikovskiy 2007-04-29 08:41:09 UTC
If it works for you with both patches applied, then there is no problem,
as these two patches just change behavior to the one that is already known to
work for HP. Fill free to mark bug resolved (not yet closed) if you think that
these two patches together solve your issue. Then it will work of Len Brown
(Linux ACPI) maintainer to move these patches to mainline kernel and mark bug
closed.
Comment 31 François Valenduc 2007-04-29 09:04:26 UTC
I have read the thread about bug # 5534 and I am afraid I have the same problem
concerning temperature management. When I run intensive application (partimage,
kernel compile for example), the fan makes a lot of noise but the computer
doesn't seem to cool down. If I run
cat /proc/acpi/thermal_zone/THRM/polling_frequency,i obtain "<polling disabled>"
as output. 

Is it really normal ?
Thanks for your help,

Fran
Comment 32 Alexey Starikovskiy 2007-04-29 09:21:48 UTC
if your fan spins, then you don't have thermal management problem.
"polling disabled" is a default value, meaning that embedded controller calls us
then temperature changes instead of us polling it over some interval. if you
want to see something different, write value in seconds to this file.
Comment 33 François Valenduc 2007-04-29 09:25:55 UTC
So, I mark the bug as resolved.
Comment 34 Alexey Starikovskiy 2007-04-29 09:45:36 UTC
Thanks for report and testing.
Comment 35 François Valenduc 2007-04-29 10:22:03 UTC
On question I am still asking: do I need to keep the kernel options you
suggested (CONFIG_DEBUG_STACKOVERFLOW, CONFIG_DEBUG_STACK_USAGE or
CONFIG_4KSTACKS) ? I guess DEBUG options are no more needed but is it preferable
to use 4kb for the kernel stacks ?
Comment 36 Bruno Prémont 2007-04-29 10:39:50 UTC
Ok, use of both patches fix crashes for me as well.

Fran
Comment 37 François Valenduc 2007-05-01 12:42:23 UTC
So I think I can mark this bug as verified.

Thanks for your help.
Comment 38 Robert Moore 2007-05-02 14:46:22 UTC
It is worse than just simple stack overflow, it is close to an infinite loop.

From examining the stack trace and the DSDT, it looks like this machine is 
falling into an "infinite" loop via the following sequence of events:

A temperature EC event starts the whole thing going.

Linux acpi_ec_gpe_query runs
  Invoke _Q81 (temp is falling)
    Notify (THRM, 0x80)
      Perform thermal_check
        Invoke active thermal state handler
        Attempt to turn off a fan
          Invoke _OFF method for fan
            Invoke THRM._SCP
              Notify (THRM, 0x81)

The Notify (THRM, 0x81) causes a call to thermal_check (in the Linux thermal 
notify handler), and we end up in an infinite loop. (or at least we quickly 
spin through this thing enough times that a stack fault occurs before some 
event terminates it.)

I think it's a bit early to close this bug.
Comment 39 Daniel Drake 2007-05-07 16:28:35 UTC
downstream report https://bugs.gentoo.org/176615
Comment 40 Len Brown 2007-05-10 17:28:44 UTC
The patches in comments #25 and comment #26 went upstream today,
and should thus appear in the next upstream snapshot after 2.6.21-git13.

closed.
Comment 41 Alexey Starikovskiy 2007-05-11 06:20:37 UTC
Created attachment 11482 [details]
Remove recursion from thermal notify

Please test this patch against clean 2.6.21.
Comment 42 François Valenduc 2007-05-12 00:06:26 UTC
It works also well if I only use the last patch you send. I don't know if it's
related to this bug but I still think there is a problem with fan and
temperature managenement with kernel 2.6.21. If I use the computer normally,
without running applications requiring a lot of CPU usage, the fan starts
working very hard and becomes extremely noisy after a while, even if the
temperature is not extremely high. In fact, the temperature reach 65
Comment 43 Alexey Starikovskiy 2007-05-12 00:11:51 UTC
Thanks once again for testing.
Comment 44 Daniel Drake 2007-05-16 19:07:01 UTC
What's the way forward here then? I note that the patch in comment #25 is noted
to break a HP laptop (see header of commit
40d07080e585396dc58bc64befa1de0695318b3b). Now that an independent fix has been
produced (comment #41), is that patch going to be re-applied to fix the HP laptop?

I see that the patch in comment #41 isn't in the ACPI git tree yet, but it's
only been a few days, I'm probably just being impatient :)

From the perspective of fixing Gentoo's 2.6.21 kernel, which patches should we
backport, from the choices: comment #25, comment #26, comment #41
We'd like to match upstream as best as possible
Comment 45 Jason Cassell 2007-05-24 23:01:29 UTC
It's not just Acers, I have what appears to be the same problem on a Gateway
600YG2 laptop.  The patches from comment #25 and comment #26 fixed it.
Comment 46 Daniel Drake 2007-05-29 15:41:47 UTC
Alexey, any news on the patch? Has it been submitted to Len? Which ones should
we consider backporting for distro kernels? Thanks in advance.
Comment 47 Alexey Starikovskiy 2007-05-30 00:23:24 UTC
Daniel,
yes, Len is aware about all the patches in this bug report.
I don't know if/when he is going to push them upstream. 
I'd recommend patch from #41 for stable kernels, as much less intrusive.
Regards, Alex.
Comment 48 Len Brown 2007-05-31 19:25:47 UTC
re: patch in comment #41 to remove _TMP call from trip-point change notify.

On the HP nx6325 before the patch, the fan turns completely off
when the temperature drops below the lowest trip-point 40,
and then turns on again when it rises above the modified lowest
trip point 45.

However, after this patch is applied on 2.6.22-rc3,
the fan never turns completely off.
Instead it continues running after the temperature drops below 45,
and drives the temperature of TZ1 all the way down to 32.
Comment 49 Alexey Starikovskiy 2007-06-01 00:47:51 UTC
Len, could you try to apply it to 2.6.21, i.e. to sync version of Notify?
Also, is it possible to add some printk in thermal notify to see the order of
notifies in nx6325?
Comment 50 Alexey Starikovskiy 2007-06-05 04:59:43 UTC
Created attachment 11684 [details]
Do not do acpi_thermal_check recursively/in parallel

Len,
Please check if this version works better
Comment 51 Daniel Drake 2007-06-28 19:51:54 UTC
Len, I know you're a busy person, just a reminder: the above patch is awaiting your testing on your nx6325. I'm still interested in backporting these fixes to 2.6.21 but it seems it is not fully settled in 2.6.22-rc yet. Thanks.
Comment 52 Sean Bridges 2007-07-02 19:48:11 UTC
Just a note, I have a Gateway 450ROG that is also effected with this bug in 2.6.21.5.  I applied the patches in comment #25 and comment #26 and so far it appears very stable. 
Comment 53 Alexey Starikovskiy 2007-07-08 23:57:13 UTC
any chance you could try patch from #50?
Comment 54 Sean Bridges 2007-07-09 16:13:29 UTC
Hello Alexey,   I reversed the #25 and #26 patches on my 2.6.21.5 src and applied the patch from comment #50 .  I've been running with this for a couple hours and the system appears just as stable as with the other patches.  Will post back if anything bad happens.  
Comment 55 Andreas Kirschner 2007-07-11 07:50:31 UTC
A note also from my side. I was facing this problem on my old benq joybook 5000U running any default 2.6.21 fedora 7 kernels.

After upgrading to the developmental kernels 2.6.22 the bug went away. Stable now for couple of days.

Thanks, good job! 
Comment 56 Len Brown 2007-09-05 15:30:49 UTC
patch from comment #50 applied to acpi-test.
I'll try it on my nx6325 when i get home.
Comment 57 Len Brown 2008-01-12 21:25:06 UTC
in the name of bug #3686, the patch in comment #50
shipped in linux-2.6.24-rc1.
The HP nx6325 fan works properly, including turning
off completely when temperature drops to 40C.

closed.

Note You need to log in before you can comment on or make changes to this bug.