Bug 19612 - Computer fails to hibernate - problem idling SMP CPU's
Summary: Computer fails to hibernate - problem idling SMP CPU's
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Power Management
Classification: Unclassified
Component: cpufreq (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: cpufreq
URL:
Keywords:
Depends on:
Blocks: 7216 15310
  Show dependency tree
 
Reported: 2010-10-02 22:26 UTC by tempo444z
Modified: 2012-01-18 02:19 UTC (History)
5 users (show)

See Also:
Kernel Version: 2.6.35-1
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description tempo444z 2010-10-02 22:26:25 UTC
I tested kernel 2.6.36 and found that hibernation failed when the
process was attempting to suspend operation of CPU's 1,2 and 3.

I verified that hiberation proceeds if CPU's 1.2 and 3 are manually
suspended prior to requesting that part of the hibernation (freezing)
test.

Tested 2.6.35 - same result.

The test works for 2.6.34, thus the regression appears to be in
linux-source-2.6.35

'test' - refers to the procedures set out in source/Documentation/power
basic.pm.debugging.txt
Comment 1 Rafael J. Wysocki 2010-10-03 20:02:27 UTC
Are you able to switch the CPUs 1, 2, 3 on and off using the
/sys/devices/system/cpu/cpu[1-3]/online interfaces?
Comment 2 Len Brown 2010-10-05 04:31:07 UTC
This is filed against "2.6.35-1" -- what is that?
How is it related to 2.6.35.7 -- the current stable release of Linux?

The description mentions 2.6.36, which is not yet released,
so I assume you tested some 2.6.36-rc?

You mentioned that 2.6.34 works.
Does the latest version of 2.6.34.stable work? -- currently 2.6.34.7
Comment 3 tempo444z 2010-10-08 13:32:48 UTC
On Tue, Oct 5, 2010 at 5:31 AM, <bugzilla-daemon@bugzilla.kernel.org> wrote:
Hi Len,

> https://bugzilla.kernel.org/show_bug.cgi?id=19612
>
>
> Len Brown <lenb@kernel.org> changed:
>
>           What    |Removed                     |Added
>
> ----------------------------------------------------------------------------
>             Status|NEW                         |NEEDINFO
>                 CC|                            |lenb@kernel.org
> --- Comment #2 from Len Brown <lenb@kernel.org>  2010-10-05 04:31:07 ---
> This is filed against "2.6.35-1" -- what is that?
> How is it related to 2.6.35.7 -- the current stable release of Linux?
>
> The description mentions 2.6.36, which is not yet released,
> so I assume you tested some 2.6.36-rc?
>
> You mentioned that 2.6.34 works.
> Does the latest version of 2.6.34.stable work? -- currently 2.6.34.7
>
> --
> Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You reported the bug.
>
I'm running Debian|testing/experimental on a 2x2 CPU Opteron machine.

The source code packages tested are:
linux-source-2.6.34_2.6.34-1~experimental.2_all.deb
linux-source-2.6.35_2.6.35-1~experimental.3_all.deb
linux-source-2.6.36_2.6.36~rc5-1~experimental.1_all.deb

I will check the debian package for 2.6.34 and test 2.6.34.7 during the
weekend.
Comment 4 tempo444z 2010-10-09 16:48:57 UTC
100%

> Test results 9 October 2010 file tmp.pm_test.txt
>
> linux-source-2.6.34_2.6.34-1~experimental.2_all.deb
>
> Devices = pm_test    fail/pass 0/10
> Platform = pm_test   fail/pass 0/10
> Processors = pm_test fail/pass 2*/10 [1][2]
> Core = pm_test       fail/pass 3/2 [3][4][5]
> [1] Long delay around 112 sec, system came back apparently fully
> operational. On next test, instant error message, I *think* related to
> access to disks, then returned to gnome scre
> en and termaial with terminal message "bash: echo: write error:
> Input/output error". Reset (reboot) needed.
> [2] Long delay around 112 sec, system came back apparently fully
> operational. Checked and found one mdadm raid5 disc degraded. Rebooted. Told
> mdadm to add the partition back into t
> he array ... rebuilding etc.
> * This may underestime the number of failures. If in doubt about success, I
> rebooted the system. Even so, sometime in this process one of my mdadm
> partitions got kicked from its ra
> id5 array.
> [3] Long delay 120 sec, system restored screed, no keyboard or mouse
> activity, flashing caps lock. reset (reboot)
> [4] Long delay around 112 sec, system came back apparently fully
> operational. Checked and found one mdadm raid5 partition degraded. Rebooted.
> Told mdadm to add the partition back i
> nto the array ... rebuilding etc.
> [5] long delay around 120 sec, black screen, no keyboard, no mouse, reset
> (reboot). Checked and found two mdadm raid5 partitions degraded. Told mdadm
> to add the partitions back int
> o the arrays ... rebuilding etc.
>
> 2.6.34.7
>
> Processors = pm_test fail/pass 4/10 [1][2][3][4][5]
> Devices = pm_test       fail/pass 0/10
> Platform = pm_test      fail/pass 0/10
>
> [1] Long delay around 60 sec, partial restore, keyboard active, then system
> locked up, reset (reboot) needed
> [2] Long delay around 60 sec , screen restored locked up, reset (reboot)
> needed
> [3] instant failure to hibernate, system operational, terminal message
> "bash: echo: write error: Input/output error"
> [4] Long delay around 60 sec, black screen locked up, flashing caps lock,
> reset (reboot) needed
> [5] 15 sec delay (normal) with message "ata1 SRST failed (errorno=-16) then
> lockup, reset (reboot) needed.
>
>
>
Comment 5 tempo444z 2010-10-10 12:51:16 UTC
"Are you able to switch the CPUs 1, 2, 3 on and off using the
/sys/devices/system/cpu/cpu[1-3]/online interfaces?"

Yes. With CPUs 1,2,3 = 0 and running kernel 2.6.24.7 ...
processors = pm_test fail/pass 0/10

On 10/9/10, zz zzzzzzzzzzzzz <tempo444z@gmail.com> wrote:
> 100%
>
>> Test results 9 October 2010 file tmp.pm_test.txt
>>
>> linux-source-2.6.34_2.6.34-1~experimental.2_all.deb
>>
>> Devices = pm_test    fail/pass 0/10
>> Platform = pm_test   fail/pass 0/10
>> Processors = pm_test fail/pass 2*/10 [1][2]
>> Core = pm_test       fail/pass 3/2 [3][4][5]
>> [1] Long delay around 112 sec, system came back apparently fully
>> operational. On next test, instant error message, I *think* related to
>> access to disks, then returned to gnome scre
>> en and termaial with terminal message "bash: echo: write error:
>> Input/output error". Reset (reboot) needed.
>> [2] Long delay around 112 sec, system came back apparently fully
>> operational. Checked and found one mdadm raid5 disc degraded. Rebooted.
>> Told
>> mdadm to add the partition back into t
>> he array ... rebuilding etc.
>> * This may underestime the number of failures. If in doubt about success,
>> I
>> rebooted the system. Even so, sometime in this process one of my mdadm
>> partitions got kicked from its ra
>> id5 array.
>> [3] Long delay 120 sec, system restored screed, no keyboard or mouse
>> activity, flashing caps lock. reset (reboot)
>> [4] Long delay around 112 sec, system came back apparently fully
>> operational. Checked and found one mdadm raid5 partition degraded.
>> Rebooted.
>> Told mdadm to add the partition back i
>> nto the array ... rebuilding etc.
>> [5] long delay around 120 sec, black screen, no keyboard, no mouse, reset
>> (reboot). Checked and found two mdadm raid5 partitions degraded. Told
>> mdadm
>> to add the partitions back int
>> o the arrays ... rebuilding etc.
>>
>> 2.6.34.7
>>
>> Processors = pm_test fail/pass 4/10 [1][2][3][4][5]
>> Devices = pm_test       fail/pass 0/10
>> Platform = pm_test      fail/pass 0/10
>>
>> [1] Long delay around 60 sec, partial restore, keyboard active, then
>> system
>> locked up, reset (reboot) needed
>> [2] Long delay around 60 sec , screen restored locked up, reset (reboot)
>> needed
>> [3] instant failure to hibernate, system operational, terminal message
>> "bash: echo: write error: Input/output error"
>> [4] Long delay around 60 sec, black screen locked up, flashing caps lock,
>> reset (reboot) needed
>> [5] 15 sec delay (normal) with message "ata1 SRST failed (errorno=-16)
>> then
>> lockup, reset (reboot) needed.
>>
>>
>>
>
Comment 6 Rafael J. Wysocki 2010-10-11 20:27:42 UTC
On Monday, October 11, 2010, zz zzzzzzzzzzzzz wrote:
> Test results 9-11 October 2010 file tmp.pm_test.txt
> 
> linux-source-2.6.34_2.6.34-1~experimental.2_all.deb
> 
> Devices = pm_test    fail/pass 0/10
> Platform = pm_test   fail/pass 0/10
> Processors = pm_test fail/pass 2*/10 [1][2]
> Core = pm_test       fail/pass 3/2 [3][4][5]
> [1] Long delay around 112 sec, system came back apparently fully
> operational. On next test, instant error message, I *think* related to
> access to disks, then returned to gnome scr$
> [2] Long delay around 112 sec, system came back apparently fully
> operational. Checked and found one mdadm raid5 disc degraded.
> Rebooted. Told mdadm to add the partition back into $
> * This may underestime the number of failures. If in doubt about
> success, I rebooted the system. Even so, sometime in this process one
> of my mdadm partitions got kicked from its r$
> [3] Long delay 120 sec, system restored screed, no keyboard or mouse
> activity, flashing caps lock. reset (reboot)
> [4] Long delay around 112 sec, system came back apparently fully
> operational. Checked and found one mdadm raid5 partition degraded.
> Rebooted. Told mdadm to add the partition back $
> [5] long delay around 120 sec, black screen, no keyboard, no mouse,
> reset (reboot). Checked and found two mdadm raid5 partitions degraded.
> Told mdadm to add the partitions back in$
> 
> 2.6.34.7
> 
> Processors = pm_test fail/pass 4/10 [1][2][3][4][5]
> Devices = pm_test    fail/pass 0/10
> Platform = pm_test   fail/pass 0/10
> 
> [1] Long delay around 60 sec, partial restore, keyboard active, then
> system locked up, reset (reboot) needed
> [2] Long delay around 60 sec , screen restored locked up, reset (reboot)
> needed
> [3] instant failure to hibernate, system operational, terminal message
> "bash: echo: write error: Input/output error"
> [4] Long delay around 60 sec, black screen locked up, flashing caps
> lock, reset (reboot) needed
> [5] 15 sec delay (normal) with message "ata1 SRST failed (errorno=-16)
> then lockup, reset (reboot) needed.
> 
> 2.6.35.1
> 
> Processors = pm_test fail/pass 3/0 [1][2][3]
> platform = pm_test   fail/pass 0/3
> 
> [1][2][3] Hard lockup after about 12 secs. Caps lock flashing, black
> screen, no keyboard response. Reset (reboot)
Comment 7 tempo444z 2010-10-12 16:52:34 UTC
Tested kernel 2.6.33.7 for SMP CPU hibernation, seems good.
Tested kernel 2.6.34.1 for SMP CPU hibernation, and got some failures,
some passes.
Comment 8 Rafael J. Wysocki 2010-10-12 18:30:38 UTC
Thus it appears to be a regression from 2.6.33 rather than from 2.6.34.
Comment 9 tempo444z 2010-10-19 17:24:10 UTC
Compiling POWERNOW_K8 into the linux kernel, and not as a module,
creates the bug.

Testing for SMP CPU hibernation with the bug, gives the following results:
kernels 2.6.35 and 2.6.36 - black screen, flashing caps lock, no
response from the keyboard.
kernel 2.6.34 - failure rates were between 20% and 50%. Perhaps half
the failed tests showed that one or more software raid5 partitions was
marked as faulty and had been removed from its array. The system
appeared to be able to operate normally, though degraded. The other
failures gave a black screen and no response from the keyboard.

kernel 2.6.33 was not tested with POWERNOW_K8 in the kernel.
Comment 10 Rafael J. Wysocki 2011-01-16 22:57:10 UTC
Is the problem still present in 2.6.37?
Comment 11 Zhang Rui 2012-01-18 02:19:59 UTC
Bug closed as there is no response from the bug reporter.
Please feel free to reopen it if the problem still exists in the latest upstream kernel.

Note You need to log in before you can comment on or make changes to this bug.