Bug 26432

Summary: 2.6.36 regression: reboot after poweroff - HP6930p
Product: ACPI Reporter: drago01
Component: Power-OffAssignee: Zhang Rui (rui.zhang)
Status: CLOSED CODE_FIX    
Severity: normal CC: aaron.lu, alan, lenb, rjw, rui.zhang
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.1.9 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 16055    
Attachments: dmidecode
lspci -vvvv
dmesg on 2.6.37
/proc/acpi/wakeup running 2.6.35
/proc/acpi/wakeup running 2.6.37
acpidump output
/proc/acpi/wakeup running 3.8.5
debug patch to check who installs gpe handler

Description drago01 2011-01-09 20:45:26 UTC
Poweroff works fine on my laptop (hp6930p) with 2.6.34/35 but broke with .36/7 . When using one of those kernels it powers off and boots immediately afterwards (i.e same as a reboot). When googling I found an older report: http://marc.info/?l=linux-kernel&m=127012918316305&w=4 which describes a similar issue but with an older kernel.

I have talked with Matthew Garrett on IRC and he said that it probably is a GPE related issue.
Comment 1 drago01 2011-01-09 20:45:57 UTC
Created attachment 43012 [details]
dmidecode
Comment 2 drago01 2011-01-09 20:46:19 UTC
Created attachment 43022 [details]
lspci -vvvv
Comment 3 Len Brown 2011-01-10 20:39:27 UTC
*** Bug 26422 has been marked as a duplicate of this bug. ***
Comment 4 drago01 2011-01-10 20:42:37 UTC
(In reply to comment #3)
> *** Bug 26422 has been marked as a duplicate of this bug. ***

Ugh ... sorry for the double filling after submitting the first one I got a white page (and no mail) so I was not sure whether the bug was actually filled.
Comment 5 Len Brown 2011-01-18 06:32:40 UTC
still an issue with 2.6.37?

can you bisect which commit broke poweroff between 2.6.35 and 2.6.36?
Comment 6 Ozan Caglayan 2011-01-18 06:33:01 UTC
Created attachment 43942 [details]
dmesg on 2.6.37

This also happens on a Sony VAIO SR29VN. The last working release was 2.6.36_rc7 according to the complaining user. The first broken was 2.6.36. 2.6.37 still has the issue. dmesg is attached.
Comment 7 drago01 2011-01-18 08:28:00 UTC
(In reply to comment #6)
> Created an attachment (id=43942) [details]
> dmesg on 2.6.37
> 
> This also happens on a Sony VAIO SR29VN. The last working release was
> 2.6.36_rc7 according to the complaining user. The first broken was 2.6.36.
> 2.6.37 still has the issue. dmesg is attached.

Yeah I didn't test any rc releases yet, as I stated in comment #1 it is broken in .37 too.
Comment 8 Rafael J. Wysocki 2011-01-18 19:06:15 UTC
Please post the contents of /proc/acpi/wakeup .
Comment 9 drago01 2011-01-18 19:13:35 UTC
Created attachment 44082 [details]
/proc/acpi/wakeup running 2.6.35

Here he output for 2.6.25 i.e the working case.
Comment 10 drago01 2011-01-18 19:14:33 UTC
Created attachment 44092 [details]
/proc/acpi/wakeup running 2.6.37

Here the broken case (2.6.37)
Comment 11 drago01 2011-01-18 19:15:26 UTC
(In reply to comment #9)
> Created an attachment (id=44082) [details]
> /proc/acpi/wakeup running 2.6.25
> 
> Here he output for 2.6.25 i.e the working case.

This shoudl read 2.6.*35*
Comment 12 drago01 2011-01-22 21:18:30 UTC
For the record 2.6.38-rc2 is also broken.
Comment 13 Zhang Rui 2012-01-18 02:34:43 UTC
It's great that kernel bugzilla is back.

can you please verify if the problem still exists in the latest upstream
kernel?
Comment 14 drago01 2012-01-18 08:57:00 UTC
(In reply to comment #13)
> It's great that kernel bugzilla is back.
> 
> can you please verify if the problem still exists in the latest upstream
> kernel?

It is still present in 3.1.9 .. have not tested 3.2.x yet.
Comment 15 Aaron Lu 2013-03-12 05:47:21 UTC
(In reply to comment #14)
> (In reply to comment #13)
> > It's great that kernel bugzilla is back.
> > 
> > can you please verify if the problem still exists in the latest upstream
> > kernel?
> 
> It is still present in 3.1.9 .. have not tested 3.2.x yet.
Hi,

If it's still a problem on latest kernel, can you please do the git bisect as suggested in comment #5 between 2.6.35 and 2.6.36?

Here is a link on how to do git bisect:
http://git-scm.com/book/en/Git-Tools-Debugging-with-Git#Binary-Search
You can start with:
$ git bisect start
$ git bisect bad v2.6.36
$ git bisect good v2.6.35

Thanks.
Comment 16 drago01 2013-03-12 07:29:12 UTC
(In reply to comment #15)
> (In reply to comment #14)
> > (In reply to comment #13)
> > > It's great that kernel bugzilla is back.
> > > 
> > > can you please verify if the problem still exists in the latest upstream
> > > kernel?
> > 
> > It is still present in 3.1.9 .. have not tested 3.2.x yet.
> Hi,
> 
> If it's still a problem on latest kernel,

Yes still happens in 3.8.2

> can you please do the git bisect as
> suggested in comment #5 between 2.6.35 and 2.6.36?

I am afraid I can't. The kernel is way to old for my current userspace, won't even boot.
Comment 17 Zhang Rui 2013-04-13 17:05:50 UTC
please attach the acpidump output.
please attach the /proc/acpi/wakeup output in the latest kernel that you're running.
Comment 18 drago01 2013-04-13 17:22:20 UTC
Created attachment 98521 [details]
acpidump output

Here is a dump generated while running 3.8.5
Comment 19 Zhang Rui 2013-04-14 15:34:08 UTC
first, I think it is commit f517709d65beed95f52f021b43e3035b52ef791a that allows wake gpes other than power/sleep/lid buttons to be run_wake gpe.
And the commit was introduced in 2.6.34.
this explains why there are so many "*" in the /proc/acpi/wakeup.

Second, the only difference of /proc/acpi/wakeup between 2.6.35 and 2.6.37 is that
-USB6	  S0	 disabled  pci:0000:00:1a.2
+USB6	  S0	*disabled  pci:0000:00:1a.2
I checked the BIOS and _PRW for USB6 should use GPE 0x20 as wake GPE. But there is no AML handler for this GPE.
So I suspect that some drivers, e.g. driver for pci:0000:00:1a.2, installs the GPE handlers for the wakeup GPE sometime between 2.6.35 and 2.6.36, and this explains why the wake GPE for USB6 becomes run_wake.
But I can not find any USB code that invokes acpi_install_gpe_handler().

drago01,
can you please attach the output of "grep . /sys/firmware/acpi/interrupts/*" in both 2.6.35 and 2.6.37?
I think GPE20 should be invalid in 2.6.35 and enabled in 2.6.37.
Comment 20 Zhang Rui 2013-04-14 15:34:40 UTC
BTW, please attach the output of /proc/acpi/wakeup in 3.8.5.
Comment 21 drago01 2013-04-14 15:43:05 UTC
Created attachment 98591 [details]
/proc/acpi/wakeup running 3.8.5
Comment 22 Zhang Rui 2013-04-14 16:04:54 UTC
Created attachment 98601 [details]
debug patch to check who installs gpe handler

can you please try to apply this debug patch on top of 2.6.37 and attach the dmesg output after boot?
Comment 23 Zhang Rui 2013-04-14 16:08:10 UTC
USB1	  S0	*enabled   pci:0000:00:1d.0
USB2	  S0	*enabled   pci:0000:00:1d.1
USB3	  S0	*enabled   pci:0000:00:1d.2
USB4	  S0	*enabled   pci:0000:00:1a.0
USB5	  S0	*enabled   pci:0000:00:1a.1
USB6	  S0	*enabled   pci:0000:00:1a.2
EHC1	  S0	*enabled   pci:0000:00:1d.7
EHC2	  S0	*enabled   pci:0000:00:1a.7

can you reproduce the problem in 3.8.5 after disabling all of this?
(for example, you can disable USB1 by echo USB1 > /proc/acpi/wakeup)
Comment 24 drago01 2013-04-14 18:19:57 UTC
(In reply to comment #23)
> USB1      S0    *enabled   pci:0000:00:1d.0
> USB2      S0    *enabled   pci:0000:00:1d.1
> USB3      S0    *enabled   pci:0000:00:1d.2
> USB4      S0    *enabled   pci:0000:00:1a.0
> USB5      S0    *enabled   pci:0000:00:1a.1
> USB6      S0    *enabled   pci:0000:00:1a.2
> EHC1      S0    *enabled   pci:0000:00:1d.7
> EHC2      S0    *enabled   pci:0000:00:1a.7
> 
> can you reproduce the problem in 3.8.5 after disabling all of this?
> (for example, you can disable USB1 by echo USB1 > /proc/acpi/wakeup)

I can only disable USB5 (all others remain enabled in /proc/acpi/wake even if I try to disable them).

As for whether it helps it worked in 4 out of 4 attempts. Which does not mean much as it does not always happen. If it helps I can run with it for a few days and report back.
Comment 25 Zhang Rui 2013-04-15 01:15:24 UTC
(In reply to comment #24)

> As for whether it helps it worked in 4 out of 4 attempts. Which does not mean
> much as it does not always happen.

Oh, the machine powers off properly most of time and reboot occasionally?
I'm not aware of this before.
can you tell me in what percentage the bug occurs?
Comment 26 drago01 2013-04-15 07:51:58 UTC
(In reply to comment #25)
> (In reply to comment #24)
> 
> > As for whether it helps it worked in 4 out of 4 attempts. Which does not
> mean
> > much as it does not always happen.
> 
> Oh, the machine powers off properly most of time and reboot occasionally?

Yeah.

> I'm not aware of this before.

Sorry seems like I indeed didn't mention that anywhere.

> can you tell me in what percentage the bug occurs?

Not sure something like 20% or 30%.
Comment 27 Zhang Rui 2013-04-17 06:55:06 UTC
(In reply to comment #24)
> (In reply to comment #23)
> > USB1      S0    *enabled   pci:0000:00:1d.0
> > USB2      S0    *enabled   pci:0000:00:1d.1
> > USB3      S0    *enabled   pci:0000:00:1d.2
> > USB4      S0    *enabled   pci:0000:00:1a.0
> > USB5      S0    *enabled   pci:0000:00:1a.1
> > USB6      S0    *enabled   pci:0000:00:1a.2
> > EHC1      S0    *enabled   pci:0000:00:1d.7
> > EHC2      S0    *enabled   pci:0000:00:1a.7
> > 
> > can you reproduce the problem in 3.8.5 after disabling all of this?
> > (for example, you can disable USB1 by echo USB1 > /proc/acpi/wakeup)
> 
> I can only disable USB5 (all others remain enabled in /proc/acpi/wake even if
> I
> try to disable them).
> 
for the other ones, please try the follow commands,
take USB1 for example,
"echo disabled > /sys/bus/pci/devices/0000\:00\:1d.0/power/wakeup"

and see if it helps.
Comment 28 Zhang Rui 2013-04-23 02:16:29 UTC
can you please do the test in comment #27?
Comment 29 drago01 2013-04-25 05:40:44 UTC
(In reply to comment #28)
> can you please do the test in comment #27?

Sorry for the delay ... yes this works I can disable the others this way.
Comment 30 drago01 2013-04-25 05:50:53 UTC
(In reply to comment #29)
> (In reply to comment #28)
> > can you please do the test in comment #27?
> 
> Sorry for the delay ... yes this works I can disable the others this way.

I am now running 3.8.8 and have disabled all of the entries from comment #23 and did multiple poweroff cycles ... seems to work fine so far i.e no reboot instead of power off.
Comment 31 Zhang Rui 2013-04-25 06:39:44 UTC
okay, then we need to check them one by one to see which device wakes the system up.
And IMO, USB6 is the first suspect that worth trying.
Comment 32 drago01 2013-04-28 15:12:17 UTC
(In reply to comment #30)
> (In reply to comment #29)
> > (In reply to comment #28)
> > > can you please do the test in comment #27?
> > 
> > Sorry for the delay ... yes this works I can disable the others this way.
> 
> I am now running 3.8.8 and have disabled all of the entries from comment #23
> and did multiple poweroff cycles ... seems to work fine so far i.e no reboot
> instead of power off.

OK I take that back it happened again today with all of them disabled.

Is there anything useful that I can extract after such a reboot?
Comment 33 Zhang Rui 2013-05-13 01:20:20 UTC
> OK I take that back it happened again today with all of them disabled.

bad news.
well, the next thing worth trying is to disable runtime PM completely.
could you please rebuild your kernel with CONFIG_PM_RUNTIME=n and see if it helps?
Comment 34 Zhang Rui 2013-05-20 02:53:07 UTC
ping...
Comment 35 Zhang Rui 2013-05-27 03:58:55 UTC
ping...
Comment 36 drago01 2013-05-27 08:14:23 UTC
(In reply to comment #35)
> ping...

Hi, sorry been busy lately will build a kernel and test with CONFIG_PM_RUNTIME=n at the end of this week.
Comment 37 Rafael J. Wysocki 2013-06-09 22:45:28 UTC
Any news?
Comment 38 Zhang Rui 2013-06-24 01:10:29 UTC
ping...
Comment 39 drago01 2013-06-24 07:29:50 UTC
(In reply to comment #37)
> Any news?

(In reply to comment #38)
> ping...

I am running 3.9.7 with CONFIG_PM_RUNTIME=n since yesterday. I have not been able to reproduce the bug yet.

I have tried multiple boot->poweroff and boot->suspend->resume->poweroff tests and all of them worked as expected.
Comment 40 Rafael J. Wysocki 2013-06-24 09:26:43 UTC
OK, thanks!

Let's hope it's been fixed and close it.  Please reopen if you can reproduce it.
Comment 41 drago01 2013-06-24 09:34:23 UTC
(In reply to comment #40)
> OK, thanks!
> 
> Let's hope it's been fixed and close it.  Please reopen if you can reproduce
> it.

So you are saying that there is no other way to fix it other then disabling runtime pm?
Comment 42 Zhang Rui 2013-07-01 06:08:36 UTC
I'd prefer this is a driver/firmware issue, rather than ACPI problem.

If you'd like to continue debugging, the best way is to disable the runtime PM support bus by bus, and check which bus causes this problem, and then file a new bug again that component.
Comment 43 Zhang Rui 2013-07-01 06:08:54 UTC
s/again/against