Bug 42979

Summary: Regression In 3.2.12 stable prevents various system types from booting
Product: Drivers Reporter: Joseph Salisbury (joseph.salisbury)
Component: PCIAssignee: Jonathan Nieder (jrnieder)
Status: CLOSED CODE_FIX    
Severity: blocking CC: andrew, gunnar.kriik, hjl.tools, jrnieder, michael, vovan, william.bowling
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.2.12 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 42644    
Attachments: A patch
ASPM: Fix pcie devices with non-pcie children

Description Joseph Salisbury 2012-03-22 19:23:18 UTC
Possible regression in v3.2.12 stable.  The issue prevents systems from booting.

There have been a few bugs[0] reported against the Ubuntu kernel, which
was recently rebased to 3.2.12.  We're in the process of bisecting now
and will post an update.

There are also reports that this bug exists in v3.3.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/961482
Comment 1 Joseph Salisbury 2012-03-23 00:56:12 UTC
The following commit appears to have caused the regression:

f043ddb60c84ea64a23b755004572afe922e653c PCI: ignore pre-1.1 ASPM
quirking when ASPM is disabled
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 1cfbf22..24f049e 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -500,6 +500,9 @@ static int pcie_aspm_sanity_check(struct pci_dev *pdev)
        int pos;
        u32 reg32;

+ if (aspm_disabled)
+ return 0;
+
        /*
         * Some functions in a slot might not all be PCIe functions,
         * very strange. Disable ASPM for the whole slot
Comment 2 Joseph Salisbury 2012-03-23 00:57:21 UTC
Discussion on LKML:
https://lkml.org/lkml/2012/3/19/408
Comment 3 Gunnar Kriik 2012-03-23 13:09:54 UTC
Similar problem reported by some archlinux users, at first we thought it was a problem with a certain Gigabyte motherboard, but others have reported this problem aswell.

Gigabyte Z68XP-UD3
Gigabyte Z68X-UD3-B3
Asus p8h67-v
Gigabyte GA-P67X-UD3-B3

For more info see:
https://bbs.archlinux.org/viewtopic.php?pid=1076580#p1076580

Unable to boot both 3.2.12 and 3.3

But both 3.2.12 and 3.3 boot with kernel parameter "pcie_aspm=force"
Comment 4 michael 2012-03-26 17:28:09 UTC
Same problem here with Arch Linux kernel 3.2.12 on an Asus P8P67 board. Boots fine with "pcie_aspm=force".
Comment 5 H.J. Lu 2012-03-27 01:12:39 UTC
Same on Intel DP67BG.
Comment 6 H.J. Lu 2012-03-27 16:19:22 UTC
Created attachment 72732 [details]
A patch

This patch works for me.
Comment 7 Jonathan Nieder 2012-03-29 16:37:47 UTC
Created attachment 72742 [details]
ASPM: Fix pcie devices with non-pcie children

Does this patch help?

(taken from <http://thread.gmane.org/gmane.linux.kernel.pci/14503>)
Comment 8 Jonathan Nieder 2012-04-01 16:22:31 UTC
Applied as c9651e70ad0aa499814817cbf3cc1d0b806ed3a1.  Closing.
Comment 9 Vlad 2012-05-27 06:11:17 UTC
I have exactly the same issue again with vanilla 3.4.0. At least symptoms are the same. Any ideas?
Comment 10 Jonathan Nieder 2012-05-27 06:21:09 UTC
(In reply to comment #9)
> I have exactly the same issue again with vanilla 3.4.0. At least symptoms are
> the same. Any ideas?

Do you mean that reverting f043ddb60c84 gets your machine to boot?
Or some other symptom?
Comment 11 Vlad 2012-05-27 06:34:14 UTC
I didn't try reverting the commit you mentioned, but symptoms I have with 3.4.0 are exactly the same as I had with 3.2.12, 3.2.13, and 3.3.0: System hangs immediately after boot with fancy flashing screen. Passing any pcie_aspm parameters doesn't help. Though I'm able to boot 3.4.0 using working kernel and then kexec'ing to 3.4.0. As I can remember, the commit f043ddb60c84 solved my problem with 3.2.12.
Comment 12 Jonathan Nieder 2012-05-27 06:39:00 UTC
(In reply to comment #11)
> I didn't try reverting the commit you mentioned, but symptoms I have with
> 3.4.0
> are exactly the same as I had with 3.2.12, 3.2.13, and 3.3.0: System hangs
> immediately after boot with fancy flashing screen. Passing any pcie_aspm
> parameters doesn't help. Though I'm able to boot 3.4.0 using working kernel
> and
> then kexec'ing to 3.4.0. As I can remember, the commit f043ddb60c84 solved my
> problem with 3.2.12.

Thanks.  That sounds like a totally different bug.  I'd suggest contacting
linux-kernel@vger.kernel.org, and bisecting or getting a log with netconsole
if possible in order to track it down.

Hope that helps,
Jonathan