Bug 200675
Summary: | Bisected: "modprobe iwlmvm" causes a hang under some condition. | ||
---|---|---|---|
Product: | Power Management | Reporter: | teika kazura (teika) |
Component: | Other | Assignee: | Rafael J. Wysocki (rjw) |
Status: | CLOSED PATCH_ALREADY_AVAILABLE | ||
Severity: | normal | CC: | mika.westerberg |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 4.15.0-rc3+ | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
lspci -vv (partial)
.config for the mentioned commit PCI / PM: Only set driver flags for PCIe ports if enabling runtime PM by default PCI / PM: Only set driver flags for PCIe ports if enabling runtime PM by default entire "lspci -vv" PCI / ACPI / PM: Resume all bridges on system suspend to S3 |
Description
teika kazura
2018-07-28 12:31:56 UTC
Created attachment 277583 [details]
.config for the mentioned commit
Is this issue present in 4.18-rc6? Yep. Thanks kernel developers. Ah, sorry for the ugly pathname with backslashes. It's /sys/devices/pci0000:00/0000:00:1c.5/0000:02:00.0/power/control Created attachment 277599 [details]
PCI / PM: Only set driver flags for PCIe ports if enabling runtime PM by default
It looks like the optimization made by the problematic commit is overly aggressive.
Please test the attached patch and let me know if it makes a difference.
Unfortunately, it doesn't work. Thanks anyway. So does it work if you revert the problematic commit completely from 4.18-rc6 (or -rc7)? Yes. (Tried 4.18-rc6) What's there initially in /sys/devices/pci0000\:00/0000\:00\:1c.5/0000\:02\:00.0/power/control ? "on" on boot. FYI: Insertion and deletion of the moudule iwlmvm do not reset it. Well, if this is "on" on boot, then there should be no difference between reverting the problematic commit and applying the patch from comment #5. Created attachment 277645 [details]
PCI / PM: Only set driver flags for PCIe ports if enabling runtime PM by default
Please boot with this patch applied and provide the output of
$ dmesg | grep pcie_portdrv_probe
on boot (unless it is empty, in which case please let me know).
Created attachment 277647 [details]
entire "lspci -vv"
Dmesg says:
pcieport 0000:00:1c.0: pcie_portdrv_probe: Setting PM flags
pcieport 0000:00:1c.5: pcie_portdrv_probe: Setting PM flags
"lspci -vv" for 0000:00:1c.0 is:
------------------------------------------------------------------------
00:1c.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #5 (rev f1) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 16
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
I/O behind bridge: 00003000-00003fff
Memory behind bridge: b1200000-b12fffff
Prefetchable memory behind bridge: 00000000b1000000-00000000b10fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0
ExtTag- RBE+
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #5, Speed 8GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <1us, L1 <16us
ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
Slot #4, PowerLimit 10.000W; Interlock- NoCompl+
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
Changed: MRL- PresDet- LinkState+
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
RootCap: CRSVisible-
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd+
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
Address: 00000000 Data: 0000
Capabilities: [90] Subsystem: Hewlett-Packard Company Sunrise Point-LP PCI Express Root Port
Capabilities: [a0] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
Capabilities: [140 v1] Access Control Services
ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd- EgressCtrl- DirectTrans-
ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
Capabilities: [220 v1] #19
Kernel driver in use: pcieport
------------------------------------------------------------------------
In case this helps: Other initial values: (1) /sys/devices/pci0000:00/0000:00:1c.0/power/control : auto (2) /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/power/control : on (3) /sys/devices/pci0000:00/0000:00:1c.5/power/control : auto (4) /sys/devices/pci0000:00/0000:00:1c.5/0000:02:00.0/power/control : on (As already reported) It seems that the above device (4) is the wifi controller. The value of /sys/devices/pci0000:00/0000:00:1c.5/0000:02:00.0/device is 0x3165, and hwinfo says, ------------------------------------------------------------------------ 26: PCI 200.0: 0280 Network controller ... SysFS ID: /devices/pci0000:00/0000:00:1c.5/0000:02:00.0 ... Model: "Intel WLAN controller" ... Device: pci 0x3165 ------------------------------------------------------------------------ Similarly, the device (2) must be an ethernet controller. No other device seems to be there under these two PCIe controllers. Your patch clarified that the flag of the pcieports is set. ("modprobe iwlmvm" does not emit any counterpart message.) I've done a similar experiment with the other PCIe port & the ethernet driver, under a "bad" kernel, and it turns out that it doesn't cause a hang! ------------------------------------------------------------------------ $ modprobe r8169; sleep .1; modprobe -r r8169 ; sleep .1 # It's the ethernet driver. $ echo "auto" > /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/power/control $ echo "mem" > /sys/power/state $ modprobe r8169 ; sleep .1 ------------------------------------------------------------------------ It looks like we need to debug this a bit. I'll prepare a couple of debug patches to see what's up, stay tuned! Created attachment 277859 [details]
PCI / ACPI / PM: Resume all bridges on system suspend to S3
Sorry for the delay.
Please let me know if the issue is reproducible with this patch applied.
It works!
To be sure:
1. I used the last patch alone. Not with the obsolete patch on Jul 30.
2. I tried linux-4.17.y and 4.18.0-rc6.
In fact, Rafael, you saved my previous buggy PC years ago, (Sorry I can't find its log.) so this is the second time. I deeply thank your years of work in the linux kernel.
> stay tuned!
Radios are wireless, after all. (Forgive me for a joke for elderly folks. ;-)
Kind regards.
(In reply to teika kazura from comment #18) > It works! OK, thanks! > To be sure: > 1. I used the last patch alone. Not with the obsolete patch on Jul 30. > 2. I tried linux-4.17.y and 4.18.0-rc6. OK > In fact, Rafael, you saved my previous buggy PC years ago, (Sorry I can't > find its log.) so this is the second time. I deeply thank your years of work > in the linux kernel. Appreciated, thank you! Let me submit the patch with a changelog to the mailing lists. Patch submitted: https://patchwork.kernel.org/patch/10567315/ Pulled into linux-next[1] and linux-4.18.5[2]: [1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/pci/pci-acpi.c?h=next-20180823&id=9d64b539b738fc181442caab95f1f76d9bd58539 [2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/pci/pci-acpi.c?h=v4.18.5&id=4d4306a283a1be22e4342c5a04d99f65f256d157 Thanks all. Patch merged as commit 9d64b539b738 PCI / ACPI / PM: Resume all bridges on suspend-to-RAM Closing. |