Created attachment 300692 [details] Picture of the stack trace screen ========================== Summary ========================== If the network link of the atlantic module is up during pm hibernation entry, it will crash with the attached trace. Setting it down or unloading the module is a valid workaround, but logind and/or NetworkManager will reload the module (regardless of blacklisting) and restore the link state, so this is broken with common userspace. I’ll provide a working hibernation script using solely the kernel interface. This did not happen with any of the core >/sys/power/pm_test, which I tried. ========================== Steps to reproduce ========================== 1. modprobe atlantic 2. ip link set <iface> up # possibly a connection has to be established first 3. echo platform >/sys/power/disk # provided a swap device is available 4. echo disk >/sys/power/state ========================== Actual behaviour ========================== The atlantic module will crash with a trace, leaving the system in an semi-hibernated state. Sysrq is still possible. ========================== Expected behaviour ========================== The module should happily go to sleep, cuddling with his best friends. ========================== Additional information ========================== Stack trace is attached. Sorry, OS can’t do screenshots in this state. The device is an AQC107 integrated in an ASUS ROG Zenith Ⅱ Extreme Alpha mainboard. 44:00.0 Ethernet controller: Aquantia Corp. AQC107 NBase-T/IEEE 802.3bz Ethernet Controller [AQtion] (rev 02) Subsystem: ASUSTeK Computer Inc. AQC107 NBase-T/IEEE 802.3bz Ethernet Controller [AQtion] Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 81 IOMMU group: 50 Region 0: Memory at e1040000 (64-bit, non-prefetchable) [size=64K] Region 2: Memory at e1050000 (64-bit, non-prefetchable) [size=4K] Region 4: Memory at e0c00000 (64-bit, non-prefetchable) [size=4M] Expansion ROM at e1000000 [disabled] [size=256K] Capabilities: [40] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend- LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L0s L1, Exit Latency L0s unlimited, L1 unlimited ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 8GT/s (ok), Width x2 (downgraded) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR- 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- TPHComp- ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled, AtomicOpsCtl: ReqEn- LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink+ Retimer- 2Retimers- DRS- LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+ EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported Capabilities: [80] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [90] MSI-X: Enable+ Count=32 Masked- Vector table: BAR=2 offset=00000000 PBA: BAR=2 offset=00000200 Capabilities: [a0] MSI: Enable- Count=1/32 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [c0] Vital Product Data Product Name: Atlantic Read-only fields: [PN] Part number: 3290495095 [EC] Engineering changes: 0 [FG] Unknown: 61 62 63 [LC] Unknown: 64 65 66 [MN] Manufacture ID: AFDSWEWEBSFD [PG] Unknown: 49 49 49 [SN] Serial number: CPL5938TLKMY [V0] Vendor specific: wfewfe [V1] Vendor specific: fwewfe [V2] Vendor specific: SDFWI [RV] Reserved: checksum good, 0 byte(s) reserved Read/write fields: [YA] Asset tag: 9495829 [V0] Vendor specific: f34ge4rsg [V1] Vendor specific: ger35g5rthghgsa3 [Y0] System specific: bsdfvbxcz [Y1] System specific: fwefewwfe [RW] Read-write area: 11 byte(s) free End Capabilities: [100 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 Capabilities: [150 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> Capabilities: [180 v1] Secondary PCI Express LnkCtl3: LnkEquIntrruptEn- PerformEqu- LaneErrStat: 0 Kernel driver in use: atlantic Kernel modules: atlantic
Created attachment 300693 [details] Script working around the crash
Created attachment 300694 [details] Minimal script working around the crash Had already referenced my logind quirks, so the previous script was not helpful for the report. This should be, though.
I have ignored the bug reporting documentation. I’m currently collecting all relevant information bits and report back correctly, when done with that. Please excuse the noise.
Hi, the change associated with this appears to have caused a regression. See https://bugzilla.kernel.org/show_bug.cgi?id=215949
> --- Comment #4 from Andrew M (andrew@m6l.net) --- > Hi, the change associated with this appears to have caused a regression. See > https://bugzilla.kernel.org/show_bug.cgi?id=215949 Thanks, this is handled in https://patchwork.kernel.org/project/netdevbpf/patch/8735hniqcm.fsf@posteo.de/ This is a partial revert and has been successfully tested by two other reporters. If need be, you can apply it to a custom kernel until it reaches stable. Shouldn’t take too long. Manuel
Hi, thank you for the patch. I can confirm that applying that patch (instead of a revert) onto 5.15.36 remedies the regression I saw. I look forward to seeing it merged into stable.
For what's it worth, the patch also fixes the regression for me. Thanks.
It’s included in v5.17.9, v5.15.41, v5.10.117 and mainline.