Created attachment 184311 [details] git_bisect_4.0_4.1.log I'm running Arch Linux on a ASRock Z87 Extreme 6. After switching from 4.0 kernel to 4.1. All drives attached to the ASMedia Chipset stopped working. To inspect this further I switched over to vanilla kernel and did a git bisect to mirror it down to the following commit: [387d37577fdd05e9472c20885464c2a53b3c945f] PCI: Don't clear ASPM bits when the FADT declares it's unsupported
Created attachment 184321 [details] 4.0 Kernel wit no drive attached to asm1062 ports
Created attachment 184331 [details] 4.0 Kernel with no drive attached to asm1062 ports
Created attachment 184341 [details] 4.0 lspci no drives attached
Created attachment 184351 [details] 4.0 dmesg no drives attached
Created attachment 184361 [details] 4.0 dmesg drive attached
Created attachment 184371 [details] 4.0 lspci drive attached
Created attachment 184381 [details] 4.1 dmesg no drives attached
Created attachment 184391 [details] 4.1 lspci no drives attached
Created attachment 184401 [details] 4.1 dmesg drive attached
Created attachment 184411 [details] 4.1 lspci drive attached
Created attachment 184421 [details] 4.1 dmesg drive hotplugged
Created attachment 184431 [details] 4.1 lspci drive hotplugged
Does it work again if you apply this patch? https://lkml.org/lkml/2015/7/20/566
Nope. I applied it to v4.1 with no success. Same error messages on boot with no detected drives.
I assume this is still broken. Please correct me if I'm wrong.
True. Still not working. (4.8.6-1-ARCH #1) Any hint what I can do to work around this? I.e. disable aspm for this device only.
Strangely even disabling aspm completeley (pcie_aspm=off) doesn't work (see attached dmesg/lspci). The command [1] seems to disable apsm for the asmedia controller but I'm not able to get any readings from any of the attached drives. Neither helped a pci reset [2]. [1] setpci -s 04:00.0 CAP_EXP+10.b=40 [2] echo 1 > /sys/bus/pci/devices/0000\:04\:00.0/reset
Created attachment 243901 [details] 4.8.6_lspci
Created attachment 243911 [details] 4.8.6_dmesg
Your system advertises ACPI_FADT_NO_ASPM. Prior to 387d37577fdd ("PCI: Don't clear ASPM bits when the FADT declares it's unsupported"), we actively disabled ASPM on all PCIe links. After 387d37577fdd, we leave ASPM alone, so if the BIOS enabled it, it will remain enabled. ASPM is mostly managed from the upstream end of the link, and your attachment #184411 [details] ("4.1 lspci drive attached") shows it as enabled: 00:1c.4 PCI bridge Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00 LnkCtl: ASPM L0s Enabled Can you try this on a v4.1 or later kernel: setpci -s 00:1c.4 CAP_EXP+10.b=40 setpci -s 04:00.0 CAP_EXP+10.b=40
Hi! I still have the same problem with 4.19.0-5 (devuan beowulf/ceres) after upgrading from older kernel (3.16.0). I managed to set ASPM off using the above setpci command (It only worked if I did it for both, the SATA controller AND the appropriate PCIe bridge!!!) in an initramfs script. I documented the workaround in Czech language here: http://www.abclinuxu.cz/poradna/hardware/show/447956#16 . The problem lies in the fact that my controller is a cheap chinese crap that does set ASPM on in it's BIOS but it won't work (at least not in combination with my motherboard). However I feel that it is really a bad practise to break (even broken hardware) which once worked flawlessly in Linux and I would like to suggest that something like "aspm_pcie=force_off" be added which would go back to the behaviour prior to 4.1. Or maybe a hook for a this controller which would switch ASPM by default may even be better???
I agree; as far as possible, we should never break something that previously worked. What is your motherboard? Would you mind attaching the complete dmesg log and "sudo lspci -vvxxx" output to this bugzilla? ASPM is mostly a property of an individual link, i.e., the connection between your Root Port and the SATA controller. The BIOS might enable/disable ASPM if it wants to optimize for power or performance, but proper functioning of ASPM shouldn't really depend on the BIOS. Therefore, I suspect either a Linux ASPM defect or a hardware issue somewhere between the Root Port and the SATA controller. In André-Sebastian's case (see comment #20), I think Linux mostly leaves the ASPM settings alone, and the Root Port is a widely-used Intel device ("00:1c.4 Intel 8 Series/C220 Series Chipset Family PCI Express Root Port") so my guess is an ASMedia SATA controller problem or some sort of electrical issue with the link, e.g., the wires on this motherboard. André-Sebastian, your v4.1 lspci shows an Intel I211 Gigabit NIC at 03:00.0 (slot #3) with ASPM L0s enabled and the ASMedia SATA at 04:00.0 (slot #4). The SATA doesn't work, but I assume the NIC does. It would be really interesting to know what happens if you can swap the NIC and the SATA controller. Butrus, if you have another device that supports ASPM, could you try it in the slot where your SATA controller currently is, and attach the "sudo lspci -vvxxx" output here? If we can figure out that either the ASMedia SATA controller or the slot is broken, we should be able to add a quirk to automatically disable ASPM in that case.
@brutus fun, I just left Devuan for Manjaro and I stepped exactly on this @Bjorn here you go, as requested... 07:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 02) (prog-if 01 [AHCI 1.0]) Subsystem: ASMedia Technology Inc. Device 1060 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 128 Region 0: I/O ports at b050 [size=8] Region 1: I/O ports at b040 [size=4] Region 2: I/O ports at b030 [size=8] Region 3: I/O ports at b020 [size=4] Region 4: I/O ports at b000 [size=32] Region 5: Memory at df310000 (32-bit, non-prefetchable) [size=512] Expansion ROM at df300000 [disabled] [size=64K] Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit- Address: fee08004 Data: 4022 Capabilities: [78] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [80] Express (v2) Legacy Endpoint, MSI 00 DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend- LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM not supported ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s (ok), Width x1 (ok) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABC, TimeoutDis+ NROPrPrP- LTR- 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled, AtomicOpsCtl: ReqEn- LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1- EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported Capabilities: [100 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff Status: NegoPending- InProgress- Kernel driver in use: ahci 00: 21 1b 12 06 07 04 10 00 02 01 06 01 10 00 00 00 10: 51 b0 00 00 41 b0 00 00 31 b0 00 00 21 b0 00 00 20: 01 b0 00 00 00 00 31 df 00 00 00 00 21 1b 60 10 30: 00 00 30 df 50 00 00 00 00 00 00 00 ff 01 00 00 40: 00 00 00 00 60 61 11 02 00 00 00 00 00 00 00 00 50: 05 78 01 00 04 80 e0 fe 22 40 00 00 00 00 00 00 60: 00 00 00 00 00 00 00 00 11 78 01 00 00 00 00 00 70: 00 00 00 00 00 00 00 00 01 80 03 00 00 00 00 00 80: 10 00 12 00 02 87 90 05 30 28 00 00 12 f0 00 01 90: 40 00 12 10 00 00 00 00 00 00 00 00 00 00 00 00 a0: 00 00 00 00 17 00 00 00 00 00 00 00 00 00 00 00 b0: 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 04 00 04 00 63 00 00 00 f0: 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00
p.s. the card (the hardware) was working until last week, when I was on Devuan Ceres/Beowulf (Kernel, Linux 4.19) now I'm on Linux 5.8
Hello, are there any status updates on this issue? Is there any possibility that ASPM will be supported on this SATA controller in the near future? I have a home server using MSI B660M Mortar DDR4 motherboard and this is the only component that is preventing the system from reaching higher C-States
(In reply to Tobia Bocchi from comment #25) I don't think anybody is actively working on this issue. From your comment, I assume your SATA controller is functional, but we don't enable ASPM for it, so we use more power than we should. The original report here bisected the problem to 387d37577fdd ("PCI: Don't clear ASPM bits when the FADT declares it's unsupported"), which appeared in v4.1. Do you have any idea whether that commit is responsible for the problem you see? That commit no longer reverts cleanly, but if you knew, for example, that 37a9c502c0af (the parent of 387d37577fdd) worked, but 387d37577fdd fails, we would know that this is the same problem. I guess if your dmesg log includes ""FADT indicates ASPM is unsupported, using BIOS configuration", that would also be a good indication. Could you attach the complete dmesg log and complete "sudo lspci -vvxxxx" output for your system?