In the test machine IBM System x3250 M5, Server RAID M5110(20:00.0) is connected to PCI bridge(00:01.0). #lspci -t -v -[0000:00]-+-00.0 Intel Corporation Xeon E3-1200 v3 Processor DRAM Controller +-01.0-[20]----00.0 LSI Logic / Symbios Logic MegaRAID SAS 2208 [Thunderbolt] +-01.1-[10]-- +-14.0 Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI +-1a.0 Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 ............................................. Following pcie_aspm_check_latency() and pcie_config_aspm_link(), kernel ASPM driver set PCI bridge L0s enabled while M5110 L0s disabled. DevCap Acceptable L0s Exit Latency: 64ns(PCI bridge) 64ns(M5110) LnkCap L0s Exit Latecny: 256ns(PCI bridge) 64ns(M5110) This cause Server RAID M5110 randomly reset and dmesg show: megaraid_sas 0000:20:00.0: 2614 (517500295s/0x0020/CRIT) - Controller encountered a fatal error and was reset. If all the PCI bridge L0s and M5110 L0s are disabled, then no issue happens. #lspci -s 20:00.0 -vv 20:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2208 [Thunderbolt] (rev 05) Subsystem: IBM ServeRAID M5110 SAS/SATA Controller Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 16 Region 0: I/O ports at 3000 [size=256] Region 1: Memory at 82b40000 (64-bit, non-prefetchable) [size=16K] Region 3: Memory at 82b00000 (64-bit, non-prefetchable) [size=256K] Expansion ROM at 80100000 [disabled] [size=128K] Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [68] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L0s, Latency L0 <64ns, L1 <1us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch+ ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- .................................... #lspci -s 00:01.0 -vv 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller (rev 06) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Bus: primary=00, secondary=20, subordinate=20, sec-latency=0 I/O behind bridge: 00003000-00003fff Memory behind bridge: 82b00000-82bfffff Prefetchable memory behind bridge: 0000000080100000-00000000801fffff Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity+ SERR+ NoISA- VGA- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [88] Subsystem: Intel Corporation Device 1999 Capabilities: [80] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit- Address: fee00418 Data: 0000 Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 256 bytes, MaxReadReq 128 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #2, Speed 8GT/s, Width x8, ASPM L0s L1, Latency L0 <256ns, L1 <8us ClockPM- Surprise- LLActRep- BwNot+ LnkCtl: ASPM L0s Enabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+ ........................................................ Ocean.
It approves a firmware bug. So close it. Ocean.
I assume you mean the Server RAID M5110 random reset problem was caused by a firmware bug. Can you add details about what firmware version contains the fix? If anybody else trips over the problem, that will help debug it.