Bug 209331
Summary: | AER: Hardware error from APEI Generic Hardware Error with EPYC and DD Max S8 DVB | ||
---|---|---|---|
Product: | Drivers | Reporter: | Hans-Peter Jansen (hpj) |
Component: | PCI | Assignee: | drivers_pci (drivers_pci) |
Status: | NEW --- | ||
Severity: | normal | CC: | alexis.gryta, hpj, lizw |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 5.3.18, 5.7.11, 5.8.7, 5.8.9, 5.8.10, 5.8.11, 5.9.1, 5.10.13, 5.12.2, 5.17.5 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
errors, device details and initialization sequence
Here's a little broader boot log |
Description
Hans-Peter Jansen
2020-09-19 09:49:01 UTC
Created attachment 292551 [details]
Here's a little broader boot log
Meanwhile, I tried to get rid of the errors the hard way, but failed. I tried a couple of combinations of: pci=nomsi pci=noaer pcie_aspm=off pci=nomsi,noaer,ioapicreroute pci=nommconf Interestingly, *none* of them turned off these errors. My server is experiencing this issue, AMD 7742, NVIDIA A30 GPU and mellanox network card are reporting errors aer_layer=Physical Layer, aer_agent=Receiver ID 0000:81:00.0: AER: 259.4775061 nvidia How to solve this problem my ubuntu kernel has been crashing lately. I'm not sure what is causing it, but my computer keeps restarting randomly-5.15.0-25-generic Nothing anybody can do, as specified, it's a hardware issue not a kernel problem. It means your device is sending wrong data and is maybe dying slowly. AER can usually be disabled in the BIOS' settings, but you basically ignore all the errors and the result can be dramatic. (the error can come from your SSD or any other important PCIe device) |