Bug 60671
Summary: | Hot-added device fails because MPS is incorrect | ||
---|---|---|---|
Product: | Drivers | Reporter: | Bjorn Helgaas (bjorn) |
Component: | PCI | Assignee: | drivers_pci (drivers_pci) |
Status: | NEW --- | ||
Severity: | normal | CC: | kbusch, wangyijing |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
URL: | https://lkml.kernel.org/r/505456FD.6040801@huawei.com | ||
Kernel Version: | 3.6 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | dmesg log |
Description
Bjorn Helgaas
2013-07-31 20:39:56 UTC
The reference discussion link: http://marc.info/?l=linux-scsi&m=134788365823217&w=2 Joe Jin <joe.jin@oracle.com> also reported the same problem found in their E1000 device. reference discussion link: http://marc.info/?l=e1000-devel&m=134182518500774&w=2 We can work around this issue by appending "pci=pcie_bus_safe" boot command. But this is not a smart solution. I will provide the patch to fix this problem be triggered by hotplug soon. Joe is seeing a different problem. In Joe's case, there is no hotplug event, and in fact, the NIC involved is not hot-pluggable. On Joe's machine, with the default BIOS settings as shown in http://marc.info/?l=e1000-devel&m=134207732924628&w=2, we have the following MPS settings in the path to the NIC: 00:07.0 Root Port bridge to [bus 02-05] MPS=256 02:00.0 Upstream Port bridge to [bus 03-05] MPS=256 03:02.0 Downstream Port bridge to [bus 05] MPS=128 05:00.0 82571EB NIC MPS=128 05:00.1 82571EB NIC MPS=128 The mismatch between MPS settings of 02:00.0 and 03:02.0 causes the problem. If 05:00.0 issues a DMA read, the response can be a TLP with 256-byte payload, which should be rejected as a malformed TLP by 03:02.0. Joe's workaround was to change a BIOS setting so the max MPS value is 128, as shown in http://marc.info/?l=e1000-devel&m=134215430719726&w=2. Then we have this: 00:07.0 Root Port bridge to [bus 02-05] MPS=128 02:00.0 Upstream Port bridge to [bus 03-05] MPS=128 03:02.0 Downstream Port bridge to [bus 05] MPS=128 05:00.0 82571EB NIC MPS=128 05:00.1 82571EB NIC MPS=128 I think the fact that BIOS programmed the MPS settings of 02:00.0 and 03:02.0 differently is pretty clearly a BIOS bug. It's possible that Linux could issue a warning or even fix it up, but I don't think the patch proposed for this bug (bug 60671) will help Joe's issue. I opened bug 60799 for the issue Joe reported. Basically it just contains the sources from which I extracted the information in comment #2. Yijing, did this get resolved? It sounds similar to the problem Keith is seeing: http://lkml.kernel.org/r/alpine.LRH.2.03.1406031308200.11244@AMR |