Bug 218491
Summary: | ixgbe probe failure in Proxmox8 | ||
---|---|---|---|
Product: | Drivers | Reporter: | Bjorn Helgaas (bjorn) |
Component: | PCI | Assignee: | drivers_pci (drivers_pci) |
Status: | NEW --- | ||
Severity: | normal | CC: | anthony.l.nguyen, jbrandeb, kernel.org-fo5k2w, pmenzel+bugzilla.kernel.org, spamme |
Priority: | P3 | ||
Hardware: | All | ||
OS: | Linux | ||
URL: | https://forum.proxmox.com/threads/proxmox-8-kernel-6-2-16-4-pve-ixgbe-driver-fails-to-load-due-to-pci-device-probing-failure.131203/post-633851 | ||
Kernel Version: | v6.1 (Proxmox) | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
v6.1.10-1-pve dmesg log
List of PCIe devices on Qotom before and after load ixgbe module Return of dmesg after a modprobe ixgbe before lspci after lspci dmesg after dmesg with pci=noaer |
Description
Bjorn Helgaas
2024-02-13 23:37:25 UTC
Hello, (Please note that I don't speak English, sorry if the traction is not faithful to your language) I'm adding my experience here in the hope of contributing something to the resolution of the problem that also concerns me under GNU/Linux Debian 12 - kernel 6.1.76 (and Sid - kernel 6.6.13). So it's not specific to Proxmox. I should point out that under GNU/Linux Debian 11 - kernel 5.10, the network card (X553 via ixgbe) works without problems (so this is a relatively "recent" bug). Other users have encountered this problem (see comments): https://www.servethehome.com/the-everything-fanless-home-server-firewall-router-and-nas-appliance-qotom-qnap-teamgroup/ https://www.servethehome.com/intel-x553-networking-and-proxmox-ve-8-1-3/?unapproved=518173&moderation-hash=e57a05288058d3ff253ceb42e9ada905#comment-518173 For my part, here's my test environment: - 1 Qotom Q20332G9-S10 (I used a 16GB Intel Optane M10 M.2 SSD with a fresh GNU/Linux Debian 12) - 1 Cisco DAC cable (tested with a 1m and a 3M) - 1 PC with Mellanox Connectx-3 2x SFP+ network card (running GNU/Linux Debian SID installed several years ago) - 1 Cisco 3560CX-12PD-S switch (2 SFP+ ports) with IOS 15.2(7)E2 Connecting the Qotom Q20332G9-S10 (X553) to the Mellanox Connectx-3 works without a hitch and without any special handling (the linux-image-6.1.0-17-amd64 ixgbe driver works in this configuration). Full 10gbps speeds between the two with an "iperf". At this stage, I've ruled out a hardware incompatibility (OSI level 1) since the DAC works with the X553. So there's no need to use compatibility tricks as suggested in the link comments with the "allow_unsupported_sfp=1" parameter. This will be useless in the following tests (I've checked). Where it gets tricky is when you connect it (the Qotom) to the Cisco switch. Before an "ip link eno1 up", the Cisco raises the link on its side, but the Debian doesn't (link DOWN). After the "ip link eno1 up", the link drops and never comes back. There does seem to be a driver problem in recent kernels (GNU/Linux Debian Stable and Sid). After compiling the driver manually (https://downloadmirror.intel.com/812532/ixgbe-5.19.9.tar.gz tar xf ixgbe-5.19.9.tar.gz) following the documentation already shared by others (https://www.xmodulo.com/download-install-ixgbe-driver-ubuntu-debian.html), it works with the Cisco (after a "shut/no shut" of the latter's 10gbe port). So we end up with a working machine (I even configured and used the SR-IOV successfully right afterwards). For the moment, the Qotom machine is dedicated to testing, so I'm available to carry out any manipulations you may wish to make to advance the subject. Don't hesitate! Best regards. (In reply to Yohan Charbi from comment #1) Thanks for your report! Unfortunately I don't know anything about the specifics of the NIC and there's not any information that would show a possible PCI issue, so I don't think I can help with this. Created attachment 305876 [details] List of PCIe devices on Qotom before and after load ixgbe module I have taken the liberty of posting the returns of the order you requested from your contact person in your message https://forum.proxmox.com/threads/proxmox-8-kernel-6-2-16-4-pve-ixgbe-driver-fails-to-load-due-to-pci-device-probing-failure.131203/post-634424. There should probably be some similarities with his problem. (In reply to Yohan Charbi from comment #3) > I have taken the liberty of posting the returns of the order you requested > from your contact person in your message... Thanks. Would you mind also attaching the complete dmesg log after loading the ixgbe driver? If it's a similar problem, you should see the "Adapter removed" message. Created attachment 305877 [details]
Return of dmesg after a modprobe ixgbe
Here's a dmesg after a modprobe ixgbe (a rmmod ixgbe was done just before). I have truncated the return to keep only what is returned from the command.
I see no trace of an "Adapter removed" and a dmesg | grep -i "Adapter removed" returns nothing.
(In reply to Yohan Charbi from comment #5) > Here's a dmesg after a modprobe ixgbe (a rmmod ixgbe was done just before). OK. I think this is a different problem, so if you want to pursue this, I suggest opening a separate bugzilla or just emailing the ixgbe maintainers: Jesse Brandeburg <jesse.brandeburg@intel.com> (supporter:INTEL ETHERNET DRIVERS) Tony Nguyen <anthony.l.nguyen@intel.com> (supporter:INTEL ETHERNET DRIVERS) intel-wired-lan@lists.osuosl.org (moderated list:INTEL ETHERNET DRIVERS) netdev@vger.kernel.org (open list:NETWORKING DRIVERS) Thank you very much for your time. I'm going to write to these people to see how best to follow up on this. Good luck with the rest. Hi Bjorn, I looked over the originally reported log, and I noticed that the BIOS still (or always) seems to be operating in 32 bit BAR mode, with lots of reported issues from the kernel where it's unable to reserve resources. The reason the ixgbe driver fails to load is that the device BAR mapping either didn't work or is being ignored after the AER error, so all reads return 0xFFFFFFFF, which is also the behavior if ASPM is enabled and link doesn't come back. I checked the latest logs Tim added, ASPM is not enabled. It appears that even with the before and after, the ixgbe device is not enabled and hasn't changed state that I can see. Have a look at the 00:03.1 upstream bridge port, which is the parent port for 05:00.0/1 It's showing AER error for 0501 which I assume is 5:00.1 The device is definitely configured for 32-bit BARs, not 64-bit. What if we just turn off AER? boot with noaer ? Created attachment 305879 [details]
before lspci
Created attachment 305880 [details]
after lspci
Created attachment 305881 [details]
dmesg after
Created attachment 305887 [details] dmesg with pci=noaer from https://forum.proxmox.com/threads/proxmox-8-kernel-6-2-16-4-pve-ixgbe-driver-fails-to-load-due-to-pci-device-probing-failure.131203/post-634945 From attachment 305887 [details] (comment #12), this looks wrong: Command line: ... pci=noaer acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3] acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug SHPCHotplug PME PCIeCapability LTR DPC] Linux apparently requested and was granted DPC control *without* requesting AER control, but if the OS requests DPC control, it is required to also request AER control (PCI Firmware r3.3, sec 4.5.1). This looks like a Linux bug here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/acpi/pci_root.c?id=v6.7#n530. I don't know if this has any connection to this problem, but I posted a patch for it: https://lore.kernel.org/linux-pci/20240220235520.1514548-1-helgaas@kernel.org/T/#u Later on it's still an issue in kernel 6.8.4 in proxmox 8.2 The intel x553 NIC First it didn't get a link at all. Then I download ixgbe 5.20.3 from sourceforge and try to compile it with errors. After bandaid the errors I got it to link, but the speeds are very unreliable. It was running at 100-300mb/s with wild variations for 25 mins and then it dropped to 40-100mb/s and this is for a 10g SPF+ link. I had to replace strlcpy with strncpy other it wouldn't compile. In the ethtool.c I had to remark out 2 array elements to get it to compile. Please investigate this and fix it. Also make the fix compatible with kernel 6.8.4 (In reply to Jonathan from comment #14) > Later on it's still an issue in kernel 6.8.4 in proxmox 8.2 > > The intel x553 NIC > > First it didn't get a link at all. So, you're not having the problem reported in this bug? The one where the driver won't load? > Then I download ixgbe 5.20.3 from sourceforge and try to compile it with > errors. thanks for troubleshooting with OOT, but we need to fix in-kernel! > Please investigate this and fix it. > Also make the fix compatible with kernel 6.8.4 If you're reporting a new link issue, we have a similar report on the mailing list right now. It links back to this issue but I'm not sure you're aware: https://lore.kernel.org/netdev/cbe874db-9ac9-42b8-afa0-88ea910e1e99@intel.com/ |