Kernel booting "fails" on a certain type of chassis we have in production use. It's got a forcedeth NIC on the board (two actually!); system is not responsive to sysrq combinations during its big pause, it's deadlocked in a loop for about seven hours after which it will boot perfectly normally. Our chassis is: http://www.supermicro.com/Aplus/system/1U/1011/AS-1011M-T2.cfm dmesg output from the driver: [ 35.309834] forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60. [ 35.317568] ACPI: PCI Interrupt Link [LMAC] enabled at IRQ 23 [ 35.323305] ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LMAC] -> GSI 23 (level, low) -> IRQ 23 [ 35.332026] forcedeth: using HIGHDMA [ 35.335620] 0000:00:08.0: link timer on. [ 35.339537] 0000:00:08.0: mgmt unit is running. mac in use 40000000. Thanks :)
Original downstream report: https://bugs.gentoo.org/show_bug.cgi?id=197561 There's a codepath in the forcedeth probe routines that can potentially delay boot up to 7 hours. Inside nv_probe: if (id->driver_data & DEV_HAS_MGMT_UNIT) { [...] for (i = 0; i < 5000; i++) { [...] if (nv_mgmt_acquire_sema(dev)) { So, nv_mgmt_acquire_sema() may be called 5000 times. Inside nv_mgmt_acquire_sema(): for (i = 0; i < 10; i++) { [...] msleep(500); } So, nv_mgmt_acquire_sema() may sleep for up to 5 seconds. 5 seconds * 5000 = almost 7 hours Alex's hardware hits this exact code path: nv_mgmt_acquire_sema() never manages to acquire the semaphore, so is called 5000 times, and boot is delayed. We tried disabling the loop, i.e. for (i = 0; i < 5000; i++) { becomes: for (i = 0; i < 0; i++) { In this case, the system boots as normal (no delay) and networking works fine.
rofl. This looks pretty simple to fix.
I'm not sure I see the architectural purpose to the loop if after running 5000 times and delaying my boot by seven hours, it fails through anyway without even a warning *and* my network interface works fine .... :) Unfortunately a fix more complex than simply removing the loop is beyond me! Thus far I've not noticed any anomalous behaviour when the code within the loop isn't being run, but I've only been using it that way for 2-3 hours on <=4 boxes! I'm more than happy to tinker and test any patches you guys come up with.
Andrew, well, the question still remains, why did forcedeth on Alex's hardware fail to acquire this hardware-based semaphore even after 7 hours? I hope someone with more knowledge of the hardware can comment. But yes, I agree that even if Alex didn't have this system in this strange situation, it should be regarded as a bug anyway that we have a codepath that can take 7 hours. :)
Thanks for catching this. The IPMI firmware is holding the semaphore. This is most likely a bug in the firmware. You can try to upgrade the firmware by contacting Supermicro. But, from a driver point of view, yes, we need to reduce the timeout :) The extra outer loop is not needed.
Created attachment 13490 [details] This patch removes the outer loop in mgmt unit detection code.
Ayaz, will you push this to Linus? It's still not upstream AFAICS.
I just sent the patch to netdev and stable kernel list.