Bug 13757
Created attachment 22300 [details]
Extract from /var/log/messages
Update: just booting both systems and leaving them running for about twenty minutes is sufficient to reproduce this phenomenon. Another update: I have not yet seen this report with 2.6.29.4 on the same setup. Created attachment 22303 [details]
Patch for triggering this issue sooner (provided by Roland)
Created attachment 22304 [details]
Lockdep report after having applied the provided patch.
Created attachment 22305 [details]
Lockdep report after having applied the provided patch.
Created attachment 22308 [details]
Proposed short-term solution (provided by Roland)
See also the discussion at http://lists.openfabrics.org/pipermail/general/2009-July/060644.html. Created attachment 22534 [details]
Lockdep locking inversion report for 2.6.30.3 kernel with workaround patch applied
Yesterday I found out that the proposed workaround doesn't solve all locking inversion issues unfortunately. The attached locking inversion report was obtained while testing module removal for ib_srpt.
(In reply to comment #9) > Created an attachment (id=22534) [details] > Lockdep locking inversion report for 2.6.30.3 kernel with workaround patch > applied > > Yesterday I found out that the proposed workaround doesn't solve all locking > inversion issues unfortunately. The attached locking inversion report was > obtained while testing module removal for ib_srpt. Update: the locking inversion report referred to above has been obtained with a kernel on which only the second patch (workaround.patch) was applied, and not the first (ib-lockdep-trigger.patch). I will retest this issue with a kernel on which both patches have been applied. Created attachment 22535 [details]
Lockdep complaint about a hardirq unsafe lock order.
This report was generated on a system equipped with an IB HCA and connected back-to-back to another system equipped with an IB HCA, and about four seconds after OpenSM generated the "SUBNET UP" event.
Created attachment 22624 [details]
Proposed fix (provided by Roland).
Created attachment 22625 [details] Locking inversion report for 2.6.30.4 + patch in attachment 22624 [details]. Unfortunately the newly proposed patch does not seem to fix all locking inversion issues. The attached locking inversion report was triggered by running "/etc/init.d/openibd restart" repeatedly on the system connected back-to-back to the system on which the lockdep report was generated. Created attachment 22631 [details]
Locking inversion report for 2.6.30.4 + patches in attachments 22303 and 22624.
As asked I ran a new test with both patches in attachments 22303 and 22624 applied.
Created attachment 22721 [details]
Fix for a (hard to trigger) locking cycle detected by lockdep.
Does no longer occur on a 2.6.30.4 kernel with the three attached patches applied. Created attachment 23083 [details]
(Deleted)
Apparently there are still locking inversion complaints with the latest infiniband.git/for-next tree. This report was generated during shutdown.
Comment on attachment 23083 [details]
(Deleted)
(Deleted)
(In reply to comment #17) > Created an attachment (id=23083) [details] > Locking inversion report for infiniband.git/for-next of 2009-09-05 16:38:12 > (2.6.31-rc9) > > Apparently there are still locking inversion complaints with the latest > infiniband.git/for-next tree. This report was generated during shutdown. Please ignore the above -- I have not observed any lockdep complaints with recent infiniband.git/for-next trees. The above lockdep complaint was generated by a 2.6.31 kernel. |
Created attachment 22299 [details] Kernel config Kernel: 2.6.30.1 with SCST zero-copy transfer completion notification and scsi_execute_fifo patches applied. These two patches do not modify any InfiniBand code. Setup: - Two servers connected back-to-back via InfiniBand. - OpenSM is running on one of the two servers. After having shut down one of the two servers, lockdep complained about possible irq lock inversion.