Description Bart Van Assche 2009-07-10 16:34:00 UTC
Created attachment 22299 [details] Kernel config Kernel: 126.96.36.199 with SCST zero-copy transfer completion notification and scsi_execute_fifo patches applied. These two patches do not modify any InfiniBand code. Setup: - Two servers connected back-to-back via InfiniBand. - OpenSM is running on one of the two servers. After having shut down one of the two servers, lockdep complained about possible irq lock inversion.
Comment 1 Bart Van Assche 2009-07-10 16:34:36 UTC
Created attachment 22300 [details] Extract from /var/log/messages
Comment 2 Bart Van Assche 2009-07-10 16:45:47 UTC
Update: just booting both systems and leaving them running for about twenty minutes is sufficient to reproduce this phenomenon.
Comment 3 Bart Van Assche 2009-07-10 16:47:02 UTC
Another update: I have not yet seen this report with 188.8.131.52 on the same setup.
Comment 4 Bart Van Assche 2009-07-10 19:23:58 UTC
Created attachment 22303 [details] Patch for triggering this issue sooner (provided by Roland)
Comment 5 Bart Van Assche 2009-07-10 19:26:56 UTC
Created attachment 22304 [details] Lockdep report after having applied the provided patch.
Comment 6 Bart Van Assche 2009-07-10 19:28:12 UTC
Created attachment 22305 [details] Lockdep report after having applied the provided patch.
Comment 7 Bart Van Assche 2009-07-11 09:08:42 UTC
Created attachment 22308 [details] Proposed short-term solution (provided by Roland)
Comment 8 Bart Van Assche 2009-07-30 07:27:10 UTC
See also the discussion at http://lists.openfabrics.org/pipermail/general/2009-July/060644.html.
Comment 9 Bart Van Assche 2009-07-30 09:42:32 UTC
Created attachment 22534 [details] Lockdep locking inversion report for 184.108.40.206 kernel with workaround patch applied Yesterday I found out that the proposed workaround doesn't solve all locking inversion issues unfortunately. The attached locking inversion report was obtained while testing module removal for ib_srpt.
Comment 10 Bart Van Assche 2009-07-30 10:36:33 UTC
(In reply to comment #9) > Created an attachment (id=22534) [details] > Lockdep locking inversion report for 220.127.116.11 kernel with workaround patch > applied > > Yesterday I found out that the proposed workaround doesn't solve all locking > inversion issues unfortunately. The attached locking inversion report was > obtained while testing module removal for ib_srpt. Update: the locking inversion report referred to above has been obtained with a kernel on which only the second patch (workaround.patch) was applied, and not the first (ib-lockdep-trigger.patch). I will retest this issue with a kernel on which both patches have been applied.
Comment 11 Bart Van Assche 2009-07-30 11:07:21 UTC
Created attachment 22535 [details] Lockdep complaint about a hardirq unsafe lock order. This report was generated on a system equipped with an IB HCA and connected back-to-back to another system equipped with an IB HCA, and about four seconds after OpenSM generated the "SUBNET UP" event.
Comment 12 Bart Van Assche 2009-08-06 09:43:52 UTC
Created attachment 22624 [details] Proposed fix (provided by Roland).
Comment 13 Bart Van Assche 2009-08-06 09:54:41 UTC
Created attachment 22625 [details] Locking inversion report for 18.104.22.168 + patch in attachment 22624 [details]. Unfortunately the newly proposed patch does not seem to fix all locking inversion issues. The attached locking inversion report was triggered by running "/etc/init.d/openibd restart" repeatedly on the system connected back-to-back to the system on which the lockdep report was generated.
Comment 14 Bart Van Assche 2009-08-07 09:40:50 UTC
Created attachment 22631 [details] Locking inversion report for 22.214.171.124 + patches in attachments 22303 and 22624. As asked I ran a new test with both patches in attachments 22303 and 22624 applied.
Comment 15 Bart Van Assche 2009-08-15 06:26:41 UTC
Created attachment 22721 [details] Fix for a (hard to trigger) locking cycle detected by lockdep.
Comment 16 Bart Van Assche 2009-08-16 15:48:46 UTC
Does no longer occur on a 126.96.36.199 kernel with the three attached patches applied.
Comment 17 Bart Van Assche 2009-09-13 07:43:08 UTC
Created attachment 23083 [details] (Deleted) Apparently there are still locking inversion complaints with the latest infiniband.git/for-next tree. This report was generated during shutdown.
Comment 18 Bart Van Assche 2009-09-13 07:45:06 UTC
Comment on attachment 23083 [details] (Deleted) (Deleted)
Comment 19 Bart Van Assche 2009-09-13 07:46:54 UTC
(In reply to comment #17) > Created an attachment (id=23083) [details] > Locking inversion report for infiniband.git/for-next of 2009-09-05 16:38:12 > (2.6.31-rc9) > > Apparently there are still locking inversion complaints with the latest > infiniband.git/for-next tree. This report was generated during shutdown. Please ignore the above -- I have not observed any lockdep complaints with recent infiniband.git/for-next trees. The above lockdep complaint was generated by a 2.6.31 kernel.