Bug 13757

Summary: Lockdep complains about possible irq lock inversion dependency
Product: Drivers Reporter: Bart Van Assche (bvanassche)
Component: Infiniband/RDMAAssignee: drivers_infiniband-rdma
Status: CLOSED CODE_FIX    
Severity: normal CC: alan
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.30.1 Subsystem:
Regression: No Bisected commit-id:
Attachments: Kernel config
Extract from /var/log/messages
Patch for triggering this issue sooner (provided by Roland)
Lockdep report after having applied the provided patch.
Lockdep report after having applied the provided patch.
Proposed short-term solution (provided by Roland)
Lockdep locking inversion report for 2.6.30.3 kernel with workaround patch applied
Lockdep complaint about a hardirq unsafe lock order.
Proposed fix (provided by Roland).
Locking inversion report for 2.6.30.4 + patch in attachment 22624.
Locking inversion report for 2.6.30.4 + patches in attachments 22303 and 22624.
Fix for a (hard to trigger) locking cycle detected by lockdep.
(Deleted)

Description Bart Van Assche 2009-07-10 16:34:00 UTC
Created attachment 22299 [details]
Kernel config

Kernel: 2.6.30.1 with SCST zero-copy transfer completion notification and scsi_execute_fifo patches applied. These two patches do not modify any InfiniBand code.

Setup:
- Two servers connected back-to-back via InfiniBand.
- OpenSM is running on one of the two servers.

After having shut down one of the two servers, lockdep complained about possible irq lock inversion.
Comment 1 Bart Van Assche 2009-07-10 16:34:36 UTC
Created attachment 22300 [details]
Extract from /var/log/messages
Comment 2 Bart Van Assche 2009-07-10 16:45:47 UTC
Update: just booting both systems and leaving them running for about twenty minutes is sufficient to reproduce this phenomenon.
Comment 3 Bart Van Assche 2009-07-10 16:47:02 UTC
Another update: I have not yet seen this report with 2.6.29.4 on the same setup.
Comment 4 Bart Van Assche 2009-07-10 19:23:58 UTC
Created attachment 22303 [details]
Patch for triggering this issue sooner (provided by Roland)
Comment 5 Bart Van Assche 2009-07-10 19:26:56 UTC
Created attachment 22304 [details]
Lockdep report after having applied the provided patch.
Comment 6 Bart Van Assche 2009-07-10 19:28:12 UTC
Created attachment 22305 [details]
 Lockdep report after having applied the provided patch.
Comment 7 Bart Van Assche 2009-07-11 09:08:42 UTC
Created attachment 22308 [details]
Proposed short-term solution (provided by Roland)
Comment 8 Bart Van Assche 2009-07-30 07:27:10 UTC
See also the discussion at http://lists.openfabrics.org/pipermail/general/2009-July/060644.html.
Comment 9 Bart Van Assche 2009-07-30 09:42:32 UTC
Created attachment 22534 [details]
Lockdep locking inversion report for 2.6.30.3 kernel with workaround patch applied

Yesterday I found out that the proposed workaround doesn't solve all locking inversion issues unfortunately. The attached locking inversion report was obtained while testing module removal for ib_srpt.
Comment 10 Bart Van Assche 2009-07-30 10:36:33 UTC
(In reply to comment #9)
> Created an attachment (id=22534) [details]
> Lockdep locking inversion report for 2.6.30.3 kernel with workaround patch
> applied
> 
> Yesterday I found out that the proposed workaround doesn't solve all locking
> inversion issues unfortunately. The attached locking inversion report was
> obtained while testing module removal for ib_srpt.

Update: the locking inversion report referred to above has been obtained with a kernel on which only the second patch (workaround.patch) was applied, and not the first (ib-lockdep-trigger.patch). I will retest this issue with a kernel on which both patches have been applied.
Comment 11 Bart Van Assche 2009-07-30 11:07:21 UTC
Created attachment 22535 [details]
Lockdep complaint about a hardirq unsafe lock order.

This report was generated on a system equipped with an IB HCA and connected back-to-back to another system equipped with an IB HCA, and about four seconds after OpenSM generated the "SUBNET UP" event.
Comment 12 Bart Van Assche 2009-08-06 09:43:52 UTC
Created attachment 22624 [details]
Proposed fix (provided by Roland).
Comment 13 Bart Van Assche 2009-08-06 09:54:41 UTC
Created attachment 22625 [details]
Locking inversion report for 2.6.30.4 + patch in attachment 22624 [details].

Unfortunately the newly proposed patch does not seem to fix all locking inversion issues. The attached locking inversion report was triggered by running "/etc/init.d/openibd restart" repeatedly on the system connected back-to-back to the system on which the lockdep report was generated.
Comment 14 Bart Van Assche 2009-08-07 09:40:50 UTC
Created attachment 22631 [details]
Locking inversion report for 2.6.30.4 + patches in attachments 22303 and 22624.

As asked I ran a new test with both patches in attachments 22303 and 22624 applied.
Comment 15 Bart Van Assche 2009-08-15 06:26:41 UTC
Created attachment 22721 [details]
Fix for a (hard to trigger) locking cycle detected by lockdep.
Comment 16 Bart Van Assche 2009-08-16 15:48:46 UTC
Does no longer occur on a 2.6.30.4 kernel with the three attached patches applied.
Comment 17 Bart Van Assche 2009-09-13 07:43:08 UTC
Created attachment 23083 [details]
(Deleted)

Apparently there are still locking inversion complaints with the latest infiniband.git/for-next tree. This report was generated during shutdown.
Comment 18 Bart Van Assche 2009-09-13 07:45:06 UTC
Comment on attachment 23083 [details]
(Deleted)

(Deleted)
Comment 19 Bart Van Assche 2009-09-13 07:46:54 UTC
(In reply to comment #17)
> Created an attachment (id=23083) [details]
> Locking inversion report for infiniband.git/for-next of 2009-09-05 16:38:12
> (2.6.31-rc9)
> 
> Apparently there are still locking inversion complaints with the latest
> infiniband.git/for-next tree. This report was generated during shutdown.

Please ignore the above -- I have not observed any lockdep complaints with recent infiniband.git/for-next trees. The above lockdep complaint was generated by a 2.6.31 kernel.