Distribution: debian Hardware Environment: Software Environment: Problem Description: Using POSIX threads makes posix_locks_deadlock() loop infinitely. Steps to reproduce: There are 3 processes; "A", "B" and "C". Each process opens "file1", "file2" and "file3". "A" creates 3 POSIX threads; "A1", "A2" and "A3". (These threads' PIDs are the same.) "B" creates 2 POSIX threads; "B1" and "B2". (These threads' PIDs are the same.) "C" creates 1 POSIX thread; "C1". 1. "A1" locks "file1" exclusively. 2. "B1" locks "file2" exclusively. 3. "C1" locks "file3" exclusively. 4. "A2" trys to lock "file2" and sleeps (blocked by "B1"). 5. "A3" trys to lock "file3" and sleeps (blocked by "C1"). 6. "B2" trys to lock "file1". Here, posix_locks_deadlock() must detect deadlock, but it does not. Then "B2" sleeps (blocked "A1"). "A" "B" | | +--A1(LOCK)---->(file1)<-B2(*deadlock)-+ | | | | +--A2(BLOCK)--->(file2)<---B1(LOCK)----+ | | +--A3(BLOCK)--->(file3)<---C1(LOCK)----"C" block_list -> A3(BLOCK) -> A2(BLOCK) 7. "C1" unlocks "file3". ("A3" wakes up and locks "file3".) 8. "A3" unlocks "file3". 9. Another process locks "file1". This makes posix_locks_deadlock() loop infinitely.
Can you tell is this problem is still present in recent kernels?
Recent kernel works fine. Thank you.
Sorry, my previous report was not correct. It seems that this problem is still present in debien's 2.6.8-3-686 kernel. I do not test newer vanilla kernel such as 2.6.17.
It'd be interesting to get results from 2.6.17, as 2.6.8 was released on August 2004
I tested 2.6.17 on uni processer PC. This kernel also has this problem.
:/ CC'ing Ingo Molnar
We simulated the above given situation using lockf and fcntl functions to aquire locks on files by threads on 2.6.17.6 kernel on i386 arch. We found that lockf generates error EDEADLK which tells that : The command was T_LOCK and this lock operation would cause a deadlock. Hence the deadlock situation is already handled in this case. Also while debugging we found that the 'posix_locks_deadlock()' function is not looping infinitely. Regards, Tanmay and Pranav (Calsoftinc, India)
i-sakamoto, are you still having locking problem with recent kernels? #7 from Tinmay suggested that the problem was supposedly fixed. Thanks.
Created attachment 13337 [details] fix infinite loop in posix_locks_deadlock The attached patch should address the infinite loop which can be hit in the case described above; submitted for 2.6.24 and 2.6.23.x. The problem is that (as noted in the comment above posix_locks_deadlock()), the loop there only terminates if the existing lock dependency graph only terminates if the existing lock dependency graph is acyclic. But the posix_locks_deadlock() function is only guaranteed to detect (and prevent) such cycles if each node of that graph has outdegree 1. In the example above, the node corresponding to lock owner A has outdegree 2 (because it's blocking on two locks at once--something not possible in the presence only of traditional posix processes)--so posix_locks_deadlock() misses the cycle that's created at step 6, which allows it to go into an infinite loop when it's called again at step 9.
It looks like the code was commited (and later reworked further). Closing the bug, thanks.