Bug 197605 - dmatest leaves dangling pointer when timeout expires which leads to deadlock if test is re-run
Summary: dmatest leaves dangling pointer when timeout expires which leads to deadlock ...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-10-31 16:20 UTC by Adam Wallis
Modified: 2017-11-10 17:47 UTC (History)
5 users (show)

See Also:
Kernel Version: v4.14-rc7
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Adam Wallis 2017-10-31 16:20:26 UTC
Using the dmatest module (I was testing with it built-in), if the timeout parameter is set such that an actual timeout occurs before a DMA operation completes in the dmatest (dmatest.c), the "done_wait" structure will be left with a dangling pointer that is not handled. 

an example of a message that might be printed out right before we lockup the system:

[  249.596529] dmatest: dma5chan0-copy0: result #10: 'test timed out' with src_off=0x9f300 dst_off=0xdf00

Often after this situation occurs, we arrive at what appears to be a deadlock (sometimes just by rerunning the same test)...but this is due to the fact that the spinlock structure for the wait queues was corrupted by the unfreed pointer that gets reinitialized on the stack.

Surprisingly, the fact that this bug exists is documented in the very code where the issue occurs: http://elixir.free-electrons.com/linux/v4.14-rc7/source/drivers/dma/dmatest.c#L698 

Further information on the race condition can be found in this original patch: https://lkml.org/lkml/2011/11/21/381

At the very least, the kernel should probably BUG out if (!done.done)...but, would ideally be fixed properly
Comment 1 Shunyong Yang 2017-11-02 01:24:10 UTC
In stress test, this occurs in some low possibility. Shall we move the wait queue to thread info of each thread?
Comment 2 Adam Wallis 2017-11-08 15:44:15 UTC
It happens with increased regularity if 1)You are running on a slower system (e.g., emulation, simulation) 2) You use an unrealistically low timeout value 3) Your DMA legitimately has an issue and times out

Moving the wait queue to each thread sounds like it could work.
Comment 3 Adam Wallis 2017-11-10 17:47:22 UTC
Proposed patch to fix the issue submitted: https://patchwork.kernel.org/patch/10053507/

Note You need to log in before you can comment on or make changes to this bug.