Bug 8223
Summary: | RACE: Lock is expected before calling ata_dev_select, but in some call chains not held! | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Ding (yuanding02) |
Component: | Serial ATA | Assignee: | Tejun Heo (htejun) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | alan, bugzilla.kernel.org, bunk, htejun, jgarzik, tj |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.20.1 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Ding
2007-03-16 21:59:19 UTC
Hmmm... Am I missing something? How does ata_dev_revalidate() end up calling ata_bus_probe()? The call path is ata_dev_revalidate() -> ata_dev_read_id() -> ata_exec_internal[_sg](), this one acquires lock -> .qc_issue(). Thanks for your prompt reply! The comments before "ata_dev_select" says: "LOCKING: caller." The call chain: ata_dev_revalidate() -> ata_dev_read_id() -> ata_dev_select(), and then it calls ata_exec_internal[_sg], so shall I assume the lock is acquired AFTER calling ata_dev_select(), which violates the comment? Sorry in previous I messed up earlier version of kernel (2.6.6) No, what code are you looking at? ata_dev_read_id() calls ata_exec_internal() which calls ata_exec_internal_sg() which grabs the host lock and then calls ->qc_issue which in turn calls ata_dev_select(). Nothing wrong there. I'm closing the bug as INVALID. I think I am looking at version 2.6.20.1, and following is the link of it on cross-reference. http://lxr.linux.no/source/drivers/ata/libata-core.c#L1448 If I am right, I find on line 1462 ata_dev_read_id calls ata_dev_select(), and on line 1483, it further calls ata_exec_internal(). So when calling ata_dev_select() seems to me no lock is acquired. Am I making mistake? Thanks. No you aren't. I was mistaken. My apologies. Anyways, yeah, there is a race there. The direct call to ata_dev_select() in ata_dev_read_id() is there for mostly historical reason as ->qc_issue() is responsible for that anyway. It also can be dangerous on a PATA channel. Thanks for reporting this. Not sure whether that should be wrapped in host lock or just be killed. I'll post a message to linux-ide mailing list. Thanks for your all your prompt replies! I am also suspecting the following functions may be potential race case: ata_pio_devchk ata_devchk ata_mmio_devchk From the comment we see they all require the caller to hold the lock, but in the call chain: ata_bus_reset -> ata_pio_devchk -> ata_mmio_devchk|ata_pio_devchk, seems there is no lock. Am I making mistakes? Thanks. I might be too prompt today (writing faster than thinking). Anyways, yeah, it seems like a class of issues. We should be holding host lock whether we're calling from EH or not. Entering EH does take care of exclusion in the port but nothing is guaranteed host-wide and register accesses should be guaranteed host-wide exclusion. In practice, as EH actions don't happen frequently && modern controllers implement each port very independently, this doesn't cause real problem, but there still is a slim chance that EH operation going on one channel interferes operation going on another channel. For the record, I don't think fixing this problem is of very high priority as it has been working well until now. I'll post a patch for this in a few days. Thanks a lot. Thanks! Status ? Has this been fixed upstream ? Hello ping ?? The problem is still there. It definitely needs fixing but priority is still rather low. Ping ? Pong. Planning on working on it when moving EH to block layer before the end of this year, hopefully. :-) Its now march so ping again Ping again - Tejun did you look at this yet - its almost a year now ? This being so tedious, I was trying to think of some clever way to do it without changing every exception processing functions. Ehhh... I think I should just bite and do it. Will work on it. Thanks for reminding this. FYI: Tejun posted a patch and Jeff applied it: http://thread.gmane.org/gmane.linux.ide/47481 |