My timing indicates there appears to be no fair queueing mechanism for scsi commands issued from multiple processes to separate devices over a shared bus (e.g. SCSI or USB). For example, an io can easily be starved in queue for more than half a minute with just four processes issuing 16 slow ios to usb flash devices at a time. If I hold back ios in my application and reduce the number of ios queued in the kernel at a time to 2 per process it decreases average and maximum end to end latency to more tolerable levels. Much like memory, cpu or network hogging, unfair storage command queueing between processes would be considered by most people to be a performance related defect. Shared use, particularly of a congested bus, calls for time or bandwidth slicing to prevent starvation. The SCSI Generic HOWTO doesn't go into it, but is this the current kernel design? If so, perhaps libaio would be a more viable solution to concurrent io?