Hello! On Tue 14-01-20 17:12:13, Christoph Hellwig wrote: > Asynchronous read/write operations currently use a rather magic locking > scheme, were access to file data is normally protected using a rw_semaphore, > but if we are doing aio where the syscall returns to userspace before the > I/O has completed we also use an atomic_t to track the outstanding aio > ops. This scheme has lead to lots of subtle bugs in file systems where > didn't wait to the count to reach zero, and due to its adhoc nature also > means we have to serialize direct I/O writes that are smaller than the > file system block size. > > All this is solved by releasing i_rwsem only when the I/O has actually > completed, but doings so is against to mantras of Linux locking primites: > > (1) no unlocking by another process than the one that acquired it > (2) no return to userspace with locks held
I'd like to note that using i_dio_count has also one advantage you didn't mention. For AIO case, if you need to hold i_rwsem in exclusive mode, holding the i_rwsem just for submission part is a significant performance advantage (shorter lock hold times allow for higher IO parallelism). I guess this could be mitigated by downgrading the lock to shared mode once the IO is submitted. But there will be still some degradation visible for the cases of mixed exclusive and shared acquisitions because shared holders will be blocking exclusive ones for longer time. This may be especially painful for filesystems that don't implement DIO overwrites with i_rwsem in shared mode... Honza -- Jan Kara <j...@suse.com> SUSE Labs, CR