On Tue, 29.03.11 03:20, fykc...@gmail.com (fykc...@gmail.com) wrote: > 2011/3/28 Lennart Poettering <lenn...@poettering.net>: > > On Sun, 20.03.11 05:28, fykc...@gmail.com (fykc...@gmail.com) wrote: > >> Current readahead implementation has some problems: > >> 1. It can't separate *real* block read requests from all read > >> requests(which includes more blocks read by the kernel's readahead > >> logic) > > > > Shouldn't make a big difference, since on replay we turn off additional > > kernel-side readahead. > > > > However, it is true that the file will only ever increase, never > > decrease in size. > For collect, it can't filter out: > 1. Kernel-side readahead, whether the readahead is initiated by > kernel(when no /.readhead data), or the replay process.
That is true. But is that really a problem? Usually kernel readahead should be a useful optimization which shouldn't hurt much. And we will only apply it once, during the original run. It will not be done again one replay, since we disable it explicitly then. > 2. Written blocks of files(opened as "r+", "w+", "a"). The written > blocks resides at memory when boot time. Actually, now that I am looking into this it might actually be possible to distuingish read and write accesses to files, by using FAN_CLOSE_NOWRITE/FAN_CLOSE_WRITE instead of FAN_OPEN. I do wonder though why that isn't symmetric here... > IMHO, the kernel lacks some APIs to notify each *real* read requests. > e.g, It can be done by tracking each read syscall (mmap seems not easy > to handle, though). The kernel has quite a number of APIs, for example there is blktrace, and there are the newer syscall tracing APIs. But fanotify is actually the most useful of all of them. > >> 2. It just gives advices for how to do kernel's readahead, causes the > >> first read of a fille to spend more time. > > > > Hmm? > posix_fadvise(...) may make each read do more readahead(more than the > kernel guess way), thus spend more time. e.g. > * When no replay, someone reads A part of file X --> do some job --> > reads B part of file X. > * When replay, both A and B parts of file X are read in one time, thus > more I/O usage. Other services may spend more time waiting for > I/O.(This can be observed from bootchart diagram) The idea of readahead is to load as much IO requests into the kernel as possible, so that the IO elevator can decide what to read when and to reorder things as it likes and thinks is best. > BTW, does posix_fadvise apply globally or just for the process which > calls it? The kernel caches each block only once. > > We do that too. We use "idle" on SSD, and "realtime" on HDD. > Why "realtime" on HDD? Because on HDD seeks are very expensive. The idea of readahead is to rearrange our reads so that no seeks happen, i.e. we read things linearly in one big chunk. If accesses of other processes are interleaved with this then disk access will be practically random and the seeks will hurt. On SSD seeks are basically free, hence all we do is tell the kernel early what might be needed later so that that it reads it when it has nothing else to do. > BTW, According to test, the "idle" is not really *idle*, see the attachment. > That means 'replay' will always impact other one's I/O. For 'replay' > in idle I/O class on HDD, other one's I/O performance will reduce by > half, according to the test. That's probably something to fix in the elevator in the kernel? Lennart -- Lennart Poettering - Red Hat, Inc. _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel