1. Perhaps a human-readable log, being write-only, and which may buffer on the user side or in the kernel, may be more efficient because small writes are accumulated in a buffer (cheap) actually pushed to disk (less cheap)? If you mmap'd this instead, how do you feel it would behave?
2. Did you read the link to the post about mmapping? The guy knows more about it than I'll probably ever know and he says it's not that simple. He's saying mmap not a magic answer to anything. This bit may be relevant: "APPLICATION BUFFERS WHICH EASILY FIT IN THE L2 CACHE COST VIRTUALLY NOTHING ON A MODERN CPU!" (NB. post is from 2004 so with cache+cpu VS ram/disk discrepancies growing larger, it may be more true). Kafka messages can be largish so perhaps that suggests why they use it for data files. If this comes across as bit rude that wasn't intended. I can't really answer your question, just suggest a bit of reading and some guesswork. cheers jan On 13/02/2018, YuFeng Shen <v...@hotmail.com> wrote: > If that is like what you said , why index file use the memory mapped file? > > ________________________________ > From: jan <rtm4...@googlemail.com> > Sent: Monday, February 12, 2018 9:26 PM > To: users@kafka.apache.org > Subject: Re: why kafka index file use memory mapped files ,however log file > doesn't > > A human-readable log file is likely to have much less activity in it > (it was a year ago I was using kafka and we could eat up gigs for the > data files but the log files were a few meg). So there's perhaps > little to gain. > > Also if the power isn't pulled and the OS doesn't crash, log messages > will be, I guess, buffered by the OS then written out as a full > buffer, or perhaps every nth tick if the buffer fills up very slowly. > So it's still reasonably efficient. > > Adding a few hundred context switches a second for the human log > probably isn't a big deal. I remember seeing several tens of > thousands/sec when using kafka (although it was other processes > running on those multicore machines to be fair). I guess logging > overhead is down in the noise, though that's just a guess. > > Also I remember reading a rather surprising post about mmaping. Just > found it > <https://lists.freebsd.org/pipermail/freebsd-questions/2004-June/050371.html>. > Sniplets: > "There are major hardware related overheads to the use of mmap(), on > *ANY* operating system, that cannot be circumvented" > -and- > "you are assuming that copying is always bad (it isn't), that copying > is always horrendously expensive (it isn't), that memory mapping is > always cheap (it isn't cheap)," > > A bit vague on my part, but HTH anyway > > jan > > > On 12/02/2018, YuFeng Shen <v...@hotmail.com> wrote: >> Hi jan , >> >> I think the reason is the same as why index file using memory mapped >> file. >> >> As the memory mapped file can avoid the data copy between user and kernel >> buffer space, so it can improve the performance for the index file IO >> operation ,right? If it is ,why Log file cannot achieve the same >> performance >> improvement as memory mapped index file? >> >> >> Jacky >> >> >> ________________________________ >> From: jan <rtm4...@googlemail.com> >> Sent: Saturday, February 10, 2018 8:33 PM >> To: users@kafka.apache.org >> Subject: Re: why kafka index file use memory mapped files ,however log >> file >> doesn't >> >> I'm not sure I can answer your question, but may I pose another in >> return: why do you feel having a memory mapped log file would be a >> good thing? >> >> >> On 09/02/2018, YuFeng Shen <v...@hotmail.com> wrote: >>> Hi Experts, >>> >>> We know that kafka use memory mapped files for it's index files ,however >>> it's log files don't use the memory mapped files technology. >>> >>> May I know why index files use memory mapped files, however log files >>> don't >>> use the same technology? >>> >>> >>> Jacky >>> >> >