Re: why kafka index file use memory mapped files ,however log file doesn't

jan Tue, 13 Feb 2018 06:54:45 -0800

1. Perhaps a human-readable log, being write-only, and which may
buffer on the user side or in the kernel, may be more efficient
because small writes are accumulated in a buffer (cheap) actually
pushed to disk (less cheap)? If you mmap'd this instead, how do you
feel it would behave?


2. Did you read the link to the post about mmapping? The guy knows
more about it than I'll probably ever know and he says it's not that
simple. He's saying mmap not a magic answer to anything.
This bit may be relevant: "APPLICATION BUFFERS WHICH EASILY FIT IN THE
L2 CACHE COST VIRTUALLY NOTHING ON A MODERN CPU!" (NB. post is from
2004 so with cache+cpu VS ram/disk discrepancies growing larger, it
may be more true).
Kafka messages can be largish so perhaps that suggests why they use it
for data files.

If this comes across as bit rude that wasn't intended. I can't really
answer your question, just suggest a bit of reading and some
guesswork.

cheers

jan

On 13/02/2018, YuFeng Shen <v...@hotmail.com> wrote:
>  If that is like what you said , why index file use the memory mapped file?
>
> ________________________________
> From: jan <rtm4...@googlemail.com>
> Sent: Monday, February 12, 2018 9:26 PM
> To: users@kafka.apache.org
> Subject: Re: why kafka index file use memory mapped files ,however log file
> doesn't
>
> A human-readable log file is likely to have much less activity in it
> (it was a year ago I was using kafka and we could eat up gigs for the
> data files but the log files were a few meg). So there's perhaps
> little to gain.
>
> Also if the power isn't pulled and the OS doesn't crash, log messages
> will be, I guess, buffered by the OS then written out as a full
> buffer, or perhaps every nth tick if the buffer fills up very slowly.
> So it's still reasonably efficient.
>
> Adding a few hundred context switches a second for the human log
> probably isn't a big deal. I remember seeing several tens of
> thousands/sec  when using kafka (although it was other processes
> running on those multicore machines to be fair). I guess logging
> overhead is down in the noise, though that's just a guess.
>
> Also I remember reading a rather surprising post about mmaping. Just
> found it
> <https://lists.freebsd.org/pipermail/freebsd-questions/2004-June/050371.html>.
> Sniplets:
> "There are major hardware related overheads to the use of mmap(), on
> *ANY* operating system, that cannot be circumvented"
> -and-
> "you are assuming that copying is always bad (it isn't), that copying
> is always horrendously expensive (it isn't), that memory mapping is
> always cheap (it isn't cheap),"
>
> A bit vague on my part, but HTH anyway
>
> jan
>
>
> On 12/02/2018, YuFeng Shen <v...@hotmail.com> wrote:
>> Hi jan ,
>>
>> I think the reason is the same as why index file using  memory mapped
>> file.
>>
>> As the memory mapped file can avoid the data copy between user and kernel
>> buffer space, so it can improve the performance for the index file IO
>> operation ,right? If it is ,why Log file cannot achieve the same
>> performance
>> improvement as memory mapped index file?
>>
>>
>> Jacky
>>
>>
>> ________________________________
>> From: jan <rtm4...@googlemail.com>
>> Sent: Saturday, February 10, 2018 8:33 PM
>> To: users@kafka.apache.org
>> Subject: Re: why kafka index file use memory mapped files ,however log
>> file
>> doesn't
>>
>> I'm not sure I can answer your question, but may I pose another in
>> return: why do you feel having a memory mapped log file would be a
>> good thing?
>>
>>
>> On 09/02/2018, YuFeng Shen <v...@hotmail.com> wrote:
>>> Hi Experts,
>>>
>>> We know that kafka use memory mapped files for it's index files ,however
>>> it's log files don't use the memory mapped files technology.
>>>
>>> May I know why index files use memory mapped files, however log files
>>> don't
>>> use the same technology?
>>>
>>>
>>> Jacky
>>>
>>
>

Re: why kafka index file use memory mapped files ,however log file doesn't

Reply via email to