Re: use page cache as much as possiblee

Kishore Senji Sat, 15 Aug 2015 11:30:02 -0700

Please check this:
https://lonesysadmin.net/2013/12/22/better-linux-disk-caching-performance-vm-dirty_ratio/


I depends on your OS, please research those parameters appropriately and
see what they are on your current system. Based on those parameters the
background syncing will be done by the OS. But I would recommend not to
completely stop the OS from asynchronously writing the file cache to disk
as it would cause large pauses when it has to write because of file cache
outgrowing the available RAM.

On Fri, Aug 14, 2015 at 8:00 PM, Yuheng Du <yuheng.du.h...@gmail.com> wrote:

> Thank you Kishore, I see that the end-to-end latency may not be reduced by
> resetting the flush time manually.
>
> But if the default flush.ms is Long.Max_Value, why I see the disk usage of
> the brokers constantly increasing when the producer is pushing in data?
> Should that be happen once a while? The os page cache usage should not be
> reflected when using "watch df -h" command, am I correct?
>
> Thanks.
>
> On Fri, Aug 14, 2015 at 10:12 PM, Kishore Senji <kse...@gmail.com> wrote:
>
> > Actually in 0.8.2, flush.ms & flush.messages are recommended to be left
> > defaults (Long.MAX_VALUE)
> > http://kafka.apache.org/documentation.html (search for flush.ms)
> >
> > The disk flush and the committed offset are two independent things. As
> long
> > as you have replication, the recommended thing is to leave the flushing
> to
> > the OS. But if you choose to flush manually the time interval at which
> you
> > flush may not influence the end-to-end latency from the Producer to
> > Consumer, however it can influence the throughput of the broker.
> >
> > On Fri, Aug 14, 2015 at 9:20 AM Yuheng Du <yuheng.du.h...@gmail.com>
> > wrote:
> >
> > > So if I understand correctly, even if I delay flushing, the consumer
> will
> > > get the messages as soon as the broker receives them and put them into
> > page
> > > cache (assuming producer doesn't wait for acks from brokers)?
> > >
> > > And will the decrease of log.flush interval help reduce latency between
> > > producer and consumer?
> > >
> > > Thanks.
> > >
> > >
> > > On Fri, Aug 14, 2015 at 11:57 AM, Kishore Senji <kse...@gmail.com>
> > wrote:
> > >
> > > > Thank you Gwen for correcting me. This document (
> > > > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Replication)
> > in
> > > > "Writes" section also has specified the same thing as you have
> > mentioned.
> > > > One thing is not clear to me as to what happens when the Replicas add
> > the
> > > > message to memory but the leader fails before acking to the producer.
> > > Later
> > > > the leader replica is chosen to be the leader for the partition, it
> > will
> > > > advance the HW to its LEO (which has the message). The producer can
> > > resend
> > > > the same message thinking it failed and there will be a duplicate
> > > message.
> > > > Is my understanding correct here?
> > > >
> > > > On Thu, Aug 13, 2015 at 10:50 PM, Gwen Shapira <g...@confluent.io>
> > > wrote:
> > > >
> > > > > On Thu, Aug 13, 2015 at 4:10 PM, Kishore Senji <kse...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Consumers can only fetch data up to the committed offset and the
> > > reason
> > > > > is
> > > > > > reliability and durability on a broker crash (some consumers
> might
> > > get
> > > > > the
> > > > > > new data and some may not as the data is not yet committed and
> > lost).
> > > > > Data
> > > > > > will be committed when it is flushed. So if you delay the
> flushing,
> > > > > > consumers won't get those messages until that time.
> > > > > >
> > > > >
> > > > > As far as I know, this is not accurate.
> > > > >
> > > > > A message is considered committed when all ISR replicas received it
> > > (this
> > > > > much is documented). This doesn't need to include writing to disk,
> > > which
> > > > > will happen asynchronously.
> > > > >
> > > > >
> > > > > >
> > > > > > Even though you flush periodically based on
> > > log.flush.interval.messages
> > > > > and
> > > > > > log.flush.interval.ms, if the segment file is in the pagecache,
> > the
> > > > > > consumers will still benefit from that pagecache and OS wouldn't
> > read
> > > > it
> > > > > > again from disk.
> > > > > >
> > > > > > On Thu, Aug 13, 2015 at 2:54 PM Yuheng Du <
> > yuheng.du.h...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > As I understand it, kafka brokers will store the incoming
> > messages
> > > > into
> > > > > > > pagecache as much as possible and then flush them into disk,
> > right?
> > > > > > >
> > > > > > > But in my experiment where 90 producers is publishing data
> into 6
> > > > > > brokers,
> > > > > > > I see that the log directory on disk where broker stores the
> data
> > > is
> > > > > > > constantly increasing (every seconds.) So why this is
> happening?
> > > Does
> > > > > > this
> > > > > > > has to do with the default "log.flush.interval" setting?
> > > > > > >
> > > > > > > I want the broker to write to disk less often when serving some
> > > > on-line
> > > > > > > consumers to reduce latency. I tested in my broker the disk
> write
> > > > speed
> > > > > > is
> > > > > > > around 110MB/s.
> > > > > > >
> > > > > > > Thanks for any replies.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: use page cache as much as possiblee

Reply via email to