Thanks for reporting the results. Maybe you could submit a PR that updates
the ops section?

https://github.com/apache/kafka/blob/trunk/docs/ops.html

Ismael

On Fri, Jul 21, 2017 at 2:49 PM, Ovidiu-Cristian MARCU <
ovidiu-cristian.ma...@inria.fr> wrote:

> After some tuning, I got better results. What I changed, as suggested:
>
> dirty_ratio = 10 (previously 20)
> dirty_background_ratio=3 (previously 10)
>
> It results that disk read I/O is almost completely 0 (I have enough cache,
> the consumer is keeping with the producer).
>
> - producer throughput remains constant ~ 400K/s;
> - consumer throughput (a Flink app) stays in this interval [300K/s,
> 500K/s] even when the cache is filled (there are some variations but are
> not influenced by system’s cache);
>
> I don’t know if Kafka’s documentation is saying something, but this could
> be put somewhere in documentation if you also reproduce my tests and
> consider it useful.
>
> Thanks,
> Ovidiu
>
> > On 21 Jul 2017, at 01:57, Apurva Mehta <apu...@confluent.io> wrote:
> >
> > Hi Ovidu,
> >
> > The see-saw behavior is inevitable with linux when you have concurrent
> reads and writes. However, tuning the following two settings may help
> achieve more stable performance (from Jay's link):
> >
> > dirty_ratio
> > Defines a percentage value. Writeout of dirty data begins (via pdflush)
> when dirty data comprises this percentage of total system memory. The
> default value is 20.
> > Red Hat recommends a slightly lower value of 15 for database workloads.
> >
> > dirty_background_ratio
> > Defines a percentage value. Writeout of dirty data begins in the
> background (via pdflush) when dirty data comprises this percentage of total
> memory. The default value is 10. For database workloads, Red Hat recommends
> a lower value of 3.
> >
> > Thanks,
> > Apurva
> >
> >
> > On Thu, Jul 20, 2017 at 12:25 PM, Ovidiu-Cristian MARCU <
> ovidiu-cristian.ma...@inria.fr <mailto:ovidiu-cristian.ma...@inria.fr>>
> wrote:
> > Yes, I’m using Debian Jessie 2.6 installed on this hardware [1].
> >
> > It is also my understanding that Kafka is based on system’s cache (Linux
> in this case) which is based on Clock-Pro for page replacement policy,
> doing complex things for general workloads. I will check the tuning
> parameters, but I was hoping for some advices to avoid disk at all when
> reading, considering the system's cache is used completely by Kafka and is
> huge ~128GB - that is to tune Clock-Pro to be smarter when used for
> streaming access patterns.
> >
> > Thanks,
> > Ovidiu
> >
> > [1] https://www.grid5000.fr/mediawiki/index.php/Rennes:
> Hardware#Dell_Poweredge_R630_.28paravance.29 <https://www.grid5000.fr/
> mediawiki/index.php/Rennes:Hardware#Dell_Poweredge_R630_.28paravance.29> <
> https://www.grid5000.fr/mediawiki/index.php/Rennes:
> Hardware#Dell_Poweredge_R630_.28paravance.29 <https://www.grid5000.fr/
> mediawiki/index.php/Rennes:Hardware#Dell_Poweredge_R630_.28paravance.29>>
> >
> > > On 20 Jul 2017, at 21:06, Jay Kreps <j...@confluent.io <mailto:
> j...@confluent.io>> wrote:
> > >
> > > I suspect this is on Linux right?
> > >
> > > The way Linux works is it uses a percent of memory to buffer new
> writes, at a certain point it thinks it has too much buffered data and it
> gives high priority to writing that out. The good news about this is that
> the writes are very linear, well layed out, and high-throughput. The
> problem is that it leads to a bit of see-saw behavior.
> > >
> > > Now obviously the drop in performance isn't wrong. When your disk is
> writing data out it is doing work and obviously the read throughput will be
> higher when you are just reading and not writing then when you are doing
> both reading and writing simultaneously. So obviously you can't get the
> no-writing performance when you are also writing (unless you add I/O
> capacity).
> > >
> > > But still these big see-saws in performance are not ideal. You'd
> rather have more constant performance all the time rather than have linux
> bounce back and forth from writing nothing and then frantically writing
> full bore. Fortunately linux provides a set of pagecache tuning parameters
> that let you control this a bit.
> > >
> > > I think these docs cover some of the parameters:
> > > https://access.redhat.com/documentation/en-US/Red_Hat_
> Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-tunables.html <
> https://access.redhat.com/documentation/en-US/Red_Hat_
> Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-tunables.html> <
> https://access.redhat.com/documentation/en-US/Red_Hat_
> Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-tunables.html <
> https://access.redhat.com/documentation/en-US/Red_Hat_
> Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-tunables.html>>
> > >
> > > -Jay
> > >
> > > On Thu, Jul 20, 2017 at 10:24 AM, Ovidiu-Cristian MARCU <
> ovidiu-cristian.ma...@inria.fr <mailto:ovidiu-cristian.ma...@inria.fr>
> <mailto:ovidiu-cristian.ma...@inria.fr <mailto:ovidiu-cristian.marcu@
> inria.fr>>> wrote:
> > > Hi guys,
> > >
> > > I’m relatively new to Kafka’s world. I have an issue I describe below,
> maybe you can help me understand this behaviour.
> > >
> > > I’m running a benchmark using the following setup: one producer sends
> data to a topic and concurrently one consumer pulls and writes it to
> another topic.
> > > Measuring the consumer throughput, I observe values around 500K
> records/s only until the system’s cache gets filled - from this moment the
> consumer throughout drops to ~200K (2.5 times lower).
> > > Looking at disk usage, I observe disk read I/O which corresponds to
> the moment the consumer throughout drops.
> > > After some time, I kill the producer and immediately I observe the
> consumer throughput goes up to initial values ~ 500K records/s.
> > >
> > > What can I do to avoid this throughput drop?
> > >
> > > Attached an image showing disk I/O and CPU usage. I have about 128GB
> RAM on that server which gets filled at time ~2300.
> > >
> > > Thanks,
> > > Ovidiu
> > >
> > > <consumer-throughput-drops.png>
> > >
> >
> >
>
>

Reply via email to