Hey all,

Just wanted to confirm, this was totally our issue.  Thank so much Todd and
Matt, our cluster is much more stable now.

Apache Kafka folks:  I know 0.8.3 is slated to come out soon, but this is a
pretty serious bug.  I would think it would merit a minor release just to
get it out there, so that others don't run into this problem.  0.8.2.1
basically does not work at scale with snappy compression.  I will add a
comment to https://issues.apache.org/jira/browse/KAFKA-2189 noting this too.

Thanks so much!
-Andrew

On Tue, Aug 11, 2015 at 3:43 PM, Matthew Bruce <mbr...@blackberry.com>
wrote:

> Hi Andrew,
>
>
>
> I work with Todd and did our 0.8.2.1 testing with him.  I believe that the
> Kafka 0.8.x brokers recompresses the messages once it receives them in,
> order to assign the offsets to the messages (see the ‘Compression in Kafka’
> section of:
> http://nehanarkhede.com/2013/03/28/compression-in-kafka-gzip-or-snappy/).
> I expect that you will see an improvement with Snappy 1.1.1.7  (FWIW, our
> load generator’s version of Snappy didn’t change between our 0.8.1.1 and
> 0.8.2.1 testing, and we still saw the IO hit on the broker side, which
> seems to confirm this).
>
>
>
> Thanks,
>
> Matt Bruce
>
>
>
>
>
> *From:* Andrew Otto [mailto:ao...@wikimedia.org]
> *Sent:* Tuesday, August 11, 2015 3:15 PM
> *To:* users@kafka.apache.org
> *Cc:* Dan Andreescu <dandree...@wikimedia.org>; Joseph Allemandou <
> jalleman...@wikimedia.org>
> *Subject:* Re: 0.8.2.1 upgrade causes much more IO
>
>
>
> Hi Todd,
>
>
>
> We are using snappy!  And we are using version 1.1.1.6 as of our upgrade
> to 0.8.2.1 yesterday.  However, as far as I can tell, that is only relevant
> for Java producers, right?   Our main producers use librdkafka (the Kafka C
> lib) to produce, and in doing so use a built in C version of snappy[1].
>
>
>
> Even so, your issue sounds very similar to mine, and I don’t have a full
> understanding of how brokers deal with compression, so I have updated the
> snappy java version to 1.1.1.7 on one of our brokers.  We’ll have to wait a
> while to see if the log sizes are actually smaller for data written to this
> broker.
>
>
>
> Thanks!
>
>
>
>
>
>
>
>
>
> [1] https://github.com/edenhill/librdkafka/blob/0.8.5/src/snappy.c
>
> On Aug 11, 2015, at 12:58, Todd Snyder <tsny...@blackberry.com> wrote:
>
>
>
> Hi Andrew,
>
>
>
> Are you using Snappy Compression by chance?  When we tested the 0.8.2.1
> upgrade initially we saw similar results and tracked it down to a problem
> with Snappy version 1.1.1.6 (
> https://issues.apache.org/jira/browse/KAFKA-2189).  We’re running with
> Snappy 1.1.1.7 now and the performance is back to where it used to be.
>
>
>
>
>
> Sent from my BlackBerry 10 smartphone on the TELUS network.
>
> *From: *Andrew Otto
>
> *Sent: *Tuesday, August 11, 2015 12:26 PM
>
> *To: *users@kafka.apache.org
>
> *Reply To: *users@kafka.apache.org
>
> *Cc: *Dan Andreescu; Joseph Allemandou
>
> *Subject: *0.8.2.1 upgrade causes much more IO
>
>
>
> Hi all!
>
>
>
> Yesterday I did a production upgrade of our 4 broker Kafka cluster from
> 0.8.1.1 to 0.8.2.1.
>
>
>
> When we did so, we were running our (varnishkafka) producers with
> request.required.acks = -1.  After switching to 0.8.2.1, producers saw
> produce response RTTs of >60 seconds.  I then switched to
> request.required.acks = 1, and producers settled down.  However, we then
> started seeing flapping ISRs about every 10 minutes.  We run Camus every 10
> minutes.  If we disable Camus, then ISRs don’t flap.
>
>
>
> All of these issues seem to be a side affect of a larger problem.  The
> total amount of network and disk IO that Kafka brokers are doing after the
> upgrade to 0.8.2.1 has tripled.  We were previously seeing about 20 MB/s
> incoming on broker interfaces, 0.8.2.1 knocks this up to around 60 MB/s.
> Disk writes have tripled accordingly.  Disk reads have also increased by a
> huge amount, although I suspect this is a consequence of more data flying
> around somehow dirtying the disk cache
>
>
>
> You can see these changes in this dashboard:
> http://grafana.wikimedia.org/#/dashboard/db/kafka-0821-upgrade
>
>
>
> The upgrade started at around 2015-08-10 14:30, and was completed on all 4
> brokers within a couple of hours.
>
>
>
> Probably the most relevant is network rx_bytes on brokers.
>
>
>
>
>
>
>
> We looked at Kafka .log file sizes and noticed that file sizes are indeed
> much larger than they were before this upgrade:
>
>
>
> # 0.8.1.1
>
> 2015-08-10T04 38119109383
>
> 2015-08-10T05 46172089174
>
> 2015-08-10T06 46172182745
>
> 2015-08-10T07 53151490032
>
> 2015-08-10T08 53151892928
>
> 2015-08-10T09 55836248198
>
> 2015-08-10T10 57984054557
>
> 2015-08-10T11 63353197416
>
> 2015-08-10T12 68184938548
>
> 2015-08-10T13 69259218741
>
> 2015-08-10T14 79567698089
>
> # Upgrade to 0.8.2.1 starts here
>
> 2015-08-10T15 133643184876
>
> 2015-08-10T16 168515916825
>
> 2015-08-10T17 181394338213
>
> 2015-08-10T18 177097927553
>
> 2015-08-10T19 183530782549
>
> 2015-08-10T20 178706680082
>
> 2015-08-10T21 178712665924
>
> 2015-08-10T22 171741495606
>
> 2015-08-10T23 169049665348
>
> 2015-08-11T00 163682183241
>
> 2015-08-11T01 165292426510
>
>
>
>
>
> Aside from the request.required.acks change I mentioned above, we haven’t
> made any config changes on brokers, producers, or consumers.  Our
> server.properties file is here:
> https://gist.github.com/ottomata/cdd270102287661c176a
>
>
>
> Has anyone seen this before?  What could be the cause of more data here?
> Perhaps there is some compression config change that we missed that is
> causing this data to be sent or saved uncompressed?  (Sent uncompressed is
> unlikely, as we would probably notice a larger network change on the
> producers than we do.  (Unless I’m looking at that wrong right now…:))  Is
> there a quick way to tell if the data is compressed?
>
>
>
>
>
> Thanks!
>
> -Andrew Otto
>
>
>
>
>
> ---------------------------------------------------------------------
> This transmission (including any attachments) may contain confidential
> information, privileged material (including material protected by the
> solicitor-client or other applicable privileges), or constitute non-public
> information. Any use of this information by anyone other than the intended
> recipient is prohibited. If you have received this transmission in error,
> please immediately reply to the sender and delete this information from
> your system. Use, dissemination, distribution, or reproduction of this
> transmission by unintended recipients is not authorized and may be unlawful.
>
>
> ---------------------------------------------------------------------
> This transmission (including any attachments) may contain confidential
> information, privileged material (including material protected by the
> solicitor-client or other applicable privileges), or constitute non-public
> information. Any use of this information by anyone other than the intended
> recipient is prohibited. If you have received this transmission in error,
> please immediately reply to the sender and delete this information from
> your system. Use, dissemination, distribution, or reproduction of this
> transmission by unintended recipients is not authorized and may be unlawful.
>

Reply via email to