GitHub user original-brownbear opened a pull request: https://github.com/apache/kafka/pull/2570
KAFKA-4196: Improved Test Stability by Disabling ZK Fsync and Fixed KafkaAPI Error Response This addresses https://issues.apache.org/jira/browse/KAFKA-4196 What I found was below warning accompanying all failures I was seeing from this test (reproduced instability by putting system under load): ```sh [2017-02-18 16:17:42,892] WARN fsync-ing the write ahead log in SyncThread:0 took 20632ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide (org.apache.zookeeper.server.persistence.FileTxnLog:338) ``` ZK at times keeps locking for multiple seconds in tests (not only this one, but it's very frequent in this one for some reason). In this case (20s) the ZK locking lasted longer than the test timeout waiting only 15s (`org.apache.kafka.test.TestUtils#DEFAULT_MAX_WAIT_MS`) for the path `/admin/delete_topic/topic` to be deleted. The only way to really fix this in a portable manner (should mainly hit ext3 users) is to turn off ZK fsyncing (not really needed in UTs anyways) as far as I know. Did that here as described in (https://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html) by setting ```scala sys.props.put("zookeeper.observer.syncEnabled", "false") ``` This should also help general test performance in my opinion. Also fixed (only ever observed this here) that the resulting error was not properly logged by the `KafkaApis` since no type param was given in the changed line ```scala error("Error when handling request %s".format(request.body), e) ``` that then threw: ```sh java.lang.ClassCastException: Expected request with type class scala.runtime.Nothing$, but found class org.apache.kafka.common.requests.UpdateMetadataRequest at kafka.network.RequestChannel$Request.body(RequestChannel.scala:118) at kafka.server.KafkaApis$$anonfun$handle$4.apply(KafkaApis.scala:120) at kafka.server.KafkaApis$$anonfun$handle$4.apply(KafkaApis.scala:120) at kafka.utils.Logging$class.error(Logging.scala:105) at kafka.server.KafkaApis.error(KafkaApis.scala:56) at kafka.server.KafkaApis.handle(KafkaApis.scala:120) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:64) at java.lang.Thread.run(Thread.java:745) ``` added the hint there and (without the fsync fix) got logged proper errors :) You can merge this pull request into a Git repository by running: $ git pull https://github.com/original-brownbear/kafka KAFKA-4196 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/2570.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2570 ---- commit 99dd9c62f63960bac35effd6a2514cd5ba61d66a Author: Armin Braun <m...@obrown.io> Date: 2017-02-18T16:24:31Z KAFKA-4196 Improved Test Stability by Disabling ZK Fsync and Fixed KafkaAPI Error Response ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---