[jira] [Commented] (KAFKA-7022) Setting a very small segment.bytes can cause ReplicaFetcherThread to crash and in turn high number of under-replicated partitions

ASF GitHub Bot (JIRA) Fri, 08 Jun 2018 13:20:08 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-7022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506528#comment-16506528
 ]


ASF GitHub Bot commented on KAFKA-7022:
---------------------------------------

rajinisivaram opened a new pull request #5167: KAFKA-7022: Validate that 
log.segment.bytes is big enough for one batch
URL: https://github.com/apache/kafka/pull/5167
 
 
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Setting a very small segment.bytes can cause ReplicaFetcherThread to crash 
> and in turn high number of under-replicated partitions
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-7022
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7022
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.0.1
>            Reporter: Koelli Mungee
>            Assignee: Rajini Sivaram
>            Priority: Major
>
> The topic configuration segment.bytes was changed to 14 using the alter 
> command. This resulted in ReplicaFetcher threads dying with the following 
> exception:
> {code:java}
> [2018-06-07 21:02:15,669] ERROR [ReplicaFetcher replicaId=7, leaderId=9, 
> fetcherId=0] Error due to (kafka.server.ReplicaFetcherThread)
> kafka.common.KafkaException: Error processing data for partition test-11 
> offset 2362
>         at 
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:204)
>         at 
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:169)
>         at scala.Option.foreach(Option.scala:257)
>         at 
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:169)
>         at 
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:166)
>         at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>         at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>         at 
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:166)
>         at 
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:166)
>         at 
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:166)
>         at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:250)
>         at 
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:164)
>         at 
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:111)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
> Caused by: kafka.common.KafkaException: Trying to roll a new log segment for 
> topic partition test-11 with start offset 2362 while it already exists.
>         at kafka.log.Log$$anonfun$roll$2.apply(Log.scala:1349)
>         at kafka.log.Log$$anonfun$roll$2.apply(Log.scala:1316)
>         at kafka.log.Log.maybeHandleIOException(Log.scala:1678)
>         at kafka.log.Log.roll(Log.scala:1316)
>         at kafka.log.Log.kafka$log$Log$$maybeRoll(Log.scala:1303)
>         at kafka.log.Log$$anonfun$append$2.apply(Log.scala:726)
>         at kafka.log.Log$$anonfun$append$2.apply(Log.scala:640)
>         at kafka.log.Log.maybeHandleIOException(Log.scala:1678)
>         at kafka.log.Log.append(Log.scala:640)
>         at kafka.log.Log.appendAsFollower(Log.scala:623)
>         at 
> kafka.cluster.Partition$$anonfun$appendRecordsToFollower$1.apply(Partition.scala:560)
>         at 
> kafka.cluster.Partition$$anonfun$appendRecordsToFollower$1.apply(Partition.scala:560)
>         at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:250)
>         at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:256)
>         at 
> kafka.cluster.Partition.appendRecordsToFollower(Partition.scala:559)
>         at 
> kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:112)
>         at 
> kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:43)
>         at 
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:183)
>         ... 13 more
> [2018-06-07 21:02:15,669] INFO [ReplicaFetcher replicaId=7, leaderId=9, 
> fetcherId=0] Stopped (kafka.server.ReplicaFetcherThread)
> {code}
> In order to fix the issue the topic configuration must be changed back to a 
> reasonable value and brokers which had ReplicaFetcher threads die need to be 
> restarted one at a time to recover the under-replicated partitions. 
> A value like 14 bytes is too small to store a message in the log segment. An 
> ls -al of the topic partition directory would look something like:
> {code:java}
> -rw-r--r--. 1 root root 10M Jun 7 21:53 00000000000000002362.index 
> -rw-r--r--. 1 root root 0 Jun 7 21:02 00000000000000002362.log 
> -rw-r--r--. 1 root root 10M Jun 7 21:53 00000000000000002362.timeindex 
> -rw-r--r--. 1 root root 4 Jun 7 21:53 leader-epoch-checkpoint
> {code}
> It would be good to add a check to prevent this configuration to be set to 
> such a small value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (KAFKA-7022) Setting a very small segment.bytes can cause ReplicaFetcherThread to crash and in turn high number of under-replicated partitions

Reply via email to