On 2019/06/27 15:39:29, Jan Kosecki <jan.koseck...@gmail.com> wrote: 
> Hi,
> 
> I have a hyperledger fabric cluster that uses a cluster of 3 kafka nodes +
> 3 zookeepers.
> Fabric doesn't support pruning yet so any change to recorded offsets is
> detected by it and it fails to reconnect to kafka cluster.
> Today morning all kafka nodes have gone offline and then slowly restarted.
> After that fabric doesn't work anymore. After checking the logs I've learnt
> that fabric has recorded last offset of 13356
> 
> 2019-06-27 10:34:51.325 UTC [orderer/consensus/kafka] newChain -> INFO 144
> [channel: audit] Starting chain with last persisted offset 13356 and last
> recorded block 2881
> 
> ,however when I compared it with kafka logs of the first node that started
> (kafka-0), I've found out that, although initially highwater mark was
> around this value, for some reason, during reboot, the node truncated the
> topic and I've lost all messages.
> 
> The topic audit has 1 partition and replication factor of 3.
> I've added logs from kafka-0 that contain any reference to audit topic.
> 
> Any suggestions, why the topic's offset has been truncated to 1?
> 
> Thanks in advance,
> Jan
> 

Failed to attach logs.
Fixing it here:

[2019-06-27 08:47:06,967] WARN [Log partition=audit-0, 
dir=/var/lib/kafka/data/topics] Found a corrupted index file corresponding to 
log file /var/lib/kafka/data/topics/audit-0/00000000000000012225.log due to 
Corrupt time index found, time index file 
(/var/lib/kafka/data/topics/audit-0/00000000000000012225.timeindex) has 
non-zero size but the last timestamp is 0 which is less than the first 
timestamp 1561534356375}, recovering segment and rebuilding index files... 
(kafka.log.Log)
[2019-06-27 08:47:06,967] INFO [Log partition=audit-0, 
dir=/var/lib/kafka/data/topics] Loading producer state till offset 12225 with 
message format version 2 (kafka.log.Log)
[2019-06-27 08:47:06,967] INFO [ProducerStateManager partition=audit-0] Loading 
producer state from snapshot file 
'/var/lib/kafka/data/topics/audit-0/00000000000000012225.snapshot' 
(kafka.log.ProducerStateManager)
[2019-06-27 08:47:06,997] INFO [ProducerStateManager partition=audit-0] Writing 
producer snapshot at offset 13358 (kafka.log.ProducerStateManager)
[2019-06-27 08:47:06,997] INFO [Log partition=audit-0, 
dir=/var/lib/kafka/data/topics] Recovering unflushed segment 12225 
(kafka.log.Log)
[2019-06-27 08:47:06,997] INFO [Log partition=audit-0, 
dir=/var/lib/kafka/data/topics] Loading producer state till offset 12225 with 
message format version 2 (kafka.log.Log)
[2019-06-27 08:47:07,001] INFO [ProducerStateManager partition=audit-0] Loading 
producer state from snapshot file 
'/var/lib/kafka/data/topics/audit-0/00000000000000012225.snapshot' 
(kafka.log.ProducerStateManager)
[2019-06-27 08:47:07,016] INFO [ProducerStateManager partition=audit-0] Writing 
producer snapshot at offset 13358 (kafka.log.ProducerStateManager)
[2019-06-27 08:47:07,021] INFO [Log partition=audit-0, 
dir=/var/lib/kafka/data/topics] Loading producer state till offset 13358 with 
message format version 2 (kafka.log.Log)
[2019-06-27 08:47:07,022] INFO [ProducerStateManager partition=audit-0] Loading 
producer state from snapshot file 
'/var/lib/kafka/data/topics/audit-0/00000000000000013358.snapshot' 
(kafka.log.ProducerStateManager)
[2019-06-27 08:47:07,022] INFO [Log partition=audit-0, 
dir=/var/lib/kafka/data/topics] Completed load of log with 2 segments, log 
start offset 0 and log end offset 13358 in 61 ms (kafka.log.Log)
[2019-06-27 08:49:19,360] INFO [Controller id=0] Current list of topics in the 
cluster: Set(audit) (kafka.controller.KafkaController)
[2019-06-27 08:49:19,382] INFO [Controller id=0] List of topics ineligible for 
deletion: audit(kafka.controller.KafkaController)
[2019-06-27 08:49:20,209] INFO [ReplicaStateMachine controllerId=0] Started 
replica state machine with initial state -> 
Map([Topic=audit,Partition=0,Replica=0] -> OnlineReplica, 
[Topic=audit,Partition=0,Replica=2] -> OnlineReplica, 
[Topic=audit,Partition=0,Replica=1] -> ReplicaDeletionIneligible) 
(kafka.controller.ReplicaStateMachine)
[2019-06-27 08:49:22,943] INFO [PartitionStateMachine controllerId=0] Started 
partition state machine with initial state -> Map(audit-0 -> OnlinePartition) 
(kafka.controller.PartitionStateMachine)
[2019-06-27 08:49:26,337] INFO Replica loaded for partition audit-0 with 
initial high watermark 0 (kafka.cluster.Replica)
[2019-06-27 08:49:26,337] INFO Replica loaded for partition audit-0 with 
initial high watermark 0 (kafka.cluster.Replica)
[2019-06-27 08:49:26,349] INFO Replica loaded for partition audit-0 with 
initial high watermark 13358 (kafka.cluster.Replica)
[2019-06-27 08:49:27,504] INFO [ReplicaFetcherManager on broker 0] Removed 
fetcher for partitions Set(audit-0) (kafka.server.ReplicaFetcherManager)
[2019-06-27 08:49:27,715] INFO [ReplicaFetcherManager on broker 0] Added 
fetcher to broker BrokerEndPoint(id=2, 
host=kafka-2.infrastructure.svc.cluster.local:9092) for partitions Map(audit-0 
-> (offset=13358, leaderEpoch=39)) (kafka.server.ReplicaFetcherManager)
[2019-06-27 08:49:29,156] INFO The cleaning for partition audit-0 is aborted 
and paused (kafka.log.LogCleaner)
[2019-06-27 08:49:29,156] INFO [Log partition=audit-0, 
dir=/var/lib/kafka/data/topics] Truncating to offset 1 (kafka.log.Log)
[2019-06-27 08:49:29,156] INFO [Log partition=audit-0, 
dir=/var/lib/kafka/data/topics] Scheduling log segment [baseOffset 12225, size 
3153148] for deletion. (kafka.log.Log)
[2019-06-27 08:49:29,161] INFO [Log partition=audit-0, 
dir=/var/lib/kafka/data/topics] Loading producer state till offset 1 with 
message format version 2 (kafka.log.Log)
[2019-06-27 08:49:29,162] INFO [ProducerStateManager partition=audit-0] Writing 
producer snapshot at offset 1 (kafka.log.ProducerStateManager)
[2019-06-27 08:49:29,168] INFO Compaction for partition audit-0 is resumed 
(kafka.log.LogManager)
[2019-06-27 08:49:29,322] WARN [ReplicaFetcher replicaId=0, leaderId=2, 
fetcherId=0] Truncating audit-0 to offset 1 below high watermark 13358 
(kafka.server.ReplicaFetcherThread)
[2019-06-27 08:49:34,977] INFO [Log partition=audit-0, 
dir=/var/lib/kafka/data/topics] Rolled new log segment at offset 1 in 5 ms. 
(kafka.log.Log)
[2019-06-27 08:49:38,253] INFO [GroupCoordinator 0]: Loading group metadata for 
audits with generation 85 (kafka.coordinator.group.GroupCoordinator)
[2019-06-27 08:50:29,157] INFO [Log partition=audit-0, 
dir=/var/lib/kafka/data/topics] Deleting segment 12225 (kafka.log.Log)
[2019-06-27 08:50:29,159] INFO Deleted log 
/var/lib/kafka/data/topics/audit-0/00000000000000012225.log.deleted. 
(kafka.log.LogSegment)
[2019-06-27 08:50:29,159] INFO Deleted offset index 
/var/lib/kafka/data/topics/audit-0/00000000000000012225.index.deleted. 
(kafka.log.LogSegment)
[2019-06-27 08:50:29,159] INFO Deleted time index 
/var/lib/kafka/data/topics/audit-0/00000000000000012225.timeindex.deleted. 
(kafka.log.LogSegment)
[2019-06-27 09:00:00,951] INFO [Admin Manager on Broker 0]: Error processing 
create topic request for topic audit with arguments (numPartitions=1, 
replicationFactor=3, replicasAssignments={}, configs={}) 
(kafka.server.AdminManager)

Reply via email to