On 2019/06/27 15:39:29, Jan Kosecki <jan.koseck...@gmail.com> wrote:
> Hi,
>
> I have a hyperledger fabric cluster that uses a cluster of 3 kafka nodes +
> 3 zookeepers.
> Fabric doesn't support pruning yet so any change to recorded offsets is
> detected by it and it fails to reconnect to kafka cluster.
> Today morning all kafka nodes have gone offline and then slowly restarted.
> After that fabric doesn't work anymore. After checking the logs I've learnt
> that fabric has recorded last offset of 13356
>
> 2019-06-27 10:34:51.325 UTC [orderer/consensus/kafka] newChain -> INFO 144
> [channel: audit] Starting chain with last persisted offset 13356 and last
> recorded block 2881
>
> ,however when I compared it with kafka logs of the first node that started
> (kafka-0), I've found out that, although initially highwater mark was
> around this value, for some reason, during reboot, the node truncated the
> topic and I've lost all messages.
>
> The topic audit has 1 partition and replication factor of 3.
> I've added logs from kafka-0 that contain any reference to audit topic.
>
> Any suggestions, why the topic's offset has been truncated to 1?
>
> Thanks in advance,
> Jan
>
Failed to attach logs.
Fixing it here:
[2019-06-27 08:47:06,967] WARN [Log partition=audit-0,
dir=/var/lib/kafka/data/topics] Found a corrupted index file corresponding to
log file /var/lib/kafka/data/topics/audit-0/00000000000000012225.log due to
Corrupt time index found, time index file
(/var/lib/kafka/data/topics/audit-0/00000000000000012225.timeindex) has
non-zero size but the last timestamp is 0 which is less than the first
timestamp 1561534356375}, recovering segment and rebuilding index files...
(kafka.log.Log)
[2019-06-27 08:47:06,967] INFO [Log partition=audit-0,
dir=/var/lib/kafka/data/topics] Loading producer state till offset 12225 with
message format version 2 (kafka.log.Log)
[2019-06-27 08:47:06,967] INFO [ProducerStateManager partition=audit-0] Loading
producer state from snapshot file
'/var/lib/kafka/data/topics/audit-0/00000000000000012225.snapshot'
(kafka.log.ProducerStateManager)
[2019-06-27 08:47:06,997] INFO [ProducerStateManager partition=audit-0] Writing
producer snapshot at offset 13358 (kafka.log.ProducerStateManager)
[2019-06-27 08:47:06,997] INFO [Log partition=audit-0,
dir=/var/lib/kafka/data/topics] Recovering unflushed segment 12225
(kafka.log.Log)
[2019-06-27 08:47:06,997] INFO [Log partition=audit-0,
dir=/var/lib/kafka/data/topics] Loading producer state till offset 12225 with
message format version 2 (kafka.log.Log)
[2019-06-27 08:47:07,001] INFO [ProducerStateManager partition=audit-0] Loading
producer state from snapshot file
'/var/lib/kafka/data/topics/audit-0/00000000000000012225.snapshot'
(kafka.log.ProducerStateManager)
[2019-06-27 08:47:07,016] INFO [ProducerStateManager partition=audit-0] Writing
producer snapshot at offset 13358 (kafka.log.ProducerStateManager)
[2019-06-27 08:47:07,021] INFO [Log partition=audit-0,
dir=/var/lib/kafka/data/topics] Loading producer state till offset 13358 with
message format version 2 (kafka.log.Log)
[2019-06-27 08:47:07,022] INFO [ProducerStateManager partition=audit-0] Loading
producer state from snapshot file
'/var/lib/kafka/data/topics/audit-0/00000000000000013358.snapshot'
(kafka.log.ProducerStateManager)
[2019-06-27 08:47:07,022] INFO [Log partition=audit-0,
dir=/var/lib/kafka/data/topics] Completed load of log with 2 segments, log
start offset 0 and log end offset 13358 in 61 ms (kafka.log.Log)
[2019-06-27 08:49:19,360] INFO [Controller id=0] Current list of topics in the
cluster: Set(audit) (kafka.controller.KafkaController)
[2019-06-27 08:49:19,382] INFO [Controller id=0] List of topics ineligible for
deletion: audit(kafka.controller.KafkaController)
[2019-06-27 08:49:20,209] INFO [ReplicaStateMachine controllerId=0] Started
replica state machine with initial state ->
Map([Topic=audit,Partition=0,Replica=0] -> OnlineReplica,
[Topic=audit,Partition=0,Replica=2] -> OnlineReplica,
[Topic=audit,Partition=0,Replica=1] -> ReplicaDeletionIneligible)
(kafka.controller.ReplicaStateMachine)
[2019-06-27 08:49:22,943] INFO [PartitionStateMachine controllerId=0] Started
partition state machine with initial state -> Map(audit-0 -> OnlinePartition)
(kafka.controller.PartitionStateMachine)
[2019-06-27 08:49:26,337] INFO Replica loaded for partition audit-0 with
initial high watermark 0 (kafka.cluster.Replica)
[2019-06-27 08:49:26,337] INFO Replica loaded for partition audit-0 with
initial high watermark 0 (kafka.cluster.Replica)
[2019-06-27 08:49:26,349] INFO Replica loaded for partition audit-0 with
initial high watermark 13358 (kafka.cluster.Replica)
[2019-06-27 08:49:27,504] INFO [ReplicaFetcherManager on broker 0] Removed
fetcher for partitions Set(audit-0) (kafka.server.ReplicaFetcherManager)
[2019-06-27 08:49:27,715] INFO [ReplicaFetcherManager on broker 0] Added
fetcher to broker BrokerEndPoint(id=2,
host=kafka-2.infrastructure.svc.cluster.local:9092) for partitions Map(audit-0
-> (offset=13358, leaderEpoch=39)) (kafka.server.ReplicaFetcherManager)
[2019-06-27 08:49:29,156] INFO The cleaning for partition audit-0 is aborted
and paused (kafka.log.LogCleaner)
[2019-06-27 08:49:29,156] INFO [Log partition=audit-0,
dir=/var/lib/kafka/data/topics] Truncating to offset 1 (kafka.log.Log)
[2019-06-27 08:49:29,156] INFO [Log partition=audit-0,
dir=/var/lib/kafka/data/topics] Scheduling log segment [baseOffset 12225, size
3153148] for deletion. (kafka.log.Log)
[2019-06-27 08:49:29,161] INFO [Log partition=audit-0,
dir=/var/lib/kafka/data/topics] Loading producer state till offset 1 with
message format version 2 (kafka.log.Log)
[2019-06-27 08:49:29,162] INFO [ProducerStateManager partition=audit-0] Writing
producer snapshot at offset 1 (kafka.log.ProducerStateManager)
[2019-06-27 08:49:29,168] INFO Compaction for partition audit-0 is resumed
(kafka.log.LogManager)
[2019-06-27 08:49:29,322] WARN [ReplicaFetcher replicaId=0, leaderId=2,
fetcherId=0] Truncating audit-0 to offset 1 below high watermark 13358
(kafka.server.ReplicaFetcherThread)
[2019-06-27 08:49:34,977] INFO [Log partition=audit-0,
dir=/var/lib/kafka/data/topics] Rolled new log segment at offset 1 in 5 ms.
(kafka.log.Log)
[2019-06-27 08:49:38,253] INFO [GroupCoordinator 0]: Loading group metadata for
audits with generation 85 (kafka.coordinator.group.GroupCoordinator)
[2019-06-27 08:50:29,157] INFO [Log partition=audit-0,
dir=/var/lib/kafka/data/topics] Deleting segment 12225 (kafka.log.Log)
[2019-06-27 08:50:29,159] INFO Deleted log
/var/lib/kafka/data/topics/audit-0/00000000000000012225.log.deleted.
(kafka.log.LogSegment)
[2019-06-27 08:50:29,159] INFO Deleted offset index
/var/lib/kafka/data/topics/audit-0/00000000000000012225.index.deleted.
(kafka.log.LogSegment)
[2019-06-27 08:50:29,159] INFO Deleted time index
/var/lib/kafka/data/topics/audit-0/00000000000000012225.timeindex.deleted.
(kafka.log.LogSegment)
[2019-06-27 09:00:00,951] INFO [Admin Manager on Broker 0]: Error processing
create topic request for topic audit with arguments (numPartitions=1,
replicationFactor=3, replicasAssignments={}, configs={})
(kafka.server.AdminManager)