Kafka Hdfs Connect Flush Size

chinchu chinchu Wed, 23 Jan 2019 13:57:57 -0800

Hey folks,
I have been going through the hdfs connector code . I have  a one question.


Is the flush size  in connector config the number of records read from a
kafka partition or the number of records written to an hdfs  path?.
Looks like  the  recordCounter in TopicPartitionWriter is incremented for
every record received from a kafka partition.
In this case  how does  this connector handles records  from the same kafka
partition  but going on to  two different hdfs  paths if flush size is at
 hdfs file level.
After looking through the code  and running TopicPartitionWriter test cases
I  think that the flush size is  the number of  records written to  hdfs
 from a  kafka partition . I ran the test case
testWriteRecordFieldPartitioner() in TopicPartitionWriterTest and saw the
same. Can some one clarify if my understanding is right ?

https://docs.confluent.io/current/connect/kafka-connect-hdfs/configuration_options.html


flush.size
Number of records written to store before invoking file commits.

Type: int
Importance: high
Thanks,
Chinchu

Kafka Hdfs Connect Flush Size

Reply via email to