Hey folks, I have been going through the hdfs connector code . I have a one question.
Is the flush size in connector config the number of records read from a kafka partition or the number of records written to an hdfs path?. Looks like the recordCounter in TopicPartitionWriter is incremented for every record received from a kafka partition. In this case how does this connector handles records from the same kafka partition but going on to two different hdfs paths if flush size is at hdfs file level. After looking through the code and running TopicPartitionWriter test cases I think that the flush size is the number of records written to hdfs from a kafka partition . I ran the test case testWriteRecordFieldPartitioner() in TopicPartitionWriterTest and saw the same. Can some one clarify if my understanding is right ? https://docs.confluent.io/current/connect/kafka-connect-hdfs/configuration_options.html flush.size Number of records written to store before invoking file commits. Type: int Importance: high Thanks, Chinchu
