Hi Ananth, Unlike files, Kafka is usually for streaming cases. Correct me if I'm wrong, your use case seems like a batch processing. We didn't consider end offset in our Kafka input operator design. But it could be a useful feature. Unfortunately there is no easy way, as of I know, to extend existing operator to achieve that.
OffsetManager is not designed for end offset. It's only a customizable callback to update the committed offsets. And the start offsets it loads are supposed for stateful application restart. Can you create a ticket and elaborate your use case there? Thanks! Regards, Siyuan On Friday, June 10, 2016, Ananth Gundabattula <agundabatt...@gmail.com> wrote: > Hello All, > > I was wondering what would be the community's thoughts on the following ? > > We are using kafka 0.9 input operator to read from a few topics. We are > using this stream to generate a parquet file. Now this approach is all good > for a beginners use case. At a later point in time, we would like to > "merge" all of the parquet files previously generated and for this I would > like to reprocess data exactly from a particular offset inside each of the > partitions. Each of the partitions will have their own starting and ending > offsets that I need to process for. > > I was wondering if there is an easy way to extend the Kafka 0.9 operator ( > perhaps along the lines of the offset manager in the 0.8 versions of the > kafka operator ) . Thoughts please ? > > Regards, > Ananth >