Thanks for the thoughts Siyuan. Yes agree that the problem is inherently a batch oriented problem. We are hoping to build upon the window concepts to simulate a batch design. ( Primary reason is that we do not want two different ETL processing pipeline platforms within our eco system ).
We are using kafka as the source of data over which multiple data processing frameworks ( ETL, M/L frameworks etc) run through. Hence Kafka is being used both for streaming (primarily ETL - Apex system ) and batch use cases ( primarily M/L ) . I shall create a ticket. Regards, Ananth On Sat, Jun 11, 2016 at 7:15 AM, [email protected] <[email protected]> wrote: > Hi Ananth, > Unlike files, Kafka is usually for streaming cases. Correct me if I'm > wrong, your use case seems like a batch processing. We didn't consider end > offset in our Kafka input operator design. But it could be a useful > feature. Unfortunately there is no easy way, as of I know, to extend > existing operator to achieve that. > > OffsetManager is not designed for end offset. It's only > a customizable callback to update the committed offsets. And the start > offsets it loads are supposed for stateful application restart. > > Can you create a ticket and elaborate your use case there? Thanks! > > Regards, > Siyuan > > > > > > On Friday, June 10, 2016, Ananth Gundabattula <[email protected]> > wrote: > >> Hello All, >> >> I was wondering what would be the community's thoughts on the following ? >> >> We are using kafka 0.9 input operator to read from a few topics. We are >> using this stream to generate a parquet file. Now this approach is all good >> for a beginners use case. At a later point in time, we would like to >> "merge" all of the parquet files previously generated and for this I would >> like to reprocess data exactly from a particular offset inside each of the >> partitions. Each of the partitions will have their own starting and ending >> offsets that I need to process for. >> >> I was wondering if there is an easy way to extend the Kafka 0.9 operator >> ( perhaps along the lines of the offset manager in the 0.8 versions of the >> kafka operator ) . Thoughts please ? >> >> Regards, >> Ananth >> >
