Thanks for the thoughts Siyuan.

Yes agree that the problem is inherently a batch oriented problem. We are
hoping to build upon the window concepts to simulate a batch design. (
Primary reason is that we do not want two different ETL processing pipeline
platforms within our eco system ).

We are using kafka as the source of data over which multiple data
processing frameworks ( ETL, M/L frameworks etc) run through. Hence Kafka
is being used  both for streaming (primarily ETL - Apex system ) and batch
use cases ( primarily M/L ) .

I shall create a ticket.

Regards,
Ananth



On Sat, Jun 11, 2016 at 7:15 AM, [email protected] <[email protected]> wrote:

> Hi Ananth,
> Unlike files, Kafka is usually for streaming cases. Correct me if I'm
> wrong, your use case seems like a batch processing. We didn't consider end
> offset in our Kafka input operator design. But it could be a useful
> feature. Unfortunately there is no easy way, as of I know, to extend
> existing operator to achieve that.
>
> OffsetManager is not designed for end offset. It's only
> a  customizable callback to update the committed offsets. And the start
> offsets it loads are supposed for stateful application restart.
>
> Can you create a ticket and elaborate your use case there? Thanks!
>
> Regards,
> Siyuan
>
>
>
>
>
> On Friday, June 10, 2016, Ananth Gundabattula <[email protected]>
> wrote:
>
>> Hello All,
>>
>> I was wondering what would be the community's thoughts on the following ?
>>
>> We are using kafka 0.9 input operator to read from a few topics. We are
>> using this stream to generate a parquet file. Now this approach is all good
>> for a beginners use case. At a later point in time, we would like to
>> "merge" all of the parquet files previously generated and for this I would
>> like to reprocess data exactly from a particular offset inside each of the
>> partitions. Each of the partitions will have their own starting and ending
>> offsets that I need to process for.
>>
>> I was wondering if there is an easy way to extend the Kafka 0.9 operator
>> ( perhaps along the lines of the offset manager in the 0.8 versions of the
>> kafka operator ) . Thoughts please ?
>>
>> Regards,
>> Ananth
>>
>

Reply via email to