Re: Detecting "done" on a bounded input dataset

2015-10-14 Thread Yi Pan
Hi, Kishore, First I want some clarification on your use case. 1) Scenario 1: you still want the Samza jobs continuously running, while simply want to detect the end of a certain stream. On detection, do you need to unsubscribe from the stream? Or you are still OK receiving more messages from the

Detecting "done" on a bounded input dataset

2015-10-14 Thread Kishore N C
Hi, Our data processing pipeline consists of a set of Samza jobs, that form a DAG. Sometimes, we have to throw finite datasets into the Kafka topic that acts as the entry point to the pipeline. Given that different Samza jobs in the DAG could have varying latencies in terms of processing the

Re: Detecting "done" on a bounded input dataset

2015-10-14 Thread Kishore N C
Hi Yi, Detecting both scenarios will be useful. Scenario 1: To detect that the reprocessing job has "caught up" with the stream, as described by Jay Kreps here , so that we can make the app point to the new DB and tear