Thanks a lot for the answers Thomas. The blog entry really helps. Regarding the comment about offset management initialization, we are using APPLICATION_OR_EARLIEST. I was interpreting APPLICATION in this configuration as the "ApplicationID" ( and hence my question regarding originalAppId parameter ) and the behavior for the Kafka 0.9 operator seems to be interpreting it as the application name. I shall wait for Siyuan's thoughts on this.
Regards, Ananth On Wed, Jun 15, 2016 at 7:57 AM, Thomas Weise <[email protected]> wrote: > You can also have a look at this blog and linked example that specifically > covers exactly-once with input from Kafka: > > https://www.datatorrent.com/blog/end-to-end-exactly-once-with-apache-apex/ > > > On Tue, Jun 14, 2016 at 2:47 PM, Thomas Weise <[email protected]> > wrote: > >> See response below: >> >> On Tue, Jun 14, 2016 at 12:41 PM, Ananth Gundabattula < >> [email protected]> wrote: >> >>> Hello Siyuan/All, >>> >>> I have a couple of questions regarding the Kafka 0.9 operator. Could you >>> please help me in understanding this operator a bit better? >>> >>> >>> - As stated in >>> http://www.slideshare.net/ApacheApex/apache-apex-kafka-input-operator >>> , kafka 0.9 operator stores it "check-pointed offsets" in Kafka itself >>> using the App name ? It sounds like -originalAppID is not used by this >>> operator at all - In other words, I cant force an app to process starting >>> from the beginning until I change the App name if the App is based on the >>> Kafka 0.9 operator as the input operator >>> >>> The start offset configuration option should determine where the >> operator starts consuming on cold start (earliest, latest, last consumed). >> If that's not the case then it would be a bug. Siyuan, please comment. >> >>> >>> - >>> - How does the kafka 0.9 operator handle downstream operators >>> failure ? By this I mean, an Apex downstream operator fails, and is >>> brought >>> back up by STRAM. However this operator was significantly lagging behind >>> the current window of the kafka 0.9 operator window. Does the buffer >>> server >>> within the Kafka 0.9 operator buffer many windows to handle this >>> situation >>> ? ( and hence replays accordingly ? ) . I ask this to fine tune the >>> buffer >>> memory property. >>> >>> The upstream buffer server will hold the data until processed by the >> downstream operator. The buffer server, by default, will start to spool to >> disk when the allocated memory is used up. Back pressure will cause the >> consumer to slow down accordingly. >> >>> >>> - Is EXACTLY_ONCE processing supported in this operator ? if yes, is >>> it fair to assume that HDFS would be used to manage this type of >>> configuration ? >>> >>> Yes, when you enable idempotency on the operator, exactly once >> processing semantics in downstream operators are supported (affects those >> that write to external systems). To enable this you can configure to use >> the window data manager that writes to HDFS, essentially it will keep track >> of the consumer offsets for each window. >> >>> >>> - >>> - Is EXACTLY_ONCE based off the streaming window or the Application >>> Window in Apex ? >>> >>> The operator only sees the "application window". Make sure to align the >> checkpoint window interval. >> >> For more information about the Kafka input operator, please see: >> http://www.slideshare.net/ApacheApex/apache-apex-kafka-input-operator >> >> >> >> >
