Re: [Transaction] About KafkaSource and HDFSEventSource Transaction GGuarantee

Tao Li Tue, 14 Apr 2015 09:54:00 -0700

OK, I know it, Thanks.

2015-04-15 0:50 GMT+08:00 Gwen Shapira <[email protected]>:


> Flume is at-least-once system. This means we will never lose data, but
> you may get duplicate events on errors.
> In the cases you pointed out - the events were written but we still
> BACKOFF, you will get duplicate events in the channel or in HDFS.
>
> You probably want to write a small script to de-duplicate the data in
> HDFS, like we do in this example:
>
> https://github.com/hadooparchitecturebook/clickstream-tutorial/blob/master/03_processing/01_dedup/pig/dedup.pig
>
> Gwen
>
> On Tue, Apr 14, 2015 at 9:17 AM, Tao Li <[email protected]> wrote:
> > Hi all：
> >
> > I have a question about "Transaction". For example, KafkaSource code like
> > this:
> >
> > try {
> >     getChannelProcessor().processEventBatch(eventList);
> >     consumer.commitOffsets();
> >     return Status.READY
> > } catch(Exception e) {
> >     return Status.BACKOFF;
> > }
> >
> > If processEventBatch() succeed, but commitOffsets() failed, will return
> > BACKOFF. But the eventList is already  write to channel.
> >
> > ----------------------------------
> >
> > Also for HDFSEventSink code like this:
> >
> > try {
> >     bucketWriter.append(event);
> >     bucketWriter.flush();
> >     transaction.commit();
> >     return Status.READY;
> > } catch(Exception e) {
> >     transaction.rollback();
> >     return Status.BACKOFF;
> > }
> >
> > If bucketWriter.flush() succeed, but transaction.commit() failed, will
> > transaction.rollback() and return BACKOFF. But the event is already
> flush to
> > HDFS.
> >
> >
> > 2015-04-15 0:09 GMT+08:00 Tao Li <[email protected]>:
> >>
> >> Hi all：
> >>
> >> I have a question about "Transaction". For example, KafkaSource code
> like
> >> this:
> >> try {
> >>     getChannelProcessor().processEventBatch(eventList);
> >>     consumer.commitOffsets();
> >>
> >> }
> >
> >
>

Re: [Transaction] About KafkaSource and HDFSEventSource Transaction GGuarantee

Reply via email to