Your assumption is correct, as duplicates in a failure scenario will occur.
Thanks, Rufus On Tue, Sep 8, 2015 at 4:10 AM, Aljoscha Krettek <[email protected]> wrote: > Hi, > as I understand it the HDFS sink uses the transaction system to verify > that all the elements in a transaction are written. This is what I would > call at-least-once semantics. > > My question is now what happens if the writing fails in the middle of > writing the elements in the transaction. When the transaction is retried > some of the elements might be written again, i.e. the output contains > duplicates. Is this assumption correct or is there something in place that > prevents this from happening? > > Thanks for your time, > Aljoscha >
