Raja, Please also confirm how you are using partitioning. If, for example, in your DAG you shuffle the data received from Kafka in a way that is different from the original partitioning, then it would be possible that multiple downstream partitions process data that came from a single Kafka partition concurrently and therefore in a different order.
Thomas On Tue, Jun 7, 2016 at 6:33 PM, Raja.Aravapalli <[email protected]> wrote: > > Yes Devendra. > > p1.10 is read before p1.1 !! > > Sure I shall check that. Thanks a lot for your response. > > > Regards, > Raja. > > From: Devendra Tagare <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Tuesday, June 7, 2016 at 7:59 PM > > To: "[email protected]" <[email protected]> > Subject: Re: kafka input is processing records in a jumbled order > > Hi Raja, > > Just to be clear are you suggesting that p1.10 is being read before p1.1 ? > > If thats the case can you use a console consumer that comes packed with > kafka and verify the ordering based on timestamps ? > > Thanks, > Dev > > > > On Tue, Jun 7, 2016 at 5:31 PM, Raja.Aravapalli < > [email protected]> wrote: > >> >> Thanks a lot Devendra Tagare for the response. >> >> What you said is very clear and understandable. But, wondering, I am NOT >> getting that partition level order!! My operator is processing the records >> in jumbled order rather than in sequence! >> And, I am saying this because, I am generating timestamps upon tuple >> receipt and emitting that timestamp to my destination, which is clearly >> showing the records are receiving to operator in a shuffled order. >> >> I get records at milli second level differences!! Will that be a problem ? >> >> >> Regards, >> Raja. >> >> From: Devendra Tagare <[email protected]> >> Reply-To: "[email protected]" <[email protected]> >> Date: Tuesday, June 7, 2016 at 7:12 PM >> >> To: "[email protected]" <[email protected]> >> Subject: Re: kafka input is processing records in a jumbled order >> >> Hi Raja, >> >> When you apply ONE_TO_MANY partitioning scheme, one instance of the >> operator consumes from many partitions of a kafka topic. >> >> When you look at the consumed data, all the events coming from a given >> partition would be ordered but there are no ordering guarantees across >> partitions since kafka does not guarantee that >> >> eg : If 3 partitions of a topic p1,p2,p3 having 10 messages each are >> connected to one physical partition of the KafkaInputOperator , then the >> ordering guarantee of p1.1 to p1.10 is honored.ie message 10 of p1 be >> consumed only after messages 1 through 9 are consumed but the operator >> could consumer messages in a order like p1.1,p2.1,p1.2,p1.3,p3.1,p2.2..... >> which still follows the guarantees per partition. >> >> Thanks, >> Dev >> >> On Tue, Jun 7, 2016 at 5:00 PM, Raja.Aravapalli < >> [email protected]> wrote: >> >>> >>> Thanks for the response Thomas. >>> >>> My quick doubt is.. >>> >>> I have around 30 partitions of kafka topic, And all of them have >>> messages ordered at partition level. >>> >>> So, when I consume those messages using single consumer[with ONE_TO_MANY >>> strategy set], still the ordering doesn’t work ? >>> >>> >>> My messages in topic are guaranteed to be ordered at partition level. >>> >>> Thanks a lot in advance for your response. >>> >>> >>> Regards, >>> Raja. >>> >>> From: Thomas Weise <[email protected]> >>> Reply-To: "[email protected]" <[email protected]> >>> Date: Tuesday, June 7, 2016 at 5:52 PM >>> To: "[email protected]" <[email protected]> >>> Subject: Re: kafka input is processing records in a jumbled order >>> >>> Raja, >>> >>> Are you expecting ordering across multiple Kafka partitions? >>> >>> All messages from a given Kafka partition are received by the same >>> consumer and thus will be ordered. However, when messages come from >>> multiple partitions there is no such guarantee. >>> >>> Thomas >>> >>> >>> On Tue, Jun 7, 2016 at 3:34 PM, Raja.Aravapalli < >>> [email protected]> wrote: >>> >>>> >>>> Hi >>>> >>>> I have built a DAG, that reads from kafka and in the next operators, >>>> does lookup to a hbase table and update hbase table based on some business >>>> logic. >>>> >>>> Some times my operator which does hbase lookup and update in the same >>>> operator(Custom written), is processing the records it receives from kafka >>>> in a jumbled order, which is causing, many records being ignored from >>>> processing!! >>>> >>>> I am not using any parallel partitions/instance, and with >>>> KafkaInputOperator I am using only partition strategy ONE_TO_MANY. >>>> >>>> I am very new to Apex. I expected, Apex will guarantee the ordering. >>>> >>>> Can someone pls share your knowledge on the issue…? >>>> >>>> >>>> Thanks a lot in advance… >>>> >>>> >>>> Regards, >>>> Raja. >>>> >>> >>> >> >
