Re: Partitioner is spending around 2 to 4 minutes while pushing data to next operator

2017-06-30 Thread Aljoscha Krettek
Yes, in the end the requests to HBase are the bottle neck and the latency will manifest in different places of the job depending on where there is a queue. If there is a queue between map and flatMap elements will sit there and wait and you’ll see latency there. If map and flatMap are chained

Re: Partitioner is spending around 2 to 4 minutes while pushing data to next operator

2017-06-29 Thread sohimankotia
Few last doubts : 1. So If I increase parallelism latency will decrease because load will get distributed ? 2. But if load will increase latency will also increase if parallelism is more ? 3. Let's say If I remove partitioner , and Hbase Op is still there in Flat map . Then also this latency

Re: Partitioner is spending around 2 to 4 minutes while pushing data to next operator

2017-06-29 Thread Aljoscha Krettek
Yes, this is exactly right! > On 29. Jun 2017, at 17:42, sohimankotia wrote: > > So , it means when elements leave > > map => sit in buffer (due to partitioner) => enter flatmap > > Since Hbase op in flat map are taking time lets say 1 sec per operation , > next

Re: Partitioner is spending around 2 to 4 minutes while pushing data to next operator

2017-06-29 Thread sohimankotia
So , it means when elements leave map => sit in buffer (due to partitioner) => enter flatmap Since Hbase op in flat map are taking time lets say 1 sec per operation , next element will not be read from buffer until HBase Op is done. Due to this Hbase op , time to enter to flat map from map

Re: Partitioner is spending around 2 to 4 minutes while pushing data to next operator

2017-06-29 Thread Aljoscha Krettek
Even if the request time to HBase is just a couple of milliseconds this will add up and the elements sitting in the buffer between the map and flatMap will have high perceived latency, yes. > On 28. Jun 2017, at 16:54, sohimankotia wrote: > > I had same concern

Re: Partitioner is spending around 2 to 4 minutes while pushing data to next operator

2017-06-28 Thread sohimankotia
I had same concern regarding HBase . So I also added metric to measure Hbase op time in flatmap (Basically complete flatmap op). >From metrics I see that aprox 96 % time op time was under 1 sec. (Still I can do a dummy run without HBase op . But did these timing make sense?) -- View this

Re: Partitioner is spending around 2 to 4 minutes while pushing data to next operator

2017-06-28 Thread Aljoscha Krettek
I see, what I consider highly likely here is that the lookup to HBase is the bottleneck. If the lookup takes to long events “sit in a queue” between the map and flatMap operations. If you replace the HBase lookup by some dummy code you should see the latency go away. The reason you don’t see

Re: Partitioner is spending around 2 to 4 minutes while pushing data to next operator

2017-06-28 Thread sohimankotia
Source is KafKa . FlatMap has HBase Lookup Sink is Kafka . I tried to get stats over the days . I see that almost 40 % were having latency of 0 seconds , 10 % 0-30 sec, approx 10% 30-60 sec and 10 % around 60 - 120 sec and 30 % around 120 - 210 secs . -- View this message in context:

Re: Partitioner is spending around 2 to 4 minutes while pushing data to next operator

2017-06-28 Thread Aljoscha Krettek
I think then there is something going wrong somewhere. Usually people get millisecond latencies even when they have a “keyBy” or shuffle in-between operations (which are not different to a custom partitioner at the system level). What kind of sources/sinks is your program using? Best,

Re: Partitioner is spending around 2 to 4 minutes while pushing data to next operator

2017-06-27 Thread sohimankotia
So In following execution flow : source -> map -> partitioner -> flatmap -> sink I am attaching current time to tuple while emitting from map function , and then extracting that timestamp value from tuple in flatmap at a very first step . Then I am calculating difference between time attached

Re: Partitioner is spending around 2 to 4 minutes while pushing data to next operator

2017-06-27 Thread Aljoscha Krettek
Hi, What do you mean by latency and how are you measuring this in your job? Best, Aljoscha > On 22. Jun 2017, at 14:23, sohimankotia wrote: > > Hi Chesnay, > > I have data categorized on some attribute(Key in partition ) which will be > having n possible values. As

Re: Partitioner is spending around 2 to 4 minutes while pushing data to next operator

2017-06-22 Thread sohimankotia
Hi Chesnay, I have data categorized on some attribute(Key in partition ) which will be having n possible values. As of now job is enabled for only one value of that attribute . In couple of days we will enable all values of attribute with more parallelism so each attribute's type data get

Re: Partitioner is spending around 2 to 4 minutes while pushing data to next operator

2017-06-22 Thread Chesnay Schepler
So let's get the obvious question out of the way: Why are you adding a partitioner when your parallelism is 1? On 22.06.2017 11:58, sohimankotia wrote: I have a execution flow (Streaming Job) with parallelism 1. source -> map -> partitioner -> flatmap -> sink Since adding partitioner will

Partitioner is spending around 2 to 4 minutes while pushing data to next operator

2017-06-22 Thread sohimankotia
I have a execution flow (Streaming Job) with parallelism 1. source -> map -> partitioner -> flatmap -> sink Since adding partitioner will start new thread but partitioner is spending average of 2 to 4 minutes while moving data from map to flat map . For more details about this :