Yes, in the end the requests to HBase are the bottle neck and the latency will
manifest in different places of the job depending on where there is a queue. If
there is a queue between map and flatMap elements will sit there and wait and
you’ll see latency there. If map and flatMap are chained yo
Few last doubts :
1. So If I increase parallelism latency will decrease because load will get
distributed ?
2. But if load will increase latency will also increase if parallelism is
more ?
3. Let's say If I remove partitioner , and Hbase Op is still there in Flat
map . Then also this latency woul
Yes, this is exactly right!
> On 29. Jun 2017, at 17:42, sohimankotia wrote:
>
> So , it means when elements leave
>
> map => sit in buffer (due to partitioner) => enter flatmap
>
> Since Hbase op in flat map are taking time lets say 1 sec per operation ,
> next element will not be read from
So , it means when elements leave
map => sit in buffer (due to partitioner) => enter flatmap
Since Hbase op in flat map are taking time lets say 1 sec per operation ,
next element will not be read from buffer until HBase Op is done.
Due to this Hbase op , time to enter to flat map from map wil
Even if the request time to HBase is just a couple of milliseconds this will
add up and the elements sitting in the buffer between the map and flatMap will
have high perceived latency, yes.
> On 28. Jun 2017, at 16:54, sohimankotia wrote:
>
> I had same concern regarding HBase . So I also add
I had same concern regarding HBase . So I also added metric to measure Hbase
op time in flatmap (Basically complete flatmap op).
>From metrics I see that aprox 96 % time op time was under 1 sec. (Still I
can do a dummy run without HBase op . But did these timing make sense?)
--
View this messa
I see, what I consider highly likely here is that the lookup to HBase is the
bottleneck. If the lookup takes to long events “sit in a queue” between the map
and flatMap operations. If you replace the HBase lookup by some dummy code you
should see the latency go away.
The reason you don’t see la
Source is KafKa .
FlatMap has HBase Lookup
Sink is Kafka .
I tried to get stats over the days . I see that almost 40 % were having
latency of 0 seconds , 10 % 0-30 sec, approx 10% 30-60 sec and 10 % around
60 - 120 sec and 30 % around 120 - 210 secs .
--
View this message in context:
http://ap
I think then there is something going wrong somewhere. Usually people get
millisecond latencies even when they have a “keyBy” or shuffle in-between
operations (which are not different to a custom partitioner at the system
level).
What kind of sources/sinks is your program using?
Best,
Aljoscha
So In following execution flow :
source -> map -> partitioner -> flatmap -> sink
I am attaching current time to tuple while emitting from map function , and
then extracting that timestamp value from tuple in flatmap at a very first
step . Then I am calculating difference between time attached w
Hi,
What do you mean by latency and how are you measuring this in your job?
Best,
Aljoscha
> On 22. Jun 2017, at 14:23, sohimankotia wrote:
>
> Hi Chesnay,
>
> I have data categorized on some attribute(Key in partition ) which will be
> having n possible values. As of now job is enabled for
Hi Chesnay,
I have data categorized on some attribute(Key in partition ) which will be
having n possible values. As of now job is enabled for only one value of
that attribute . In couple of days we will enable all values of attribute
with more parallelism so each attribute's type data get process
So let's get the obvious question out of the way:
Why are you adding a partitioner when your parallelism is 1?
On 22.06.2017 11:58, sohimankotia wrote:
I have a execution flow (Streaming Job) with parallelism 1.
source -> map -> partitioner -> flatmap -> sink
Since adding partitioner will s
13 matches
Mail list logo