Re: How to tell when an insertion has "finished"

Heather, James (ELS) Thu, 28 Jul 2016 22:59:46 -0700

I don't really know enough about the low level details to know which 
replication I was referring to...


Let me ask the higher level question:

1. Am I right in thinking that after you insert a large number of rows, the 
performance of the cluster (and maybe of those rows in particular) will be 
initially slow while some stuff is still happening at a lower level in the 
background?

2. If so, how do you tell when that stuff has finished, and when your query 
performance will reach a steady state?

James

On 29 July 2016 12:05:30 a.m. James Taylor <[email protected]> wrote:

That's a good point, Mujtaba. Not sure which replication he meant either.

On Thu, Jul 28, 2016 at 4:02 PM, Mujtaba Chohan 
<[email protected]<mailto:[email protected]>> wrote:
Oh sorry I thought OP was referring to HDFS level replication.

On Thu, Jul 28, 2016 at 3:48 PM, James Taylor 
<[email protected]<mailto:[email protected]>> wrote:
I believe you can also measure the depth of the replication queue to know 
what's pending. HBase replication is asynchronous, so you're right that Phoenix 
would return while replication may still be occurring.

On Thu, Jul 28, 2016 at 12:06 PM, Mujtaba Chohan 
<[email protected]<mailto:[email protected]>> wrote:
Query running first time would be slower since data is not in HBase cache 
rather than things being not settled. Replication shouldn't be putting load on 
cluster which you can check by turning replication off. On HBase side to force 
things to be optimal before running perf queries is to do a major compaction 
and wait for compaction to complete.

- mujtaba

On Thu, Jul 28, 2016 at 8:09 AM, Heather, James (ELS) 
<[email protected]<mailto:[email protected]>> wrote:

If you upsert lots of rows into a table, presumably Phoenix will return as soon 
as HBase has received the data, but before the data has been replicated?


Is there a way to tell when everything has "settled", i.e., when everything has 
finished replicating or whatever it needs to do?


The reason I ask is that this might affect our benchmarking. If we add lots of 
rows, and then run some sample queries straight away, they might return more 
slowly initially, if the replication is still taking place.


(Does this make sense? I'm not completely clear on how HBase replication works 
anyway.)


James

________________________________

Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, 
Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in 
England and Wales.





________________________________

Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, 
Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in 
England and Wales.

Re: How to tell when an insertion has "finished"

Reply via email to