I don't really know enough about the low level details to know which replication I was referring to...
Let me ask the higher level question: 1. Am I right in thinking that after you insert a large number of rows, the performance of the cluster (and maybe of those rows in particular) will be initially slow while some stuff is still happening at a lower level in the background? 2. If so, how do you tell when that stuff has finished, and when your query performance will reach a steady state? James On 29 July 2016 12:05:30 a.m. James Taylor <[email protected]> wrote: That's a good point, Mujtaba. Not sure which replication he meant either. On Thu, Jul 28, 2016 at 4:02 PM, Mujtaba Chohan <[email protected]<mailto:[email protected]>> wrote: Oh sorry I thought OP was referring to HDFS level replication. On Thu, Jul 28, 2016 at 3:48 PM, James Taylor <[email protected]<mailto:[email protected]>> wrote: I believe you can also measure the depth of the replication queue to know what's pending. HBase replication is asynchronous, so you're right that Phoenix would return while replication may still be occurring. On Thu, Jul 28, 2016 at 12:06 PM, Mujtaba Chohan <[email protected]<mailto:[email protected]>> wrote: Query running first time would be slower since data is not in HBase cache rather than things being not settled. Replication shouldn't be putting load on cluster which you can check by turning replication off. On HBase side to force things to be optimal before running perf queries is to do a major compaction and wait for compaction to complete. - mujtaba On Thu, Jul 28, 2016 at 8:09 AM, Heather, James (ELS) <[email protected]<mailto:[email protected]>> wrote: If you upsert lots of rows into a table, presumably Phoenix will return as soon as HBase has received the data, but before the data has been replicated? Is there a way to tell when everything has "settled", i.e., when everything has finished replicating or whatever it needs to do? The reason I ask is that this might affect our benchmarking. If we add lots of rows, and then run some sample queries straight away, they might return more slowly initially, if the replication is still taking place. (Does this make sense? I'm not completely clear on how HBase replication works anyway.) James ________________________________ Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales. ________________________________ Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales.
