Please ignore the same query from another email id of mine. I was getting 
failure notification while sending emails from other id but after few hours 
somehow they showed up. Sorry for spamming.

Thanks,
Ankur Jain

From: Ankur Jain <aj...@quadanalytix.com<mailto:aj...@quadanalytix.com>>
Reply-To: "user@phoenix.apache.org<mailto:user@phoenix.apache.org>" 
<user@phoenix.apache.org<mailto:user@phoenix.apache.org>>
Date: Monday, 28 March 2016 1:03 pm
To: "user@phoenix.apache.org<mailto:user@phoenix.apache.org>" 
<user@phoenix.apache.org<mailto:user@phoenix.apache.org>>
Subject: Slow metadata update queries during upsert

Hi

We are using phoenix as our transactional data store(though we are not yet 
using its latest transaction feature yet). Earlier we had our own custom query 
layer built on top of hbase that we are trying to replace.

During tests we found that inserts are very slow as compared to regular hbase 
puts. There is always 7-8ms of additional time associated with each upsert 
query. This time is taken mostly during validate phase, where the cache is 
updated with latest table metadata. Is there a way to avoid refresh of this 
cache always?

Out of 15ms for a general upsert query in our case 11ms are taken to just 
update metadata cache of that table. Rest 3ms are spent in actual hbase batch 
call and 1ms in all other phoenix processing.

We have two use cases,
1. Our table metadata is always static and we know we are not going to add any 
new columns at least on runtime.
    we would like to avoid any cost of this metadata update cost so that our 
inserts are faster. Is this possible with existing code base.

2. We add columns to our tables on the fly.
    Adding new columns on the fly is generally a rare event. Is there a control 
where we can explicitly invalidate cache, in case a column is updated and we 
are caching metadata infinitely.

Is metadata cache at connection level or is at global level? Because we are 
aways creating new connections.

I have also observed that CsvToKeyValueMapper is fast because it avoids 
connection.commit() step and do all the validations upfront to avoid update 
cache step during commit.

Just to add another analysis where Phoenix inserts are much slower that native 
hbase put is https://issues.apache.org/jira/browse/YARN-2928. 
TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf clearly states that. 
I believe this might be related.

Thanks,
Ankur Jain

Reply via email to