Hi
We are using phoenix as our transactional data store(though we are not yet
using its latest transaction feature yet). Earlier we had our own custom query
layer built on top of hbase that we are trying to replace.
During tests we found that inserts are very slow as compared to regular hbase
puts. There is always 7-8ms of additional time associated with each upsert
query. This time is taken mostly during validate phase, where the cache is
updated with latest table metadata. Is there a way to avoid refresh of this
cache always?
Out of 15ms for a general upsert query in our case 11ms are taken to just
update metadata cache of that table. Rest 3ms are spent in actual hbase batch
call and 1ms in all other phoenix processing.
We have two use cases,
1. Our table metadata is always static and we know we are not going to add any
new columns at least on runtime.
we would like to avoid any cost of this metadata update cost so that our
inserts are faster. Is this possible with existing code base.
2. We add columns to our tables on the fly.
Adding new columns on the fly is generally a rare event. Is there a control
where we can explicitly invalidate cache, in case a column is updated and we
are caching metadata infinitely.
Is metadata cache at connection level or is at global level? Because we are
aways creating new connections.
I have also observed that CsvToKeyValueMapper is fast because it avoids
connection.commit() step and do all the validations upfront to avoid update
cache step during commit.
Just to add another analysis where Phoenix inserts are much slower that native
hbase put is https://issues.apache.org/jira/browse/YARN-2928.
TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf clearly states that.
I believe this might be related.
Thanks,
Ankur Jain