Re: Phoenix Performances & Uses Cases

Josh Elser Mon, 29 Oct 2018 09:29:54 -0700

On 10/29/18 11:39 AM, Nicolas Paris wrote:

Thanks Josh,


On Mon, Oct 29, 2018 at 10:47:42AM -0400, Josh Elser wrote:

Use Hive when Hive does things well, and use Phoenix when Phoenix does
it well.


That would be great. My concern is the phoenix "joins" do not compete
with postgresql in my actual tests.
Phoenix + hive is ok, however
Phoenix + hive + postgres is not.


Am I wrong with the bad performances of joins in the context of large
tables (> 10M) ?

I think trying to phrase "JOIN efficiency" in terms of data sets is thewrong way to go about an appropriate explanation.

There are limitations that Phoenix has which I would summarize as"things HBase can handle as push-downs" and "the lack of a distributedexecution engine".

For example, you found few-to-many joins worked well with Phoenix, butyou would find that (in most case) many-to-many joins will be slow. Thisis largely because of the constructs that HBase provides as a data storeand what Phoenix can "work with". When Phoenix can push down one side ofthe join, you get a fast, (often) parallelized scan from Phoenix. Whenboth sides of the relation are large, you end up running a sort-mergejoin which pulls everything back to the client.

The first step is understanding what Phoenix is actually doing to runyour query (JOIN or otherwise) and then understanding if you canrephrase your JOIN (or really, the application-level "question") in sucha way that Phoenix can run an efficient execution over it.


Hope that helps.

Re: Phoenix Performances & Uses Cases

Reply via email to