Specifically to your last two points about windowing, transforming,
grouping, etc: my current opinion is that Hive does certain analytical
style operations much better than Phoenix. Personally, I don't think it
makes sense for Phoenix to try to "catch up". It would take years for us
to build such capabilities on par with what they have.
Some of us have been making efforts to ease data access between Hive and
Phoenix via the PhoenixStorageHandler for Hive. The goal of this is that
it will make your life easier to use the correct tool for the job. Use
Hive when Hive does things well, and use Phoenix when Phoenix does it well.
(Again, this is my opinion. It is not meant to be some declaration of
direction by the entire Apache Phoenix community)
On 10/27/18 7:50 AM, Nicolas Paris wrote:
Hi
I am benchmarking phoenix to better understand its strength and
weaknesses. My basis is to compare to postgresql for OLTP workload and
hive llap for OLAP workload. I am testing on a 10 computer cluster
instance with hive (2.1) and phoenix (4.8) 220 GO RAM/32CPU versus a
postgresql (9.6) 128GO RAM 32CPU.
Right now, my opinion is:
- when getting a subset on a large table, phoenix performs the
best
- when getting a subset from multiple large tables, postgres performs
the best
- when getting a subset from a large table joining one to many small
table, phoenix performs the best
- when ingesting high frequency data, Phoenix performs the best
- when grouping by query, hive > postgresql > phoenix
- when windowning, transforming, grouping, hive performs the best,
phoenix the worst
Finally, my conclusion is phoenix is not intended at all for analytics
queries such grouping, windowing, and joining large tables. It suits
well for very specific use case like maintaining a very large table with
eventually small tables to join with (such timeseries data, or binary
storage data with hbase MOB enabled).
Am I missing something ?
Thanks,