Re: Phoenix as a source for Spark processing

2018-03-08 Thread Josh Elser
How large is each row in this case? Or, better yet, how large is the table in HBase? You're spreading out approximately 7 "clients" to each Regionserver fetching results (100/14). So, you should have pretty decent saturation from Spark into HBase. I'd be taking a look at the EXPLAIN plan

Re: Phoenix as a source for Spark processing

2018-03-08 Thread Josh Elser
I would guess that Hive would always be capable of out-matching what HBase/Phoenix can do for this type of workload (bulk-transformation). That said, I'm not ready to tell you that you can't get the Phoenix-Spark integration better performing. See the other thread where you provide more

Re: Direct HBase vs. Phoenix query performance

2018-03-08 Thread James Taylor
Hi Marcell, It'd be helpful to see the table DDL and the query too along with an idea of how many regions might be involved in the query. If a query is a commonly run query, usually you'll design the row key around optimizing it. If you have other, simpler queries that have determined your row

Direct HBase vs. Phoenix query performance

2018-03-08 Thread Marcell Ortutay
Hi, I am using Phoenix at my company for a large query that is meant to be run in real time as part of our application. The query involves several aggregations, anti-joins, and an inner query. Here is the (anonymized) query plan:

Re: Runtime DDL supported?

2018-03-08 Thread James Taylor
Thanks for digging that up, Miles. I've added a comment on the JIRA on how to go about implementing it here: https://issues.apache.org/jira/browse/PHOENIX-3547?focusedCommentId=16391739=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16391739 That would be a good first

Re: [DISCUSS] Include python-phoenixdb into Phoenix

2018-03-08 Thread Ankit Singhal
Thanks, Lucas and Josh. I'm now putting up the formal thread for voting. On Fri, Mar 2, 2018 at 2:50 AM, Josh Elser wrote: > He appears! Thanks for weighing in. Comments inline.. > > On Thu, Mar 1, 2018 at 3:55 PM, Lukáš Lalinský wrote: > > I'm fine