How large is each row in this case? Or, better yet, how large is the
table in HBase?
You're spreading out approximately 7 "clients" to each Regionserver
fetching results (100/14). So, you should have pretty decent saturation
from Spark into HBase.
I'd be taking a look at the EXPLAIN plan
I would guess that Hive would always be capable of out-matching what
HBase/Phoenix can do for this type of workload (bulk-transformation).
That said, I'm not ready to tell you that you can't get the
Phoenix-Spark integration better performing. See the other thread where
you provide more
Hi Marcell,
It'd be helpful to see the table DDL and the query too along with an idea
of how many regions might be involved in the query. If a query is a
commonly run query, usually you'll design the row key around optimizing it.
If you have other, simpler queries that have determined your row
Hi,
I am using Phoenix at my company for a large query that is meant to be run
in real time as part of our application. The query involves several
aggregations, anti-joins, and an inner query. Here is the (anonymized)
query plan:
Thanks for digging that up, Miles. I've added a comment on the JIRA on how
to go about implementing it here:
https://issues.apache.org/jira/browse/PHOENIX-3547?focusedCommentId=16391739=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16391739
That would be a good first
Thanks, Lucas and Josh. I'm now putting up the formal thread for voting.
On Fri, Mar 2, 2018 at 2:50 AM, Josh Elser wrote:
> He appears! Thanks for weighing in. Comments inline..
>
> On Thu, Mar 1, 2018 at 3:55 PM, Lukáš Lalinský wrote:
> > I'm fine