t the target table will contains
about 100 million records.
HBase has 14 region servers, both tables salted with SALT_BUCKETS=42.
Spark's job running via Yarn.
-Original Message-
From: Josh Elser [mailto:els...@apache.org]
Sent: Monday, March 5, 2018 9:14 PM
To: user@phoenix.apache.o
---Original Message-
From: Josh Elser [mailto:els...@apache.org]
Sent: Friday, March 9, 2018 2:17 AM
To: user@phoenix.apache.org
Subject: Re: Phoenix as a source for Spark processing
How large is each row in this case? Or, better yet, how large is the table
in HBase?
You're spreadi
s.
HBase has 14 region servers, both tables salted with SALT_BUCKETS=42.
Spark's job running via Yarn.
-Original Message-
From: Josh Elser [mailto:els...@apache.org]
Sent: Monday, March 5, 2018 9:14 PM
To: user@phoenix.apache.org
Subject: Re: Phoenix as a source for Spark processing
I would guess that Hive would always be capable of out-matching what
HBase/Phoenix can do for this type of workload (bulk-transformation).
That said, I'm not ready to tell you that you can't get the
Phoenix-Spark integration better performing. See the other thread where
you provide more details
Some more details... We have done some simple tests to compare read/write
possibility spark+hive and spark+phoenix. And now we have the following results:
Copy table (with no any transformations) (about 800 million rec):
Hive (TEZ) - 752 sec
Spark:
>From Hive to Hive: 2463 sec
>From Phoenix to H
nt: Monday, March 5, 2018 9:14 PM
To: user@phoenix.apache.org
Subject: Re: Phoenix as a source for Spark processing
Hi Stepan,
Can you better ballpark the Phoenix-Spark performance you've seen (e.g.
how much hardware do you have, how many spark executors did you use, how
many region serve
Hi Stepan,
Can you better ballpark the Phoenix-Spark performance you've seen (e.g.
how much hardware do you have, how many spark executors did you use, how
many region servers)? Also, what versions of software are you using?
I don't think there are any firm guidelines on how you can solve thi
In our software we need to combine fast interactive access to the data with
quite complex data processing. I know that Phoenix intended for fast access,
but hoped that also I could be able to use Phoenix as a source for complex
processing with the Spark. Unfortunately, Phoenix + Spark shows ver