Re: phoenix spark options not supporint query in dbtable

2017-08-17 Thread Josh Mahonin
You're mostly at the mercy of HBase and Phoenix to ensure that your data is evenly distributed in the underlying regions. You could look at pre-splitting or salting [1] your tables, as well as adjusting the guidepost parameters [2] if you need finer tuned control. If you end up with more idle

Phoenix Storage Not Working on AWS EMR 5.8.0

2017-08-17 Thread Steve Terrell
I'm running EMR 5.8.0 with these applications installed: Pig 0.16.0, Phoenix 4.11.0, HBase 1.3.1 Here is my pig script (try.pig): REGISTER /usr/lib/phoenix/phoenix-4.11.0-HBase-1.3-client.jar; A = load '/steve/a.txt' as (TXT:chararray); store A into 'hbase://A_TABLE' using

Re: phoenix spark options not supporint query in dbtable

2017-08-17 Thread Kanagha
Thanks for the details. I tested out and saw that the no.of partitions equals to the no.of parallel scans run upon DataFrame load in phoenix 4.10. Also, how can we ensure that the read gets evenly distributed as tasks across the no.of executors set for the job? I'm running phoenixTableAsDataFrame

Re: Custom Connector for Prestodb

2017-08-17 Thread Josh Mahonin
Hi Luqman, I just responded to another query on the list about phoenix-spark that may help shed some light. In addition, the preferred locations the phoenix-spark connector exposes are determined in the general PhoenixInputFormat MapReduce code [1] I'm not very familiar with PrestoDB, but if

Re: phoenix spark options not supporint query in dbtable

2017-08-17 Thread Josh Mahonin
Hi, Phoenix is able to parallelize queries based on the underlying HBase region splits, as well as its own internal guideposts based on statistics collection [1] The phoenix-spark connector exposes those splits to Spark for the RDD / DataFrame parallelism. In order to test this out, you can try

Custom Connector for Prestodb

2017-08-17 Thread Luqman Ghani
Hi, We are evaluating the possibility of writing a custom connector for Phoenix to access tables in stored in HBase. However, we need some help. The connector for Presto should be able to read from HBase cluster using parallel collections. For that the connector has a "ConnectorSplitManager"

Re: phoenix spark options not supporint query in dbtable

2017-08-17 Thread Kanagha
Also, I'm using phoenixTableAsDataFrame API to read from a pre-split phoenix table. How can we ensure read is parallelized across all executors? Would salting/pre-splitting tables help in providing parallelism? Appreciate any inputs. Thanks Kanagha On Wed, Aug 16, 2017 at 10:16 PM, kanagha