Re: PySpark and Phoenix Dynamic Columns

Craig Roberts Mon, 27 Feb 2017 22:54:07 -0800

Hi Josh,

Thanks for the info. Views do work, and that helps with some of our use
cases anyway.

I was hoping to use Spark as a means of running a long-running query
without overly taxing everything, but since both the Spark/MapReduce
integrations use the same underlying JDBC connections (as far as I can tell
from the code), I've instead just bumped up the timeouts as suggested by a
few other threads / forum posts.

Thanks again for the help,

*Craig Roberts*
*Senior Developer*

*FrogAsia Sdn Bhd (A YTL Company) *| Unit 9, Level 2, D6 at Sentul East |
801, Jalan Sentul, 51000 Kuala Lumpur | 01125618093 | Twitter
<http://www.twitter.com/FrogAsia> | Facebook
<http://www.facebook.com/FrogAsia> | Website <http://www.frogasia.com/>

*This message (including any attachments) is for the use of the addressee
only. It may contain private proprietary or legally privileged statements
and information. No confidentiality or privilege is waived or lost by any
mistransmission. If you are not the intended recipient, please immediately
delete it and all copies of it from your system, destroy any hard copies of
it and notify the sender. You must not, directly or indirectly, use,
disclose, distribute, print, copy or rely on any part of the message if you
are not the intended recipient. Any views expressed in this message
(including any attachments) are those of the individual sender and not
those of any member of the YTL Group, except where the message states
otherwise and the sender is authorized to state them to be the views of any
such entity.*

On 24 February 2017 at 23:06, Josh Mahonin <jmaho...@gmail.com> wrote:

> Hi Craig,
>
> I think this is an open issue in PHOENIX-2648 (https://issues.apache.org/
> jira/browse/PHOENIX-2648)
>
> There seems to be a workaround by using a 'VIEW' instead, as mentioned in
> that ticket.
>
> Good luck,
>
> Josh
>
> On Thu, Feb 23, 2017 at 11:56 PM, Craig Roberts <
> craig.robe...@frogasia.com> wrote:
>
>> Hi all,
>>
>> I've got a (very) basic Spark application in Python that selects some
>> basic information from my Phoenix table. I can't quite figure out how (or
>> even if I can) select dynamic columns through this, however.
>>
>> Here's what I have;
>>
>> from pyspark import SparkContext, SparkConf
>> from pyspark.sql import SQLContext
>>
>> conf = SparkConf().setAppName("pysparkPhoenixLoad").setMaster("local")
>> sc = SparkContext(conf=conf)
>> sqlContext = SQLContext(sc)
>>
>> df = sqlContext.read.format("org.apache.phoenix.spark") \
>>        .option("table", """MYTABLE("dyamic_column" VARCHAR)""") \
>>        .option("zkUrl", "127.0.0.1:2181:/hbase-unsecure") \
>>        .load()
>>
>> df.show()
>> df.printSchema()
>>
>>
>> I get a "org.apache.phoenix.schema.TableNotFoundException:" error for
>> the above.
>>
>> If I try and load the data frame as a table and query that with SQL:
>>
>> sqlContext.registerDataFrameAsTable(df, "test")
>> sqlContext.sql("""SELECT * FROM test("dynamic_column" VARCHAR)""")
>>
>>
>> I get a bit of a strange exception:
>>
>> py4j.protocol.Py4JJavaError: An error occurred while calling o37.sql.
>> : java.lang.RuntimeException: [1.19] failure: ``union'' expected but `('
>> found
>>
>> SELECT * FROM test("dynamic_column" VARCHAR)
>>
>>
>>
>> Does anybody have a pointer on whether this is supported and how I might
>> be able to query a dynamic column? I haven't found much information on the
>> wider Internet about Spark + Phoenix integration for this kind of
>> thing...Simple selects are working. Final note: I have (rather stupidly)
>> lower-cased my column names in Phoenix, so I need to quote them when I
>> execute a query (I'll be changing this as soon as possible).
>>
>> Any assistance would be appreciated :)
>> *-- Craig*
>>
>
>

Re: PySpark and Phoenix Dynamic Columns

Reply via email to