Hi Xindian,

The phoenix-spark integration is based on the Phoenix MapReduce layer,
which doesn't support aggregate functions. However, as you mentioned, both
filtering and pruning predicates are pushed down to Phoenix. With an RDD or
DataFrame loaded, all of Spark's various aggregation methods are available
to you.

Although the Spark JDBC data source supports the full complement of
Phoenix's supported queries, the way it achieves parallelism is to split
the query across a number of workers and connections based on a
'partitionColumn' with a 'lowerBound' and 'upperBound', which must be
numeric. If your use case has numeric primary keys, then that is
potentially a good solution for you. [1]

The phoenix-spark parallelism is based on the splits provided by the
Phoenix query planner, and has no requirements on specifying partition
columns or upper/lower bounds. It's up to you to evaluate which technique
is the right method for your use case. [2]

Good luck,

Josh

[1]
http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases
[2] https://phoenix.apache.org/phoenix_spark.html


On Wed, Jun 8, 2016 at 6:01 PM, Long, Xindian <xindian.l...@sensus.com>
wrote:

> The Spark JDBC data source supports to specify a query as the  “dbtable”
> option.
>
> I assume all queries in the above query in pushed down to the database
> instead of done in Spark.
>
>
>
> The  phoenix spark plug in seems not supporting that. Why is that? Any
> plan in the future to support it?
>
>
>
> I know phoenix spark does support an optional select clause and predicate
> push down in some cases, but it is limited.
>
>
>
> Thanks
>
>
>
> Xindian
>
>
>
>
>
> -------------------------------------------
>
> Xindian “Shindian” Long
>
> Mobile:  919-9168651
>
> Email: xindian.l...@gmail.com
>
>
>
>
>
>
>

Reply via email to