if you are using yarn as the resource negotiator , you will get
container(cpu+memory ) allocated from all the node. fyi:
http://spark.apache.org/docs/latest/running-on-yarn.html
it'a scalable parallel caculation. Map reduce(phoenix) will do the same
thing just it's way to do the caculation is
if I was to use spark (via python api for example), the query would be
processed on my webservers or on a separate server like in phoenix?
Regards,
Cheyenne Forbes
Chief Executive Officer
Avapno Omnitech
Chief Operating Officer
Avapno Solutions, Co.
Chairman
Avapno Assets, LLC
Bethel Town
Hi Cheyenne ,
That's a very interesting question, if secondary indexes are created well
on phoenix table , hbase will use coprocessor to do the join operation
(java based map reduce job still if I understand correctly) and then
return the result . on the contrary spark is famous for its great
i've been thinking, is spark sql faster than phoenix (or phoenix-spark)
with selects with joins on large data (for example instagram's size)?
Regards,
Cheyenne Forbes
Chief Executive Officer
Avapno Omnitech
Chief Operating Officer
Avapno Solutions, Co.
Chairman
Avapno Assets, LLC
Bethel Town
Hi Dalin,
Thanks for the information, I'm glad to hear that the spark integration is
working well for your use case.
Josh
On Mon, Sep 12, 2016 at 8:15 PM, dalin.qin wrote:
> Hi Josh,
>
> before the project kicked off , we get the idea that hbase is more
> suitable for
Hi Josh,
before the project kicked off , we get the idea that hbase is more suitable
for massive writing rather than batch full table reading(I forgot where the
idea from ,just some benchmart testing posted in the website maybe). So we
decide to read hbase only based on primary key for small
Hi Dalin,
That's great to hear. Have you also tried reading back those rows through
Spark for a larger "batch processing" job? Am curious if you have any
experiences or insight there from operating on a large dataset.
Thanks!
Josh
On Mon, Sep 12, 2016 at 10:29 AM, dalin.qin
Hi ,
I've used phoenix table to store billions of rows , rows are incrementally
insert into phoenix by spark every day and the table was for instant query
from web page by providing primary key . so far so good .
Thanks
Dalin
On Mon, Sep 12, 2016 at 10:07 AM, Cheyenne Forbes <
Thanks everyone, I will be using phoenix for simple input/output and
the phoenix_spark plugin (https://phoenix.apache.org/phoenix_spark.html)
for more complex queries, is that the smart thing?
Regards,
Cheyenne Forbes
Chief Executive Officer
Avapno Omnitech
Chief Operating Officer
Avapno
Just to add to James' comment, they're indeed complementary and it all
comes down to your own use case. Phoenix offers a convenient SQL interface
over HBase, which is capable of doing very fast queries. If you're just
doing insert / retrieval, it's unlikely that Spark will help you much there.
It's not an either/or with Phoenix and Spark - often companies use both as
they're very complementary. See this [1] blog for an example. Spark is a
processing engine while Phoenix+HBase is a database/store. You'll need to
store your data somewhere.
Thanks,
James
[1]
Thank you. For a project as big as Facebook or Snapschat, would you
recommend using Spark or Phoenix for things such as message
retrieval/insert, user search, user feeds retrieval/insert, etc. and what
are the pros and cons?
Regard,
Cheyenne
On Sun, Sep 11, 2016 at 8:31 AM, John Leach
Spark has a robust execution model with the following features that are not
part of phoenix
* Scalable
* fault tolerance with lineage (Handles large intermediate results)
* memory management for tasks
* Resource Management (Fair Scheduling)
* Additional
I realized there is a spark plugin for phoenix, any use cases? why would I
use spark with phoenix instead of phoenix by itself?
14 matches
Mail list logo