At the end of the day, the more data that is pulled from multiple physical
nodes, the (relatively) slower your response time to respond to queries.
Until you reach a point where that response time exceeds your business
requirements, keep it simple. As volumes grow with distributed data sources
to
Apache Spark supports integration with HBase (which has REST API).
What's the amount of data you want to store in this system ?
Cheers
On Tue, Jan 20, 2015 at 3:40 AM, Alec Taylor alec.tayl...@gmail.com wrote:
I am architecting a platform incorporating: recommender systems,
information
Small amounts in a one node cluster (at first).
As it scales I'll be looking at running various O(nk) algorithms,
where n is the number of distinct users and k are the overlapping
features I want to consider.
Is Apache Spark good as a general database as well as it's more fancy
features? - E.g.:
bq. Is Apache Spark good as a general database
I don't think Spark itself is a general database though there're connectors
to various NoSQL databases, including HBase.
bq. using their graph database features?
Sure. Take a look at http://spark.apache.org/graphx/
Cheers
On Tue, Jan 20, 2015 at