Re: Low-latency queries, HDFS exclusively or should I go, e.g.: MongoDB?

2015-01-21 Thread daemeon reiydelle
At the end of the day, the more data that is pulled from multiple physical nodes, the (relatively) slower your response time to respond to queries. Until you reach a point where that response time exceeds your business requirements, keep it simple. As volumes grow with distributed data sources to

Re: Low-latency queries, HDFS exclusively or should I go, e.g.: MongoDB?

2015-01-20 Thread Ted Yu
Apache Spark supports integration with HBase (which has REST API). What's the amount of data you want to store in this system ? Cheers On Tue, Jan 20, 2015 at 3:40 AM, Alec Taylor alec.tayl...@gmail.com wrote: I am architecting a platform incorporating: recommender systems, information

Re: Low-latency queries, HDFS exclusively or should I go, e.g.: MongoDB?

2015-01-20 Thread Alec Taylor
Small amounts in a one node cluster (at first). As it scales I'll be looking at running various O(nk) algorithms, where n is the number of distinct users and k are the overlapping features I want to consider. Is Apache Spark good as a general database as well as it's more fancy features? - E.g.:

Re: Low-latency queries, HDFS exclusively or should I go, e.g.: MongoDB?

2015-01-20 Thread Ted Yu
bq. Is Apache Spark good as a general database I don't think Spark itself is a general database though there're connectors to various NoSQL databases, including HBase. bq. using their graph database features? Sure. Take a look at http://spark.apache.org/graphx/ Cheers On Tue, Jan 20, 2015 at