Re: Experiences about NoSQL databases with Spark

Yu Zhang Sat, 28 Nov 2015 21:38:37 -0800

BTW, if you decide to try the mongodb, please use the 3.0+ version with
"wiredtiger" engine.


On Sat, Nov 28, 2015 at 11:30 PM, Yu Zhang <yuz1...@iastate.edu> wrote:

> If you need to construct multiple indexes, hbase will perform better, the
> writing speed is slow in mongodb with many indexes and the memory cost is
> huge!
>
> But my concern is: with mongodb, you could easily cooperate with js and
> with some visualization tools like D3.js, the work will become smooth as
> breeze.
>
> Could you provide additional details of the data size and number of
> operations you need in your program? I believe this is a quite general
> question and hope to hear any comments and thoughts.
>
> On Tue, Nov 24, 2015 at 9:50 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> You should consider using HBase as the NoSQL database.
>> w.r.t. 'The data in the DB should be indexed', you need to design the
>> schema in HBase carefully so that the retrieval is fast.
>>
>> Disclaimer: I work on HBase.
>>
>> On Tue, Nov 24, 2015 at 4:46 AM, sparkuser2345 <hm.spark.u...@gmail.com>
>> wrote:
>>
>>> I'm interested in knowing which NoSQL databases you use with Spark and
>>> what
>>> are your experiences.
>>>
>>> On a general level, I would like to use Spark streaming to process
>>> incoming
>>> data, fetch relevant aggregated data from the database, and update the
>>> aggregates in the DB based on the incoming records. The data in the DB
>>> should be indexed to be able to fetch the relevant data fast and to allow
>>> fast interactive visualization of the data.
>>>
>>> I've been reading about MongoDB+Spark and I've got the impression that
>>> there
>>> are some challenges in fetching data by indices and in updating
>>> documents,
>>> but things are moving so fast, so I don't know if these are relevant
>>> anymore. Do you find any benefit from using HBase with Spark as HBase is
>>> built on top of HDFS?
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Experiences-about-NoSQL-databases-with-Spark-tp25462.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>

Re: Experiences about NoSQL databases with Spark

Reply via email to