Hey Mike,

I quickly looked through the example and I found major performance issue.
You are collecting the RDDs to the driver and then sending them to Mongo in
a foreach. Why not doing a distributed push to Mongo?

val mongoConnection = ...


rdd.foreachPartition { iterator =>
   val connection = createConnection()
   iterator.foreach { ... push partition using connection ...  }

On Thu, Feb 26, 2015 at 1:25 PM, Mike Trienis <mike.trie...@orcsol.com>

> Hi All,
> I have Spark Streaming setup to write data to a replicated MongoDB
> database and would like to understand if there would be any issues using
> the Reactive Mongo library to write directly to the mongoDB? My stack is
> Apache Spark sitting on top of Cassandra for the datastore, so my thinking
> is that the MongoDB connector for Hadoop will not be particular useful for
> me since I'm not using HDFS? Is there anything that I'm missing?
> Here is an example of code that I'm planning on using as a starting point
> for my implementation.
> LogAggregator
> <https://github.com/chimpler/blog-spark-streaming-log-aggregation/blob/master/src/main/scala/com/chimpler/sparkstreaminglogaggregation/LogAggregator.scala>
> Thanks, Mike.

Reply via email to