Use foreachPartition and batch the writes On Sat, Jul 25, 2015 at 9:14 AM, <nib...@free.fr> wrote:
> Hello, > I am new user of Spark, and need to know what could be the best practice > to do the following scenario : > > - Spark Streaming receives XML messages from Kafka > - Spark transforms each message of the RDD (xml2json + some enrichments) > - Spark store the transformed/enriched messages inside MongoDB and HDFS > (Mongo Key as file name) > > Basically, I would say that I have to manage message one by one inside a > foreach loop of the RDD and write each message one by one in MongoDB and > HDFS. > Do you think it is the best way to dot it ? > > Tks > Nicolas > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >