Re: Best practice for transforming and storing from Spark to Mongo/HDFS

Cody Koeninger Sat, 25 Jul 2015 07:51:26 -0700

Use foreachPartition and batch the writes

On Sat, Jul 25, 2015 at 9:14 AM, <nib...@free.fr> wrote:


> Hello,
> I am new user of Spark, and need to know what could be the best practice
> to do the following scenario :
>
> - Spark Streaming receives XML messages from Kafka
> - Spark transforms each message of the RDD (xml2json + some enrichments)
> - Spark store the transformed/enriched messages inside MongoDB and HDFS
> (Mongo Key as file name)
>
> Basically, I would say that I have to manage message one by one inside a
> foreach loop of the RDD and write each message one by one in MongoDB and
> HDFS.
> Do you think it is the best way to dot it ?
>
> Tks
> Nicolas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: Best practice for transforming and storing from Spark to Mongo/HDFS

Reply via email to