Best practice for transforming and storing from Spark to Mongo/HDFS

nibiau Sat, 25 Jul 2015 07:14:49 -0700

Hello,
I am new user of Spark, and need to know what could be the best practice to do 
the following scenario :


- Spark Streaming receives XML messages from Kafka
- Spark transforms each message of the RDD (xml2json + some enrichments)
- Spark store the transformed/enriched messages inside MongoDB and HDFS (Mongo 
Key as file name)

Basically, I would say that I have to manage message one by one inside a 
foreach loop of the RDD and write each message one by one in MongoDB and HDFS.
Do you think it is the best way to dot it ?

Tks
Nicolas

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Best practice for transforming and storing from Spark to Mongo/HDFS

Reply via email to