I have a bunch of JSON files stored in HDFS that I want to read in, modify,
and write back out. I'm new to all this and am not sure if this is even the
right thing to do.

Basically, my JSON files contain my raw data, and I want to calculate some
derived data and add is to the existing data.

So first, is my basic approach to the problem flawed? Should I be placing
derived data somewhere else?

If not, how to I modify the existing JSON files?

Note: I have been able to read the JSON files into an RDD using
sqlContext.jsonFile, and save them back using RDD.saveAsTextFile(). But this
creates new files. Is there a way to over write the original files?

Thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Updating-exising-JSON-files-tp12211.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to