I have a bunch of JSON files stored in HDFS that I want to read in, modify, and write back out. I'm new to all this and am not sure if this is even the right thing to do.
Basically, my JSON files contain my raw data, and I want to calculate some derived data and add is to the existing data. So first, is my basic approach to the problem flawed? Should I be placing derived data somewhere else? If not, how to I modify the existing JSON files? Note: I have been able to read the JSON files into an RDD using sqlContext.jsonFile, and save them back using RDD.saveAsTextFile(). But this creates new files. Is there a way to over write the original files? Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Updating-exising-JSON-files-tp12211.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org