How to convert a parquet file that is saved in hdfs to an RDD after reading the file from hdfs?
On Thu, Nov 5, 2015 at 10:02 AM, Igor Berman <igor.ber...@gmail.com> wrote: > Hi, > we are using avro with compression(snappy). As soon as you have enough > partitions, the saving won't be a problem imho. > in general hdfs is pretty fast, s3 is less so > the issue with storing data is that you will loose your partitioner(even > though rdd has it) at loading moment. There is PR that tries to solve this. > > > On 5 November 2015 at 01:09, swetha <swethakasire...@gmail.com> wrote: > >> Hi, >> >> What is the efficient approach to save an RDD as a file in HDFS and >> retrieve >> it back? I was thinking between Avro, Parquet and SequenceFileFormart. We >> currently use SequenceFileFormart for one of our use cases. >> >> Any example on how to store and retrieve an RDD in an Avro and Parquet >> file >> formats would be of great help. >> >> Thanks, >> Swetha >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Efficient-approach-to-store-an-RDD-as-a-file-in-HDFS-and-read-it-back-as-an-RDD-tp25279.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >