Re: Efficient approach to store an RDD as a file in HDFS and read it back as an RDD?

swetha kasireddy Thu, 05 Nov 2015 11:14:38 -0800

How to convert a parquet file that is saved in hdfs to an RDD after reading
the file from hdfs?


On Thu, Nov 5, 2015 at 10:02 AM, Igor Berman <igor.ber...@gmail.com> wrote:

> Hi,
> we are using avro with compression(snappy). As soon as you have enough
> partitions, the saving won't be a problem imho.
> in general hdfs is pretty fast, s3 is less so
> the issue with storing data is that you will loose your partitioner(even
> though rdd has it) at loading moment. There is PR that tries to solve this.
>
>
> On 5 November 2015 at 01:09, swetha <swethakasire...@gmail.com> wrote:
>
>> Hi,
>>
>> What is the efficient approach to save an RDD as a file in HDFS and
>> retrieve
>> it back? I was thinking between Avro, Parquet and SequenceFileFormart. We
>> currently use SequenceFileFormart for one of our use cases.
>>
>> Any example on how to store and retrieve an RDD in an Avro and Parquet
>> file
>> formats would be of great help.
>>
>> Thanks,
>> Swetha
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Efficient-approach-to-store-an-RDD-as-a-file-in-HDFS-and-read-it-back-as-an-RDD-tp25279.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Re: Efficient approach to store an RDD as a file in HDFS and read it back as an RDD?

Reply via email to