Re: Efficient approach to store an RDD as a file in HDFS and read it back as an RDD?

Stefano Baghino Wed, 04 Nov 2015 23:52:54 -0800

What scenario would you like to optimize for? If you have something more
specific regarding your use case, the mailing list can surely provide you
with some very good advice.

If you just want to save an RDD as Avro you can use a module from
Databricks (the README on GitHub <https://github.com/databricks/spark-avro>
also gives you some example), otherwise Parquet is natively supported by
Spark SQL, the official documentation contains useful examples
<http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files>
.

On Thu, Nov 5, 2015 at 12:09 AM, swetha <swethakasire...@gmail.com> wrote:

> Hi,
>
> What is the efficient approach to save an RDD as a file in HDFS and
> retrieve
> it back? I was thinking between Avro, Parquet and SequenceFileFormart. We
> currently use SequenceFileFormart for one of our use cases.
>
> Any example on how to store and retrieve an RDD in an Avro and Parquet file
> formats would be of great help.
>
> Thanks,
> Swetha
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Efficient-approach-to-store-an-RDD-as-a-file-in-HDFS-and-read-it-back-as-an-RDD-tp25279.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

-- 
BR,
Stefano Baghino

Software Engineer @ Radicalbit

Re: Efficient approach to store an RDD as a file in HDFS and read it back as an RDD?

Reply via email to