Efficiently write a Dataframe to Text file(Spark Version 1.6.1)

sanat kumar Patnaik Wed, 14 Sep 2016 04:47:35 -0700

Hi All,


   - I am writing a batch application using Spark SQL and Dataframes. This
   application has a bunch of file joins and there are intermediate points
   where I need to drop a file for downstream applications to consume.
   - The problem is all these downstream applications are still on legacy,
   so they still require us to drop them a text file.As you all must be
   knowing Dataframe stores the data in columnar format internally.

Only way I found out how to do this and which looks awfully slow is this:

myDF=sc.textFile("inputpath").toDF()
myDF.rdd.repartition(1).saveAsTextFile("mypath/output")

Is there any better way to do this?

*P.S: *The other workaround would be to use RDDs for all my operations. But
I am wary of using them as the documentation says Dataframes are way faster
because of the Catalyst engine running behind the scene.

Please suggest if any of you might have tried something similar.

Efficiently write a Dataframe to Text file(Spark Version 1.6.1)

Reply via email to