Hi All,

   - I am writing a batch application using Spark SQL and Dataframes. This
   application has a bunch of file joins and there are intermediate points
   where I need to drop a file for downstream applications to consume.
   - The problem is all these downstream applications are still on legacy,
   so they still require us to drop them a text file.As you all must be
   knowing Dataframe stores the data in columnar format internally.

Only way I found out how to do this and which looks awfully slow is this:

myDF=sc.textFile("inputpath").toDF()
myDF.rdd.repartition(1).saveAsTextFile("mypath/output")

Is there any better way to do this?

*P.S: *The other workaround would be to use RDDs for all my operations. But
I am wary of using them as the documentation says Dataframes are way faster
because of the Catalyst engine running behind the scene.

Please suggest if any of you might have tried something similar.

Reply via email to