You can apply a filter first to filter out data of needed dates and then append them.

Cheng

On 8/20/15 4:59 PM, Hemant Bhanawat wrote:
How can I overwrite only a given partition or manually remove a partition before writing?

I don't know if (and I don't think) there is a way to do that using a mode. But doesn't manually deleting the directory of a particular partition help? For directory structure, check this out...

http://spark.apache.org/docs/latest/sql-programming-guide.html#partition-discovery


On Wed, Aug 19, 2015 at 8:18 PM, Romi Kuntsman <r...@totango.com <mailto:r...@totango.com>> wrote:

    Hello,

    I have a DataFrame, with a date column which I want to use as a
    partition.
    Each day I want to write the data for the same date in Parquet,
    and then read a dataframe for a date range.

    I'm using:
    
myDataframe.write().partitionBy("date").mode(SaveMode.Overwrite).parquet(parquetDir);

    If I use SaveMode.Append, then writing data for the same partition
    adds the same data there again.
    If I use SaveMode.Overwrite, then writing data for a single
    partition removes all the data for all partitions.

    How can I overwrite only a given partition or manually remove a
    partition before writing?

    Many thanks!
    Romi K.



Reply via email to