Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

Nan Zhu Mon, 02 Jun 2014 12:12:12 -0700

Hi, Patrick,   

I think https://issues.apache.org/jira/browse/SPARK-1677 is talking about the 
same thing?


How about assigning it to me?  

I think I missed the configuration part in my previous commit, though I 
declared that in the PR description….

Best,  

--  
Nan Zhu


On Monday, June 2, 2014 at 3:03 PM, Patrick Wendell wrote:

> Hey There,
>  
> The issue was that the old behavior could cause users to silently
> overwrite data, which is pretty bad, so to be conservative we decided
> to enforce the same checks that Hadoop does.
>  
> This was documented by this JIRA:
> https://issues.apache.org/jira/browse/SPARK-1100
> https://github.com/apache/spark/commit/3a8b698e961ac05d9d53e2bbf0c2844fcb1010d1
>  
> However, it would be very easy to add an option that allows preserving
> the old behavior. Is anyone here interested in contributing that? I
> created a JIRA for it:
>  
> https://issues.apache.org/jira/browse/SPARK-1993
>  
> - Patrick
>  
> On Mon, Jun 2, 2014 at 9:22 AM, Pierre Borckmans
> <pierre.borckm...@realimpactanalytics.com 
> (mailto:pierre.borckm...@realimpactanalytics.com)> wrote:
> > Indeed, the behavior has changed for good or for bad. I mean, I agree with
> > the danger you mention but I'm not sure it's happening like that. Isn't
> > there a mechanism for overwrite in Hadoop that automatically removes part
> > files, then writes a _temporary folder and then only the part files along
> > with the _success folder.
> >  
> > In any case this change of behavior should be documented IMO.
> >  
> > Cheers
> > Pierre
> >  
> > Message sent from a mobile device - excuse typos and abbreviations
> >  
> > Le 2 juin 2014 à 17:42, Nicholas Chammas <nicholas.cham...@gmail.com 
> > (mailto:nicholas.cham...@gmail.com)> a
> > écrit :
> >  
> > What I've found using saveAsTextFile() against S3 (prior to Spark 1.0.0.) is
> > that files get overwritten automatically. This is one danger to this though.
> > If I save to a directory that already has 20 part- files, but this time
> > around I'm only saving 15 part- files, then there will be 5 leftover part-
> > files from the previous set mixed in with the 15 newer files. This is
> > potentially dangerous.
> >  
> > I haven't checked to see if this behavior has changed in 1.0.0. Are you
> > saying it has, Pierre?
> >  
> > On Mon, Jun 2, 2014 at 9:41 AM, Pierre B
> > [pierre.borckm...@realimpactanalytics.com](mailto:pierre.borckm...@realimpactanalytics.com)
> > wrote:
> > >  
> > > Hi Michaël,
> > >  
> > > Thanks for this. We could indeed do that.
> > >  
> > > But I guess the question is more about the change of behaviour from 0.9.1
> > > to
> > > 1.0.0.
> > > We never had to care about that in previous versions.
> > >  
> > > Does that mean we have to manually remove existing files or is there a way
> > > to "aumotically" overwrite when using saveAsTextFile?
> > >  
> > >  
> > >  
> > > --
> > > View this message in context:
> > > http://apache-spark-user-list.1001560.n3.nabble.com/How-can-I-make-Spark-1-0-saveAsTextFile-to-overwrite-existing-file-tp6696p6700.html
> > > Sent from the Apache Spark User List mailing list archive at Nabble.com 
> > > (http://Nabble.com).
> > >  
> >  
> >  
>  
>  
>

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

Reply via email to