Thanks for pointing that out. I've assigned you to SPARK-1677 (I think I accidentally assigned myself way back when I created it). This should be an easy fix.
On Mon, Jun 2, 2014 at 12:19 PM, Nan Zhu <zhunanmcg...@gmail.com> wrote: > Hi, Patrick, > > I think https://issues.apache.org/jira/browse/SPARK-1677 is talking about > the same thing? > > How about assigning it to me? > > I think I missed the configuration part in my previous commit, though I > declared that in the PR description.... > > Best, > > -- > Nan Zhu > > On Monday, June 2, 2014 at 3:03 PM, Patrick Wendell wrote: > > Hey There, > > The issue was that the old behavior could cause users to silently > overwrite data, which is pretty bad, so to be conservative we decided > to enforce the same checks that Hadoop does. > > This was documented by this JIRA: > https://issues.apache.org/jira/browse/SPARK-1100 > https://github.com/apache/spark/commit/3a8b698e961ac05d9d53e2bbf0c2844fcb1010d1 > > However, it would be very easy to add an option that allows preserving > the old behavior. Is anyone here interested in contributing that? I > created a JIRA for it: > > https://issues.apache.org/jira/browse/SPARK-1993 > > - Patrick > > On Mon, Jun 2, 2014 at 9:22 AM, Pierre Borckmans > <pierre.borckm...@realimpactanalytics.com> wrote: > > Indeed, the behavior has changed for good or for bad. I mean, I agree with > the danger you mention but I'm not sure it's happening like that. Isn't > there a mechanism for overwrite in Hadoop that automatically removes part > files, then writes a _temporary folder and then only the part files along > with the _success folder. > > In any case this change of behavior should be documented IMO. > > Cheers > Pierre > > Message sent from a mobile device - excuse typos and abbreviations > > Le 2 juin 2014 à 17:42, Nicholas Chammas <nicholas.cham...@gmail.com> a > écrit : > > What I've found using saveAsTextFile() against S3 (prior to Spark 1.0.0.) is > that files get overwritten automatically. This is one danger to this though. > If I save to a directory that already has 20 part- files, but this time > around I'm only saving 15 part- files, then there will be 5 leftover part- > files from the previous set mixed in with the 15 newer files. This is > potentially dangerous. > > I haven't checked to see if this behavior has changed in 1.0.0. Are you > saying it has, Pierre? > > On Mon, Jun 2, 2014 at 9:41 AM, Pierre B > [pierre.borckm...@realimpactanalytics.com](mailto:pierre.borckm...@realimpactanalytics.com) > wrote: > > > Hi Michaël, > > Thanks for this. We could indeed do that. > > But I guess the question is more about the change of behaviour from 0.9.1 > to > 1.0.0. > We never had to care about that in previous versions. > > Does that mean we have to manually remove existing files or is there a way > to "aumotically" overwrite when using saveAsTextFile? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-can-I-make-Spark-1-0-saveAsTextFile-to-overwrite-existing-file-tp6696p6700.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > >