subject:"ETL on pyspark"

RE: ETL on pyspark

2014-02-25 Thread Adrian Mocanu

Matei Zaharia [mailto:matei.zaha...@gmail.com] Sent: February-25-14 3:02 PM To: user@spark.apache.org Cc: u...@spark.incubator.apache.org Subject: Re: ETL on pyspark It will only move a file to the final directory when it's successfully finished writing it, so the file shouldn't have any

Re: ETL on pyspark

2014-02-25 Thread Matei Zaharia

after recovery from the failure does > it continue where it left off or will there be duplicates in the file? > > -A > From: Matei Zaharia [mailto:matei.zaha...@gmail.com] > Sent: February-24-14 4:20 PM > To: u...@spark.incubator.apache.org > Subject: Re: ETL on pyspark >

RE: ETL on pyspark

2014-02-25 Thread Adrian Mocanu

on pyspark collect() means to bring all the data back to the master node, and there might just be too much of it for that. How big is your file? If you can't bring it back to the master node try saveAsTextFile to write it out to a filesystem (in parallel). Matei On Feb 24, 2014, at 1:

RE: ETL on pyspark

Re: ETL on pyspark

RE: ETL on pyspark

3 matches

Site Navigation

Mail list logo

Footer information