Hm, thanks.
Do you know what this setting mean:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala#L1178
?
Thanks,
Peter Rudenko
On 2015-05-08 17:48, ayan guha wrote:
From S3. As the dependency of df will be on s3. And because rdds are
not replicated.
On 8 May 2015 23:02, "Peter Rudenko" <petro.rude...@gmail.com
<mailto:petro.rude...@gmail.com>> wrote:
Hi, i have a next question:
|val data = sc.textFile("s3:///") val df = data.toDF
df.saveAsParquetFile("hdfs://") df.someAction(...) |
if during someAction some workers would die, would recomputation
download files from s3 or from hdfs parquet?
Thanks,
Peter Rudenko