Re: [SQL][Dataframe] Change data source after saveAsParquetFile

Michael Armbrust Fri, 08 May 2015 11:45:14 -0700

Thats a feature flag for a new code path for reading parquet files.  Its
only there in case bugs are found in the old path and will be removed once
we are sure the new path is solid.


On Fri, May 8, 2015 at 8:04 AM, Peter Rudenko <petro.rude...@gmail.com>
wrote:

>  Hm, thanks.
> Do you know what this setting mean:
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala#L1178
> ?
>
> Thanks,
> Peter Rudenko
>
>
> On 2015-05-08 17:48, ayan guha wrote:
>
> From S3. As the dependency of df will be on s3. And because rdds are not
> replicated.
> On 8 May 2015 23:02, "Peter Rudenko" < <petro.rude...@gmail.com>
> petro.rude...@gmail.com> wrote:
>
>>  Hi, i have a next question:
>>
>> val data = sc.textFile("s3:///")val df = data.toDF
>> df.saveAsParquetFile("hdfs://")
>> df.someAction(...)
>>
>> if during someAction some workers would die, would recomputation download
>> files from s3 or from hdfs parquet?
>>
>> Thanks,
>> Peter Rudenko
>> 
>>
>
>

Re: [SQL][Dataframe] Change data source after saveAsParquetFile

Reply via email to