Re: Spark with data on NFS v HDFS

Tobias Pfeiffer Thu, 05 Mar 2015 17:07:20 -0800

Hi,

On Thu, Mar 5, 2015 at 10:58 PM, Ashish Mukherjee <
ashish.mukher...@gmail.com> wrote:
>
> I understand Spark can be used with Hadoop or standalone. I have certain
> questions related to use of the correct FS for Spark data.
>
> What is the efficiency trade-off in feeding data to Spark from NFS v HDFS?
>


As I understand it, one performance advantage of using HDFS is that the
task will be computed at a cluster node that has data on its local disk
already, so the tasks go to where the data is. In the case of NFS, all data
must be downloaded from the file server(s) first, so there is no such thing
as "data locality".

Tobias

Re: Spark with data on NFS v HDFS

Reply via email to