Re: hadoop/yarn and task parallelization on non-hdfs filesystems

2014-08-18 Thread Calvin
to the cluster to decide how many containers can be parallel run. Yong Date: Fri, 15 Aug 2014 12:30:09 -0600 Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems From: iphcal...@gmail.com To: user@hadoop.apache.org Thanks for the responses! To clarify, I'm not using any

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

2014-08-18 Thread Calvin
run. Yong Date: Fri, 15 Aug 2014 12:30:09 -0600 Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems From: iphcal...@gmail.com To: user@hadoop.apache.org Thanks for the responses! To clarify, I'm not using any special FileSystem implementation. An example input

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

2014-08-15 Thread jay vyas
, assume you have a lot of small files. Yong From: ha...@cloudera.com Date: Fri, 15 Aug 2014 16:45:02 +0530 Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems To: user@hadoop.apache.org Does your non-HDFS filesystem implement a getBlockLocations API, that MR

RE: hadoop/yarn and task parallelization on non-hdfs filesystems

2014-08-15 Thread Harsh J
can get good utilization of your cluster, assume you have a lot of small files. Yong From: ha...@cloudera.com Date: Fri, 15 Aug 2014 16:45:02 +0530 Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems To: user@hadoop.apache.org Does your non-HDFS filesystem

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

2014-08-15 Thread Calvin
16:45:02 +0530 Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems To: user@hadoop.apache.org Does your non-HDFS filesystem implement a getBlockLocations API, that MR relies on to know how to split files? The API is at http://hadoop.apache.org/docs/stable2/api

RE: hadoop/yarn and task parallelization on non-hdfs filesystems

2014-08-15 Thread java8964
file, so you can get good utilization of your cluster, assume you have a lot of small files. Yong From: ha...@cloudera.com Date: Fri, 15 Aug 2014 16:45:02 +0530 Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems To: user@hadoop.apache.org Does your

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

2014-08-14 Thread Calvin
such parameters? Thanks, Calvin [1] https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems On Tue, Aug 12, 2014 at 12:29 PM, Calvin iphcal...@gmail.com wrote: Hi all, I've instantiated a Hadoop 2.4.1 cluster and I've found that running MapReduce

hadoop/yarn and task parallelization on non-hdfs filesystems

2014-08-12 Thread Calvin
Hi all, I've instantiated a Hadoop 2.4.1 cluster and I've found that running MapReduce applications will parallelize differently depending on what kind of filesystem the input data is on. Using HDFS, a MapReduce job will spawn enough containers to maximize use of all available memory. For