to the cluster to decide how many containers can be
parallel run.
Yong
Date: Fri, 15 Aug 2014 12:30:09 -0600
Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems
From: iphcal...@gmail.com
To: user@hadoop.apache.org
Thanks for the responses!
To clarify, I'm not using any
run.
Yong
Date: Fri, 15 Aug 2014 12:30:09 -0600
Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems
From: iphcal...@gmail.com
To: user@hadoop.apache.org
Thanks for the responses!
To clarify, I'm not using any special FileSystem implementation. An
example input
, assume you have a lot of small files.
Yong
From: ha...@cloudera.com
Date: Fri, 15 Aug 2014 16:45:02 +0530
Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems
To: user@hadoop.apache.org
Does your non-HDFS filesystem implement a getBlockLocations API, that
MR
can get good
utilization of your cluster, assume you have a lot of small files.
Yong
From: ha...@cloudera.com
Date: Fri, 15 Aug 2014 16:45:02 +0530
Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems
To: user@hadoop.apache.org
Does your non-HDFS filesystem
16:45:02 +0530
Subject: Re: hadoop/yarn and task parallelization on non-hdfs
filesystems
To: user@hadoop.apache.org
Does your non-HDFS filesystem implement a getBlockLocations API, that
MR relies on to know how to split files?
The API is at
http://hadoop.apache.org/docs/stable2/api
file, so you can get good
utilization of your cluster, assume you have a lot of small files.
Yong
From: ha...@cloudera.com
Date: Fri, 15 Aug 2014 16:45:02 +0530
Subject: Re: hadoop/yarn and task parallelization on non-hdfs
filesystems
To: user@hadoop.apache.org
Does your
such parameters?
Thanks,
Calvin
[1]
https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
On Tue, Aug 12, 2014 at 12:29 PM, Calvin iphcal...@gmail.com wrote:
Hi all,
I've instantiated a Hadoop 2.4.1 cluster and I've found that running
MapReduce
Hi all,
I've instantiated a Hadoop 2.4.1 cluster and I've found that running
MapReduce applications will parallelize differently depending on what
kind of filesystem the input data is on.
Using HDFS, a MapReduce job will spawn enough containers to maximize
use of all available memory. For