unexplained time between map 100% reduce 100% and job completion

2014-09-09 Thread Calvin
tasks, but I'm not doing any heavy writing via the reducers (it's a simple wordcount variant): File Input Format Counters Bytes Read=334624311758 File Output Format Counters Bytes Written=107785 Any ideas regarding this behavior or where I should look first? Thanks, Calvin

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

2014-08-18 Thread Calvin
Oops, one of the settings should read "yarn.nodemanager.vmem-check-enabled". The blog post has a typo and a comment pointed that out as well. Thanks, Calvin On Mon, Aug 18, 2014 at 4:45 PM, Calvin wrote: > OK, I figured out exactly what was happening. > > I had set the

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

2014-08-18 Thread Calvin
had set such a high setting was due to containers being killed because of virtual memory usage. The Cloudera folks have a good blog post [1] on this topic (see #6) and I wish I had read that sooner. With the above configuration values, I can now utilize the cluster at 100%. Thanks for everyone&#x

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

2014-08-15 Thread Calvin
unsplitable, just considered as one single > block. > > If the FS in use has its advantages it's better to implement a proper > interface to it making use of them, than to rely on the LFS by mounting it. > This is what we do with HDFS. > > On Aug 15, 2014 8:52 PM, "

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

2014-08-14 Thread Calvin
r to tweak such parameters? Thanks, Calvin [1] https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems On Tue, Aug 12, 2014 at 12:29 PM, Calvin wrote: > Hi all, > > I've instantiated a Hadoop 2.4.1 cluster and I've found that run

hadoop/yarn and task parallelization on non-hdfs filesystems

2014-08-12 Thread Calvin
ctively maximize resource utilization? Thanks, Calvin

Spoofing Ganglia Metrics

2014-01-24 Thread Calvin Jia
Is there a way to configure hdfs/hbase/mapreduce to spoof the ganglia metrics being sent? This is because the machines are behind a NAT and the monitoring box is outside, so all the metrics are recognized as coming from the same machine. Thanks!

Re: warnings from log4j -- where to look to resolve?

2013-01-28 Thread Calvin
Hi Harsh, No self logger object is being used. On Mon, Jan 28, 2013 at 8:21 AM, Harsh J wrote: > > Hi, > > Does your code instantiate and work with a self logger object? If so, > is it a log4j instance or a commons logging one? > > On Mon, Jan 28, 2013 at 4:06 PM,

warnings from log4j -- where to look to resolve?

2013-01-28 Thread Calvin
Hi, Each time I run a job, I get thousands of these warnings. I've looked at conf/log4j.properties and nothing out of the ordinary (a diff with the default log4j.properties shows no lines of consequence changed). Has anyone else come across this problem? Where else would I look to find potentiall