Re: How to make HOD apply more than one core on each machine?

2010-04-16 Thread Hemanth Yamijala
Song,   I know it is the way to set the capacity of each node, however, I want to know, how can we make Torque manager that we will run more than 1 mapred tasks on each machine. Because if we dont do this, torque will assign other cores on this machine to other tasks, which may cause a

Re: Distributed Cache with New API

2010-04-16 Thread Larry Compton
Thanks. That clears it up. Larry On Fri, Apr 16, 2010 at 1:05 AM, Amareshwari Sri Ramadasu amar...@yahoo-inc.com wrote: Hi, @Ted, below code is internal code. Users are not expected to call DistributedCache.getLocalCache(), they cannot use it also. They do not know all the parameters.

o.a.h.mapreduce API and SequenceFile encoding format

2010-04-16 Thread Bo Shi
Hey Folks, No luck on IRC; trying here: I was playing around with 0.20.x and SequenceFileOutputFormat. The documentation doesn't specify any particular file encoding but I had just assumed that it was some sort of raw binary format. I see, after inspecting the output that it was a false

Jetty returning 404s for everything

2010-04-16 Thread Robert Crocombe
I have a cluster running Cloudera's 0.20.1+152-1 version of Hadoop. All was well, but there was an unfortunate power outage that affected just the namenode. Everything seemed largely normal upon resumption (I did have to recreate the local version of hadoop.tmp.dir to get the namenode to

Splitting input for mapper and contiguous data

2010-04-16 Thread Andrew Nguyen
As I may have mentioned, my main goal currently is the processing of physiologic data using hadoop and MR. The steps are: Convert ADC units to physical units (input is sample num, raw value, output is sample num, physical value Perform a peak detection to detect the systolic blood pressure

Extremely slow HDFS after upgrade

2010-04-16 Thread Scott Carey
I have two clusters upgraded to CDH2. One is performing fine, and the other is EXTREMELY slow. Some jobs that formerly took 90 seconds, take 20 to 50 minutes. It is an HDFS issue from what I can tell. The simple DFS benchmark with one map task shows the problem clearly. I have looked at

Re: Extremely slow HDFS after upgrade

2010-04-16 Thread Todd Lipcon
Hey Scott, This is indeed really strange... if you do a straight hadoop fs -put with dfs.replication set to 1 from one of the DNs, does it upload slow? That would cut out the network from the equation. -Todd On Fri, Apr 16, 2010 at 5:29 PM, Scott Carey sc...@richrelevance.comwrote: I have two

Re: Extremely slow HDFS after upgrade

2010-04-16 Thread Scott Carey
Ok, so here is a ... fun result. I have dfs.replication.min set to 2, so I can't just do hsdoop fs -Ddfs.replication=1 put someFile someFile Since that will fail. So here are two results that are fascinating: $ time hadoop fs -Ddfs.replication=3 -put test.tar test.tar real1m53.237s user

Re: Extremely slow HDFS after upgrade

2010-04-16 Thread Scott Carey
More info -- this is not a Hadoop issue. The network performance issue can be replicated with SSH only on the links where Hadoop has a problem, and only in the direction with a problem. HDFS is slow to transfer data in certain directions from certain machines. So, for example, copying from

Re: Extremely slow HDFS after upgrade

2010-04-16 Thread Todd Lipcon
Checked link autonegotiation with ethtool? Sometimes gige will autoneg to 10mb half duplex if there's a bad cable, NIC, or switch port. -Todd On Fri, Apr 16, 2010 at 8:08 PM, Scott Carey sc...@richrelevance.comwrote: More info -- this is not a Hadoop issue. The network performance issue can