Re: Need to re replicate

2010-01-27 Thread Brian Bockelman
Hey Ananth - Unfortunately, if your under-replication count isn't actively going down (at least by one per minute; on a large cluster, several hundred per minute), something is wrong. Brian On Jan 27, 2010, at 9:02 PM, Ananth T. Sarathy wrote: > ok, it probably will take some time. I will ch

Re: Need to re replicate

2010-01-27 Thread Ananth T. Sarathy
ok, it probably will take some time. I will check again in the morning! Thanks Ananth T Sarathy On Wed, Jan 27, 2010 at 10:00 PM, Brian Bockelman wrote: > Hey Ananth, > > Replication happens automatically. If it doesn't (should start within > seconds after the node is declared dead on the web

Re: Need to re replicate

2010-01-27 Thread Brian Bockelman
Hey Ananth, Replication happens automatically. If it doesn't (should start within seconds after the node is declared dead on the web interface), something is wrong. Check your NN logfile for error messages. Brian On Jan 27, 2010, at 8:56 PM, Ananth T. Sarathy wrote: > when I run it, i get >

Re: Need to re replicate

2010-01-27 Thread Ananth T. Sarathy
when I run it, i get Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved The cluster is balanced. Exiting... but the fsck is still giving me this /setup_procypherevaluation.exe: Under replicated blk_-6330660892317301772_3341. Target Replicas is 3 but

Re: Need to re replicate

2010-01-27 Thread Raymond Jennings III
I would try running the rebalance utility. I would be curious to see what that will do and if that will fix it. --- On Wed, 1/27/10, Ananth T. Sarathy wrote: > From: Ananth T. Sarathy > Subject: Need to re replicate > To: common-user@hadoop.apache.org > Date: Wednesday, January 27, 2010, 9:28

Need to re replicate

2010-01-27 Thread Ananth T. Sarathy
One of our datanodes went bye bye. We added a bunch more data nodes, but when I do a fsck i get a report that a bunch of files are only replicated on 2 server, which makes sense, because we had 3, and lost one. Now that we have 6 more, is there anything i need to do replicate the those files are wi

verifying that lzo compression is being used

2010-01-27 Thread Vasilis Liaskovitis
I am trying to use lzo for intermediate map compression and gzip for output compression in my hadoop-0.20.1 jobs. For lzo usage, I 've compiled .jar and jni/native library from http://code.google.com/p/hadoop-gpl-compression/ (version 0.1.0). Also using native lzo library v2.03. Is there an easy w

Re: When exactly is combiner invoked?

2010-01-27 Thread Le Zhao
Gang, Jeff and Amogh, Thanks for all the replies. It seems no matter how many times internally combiners are invoked, the output for one specific map task will be *totally* partitioned and combined. Then, the data is shuffled/sent to reducers. That's good to know, because if combining isn't

Re: Installing in local Maven repository

2010-01-27 Thread Ryan Smith
SS, If you just want to use hadoop jars in your maven projects, run your own caching archive repository manager like Nexus. http://nexus.sonatype.org/ Deploy your hadoop and other 3rd party jars along with your own custom deployed jars here, then your maven projects can build using the jars deplo

Re: Installing in local Maven repository

2010-01-27 Thread Stuart Sierra
On Wed, Jan 27, 2010 at 2:43 PM, Eli Collins wrote: > ant mvn-install works for me on latest trunk. What error are you getting? Thanks. I want a released version, so I can release my own projects that depend on it. In release 0.20.1 of hadoop-core, there is no mvn-install target. Same with 0.2

Re: Installing in local Maven repository

2010-01-27 Thread Eli Collins
On Wed, Jan 27, 2010 at 8:38 AM, Stuart Sierra wrote: > Hello, > > Does anyone have up-to-date instructions for installing hadoop-core in > a local Maven repository?  The instructions at > http://wiki.apache.org/hadoop/HowToContribute do not work (the > mvn-install target is not defined). > > Than

Re: Question on GroupingComparatorClass

2010-01-27 Thread Amogh Vasekar
Hi, I think combiner gets only the keys sort comparator, not the grouping comparator. So I believe the default grouping is used on combiner, but custom on reducer. Here's a relevant snipped of code : { super(inputCounter, conf, reporter); combinerClass = cls; keyClass = (Class)

Re: fine granularity operation on HDFS

2010-01-27 Thread Amogh Vasekar
Hi, >>now that I can get the splits of a file in hadoop, is it possible to name >>some splits (not all) as the input to mapper? I'm assuming when you say "splits of a file in hadoop" you mean splits generated from the inputformat and not the blocks stored in HDFS. The [File]InputFormat you use gi

Re: When exactly is combiner invoked?

2010-01-27 Thread Amogh Vasekar
Hi, To elaborate a little on Gang's point, the buffer threshold is limited by io.sort.spill.percent, during which spills are created. If the number of spills is more than min.num.spills.for.combine, combiner gets invoked on the spills created before writing to disk. I'm not sure what exactly you

Re: When exactly is combiner invoked?

2010-01-27 Thread Jeff Eastman
But be careful, since combiners may execute "zero or more times" depending upon mysterious internal logic. Relying upon combiners to do significant work, as some of the Mahout clustering algorithms used to do, will bite you. Jeff Gang Luo wrote: > When the map function generate the intermediate

Re: When exactly is combiner invoked?

2010-01-27 Thread Gang Luo
When the map function generate the intermediate result and first sent them to buffer, the partitioning and sorting will start working and , if you specify a combiner, it will be invoked at this time. This process is in parallel with the map function. When map function finishes, all the spills on

Re: Failed to install Hadoop on WinXP

2010-01-27 Thread Ed Mazur
I tried running 0.20.0 on XP too a few weeks ago and stuck at the same spot. No problems with standalone mode. Any insight would be appreciated, thanks. Ed On Wed, Jan 27, 2010 at 11:41 AM, Yura Taras wrote: > Hi all > I'm trying to deploy pseudo-distributed cluster on my devbox which > runs und

When exactly is combiner invoked?

2010-01-27 Thread Le Zhao
Hi - combiner performs on a chunk of mapper output data, but what exactly is the chunk cut off, or when exactly will the chunk be fed to the combiner? 1. Will it be after the mapper finishes processing an input record? 2. Will it be after the mapper outputs a key value pair that hits the memor

Failed to install Hadoop on WinXP

2010-01-27 Thread Yura Taras
Hi all I'm trying to deploy pseudo-distributed cluster on my devbox which runs under WinXP. I did following steps: 1. Installed cygwin with ssh, configured ssh 2. Downloaded hadoop and extracted it, set JAVA_HOME and HADOOP_HOME env vars (I made a symlink to java home, so it don't contain spaces) 3

Installing in local Maven repository

2010-01-27 Thread Stuart Sierra
Hello, Does anyone have up-to-date instructions for installing hadoop-core in a local Maven repository? The instructions at http://wiki.apache.org/hadoop/HowToContribute do not work (the mvn-install target is not defined). Thanks, -SS

Re: do all mappers finish before reducer starts

2010-01-27 Thread Allen Wittenauer
This is a tunable, btw. You can set slowstart to something higher than the default 5%. For shared grids, this should likely be 50% or more. Otherwise your reduce slots may get filled by jobs that aren't using them efficiently. On 1/26/10 6:55 PM, "Eason.Lee" wrote: > No,Reduce will start a

Too many fetch-failures - reduce task problem

2010-01-27 Thread Nachiket Vaidya
Hi all, My problem is the same problem as http://issues.apache.org/jira/browse/HADOOP-3362 and there no solution is given :( 1. I am using hadoop 20.1. My structure is very simple. I have two machines (both are Ubuntu machines) machine1 = namenode, jobtracker and also datanode and tasktracker. (We