Hey Ananth -
Unfortunately, if your under-replication count isn't actively going down (at
least by one per minute; on a large cluster, several hundred per minute),
something is wrong.
Brian
On Jan 27, 2010, at 9:02 PM, Ananth T. Sarathy wrote:
> ok, it probably will take some time. I will ch
ok, it probably will take some time. I will check again in the morning!
Thanks
Ananth T Sarathy
On Wed, Jan 27, 2010 at 10:00 PM, Brian Bockelman wrote:
> Hey Ananth,
>
> Replication happens automatically. If it doesn't (should start within
> seconds after the node is declared dead on the web
Hey Ananth,
Replication happens automatically. If it doesn't (should start within seconds
after the node is declared dead on the web interface), something is wrong.
Check your NN logfile for error messages.
Brian
On Jan 27, 2010, at 8:56 PM, Ananth T. Sarathy wrote:
> when I run it, i get
>
when I run it, i get
Time Stamp Iteration# Bytes Already Moved Bytes Left To
Move Bytes Being Moved
The cluster is balanced. Exiting...
but the fsck is still giving me this
/setup_procypherevaluation.exe: Under replicated
blk_-6330660892317301772_3341. Target Replicas is 3 but
I would try running the rebalance utility. I would be curious to see what that
will do and if that will fix it.
--- On Wed, 1/27/10, Ananth T. Sarathy wrote:
> From: Ananth T. Sarathy
> Subject: Need to re replicate
> To: common-user@hadoop.apache.org
> Date: Wednesday, January 27, 2010, 9:28
One of our datanodes went bye bye. We added a bunch more data nodes, but
when I do a fsck i get a report that a bunch of files are only replicated on
2 server, which makes sense, because we had 3, and lost one. Now that we
have 6 more, is there anything i need to do replicate the those files are
wi
I am trying to use lzo for intermediate map compression and gzip for
output compression in my hadoop-0.20.1 jobs. For lzo usage, I 've
compiled .jar and jni/native library from
http://code.google.com/p/hadoop-gpl-compression/ (version 0.1.0). Also
using native lzo library v2.03.
Is there an easy w
Gang, Jeff and Amogh,
Thanks for all the replies.
It seems no matter how many times internally combiners are invoked, the
output for one specific map task will be *totally* partitioned and
combined. Then, the data is shuffled/sent to reducers.
That's good to know, because if combining isn't
SS,
If you just want to use hadoop jars in your maven projects, run your own
caching archive repository manager like Nexus.
http://nexus.sonatype.org/
Deploy your hadoop and other 3rd party jars along with your own custom
deployed jars here, then your maven projects can build using the jars
deplo
On Wed, Jan 27, 2010 at 2:43 PM, Eli Collins wrote:
> ant mvn-install works for me on latest trunk. What error are you getting?
Thanks. I want a released version, so I can release my own projects
that depend on it.
In release 0.20.1 of hadoop-core, there is no mvn-install target.
Same with 0.2
On Wed, Jan 27, 2010 at 8:38 AM, Stuart Sierra
wrote:
> Hello,
>
> Does anyone have up-to-date instructions for installing hadoop-core in
> a local Maven repository? The instructions at
> http://wiki.apache.org/hadoop/HowToContribute do not work (the
> mvn-install target is not defined).
>
> Than
Hi,
I think combiner gets only the keys sort comparator, not the grouping
comparator. So I believe the default grouping is used on combiner, but custom
on reducer.
Here's a relevant snipped of code :
{
super(inputCounter, conf, reporter);
combinerClass = cls;
keyClass = (Class)
Hi,
>>now that I can get the splits of a file in hadoop, is it possible to name
>>some splits (not all) as the input to mapper?
I'm assuming when you say "splits of a file in hadoop" you mean splits
generated from the inputformat and not the blocks stored in HDFS.
The [File]InputFormat you use gi
Hi,
To elaborate a little on Gang's point, the buffer threshold is limited by
io.sort.spill.percent, during which spills are created. If the number of spills
is more than min.num.spills.for.combine, combiner gets invoked on the spills
created before writing to disk.
I'm not sure what exactly you
But be careful, since combiners may execute "zero or more times"
depending upon mysterious internal logic. Relying upon combiners to do
significant work, as some of the Mahout clustering algorithms used to
do, will bite you.
Jeff
Gang Luo wrote:
> When the map function generate the intermediate
When the map function generate the intermediate result and first sent them to
buffer, the partitioning and sorting will start working and , if you specify a
combiner, it will be invoked at this time. This process is in parallel with the
map function. When map function finishes, all the spills on
I tried running 0.20.0 on XP too a few weeks ago and stuck at the same
spot. No problems with standalone mode. Any insight would be
appreciated, thanks.
Ed
On Wed, Jan 27, 2010 at 11:41 AM, Yura Taras wrote:
> Hi all
> I'm trying to deploy pseudo-distributed cluster on my devbox which
> runs und
Hi - combiner performs on a chunk of mapper output data, but what
exactly is the chunk cut off, or when exactly will the chunk be fed to
the combiner?
1. Will it be after the mapper finishes processing an input record?
2. Will it be after the mapper outputs a key value pair that hits the
memor
Hi all
I'm trying to deploy pseudo-distributed cluster on my devbox which
runs under WinXP. I did following steps:
1. Installed cygwin with ssh, configured ssh
2. Downloaded hadoop and extracted it, set JAVA_HOME and HADOOP_HOME
env vars (I made a symlink to java home, so it don't contain spaces)
3
Hello,
Does anyone have up-to-date instructions for installing hadoop-core in
a local Maven repository? The instructions at
http://wiki.apache.org/hadoop/HowToContribute do not work (the
mvn-install target is not defined).
Thanks,
-SS
This is a tunable, btw. You can set slowstart to something higher than the
default 5%. For shared grids, this should likely be 50% or more. Otherwise
your reduce slots may get filled by jobs that aren't using them efficiently.
On 1/26/10 6:55 PM, "Eason.Lee" wrote:
> No,Reduce will start a
Hi all,
My problem is the same problem as
http://issues.apache.org/jira/browse/HADOOP-3362 and there no solution is
given :(
1. I am using hadoop 20.1. My structure is very simple. I have two machines
(both are Ubuntu machines)
machine1 = namenode, jobtracker and also datanode and tasktracker. (We
22 matches
Mail list logo