03
>>>> As far as I play with test cluster. The balancer would take care of
>>>> replica placement.
>>>> I just don't want to fall into the situation that the hdfs sit in the
>>>> safemode
>>>> for hours and users can't use hadoop
Unless something has changed recently it won't automatically relocate
the blocks. When I did something similar I had a script that walked
through the whole set of files that were misreplicated and increased
the replication factor then dropped it back down. This triggered
relocation of blocks to m
The problem I've run into more than memory is having the system CPU
time get out of control. My guess is that the threshold for what is
considered "overloaded" is going to be dependent on your system setup,
what you're running on it, and what bounds your jobs.
On Tue, Jan 17, 2012 at 22:06, Arun
t; Of course, the little script only works if the replication factor is 3 on
> all the files. If it's a variable amount you should use the java API to get
> the existing factor and then increase by one and then decrease.
>
> Jeff
>
> On Thu, Oct 20, 2011 at 8:44 AM, John Meagher w
After a hardware move with an unfortunate mis-setup rack awareness
script our hadoop cluster has a large number of mis-replicated blocks.
After about a week things haven't gotten better on their own.
Is there a good way to trigger the name node to fix the mis-replicated blocks?
Here's what I'm u
The counter names are created dynamically in mapred.Task
/**
* Counters to measure the usage of the different file systems.
* Always return the String array with two elements. First one is
the name of
* BYTES_READ counter and second one is of the BYTES_WRITTEN counter.
*/
protected
Another case is augmenting data. This is sometimes done outside of MR
in an ETL flow, but can be done as an MR job. Doing something like
this is using Hadoop to handle the scaling issues, but really isn't
what MR is intended for.
A real example of this is:
* Input: standard apache weblog
* Data