Re: disk used percentage is not symmetric on datanodes (balancer)

Tapas Sarangi Mon, 18 Mar 2013 18:25:42 -0700

On Mar 18, 2013, at 6:17 PM, Bertrand Dechoux <decho...@gmail.com> wrote:


> And by active, it means that it does actually stops by itself?
> Else it might mean that the throttling/limit might be an issue with regard to 
> the data volume or velocity.
> 

This "else" is probably what's happening. I just checked the logs. Its active 
almost all the time. 


> What threshold is used?

Don't know what's this. How can I find out ?

> 
> About the small and big datanodes, how are they distributed with regards to 
> racks?

We haven't considered rack awareness for our cluster. It is currently 
considered as one rack. I am going through some docs to figure out how I can 
implement this after the upgrade.

> About files, how is used the replication factor(s) and block size(s)?

This is 2.

> 
> Surely trivial questions again.
> 

Not really :)

Thanks
-Tapas


> Bertrand
> 
> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <tapas.sara...@gmail.com> 
> wrote:
> Hi,
> 
> Sorry about that, had it written, but thought it was obvious. 
> Yes, balancer is active and running on the namenode.
> 
> -Tapas
> 
> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <decho...@gmail.com> wrote:
> 
>> Hi,
>> 
>> It is not explicitly said but did you use the balancer?
>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>> 
>> Regards
>> 
>> Bertrand
>> 
>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <tapas.sara...@gmail.com> 
>> wrote:
>> Hello,
>> 
>> I am using one of the old legacy version (0.20) of hadoop for our cluster. 
>> We have scheduled for an upgrade to the newer version within a couple of 
>> months, but I would like to understand a couple of things before moving 
>> towards the upgrade plan.
>> 
>> We have about 200 datanodes and some of them have larger storage than 
>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>> 
>> We found that the disk-used percentage is not symmetric through all the 
>> datanodes. For larger storage nodes the percentage of disk-space used is 
>> much lower than that of other nodes with smaller storage space. In larger 
>> storage nodes the percentage of used disk space varies, but on average about 
>> 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is 
>> this expected ? If so, then we are not using a lot of the disk space 
>> effectively. Is this solved in a future release ?
>> 
>> If no, I would like to know  if there are any checks/debugs that one can do 
>> to find an improvement with the current version or upgrading hadoop should 
>> solve this problem.
>> 
>> I am happy to provide additional information if needed.
>> 
>> Thanks for any help.
>> 
>> -Tapas
>> 
> 
> 
> 
> 
> -- 
> Bertrand Dechoux

Re: disk used percentage is not symmetric on datanodes (balancer)

Reply via email to