On Mar 18, 2013, at 6:17 PM, Bertrand Dechoux <decho...@gmail.com> wrote:
> And by active, it means that it does actually stops by itself? > Else it might mean that the throttling/limit might be an issue with regard to > the data volume or velocity. > This "else" is probably what's happening. I just checked the logs. Its active almost all the time. > What threshold is used? Don't know what's this. How can I find out ? > > About the small and big datanodes, how are they distributed with regards to > racks? We haven't considered rack awareness for our cluster. It is currently considered as one rack. I am going through some docs to figure out how I can implement this after the upgrade. > About files, how is used the replication factor(s) and block size(s)? This is 2. > > Surely trivial questions again. > Not really :) Thanks -Tapas > Bertrand > > On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <tapas.sara...@gmail.com> > wrote: > Hi, > > Sorry about that, had it written, but thought it was obvious. > Yes, balancer is active and running on the namenode. > > -Tapas > > On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <decho...@gmail.com> wrote: > >> Hi, >> >> It is not explicitly said but did you use the balancer? >> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer >> >> Regards >> >> Bertrand >> >> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <tapas.sara...@gmail.com> >> wrote: >> Hello, >> >> I am using one of the old legacy version (0.20) of hadoop for our cluster. >> We have scheduled for an upgrade to the newer version within a couple of >> months, but I would like to understand a couple of things before moving >> towards the upgrade plan. >> >> We have about 200 datanodes and some of them have larger storage than >> others. The storage for the datanodes varies between 12 TB to 72 TB. >> >> We found that the disk-used percentage is not symmetric through all the >> datanodes. For larger storage nodes the percentage of disk-space used is >> much lower than that of other nodes with smaller storage space. In larger >> storage nodes the percentage of used disk space varies, but on average about >> 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is >> this expected ? If so, then we are not using a lot of the disk space >> effectively. Is this solved in a future release ? >> >> If no, I would like to know if there are any checks/debugs that one can do >> to find an improvement with the current version or upgrading hadoop should >> solve this problem. >> >> I am happy to provide additional information if needed. >> >> Thanks for any help. >> >> -Tapas >> > > > > > -- > Bertrand Dechoux