On Mar 19, 2013, at 5:00 AM, Алексей Бабутин <zorlaxpokemon...@gmail.com> wrote:
> node A=12TB > node B=72TB > How many A nodes and B from 200 do you have? We have more number of A nodes than B. The ratio of the number is about 80, 20. Note that not all the B nodes are 72TB, that's a max value. Similarly for A it is a min. value. > If you have more B than A you can deactivate A,clear it and apply again. Apply what ? It may not be a choice for an active system and it may cripple us for days. > I suppose that cluster about 3-5 Tb.Run balancer with threshold 0.2 or 0.1. You meant 3.5 PB, then you are about right. What this threshold does exactly ? We are not setting the threshold manually, but isn't hadoop's default 0.1 ? > > Different servers in one rack is bad idea.You should rebuild cluster with > multiple racks. Why bad idea ? We are using hadoop as a file system not as a scheduler. How multiple racks are going to help in balancing the disk-usage across datanodes ? -Tapas > > 2013/3/19 Tapas Sarangi <tapas.sara...@gmail.com> > Hello, > > I am using one of the old legacy version (0.20) of hadoop for our cluster. We > have scheduled for an upgrade to the newer version within a couple of months, > but I would like to understand a couple of things before moving towards the > upgrade plan. > > We have about 200 datanodes and some of them have larger storage than others. > The storage for the datanodes varies between 12 TB to 72 TB. > > We found that the disk-used percentage is not symmetric through all the > datanodes. For larger storage nodes the percentage of disk-space used is much > lower than that of other nodes with smaller storage space. In larger storage > nodes the percentage of used disk space varies, but on average about 30-50%. > For the smaller storage nodes this number is as high as 99.9%. Is this > expected ? If so, then we are not using a lot of the disk space effectively. > Is this solved in a future release ? > > If no, I would like to know if there are any checks/debugs that one can do > to find an improvement with the current version or upgrading hadoop should > solve this problem. > > I am happy to provide additional information if needed. > > Thanks for any help. > > -Tapas > >