-setBalancerBandwidth <bandwidth in bytes per second> So the value is bytes per second. If it is running and exiting,it means it has completed the balancing.
On 24 March 2013 11:32, Tapas Sarangi <tapas.sara...@gmail.com> wrote: > Yes, we are running balancer, though a balancer process runs for almost a > day or more before exiting and starting over. > Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume > that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it > is in Bits then we have a problem. > What's the unit for "dfs.balance.bandwidthPerSec" ? > > ----- > > On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) < > li...@balajin.net> wrote: > > Are you running balancer? If balancer is running and if it is slow, try > increasing the balancer bandwidth > > > On 24 March 2013 09:21, Tapas Sarangi <tapas.sara...@gmail.com> wrote: > >> Thanks for the follow up. I don't know whether attachment will pass >> through this mailing list, but I am attaching a pdf that contains the usage >> of all live nodes. >> >> All nodes starting with letter "g" are the ones with smaller storage >> space where as nodes starting with letter "s" have larger storage space. As >> you will see, most of the "gXX" nodes are completely full whereas "sXX" >> nodes have a lot of unused space. >> >> Recently, we are facing crisis frequently as 'hdfs' goes into a mode >> where it is not able to write any further even though the total space >> available in the cluster is about 500 TB. We believe this has something to >> do with the way it is balancing the nodes, but don't understand the problem >> yet. May be the attached PDF will help some of you (experts) to see what is >> going wrong here... >> >> Thanks >> ------ >> >> >> >> >> >> >> >> Balancer know about topology,but when calculate balancing it operates >> only with nodes not with racks. >> You can see how it work in Balancer.java in BalancerDatanode about >> string 509. >> >> I was wrong about 350Tb,35Tb it calculates in such way : >> >> For example: >> cluster_capacity=3.5Pb >> cluster_dfsused=2Pb >> >> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity >> Then we know avg node utilization (node_dfsused/node_capacity*100) >> .Balancer think that all good if avgutil >> +10>node_utilizazation>=avgutil-10. >> >> Ideal case that all node used avgutl of capacity.but for 12TB node its >> only 6.5Tb and for 72Tb its about 40Tb. >> >> Balancer cant help you. >> >> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif >> you can. >> >> >> >>> >>> >>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb >>> you will be able to have only 12Tb replication data. >>> >>> >>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 >>> TB, but not true for more than two nodes in the cluster. >>> >>> >>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be >>> with identical capacity.Racks must be identical capacity. >>> For example: >>> >>> rack1: 1 node with 72Tb >>> rack2: 6 nodes with 12Tb >>> rack3: 3 nodes with 24Tb >>> >>> It helps with balancing,because dublicated block must be another rack. >>> >>> >>> The same question I asked earlier in this message, does multiple racks >>> with default threshold for the balancer minimizes the difference between >>> racks ? >>> >>> Why did you select hdfs?May be lustre,cephfs and other is better >>> choise. >>> >>> >>> It wasn't my decision, and I probably can't change it now. I am new to >>> this cluster and trying to understand few issues. I will explore other >>> options as you mentioned. >>> >>> -- >>> http://balajin.net/blog >>> http://flic.kr/balajijegan >>> >> > -- http://balajin.net/blog http://flic.kr/balajijegan