Yes On Mar 24, 2013 9:25 PM, "Tapas Sarangi" <tapas.sara...@gmail.com> wrote:
> Thanks. Does this need a restart of hadoop in the nodes where this > modification is made ? > > ----- > > On Mar 24, 2013, at 8:06 PM, Jamal B <jm151...@gmail.com> wrote: > > dfs.datanode.du.reserved > > You could tweak that param on the smaller nodes to "force" the flow of > blocks to other nodes. A short term hack at best, but should help the > situation a bit. > On Mar 24, 2013 7:09 PM, "Tapas Sarangi" <tapas.sara...@gmail.com> wrote: > >> >> On Mar 24, 2013, at 4:34 PM, Jamal B <jm151...@gmail.com> wrote: >> >> It shouldn't cause further problems since most of your small nodes are >> already their capacity. You could set or increase the dfs reserved >> property on your smaller nodes to force the flow of blocks onto the larger >> nodes. >> >> >> Thanks. Can you please specify which are the dfs properties that we can >> set or modify to force the flow of blocks directed towards the larger nodes >> than the smaller nodes ? >> >> ----- >> >> >> >> >> >> >> On Mar 24, 2013 4:45 PM, "Tapas Sarangi" <tapas.sara...@gmail.com> wrote: >> >>> Hi, >>> >>> Thanks for the idea, I will give this a try and report back. >>> >>> My worry is if we decommission a small node (one at a time), will it >>> move the data to larger nodes or choke another smaller nodes ? In principle >>> it should distribute the blocks, the point is it is not distributing the >>> way we expect it to, so do you think this may cause further problems ? >>> >>> --------- >>> >>> On Mar 24, 2013, at 3:37 PM, Jamal B <jm151...@gmail.com> wrote: >>> >>> Then I think the only way around this would be to decommission 1 at a >>> time, the smaller nodes, and ensure that the blocks are moved to the larger >>> nodes. >>> >>> And once complete, bring back in the smaller nodes, but maybe only after >>> you tweak the rack topology to match your disk layout more than network >>> layout to compensate for the unbalanced nodes. >>> >>> >>> Just my 2 cents >>> >>> >>> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi >>> <tapas.sara...@gmail.com>wrote: >>> >>>> Thanks. We have a 1-1 configuration of drives and folder in all the >>>> datanodes. >>>> >>>> -Tapas >>>> >>>> On Mar 24, 2013, at 3:29 PM, Jamal B <jm151...@gmail.com> wrote: >>>> >>>> On both types of nodes, what is your dfs.data.dir set to? Does it >>>> specify multiple folders on the same set's of drives or is it 1-1 between >>>> folder and drive? If it's set to multiple folders on the same drives, it >>>> is probably multiplying the amount of "available capacity" incorrectly in >>>> that it assumes a 1-1 relationship between folder and total capacity of the >>>> drive. >>>> >>>> >>>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <tapas.sara...@gmail.com >>>> > wrote: >>>> >>>>> Yes, thanks for pointing, but I already know that it is completing the >>>>> balancing when exiting otherwise it shouldn't exit. >>>>> Your answer doesn't solve the problem I mentioned earlier in my >>>>> message. 'hdfs' is stalling and hadoop is not writing unless space is >>>>> cleared up from the cluster even though "df" shows the cluster has about >>>>> 500 TB of free space. >>>>> >>>>> ------- >>>>> >>>>> >>>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) < >>>>> bal...@balajin.net> wrote: >>>>> >>>>> -setBalancerBandwidth <bandwidth in bytes per second> >>>>> >>>>> So the value is bytes per second. If it is running and exiting,it >>>>> means it has completed the balancing. >>>>> >>>>> >>>>> On 24 March 2013 11:32, Tapas Sarangi <tapas.sara...@gmail.com> wrote: >>>>> >>>>>> Yes, we are running balancer, though a balancer process runs for >>>>>> almost a day or more before exiting and starting over. >>>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume >>>>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If >>>>>> it >>>>>> is in Bits then we have a problem. >>>>>> What's the unit for "dfs.balance.bandwidthPerSec" ? >>>>>> >>>>>> ----- >>>>>> >>>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) < >>>>>> li...@balajin.net> wrote: >>>>>> >>>>>> Are you running balancer? If balancer is running and if it is slow, >>>>>> try increasing the balancer bandwidth >>>>>> >>>>>> >>>>>> On 24 March 2013 09:21, Tapas Sarangi <tapas.sara...@gmail.com>wrote: >>>>>> >>>>>>> Thanks for the follow up. I don't know whether attachment will pass >>>>>>> through this mailing list, but I am attaching a pdf that contains the >>>>>>> usage >>>>>>> of all live nodes. >>>>>>> >>>>>>> All nodes starting with letter "g" are the ones with smaller storage >>>>>>> space where as nodes starting with letter "s" have larger storage >>>>>>> space. As >>>>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX" >>>>>>> nodes have a lot of unused space. >>>>>>> >>>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode >>>>>>> where it is not able to write any further even though the total space >>>>>>> available in the cluster is about 500 TB. We believe this has something >>>>>>> to >>>>>>> do with the way it is balancing the nodes, but don't understand the >>>>>>> problem >>>>>>> yet. May be the attached PDF will help some of you (experts) to see >>>>>>> what is >>>>>>> going wrong here... >>>>>>> >>>>>>> Thanks >>>>>>> ------ >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Balancer know about topology,but when calculate balancing it >>>>>>> operates only with nodes not with racks. >>>>>>> You can see how it work in Balancer.java in BalancerDatanode about >>>>>>> string 509. >>>>>>> >>>>>>> I was wrong about 350Tb,35Tb it calculates in such way : >>>>>>> >>>>>>> For example: >>>>>>> cluster_capacity=3.5Pb >>>>>>> cluster_dfsused=2Pb >>>>>>> >>>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster >>>>>>> capacity >>>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100) >>>>>>> .Balancer think that all good if avgutil >>>>>>> +10>node_utilizazation>=avgutil-10. >>>>>>> >>>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node >>>>>>> its only 6.5Tb and for 72Tb its about 40Tb. >>>>>>> >>>>>>> Balancer cant help you. >>>>>>> >>>>>>> Show me >>>>>>> http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if >>>>>>> you can. >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> In ideal case with replication factor 2 ,with two nodes 12Tb and >>>>>>>> 72Tb you will be able to have only 12Tb replication data. >>>>>>>> >>>>>>>> >>>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB >>>>>>>> and 72 TB, but not true for more than two nodes in the cluster. >>>>>>>> >>>>>>>> >>>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack >>>>>>>> must be with identical capacity.Racks must be identical capacity. >>>>>>>> For example: >>>>>>>> >>>>>>>> rack1: 1 node with 72Tb >>>>>>>> rack2: 6 nodes with 12Tb >>>>>>>> rack3: 3 nodes with 24Tb >>>>>>>> >>>>>>>> It helps with balancing,because dublicated block must be another >>>>>>>> rack. >>>>>>>> >>>>>>>> >>>>>>>> The same question I asked earlier in this message, does multiple >>>>>>>> racks with default threshold for the balancer minimizes the difference >>>>>>>> between racks ? >>>>>>>> >>>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better >>>>>>>> choise. >>>>>>>> >>>>>>>> >>>>>>>> It wasn't my decision, and I probably can't change it now. I am new >>>>>>>> to this cluster and trying to understand few issues. I will explore >>>>>>>> other >>>>>>>> options as you mentioned. >>>>>>>> >>>>>>>> -- >>>>>>>> http://balajin.net/blog >>>>>>>> http://flic.kr/balajijegan >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> http://balajin.net/blog >>>>> http://flic.kr/balajijegan >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >