Hi Vikas, The imbalance does create imbalance in MR but with your configuration it may not be a big issue. Basically the balancer will put data blocks on nodes based on their available percentage. So inevitably the bigger data nodes will end up with more blocks. This means that more mapper will get spawned on the bigger node, but if the total map capacity is the same for everyone, then if all the smaller data nodes have finished processing their blocks and the bigger node is busy to its map capacity, these smaller datanodes will have to pull blocks off the big one to run mappers. I don't know if this will work well, but you can try increasing the max maps capacity of the bigger datanode and reduce the max reduce capacity for that node. Lets say your default is 8+8 for everyone, you can make the big node 12+4 or even try 14+2. Let us know how that works. -Ayon See My Photos on Flickr Also check out my Blog for answers to commonly asked questions.
________________________________ From: Vikas Srivastava <vikas.srivast...@one97.net> To: user@hive.apache.org; Ayon Sinha <ayonsi...@yahoo.com>; sonalgoy...@gmail.com Sent: Tuesday, September 13, 2011 11:04 PM Subject: Re: Data migration in Hadoop thanks Ayon and sonal, one more thing question:- does the imbalance size in cluster is of any datanode create a problem...or have any bad impact Acc to what you are saying my cluster would be of 10 DN of (2tb hdd) and 1 DN of (8tb HDD) does this make any bad impact. please suggest.. this all config with 16 gb ram Regards Vikas Srivastava On Tue, Sep 13, 2011 at 11:20 PM, Ayon Sinha <ayonsi...@yahoo.com> wrote: What you can do for each node: >1. decommission node (or 2 nodes if you want to do this faster). You can do >this with the excludes file. >2. Wait for blocks to be moved off the decommed node(s) >3. Replace the disks and put them back in service. >4. Repeat until done. > >-Ayon >See My Photos on Flickr >Also check out my Blog for answers to commonly asked questions. > > > >________________________________ >From: Vikas Srivastava <vikas.srivast...@one97.net> >To: user@hive.apache.org >Sent: Tuesday, September 13, 2011 5:27 AM >Subject: Re: Data migration in Hadoop > > > >hey sonal!! > >Actually right now we have 11 node cluster each having 8 disks of 3oogb and >8gb ram, > >now what we want to do is to replace those 300gb disks with 1 tb disks so that >we can have more space per server. > >we have replication factor 2. > >my suggestion is .. >1:- Add a node of 8 tb in cluster and run balancer to balance the load. >2:- free any 1 node(repalcement node)..... > >question:- does the imbalance size in cluster is of any datanode create a >problem...or have any bad impact > >regards >Vikas Srivastava > > > >On Tue, Sep 13, 2011 at 5:37 PM, Sonal Goyal <sonalgoy...@gmail.com> wrote: > >Hi Vikas, >> >> >>This was discussed in the groups recently: >> >> >>http://lucene.472066.n3.nabble.com/Fixing-a-bad-HD-tt2863634.html#none >> >> >>Are you looking at replacing all your datanodes, or only a few? how big is >>your cluster? >> >>Best Regards, >>Sonal >>Crux: Reporting for HBase >>Nube Technologies >> >> >> >> >> >> >> >> >> >>On Tue, Sep 13, 2011 at 1:52 PM, Vikas Srivastava >><vikas.srivast...@one97.net> wrote: >> >>HI , >>> >>>can ny1 tell me how we can migrate hadoop or replace old hard disks with new >>>big size hdd. >>> >>>actually i need to replace old hdd of 300 tbs to 1 tb so how can i do this >>>efficiently!!! >>> >>>ploblem is to migrate data from 1 hdd to other >>> >>> >>>-- >>>With Regards >>>Vikas Srivastava >>> >>>DWH & Analytics Team >>>Mob:+91 9560885900 >>>One97 | Let's get talking ! >>> >> > > >-- >With Regards >Vikas Srivastava > >DWH & Analytics Team >Mob:+91 9560885900 >One97 | Let's get talking ! > > > -- With Regards Vikas Srivastava DWH & Analytics Team Mob:+91 9560885900 One97 | Let's get talking !