There wasn't an attachment, I pasted all the lines from all the NN logs that contain that particular block number inline.
As for CPU/IO, first there is nothing else running on those servers, second, CPU utilization on the slaves at peak load was around 40% and disk IO utilization less than 20%. That's the strange thing about it (I have another thread going about the performance), there is no bottleneck I could identify and yet the performance was relatively low, compared to the numbers I see quoted for HBase in other places. The first line of the NN log says: BLOCK* NameSystem.allocateBlock: /hbase/.logs/hadoop1-s01.farm-ny.gigya.com ,60020,1302185988579/hadoop1-s01.farm-ny.gigya.com %3A60020.1302434963279.blk_1213779416283711358_54194 So it looks like a file name is: /hbase/.logs/hadoop1-s01.farm-ny.gigya.com ,60020,1302185988579/hadoop1-s01.farm-ny.gigya.com%3A60020.1302434963279 Is there a better way to associate a file with a block? -eran On Mon, Apr 11, 2011 at 21:00, Stack <st...@duboce.net> wrote: > On Sun, Apr 10, 2011 at 11:30 PM, Eran Kutner <e...@gigya.com> wrote: > > Hi St.Ack and J-D, > > Thanks for looking into this. > > > > It can definitely be a configuration problem, but I seriously doubt it > > is a network or infrastructure problem. It's our own operated > > infrastructure (not a cloud) and we have a lot of other services > > running on it without any problem. > > Services that could be stealing i/o and cpu from hbase cluster? Is > that possible? > > > Note that Hbase is complaining > > about multiple data nodes (10.1.104.1, 10.1.104.2, 10.1.104.5), I > > attached the logs from just one of them but it's more or less the same > > on all. Please see the NN log for the same block below. > > > > An attachment? Did it come through? Perhaps pastebin it and then add link > here? > > > > We are using Hadoop 0.20.2-CDH3B4 > > and Hbase Version 0.90.2-SNAPSHOT, rUnknown, Wed Mar 23 06:09:51 EDT > > 2011 (I built that from the 0.90.2 branch to try to fix a problem with > > replication) > > > > > > > > </configuration> > > > > Your configuration looks fine. > > Can you associate the block with a file? I don't see the association > in the below. I see us trying to delete the block (would like to know > why -- file deleteion?) and then it does exist on .2 for whatever > reason. > > St.Ack > > > > > This is the NN log for the same block: > > 2011-04-10 07:29:23,835 INFO org.apache.hadoop.hdfs.StateChange: > > BLOCK* NameSystem.allocateBlock: > > /hbase/.logs/hadoop1-s01.farm-ny.gigya.com,60020,1302185988579/ > hadoop1-s01.farm-ny.gigya.com%3A60020.1302434963279. > > blk_1213779416283711358_54194 > > 2011-04-10 10:12:55,749 INFO org.apache.hadoop.hdfs.StateChange: > > BLOCK* blk_1213779416283711358_54194 recovery started, > > primary=10.1.104.1:50010 > > 2011-04-10 10:12:58,292 INFO org.apache.hadoop.hdfs.StateChange: > > BLOCK* NameSystem.addStoredBlock: Targets updated: block > > blk_1213779416283711358_54249 on 10.1.104.1:50010 is added as a target > > for block blk_1213779416283711358_54194 with size 162696 > > 2011-04-10 10:12:58,293 INFO org.apache.hadoop.hdfs.StateChange: > > BLOCK* NameSystem.addStoredBlock: Targets updated: block > > blk_1213779416283711358_54249 on 10.1.104.5:50010 is added as a target > > for block blk_1213779416283711358_54194 with size 162696 > > 2011-04-10 10:12:58,294 INFO org.apache.hadoop.hdfs.StateChange: > > BLOCK* NameSystem.addStoredBlock: Targets updated: block > > blk_1213779416283711358_54249 on 10.1.104.2:50010 is added as a target > > for block blk_1213779416283711358_54194 with size 162696 > > 2011-04-10 10:12:58,295 INFO > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > > newgenerationstamp=54249, newlength=162696, > > newtargets=[10.1.104.1:50010, 10.1.104.5:50010, 10.1.104.2:50010], > > closeFile=true, deleteBlock=false) > > 2011-04-10 10:12:58,340 INFO > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > > commitBlockSynchronization(newblock=blk_1213779416283711358_54249, > > file=/hbase/.logs/hadoop1-s01.farm-ny.gigya.com,60020,1302185988579/ > hadoop1-s01.farm-ny.gigya.com%3A60020.1302434963279, > > newgenerationstamp=54249, newlength=162696, > > newtargets=[10.1.104.1:50010, 10.1.104.5:50010, 10.1.104.2:50010]) > > successful > > 2011-04-10 10:12:58,773 INFO > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > > commitBlockSynchronization(lastblock=blk_1213779416283711358_54249, > > newgenerationstamp=54250, newlength=162696, > > newtargets=[10.1.104.1:50010, 10.1.104.5:50010, 10.1.104.2:50010], > > closeFile=false, deleteBlock=false) > > 2011-04-10 10:12:58,798 INFO > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > > commitBlockSynchronization(blk_1213779416283711358_54250) successful > > 2011-04-10 10:12:58,833 INFO org.apache.hadoop.hdfs.StateChange: > > BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.1.104.2:50010 > > is added to blk_1213779416283711358_54250 size 162696 > > 2011-04-10 10:12:58,834 INFO org.apache.hadoop.hdfs.StateChange: > > BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.1.104.5:50010 > > is added to blk_1213779416283711358_54250 size 162696 > > 2011-04-10 10:12:58,834 INFO org.apache.hadoop.hdfs.StateChange: > > BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.1.104.1:50010 > > is added to blk_1213779416283711358_54250 size 162696 > > 2011-04-10 10:13:28,277 INFO org.apache.hadoop.hdfs.StateChange: > > BLOCK* NameSystem.addToInvalidates: blk_1213779416283711358 is added > > to invalidSet of 10.1.104.2:50010 > > 2011-04-10 10:13:28,277 INFO org.apache.hadoop.hdfs.StateChange: > > BLOCK* NameSystem.addToInvalidates: blk_1213779416283711358 is added > > to invalidSet of 10.1.104.5:50010 > > 2011-04-10 10:13:28,277 INFO org.apache.hadoop.hdfs.StateChange: > > BLOCK* NameSystem.addToInvalidates: blk_1213779416283711358 is added > > to invalidSet of 10.1.104.1:50010 > > 2011-04-10 10:13:29,811 INFO org.apache.hadoop.hdfs.StateChange: > > BLOCK* ask 10.1.104.2:50010 to delete blk_-826484977486799080_54240 > > blk_-3859382357539098123_54247 blk_6239319017569949674_54242 > > blk_2498532690928849052_54243 blk_-4133675075382826161_54234 > > blk_-6299622736350943739_54246 blk_1213779416283711358_54250 > > blk_-7407505773873780456_54244 blk_1070089292787245089_54237 > > 2011-04-10 10:13:32,813 INFO org.apache.hadoop.hdfs.StateChange: > > BLOCK* ask 10.1.104.1:50010 to delete blk_-826484977486799080_54240 > > blk_-9184330013088281910_54231 blk_842002398362359267_54239 > > blk_6239319017569949674_54242 blk_-4133675075382826161_54234 > > blk_-2420408100786396293_54245 blk_1213779416283711358_54250 > > blk_1475680055217098335_54241 blk_-8623426430834046353_54236 > > blk_-3859382357539098123_54247 blk_2498532690928849052_54243 > > blk_-6299622736350943739_54246 blk_-7407505773873780456_54244 > > blk_1070089292787245089_54237 > > 2011-04-10 10:13:32,813 INFO org.apache.hadoop.hdfs.StateChange: > > BLOCK* ask 10.1.104.5:50010 to delete blk_-8623426430834046353_54236 > > blk_-3859382357539098123_54247 blk_-2420408100786396293_54245 > > blk_-4133675075382826161_54234 blk_1213779416283711358_54250 > > 2011-04-10 10:14:29,820 INFO > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > > deleteBlock=true) > > 2011-04-10 10:14:29,820 INFO org.apache.hadoop.ipc.Server: IPC Server > > handler 8 on 8020, call > > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@b8682c8) from > > 10.1.104.2:46012: error: java.io.IOException: Block > > (=blk_1213779416283711358_54194) not found > > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > > 2011-04-10 10:14:29,835 INFO > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > > deleteBlock=true) > > 2011-04-10 10:14:29,836 INFO org.apache.hadoop.ipc.Server: IPC Server > > handler 6 on 8020, call > > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@2bf41e61) from > > 10.1.104.2:46012: error: java.io.IOException: Block > > (=blk_1213779416283711358_54194) not found > > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > > 2011-04-10 10:14:30,862 INFO > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > > deleteBlock=true) > > 2011-04-10 10:14:30,862 INFO org.apache.hadoop.ipc.Server: IPC Server > > handler 2 on 8020, call > > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@27f688d2) from > > 10.1.104.2:46012: error: java.io.IOException: Block > > (=blk_1213779416283711358_54194) not found > > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > > 2011-04-10 10:14:31,888 INFO > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > > deleteBlock=true) > > 2011-04-10 10:14:31,888 INFO org.apache.hadoop.ipc.Server: IPC Server > > handler 5 on 8020, call > > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@2a90a77d) from > > 10.1.104.2:46012: error: java.io.IOException: Block > > (=blk_1213779416283711358_54194) not found > > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > > 2011-04-10 10:14:32,913 INFO > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > > deleteBlock=true) > > 2011-04-10 10:14:32,913 INFO org.apache.hadoop.ipc.Server: IPC Server > > handler 0 on 8020, call > > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@7ee4155f) from > > 10.1.104.2:46012: error: java.io.IOException: Block > > (=blk_1213779416283711358_54194) not found > > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > > 2011-04-10 10:14:33,943 INFO > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > > deleteBlock=true) > > 2011-04-10 10:14:33,943 INFO org.apache.hadoop.ipc.Server: IPC Server > > handler 3 on 8020, call > > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@3037510e) from > > 10.1.104.2:46012: error: java.io.IOException: Block > > (=blk_1213779416283711358_54194) not found > > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > > 2011-04-10 10:14:34,968 INFO > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > > deleteBlock=true) > > 2011-04-10 10:14:34,968 INFO org.apache.hadoop.ipc.Server: IPC Server > > handler 9 on 8020, call > > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@7faa86dc) from > > 10.1.104.1:42703: error: java.io.IOException: Block > > (=blk_1213779416283711358_54194) not found > > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > > 2011-04-10 10:14:35,992 INFO > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > > deleteBlock=true) > > 2011-04-10 10:14:35,993 INFO org.apache.hadoop.ipc.Server: IPC Server > > handler 6 on 8020, call > > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@147c16ff) from > > 10.1.104.1:42703: error: java.io.IOException: Block > > (=blk_1213779416283711358_54194) not found > > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > > 2011-04-10 10:14:37,005 INFO > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > > deleteBlock=true) > > 2011-04-10 10:14:37,005 INFO org.apache.hadoop.ipc.Server: IPC Server > > handler 9 on 8020, call > > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@5ec9c9da) from > > 10.1.104.1:42703: error: java.io.IOException: Block > > (=blk_1213779416283711358_54194) not found > > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > > 2011-04-10 10:14:38,016 INFO > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > > deleteBlock=true) > > 2011-04-10 10:14:38,016 INFO org.apache.hadoop.ipc.Server: IPC Server > > handler 5 on 8020, call > > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@5c664878) from > > 10.1.104.1:42703: error: java.io.IOException: Block > > (=blk_1213779416283711358_54194) not found > > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > > 2011-04-10 10:14:39,029 INFO > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > > deleteBlock=true) > > 2011-04-10 10:14:39,029 INFO org.apache.hadoop.ipc.Server: IPC Server > > handler 2 on 8020, call > > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@3f1fedac) from > > 10.1.104.1:42703: error: java.io.IOException: Block > > (=blk_1213779416283711358_54194) not found > > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > > 2011-04-10 10:14:40,039 INFO > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > > deleteBlock=true) > > 2011-04-10 10:14:40,039 INFO org.apache.hadoop.ipc.Server: IPC Server > > handler 2 on 8020, call > > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@1efd5b14) from > > 10.1.104.1:42703: error: java.io.IOException: Block > > (=blk_1213779416283711358_54194) not found > > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > > 2011-04-10 10:14:41,050 INFO > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > > deleteBlock=true) > > 2011-04-10 10:14:41,050 INFO org.apache.hadoop.ipc.Server: IPC Server > > handler 3 on 8020, call > > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@b668df1) from > > 10.1.104.5:54533: error: java.io.IOException: Block > > (=blk_1213779416283711358_54194) not found > > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > > 2011-04-10 10:14:42,060 INFO > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > > deleteBlock=true) > > 2011-04-10 10:14:42,060 INFO org.apache.hadoop.ipc.Server: IPC Server > > handler 6 on 8020, call > > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@79f0cf0f) from > > 10.1.104.5:54533: error: java.io.IOException: Block > > (=blk_1213779416283711358_54194) not found > > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > > 2011-04-10 10:14:43,069 INFO > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > > deleteBlock=true) > > 2011-04-10 10:14:43,069 INFO org.apache.hadoop.ipc.Server: IPC Server > > handler 0 on 8020, call > > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@30aa2054) from > > 10.1.104.5:54533: error: java.io.IOException: Block > > (=blk_1213779416283711358_54194) not found > > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > > 2011-04-10 10:14:44,078 INFO > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > > deleteBlock=true) > > 2011-04-10 10:14:44,078 INFO org.apache.hadoop.ipc.Server: IPC Server > > handler 8 on 8020, call > > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@713083c1) from > > 10.1.104.5:54533: error: java.io.IOException: Block > > (=blk_1213779416283711358_54194) not found > > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > > 2011-04-10 10:14:45,088 INFO > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > > deleteBlock=true) > > 2011-04-10 10:14:45,088 INFO org.apache.hadoop.ipc.Server: IPC Server > > handler 8 on 8020, call > > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@4ebf4464) from > > 10.1.104.5:54533: error: java.io.IOException: Block > > (=blk_1213779416283711358_54194) not found > > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > > 2011-04-10 10:14:46,098 INFO > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > > deleteBlock=true) > > 2011-04-10 10:14:46,098 INFO org.apache.hadoop.ipc.Server: IPC Server > > handler 5 on 8020, call > > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@6f23299f) from > > 10.1.104.5:54533: error: java.io.IOException: Block > > (=blk_1213779416283711358_54194) not found > > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > > > > > > > > -eran > > > > > > > > > > On Sun, Apr 10, 2011 at 22:27, Jean-Daniel Cryans <jdcry...@apache.org> > wrote: > >> My experience in debugging those kind of issues is that 95% of the > >> time it's a configuration issue, 4.99% of the time it's environment > >> and network issues (network splits, lost packets, etc), and the > >> remaining 0.01% is actual HDFS issues. > >> > >> The fact that you're saying that you had issues even with no load > >> makes me think it's a configuration issue. Can we see your hdfs > >> config? > >> > >> BTW the HBase log was pointing at 10.1.104.1 as the one having an > >> issue, is that the log we are looking at? (it doesn't seem so) > >> > >> Thx, > >> > >> J-D > >> > >> On Sun, Apr 10, 2011 at 12:05 PM, Eran Kutner <e...@gaigya.com> wrote: > >>> This is how it's configured in /etc/security/limits.con on all the > >>> slaves in the cluster: > >>> hadoop - nofile 32768 > >>> hdfs - nofile 32768 > >>> hbase - nofile 32768 > >>> hadoop - nproc 32000 > >>> hdfs - nproc 32000 > >>> hbase - nproc 32000 > >>> > >>> When hbase is loading it prints: > >>> ulimit -n 32768 > >>> > >>> > >>> -eran > >> > > >