1. it looks like it's an underlying hdfs issue. I'm not familiar with that message, maybe it's a new 2.7 thing? I'm not sure how tested we are for hadoop 2.7, especially with accumulo 1.6 so that could be a factor
2. You don't need tablets to major compact to use locality groups. 3. The shell was waiting for the major compaction to finish because you gave it the -w flag. If you didn't want the shell to wait, do not provide that flag. On Wed, Aug 5, 2015 at 5:28 AM mohit.kaushik <[email protected]> wrote: > After a long stuck the compaction is complete for the table but the > question is still same. why does the shell stuck on io for so long ??? > > > > > > * 2015-08-05 12:28:50,583 [Shell.audit] INFO : root@orkash page_content> > compact -w 2015-08-05 12:28:50,586 [shell.Shell] INFO : Compacting table > ... 2015-08-05 12:30:51,563 [impl.ThriftTransportPool] WARN : Thread > "shell" stuck on IO to orkash4:9999 (0) for at least 120031 ms 2015-08-05 > 13:26:24,301 [impl.ThriftTransportPool] INFO : Thread "shell" no longer > stuck on IO to orkash4:9999 (0) sawError = false 2015-08-05 13:26:24,319 > [shell.Shell] INFO : Compaction of table page_content completed for given > range* > > > On 08/05/2015 12:20 PM, mohit.kaushik wrote: > > There errors are shown in logs of Hadoop namenode and slaves... > > *Namenode** log* > > > > > > > > > > > > > *2015-08-05 12:05:14,518 INFO > org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at > 391508 2015-08-05 12:05:14,664 INFO BlockStateChange: BLOCK* ask > 192.168.10.121:50010 <http://192.168.10.121:50010> to replicate > blk_1073780327_39560 to datanode(s) 192.168.10.122:50010 > <http://192.168.10.122:50010> 2015-08-05 12:05:14,664 INFO > BlockStateChange: BLOCK* ask 192.168.10.121:50010 > <http://192.168.10.121:50010> to replicate blk_1073780379_39612 to > datanode(s) 192.168.10.122:50010 <http://192.168.10.122:50010> 2015-08-05 > 12:05:24,621 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap > updated: 192.168.10.122:50010 <http://192.168.10.122:50010> is added to > blk_1073782847_42080 size 134217728 2015-08-05 12:05:26,665 INFO > BlockStateChange: BLOCK* ask 192.168.10.121:50010 > <http://192.168.10.121:50010> to replicate blk_1073780611_39844 to > datanode(s) 192.168.10.122:50010 <http://192.168.10.122:50010> 2015-08-05 > 12:05:27,232 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap > updated: 192.168.10.122:50010 <http://192.168.10.122:50010> is added to > blk_1073793941_53178 size 134217728 2015-08-05 12:05:27,950 INFO > BlockStateChange: BLOCK* addStoredBlock: blockMap updated: > 192.168.10.122:50010 <http://192.168.10.122:50010> is added to > blk_1073783859_43092 size 134217728 2015-08-05 12:05:28,798 INFO > BlockStateChange: BLOCK* addStoredBlock: blockMap updated: > 192.168.10.122:50010 <http://192.168.10.122:50010> is added to > blk_1073793387_52620 size 22496 2015-08-05 12:05:29,666 INFO > BlockStateChange: BLOCK* ask 192.168.10.123:50010 > <http://192.168.10.123:50010> to replicate blk_1073780678_39911 to > datanode(s) 192.168.10.121:50010 <http://192.168.10.121:50010> 2015-08-05 > 12:05:29,666 INFO BlockStateChange: BLOCK* ask 192.168.10.121:50010 > <http://192.168.10.121:50010> to replicate blk_1073780682_39915 to > datanode(s) 192.168.10.122:50010 <http://192.168.10.122:50010> 2015-08-05 > 12:05:32,002 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap > updated: 192.168.10.122:50010 <http://192.168.10.122:50010> is added to > blk_1073796582_55826{UCState=UNDER_CONSTRUCTION, truncateBlock=null, > primaryNodeIndex=-1, > replicas=[ReplicaUC[[DISK]DS-896dada5-52c0-4a69-beed-dfbc5d437fc6:NORMAL:192.168.10.123:50010|RBW], > ReplicaUC[[DISK]DS-dd6d6a25-122f-4958-a20b-4ccb82f49f11:NORMAL:192.168.10.121:50010|RBW], > ReplicaUC[[DISK]DS-188489f9-89d3-40bd-9d20-9db358d644c9:NORMAL:192.168.10.122:50010|RBW]]} > size 0 2015-08-05 12:05:32,072 INFO BlockStateChange: BLOCK* > addStoredBlock: blockMap updated: 192.168.10.121:50010 > <http://192.168.10.121:50010> is added to > blk_1073796582_55826{UCState=UNDER_CONSTRUCTION, truncateBlock=null, > primaryNodeIndex=-1, > replicas=[ReplicaUC[[DISK]DS-896dada5-52c0-4a69-beed-dfbc5d437fc6:NORMAL:192.168.10.123:50010|RBW], > ReplicaUC[[DISK]DS-dd6d6a25-122f-4958-a20b-4ccb82f49f11:NORMAL:192.168.10.121:50010|RBW], > ReplicaUC[[DISK]DS-188489f9-89d3-40bd-9d20-9db358d644c9:NORMAL:192.168.10.122:50010|RBW]]} > size 0 2015-08-05 12:05:32,129 INFO BlockStateChange: BLOCK* > addStoredBlock: blockMap updated: 192.168.10.123:50010 > <http://192.168.10.123:50010> is added to > blk_1073796582_55826{UCState=UNDER_CONSTRUCTION, truncateBlock=null, > primaryNodeIndex=-1, > replicas=[ReplicaUC[[DISK]DS-896dada5-52c0-4a69-beed-dfbc5d437fc6:NORMAL:192.168.10.123:50010|RBW], > ReplicaUC[[DISK]DS-dd6d6a25-122f-4958-a20b-4ccb82f49f11:NORMAL:192.168.10.121:50010|RBW], > ReplicaUC[[DISK]DS-188489f9-89d3-40bd-9d20-9db358d644c9:NORMAL:192.168.10.122:50010|RBW]]} > size 0*................and more > > *Slave log **(too many)* > > > > > > > > *k_1073794728_53972 on DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because > the block scanner is disabled. 2015-08-05 11:50:30,438 INFO > org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning > suspicious block > BP-2102462487-192.168.10.124-1436956492274:blk_1073794738_53982 on > DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is > disabled. 2015-08-05 11:50:31,024 INFO > org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning > suspicious block > BP-2102462487-192.168.10.124-1436956492274:blk_1073794728_53972 on > DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is > disabled. 2015-08-05 11:50:31,027 INFO > org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning > suspicious block > BP-2102462487-192.168.10.124-1436956492274:blk_1073794738_53982 on > DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is > disabled. 2015-08-05 11:50:31,095 INFO > org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning > suspicious block > BP-2102462487-192.168.10.124-1436956492274:blk_1073794740_53984 on > DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is > disabled. 2015-08-05 11:50:31,105 INFO > org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning > suspicious block > BP-2102462487-192.168.10.124-1436956492274:blk_1073794740_53984 on > DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is > disabled. 2015-08-05 11:50:31,136 INFO > org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning > suspicious block > BP-2102462487-192.168.10.124-1436956492274:blk_1073794740_53984 on > DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is > disabled. 2015-08-05 11:50:31,136 INFO > org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning > suspicious block > BP-2102462487-192.168.10.124-1436956492274:blk_1073794740_53984 on > DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is > disabled.* > > > I am using locality groups so its a *NEED* to compact tables.... plz > explain how can I get rid of suspicious blocks. > > Thanks > > On 08/05/2015 10:53 AM, mohit.kaushik wrote: > > yes, One of my datanode was down because disk was detached for some time > and tserver was lost for that node but Its Up and running again. > > fsck show that the file system is healthy. but with so many msgs reporting > under replicated blocks while my replication factor is 3 it shows required > is 5. > > */user/root/.Trash/Current/accumulo/tables/+r/root_tablet/delete+A0000d29.rf+F0000d28.rf: > Under replicated > BP-2102462487-192.168.10.124-1436956492274:blk_1073796198_55442. Target > Replicas is 5 but found 3 replica(s).* > > Thanks & Regards > Mohit Kaushik > > On 08/04/2015 09:18 PM, John Vines wrote: > > It looks like an hdfs issue. Did a datanode go down? Did you turn > replication down to 1? The combination of those two errors would definitely > cause the problems your seeing as the latter disables any sort of > robustness of the underlying filesystem. > > On Tue, Aug 4, 2015 at 8:10 AM mohit.kaushik <[email protected]> > wrote: > >> On 08/04/2015 05:35 PM, mohit.kaushik wrote: >> >> Hello All, >> >> I am using Apache Accumulo-1.6.3 with Apache Hadoop-2.7.0 on a 3 node >> cluster. when I give compact command from the shell it gives the folloing >> warn. >> >> root@orkash testScan> compact -w >> 2015-08-04 17:10:52,702 [Shell.audit] INFO : root@orkash testScan> >> compact -w >> 2015-08-04 17:10:52,706 [shell.Shell] INFO : Compacting table ... >> 2015-08-04 17:12:53,986 [impl.ThriftTransportPool] *WARN : Thread >> "shell" stuck on IO to orkash4:9999 (0) for at least 120034 ms* >> >> >> Tablet Servers show problem regarding a data block. which is something >> like HDFS-8659 <https://issues.apache.org/jira/browse/HDFS-8659> >> >> *2015-08-04 15:00:27,825 [hdfs.DFSClient] WARN : Failed to connect to >> /192.168.10.121:50010 <http://192.168.10.121:50010> for block, add to >> deadNodes and continue. java.io.IOException: Got error, status message >> opReadBlock BP-2102462487-192.168.10.124-1436956492274:blk_1073780678_39911 >> received exception >> org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica >> not found for >> BP-2102462487-192.168.10.124-1436956492274:blk_1073780678_39911, for >> OP_READ_BLOCK, self=/192.168.10.121:38752 <http://192.168.10.121:38752>, >> remote=/192.168.10.121:50010 <http://192.168.10.121:50010>, for file >> /accumulo/tables/h/t-000016s/F000016t.rf, for pool >> BP-2102462487-192.168.10.124-1436956492274 block 1073780678_39911* >> *java.io.IOException: Got error, status message opReadBlock >> BP-2102462487-192.168.10.124-1436956492274:blk_1073780678_39911 received >> exception org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: >> Replica not found for >> BP-2102462487-192.168.10.124-1436956492274:blk_1073780678_39911, for >> OP_READ_BLOCK, self=/192.168.10.121:38752 <http://192.168.10.121:38752>, >> remote=/192.168.10.121:50010 <http://192.168.10.121:50010>, for file >> /accumulo/tables/h/t-000016s/F000016t.rf, for pool >> BP-2102462487-192.168.10.124-1436956492274 block 1073780678_39911* >> * at >> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:140)* >> * at >> org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockReader2.java:456)* >> * at >> org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:424)* >> * at >> org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:814)* >> * at >> org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:693)* >> * at >> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:352)* >> * at >> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:618)* >> * at >> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:844)* >> * at >> org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:896)* >> * at java.io.DataInputStream.read(DataInputStream.java:149)* >> * at >> org.apache.accumulo.core.file.rfile.bcfile.BoundedRangeFileInputStream$1.run(BoundedRangeFileInputStream.java:104)* >> * at >> org.apache.accumulo.core.file.rfile.bcfile.BoundedRangeFileInputStream$1.run(BoundedRangeFileInputStream.java:100)* >> * at java.security.AccessController.doPrivileged(Native Method)* >> * at >> org.apache.accumulo.core.file.rfile.bcfile.BoundedRangeFileInputStream.read(BoundedRangeFileInputStream.java:100)* >> * at >> org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(DecompressorStream.java:159)* >> * at >> org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:143)* >> * at >> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)* >> * at >> java.io.BufferedInputStream.fill(BufferedInputStream.java:235)* >> * at >> java.io.BufferedInputStream.read(BufferedInputStream.java:254)* >> * at java.io.FilterInputStream.read(FilterInputStream.java:83)* >> * at java.io.DataInputStream.readInt(DataInputStream.java:387)* >> * at >> org.apache.accumulo.core.file.rfile.MultiLevelIndex$IndexBlock.readFields(MultiLevelIndex.java:269)* >> * at >> org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader.getIndexBlock(MultiLevelIndex.java:724)* >> * at >> org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader.access$100(MultiLevelIndex.java:497)* >> * at >> org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader$Node.getNext(MultiLevelIndex.java:587)* >> * at >> org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader$Node.getNextNode(MultiLevelIndex.java:593)* >> * at >> org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader$IndexIterator.getNextNode(MultiLevelIndex.java:616)* >> * at >> org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader$IndexIterator.next(MultiLevelIndex.java:659)* >> * at >> org.apache.accumulo.core.file.rfile.RFile$LocalityGroupReader._next(RFile.java:559)* >> >> Regards >> Mohit Kaushik >> >> >> And Compaction never completes >> >> > > -- > > * Mohit Kaushik* > Software Engineer > A Square,Plot No. 278, Udyog Vihar, Phase 2, Gurgaon 122016, India > *Tel:* +91 (124) 4969352 | *Fax:* +91 (124) 4033553 > > <http://politicomapper.orkash.com>interactive social intelligence at > work... > > <https://www.facebook.com/Orkash2012> > <http://www.linkedin.com/company/orkash-services-private-limited> > <https://twitter.com/Orkash> <http://www.orkash.com/blog/> > <http://www.orkash.com> > <http://www.orkash.com> ... ensuring Assurance in complexity and > uncertainty > > *This message including the attachments, if any, is a confidential > business communication. If you are not the intended recipient it may be > unlawful for you to read, copy, distribute, disclose or otherwise use the > information in this e-mail. If you have received it in error or are not the > intended recipient, please destroy it and notify the sender immediately. > Thank you * >
