After a long stuck the compaction is complete for the table but the question is still same. why does the shell stuck on io for so long ???
/
2015-08-05 12:28:50,583 [Shell.audit] INFO : root@orkash page_content> compact -w
2015-08-05 12:28:50,586 [shell.Shell] INFO : Compacting table ...
2015-08-05 12:30:51,563 [impl.ThriftTransportPool] WARN : Thread "shell" stuck on IO to orkash4:9999 (0) for at least 120031 ms 2015-08-05 13:26:24,301 [impl.ThriftTransportPool] INFO : Thread "shell" no longer stuck on IO to orkash4:9999 (0) sawError = false 2015-08-05 13:26:24,319 [shell.Shell] INFO : Compaction of table page_content completed for given range/

On 08/05/2015 12:20 PM, mohit.kaushik wrote:
There errors are shown in logs of Hadoop namenode and slaves...

*Namenode**log*
/2015-08-05 12:05:14,518 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 391508 2015-08-05 12:05:14,664 INFO BlockStateChange: BLOCK* ask 192.168.10.121:50010 to replicate blk_1073780327_39560 to datanode(s) 192.168.10.122:50010 2015-08-05 12:05:14,664 INFO BlockStateChange: BLOCK* ask 192.168.10.121:50010 to replicate blk_1073780379_39612 to datanode(s) 192.168.10.122:50010 2015-08-05 12:05:24,621 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 192.168.10.122:50010 is added to blk_1073782847_42080 size 134217728 2015-08-05 12:05:26,665 INFO BlockStateChange: BLOCK* ask 192.168.10.121:50010 to replicate blk_1073780611_39844 to datanode(s) 192.168.10.122:50010 2015-08-05 12:05:27,232 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 192.168.10.122:50010 is added to blk_1073793941_53178 size 134217728 2015-08-05 12:05:27,950 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 192.168.10.122:50010 is added to blk_1073783859_43092 size 134217728 2015-08-05 12:05:28,798 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 192.168.10.122:50010 is added to blk_1073793387_52620 size 22496 2015-08-05 12:05:29,666 INFO BlockStateChange: BLOCK* ask 192.168.10.123:50010 to replicate blk_1073780678_39911 to datanode(s) 192.168.10.121:50010 2015-08-05 12:05:29,666 INFO BlockStateChange: BLOCK* ask 192.168.10.121:50010 to replicate blk_1073780682_39915 to datanode(s) 192.168.10.122:50010 2015-08-05 12:05:32,002 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 192.168.10.122:50010 is added to blk_1073796582_55826{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-896dada5-52c0-4a69-beed-dfbc5d437fc6:NORMAL:192.168.10.123:50010|RBW], ReplicaUC[[DISK]DS-dd6d6a25-122f-4958-a20b-4ccb82f49f11:NORMAL:192.168.10.121:50010|RBW], ReplicaUC[[DISK]DS-188489f9-89d3-40bd-9d20-9db358d644c9:NORMAL:192.168.10.122:50010|RBW]]} size 0 2015-08-05 12:05:32,072 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 192.168.10.121:50010 is added to blk_1073796582_55826{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-896dada5-52c0-4a69-beed-dfbc5d437fc6:NORMAL:192.168.10.123:50010|RBW], ReplicaUC[[DISK]DS-dd6d6a25-122f-4958-a20b-4ccb82f49f11:NORMAL:192.168.10.121:50010|RBW], ReplicaUC[[DISK]DS-188489f9-89d3-40bd-9d20-9db358d644c9:NORMAL:192.168.10.122:50010|RBW]]} size 0 2015-08-05 12:05:32,129 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 192.168.10.123:50010 is added to blk_1073796582_55826{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-896dada5-52c0-4a69-beed-dfbc5d437fc6:NORMAL:192.168.10.123:50010|RBW], ReplicaUC[[DISK]DS-dd6d6a25-122f-4958-a20b-4ccb82f49f11:NORMAL:192.168.10.121:50010|RBW], ReplicaUC[[DISK]DS-188489f9-89d3-40bd-9d20-9db358d644c9:NORMAL:192.168.10.122:50010|RBW]]} size 0/................and more

*Slave log **(too many)*
/k_1073794728_53972 on DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is disabled. 2015-08-05 11:50:30,438 INFO org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning suspicious block BP-2102462487-192.168.10.124-1436956492274:blk_1073794738_53982 on DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is disabled. 2015-08-05 11:50:31,024 INFO org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning suspicious block BP-2102462487-192.168.10.124-1436956492274:blk_1073794728_53972 on DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is disabled. 2015-08-05 11:50:31,027 INFO org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning suspicious block BP-2102462487-192.168.10.124-1436956492274:blk_1073794738_53982 on DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is disabled. 2015-08-05 11:50:31,095 INFO org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning suspicious block BP-2102462487-192.168.10.124-1436956492274:blk_1073794740_53984 on DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is disabled. 2015-08-05 11:50:31,105 INFO org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning suspicious block BP-2102462487-192.168.10.124-1436956492274:blk_1073794740_53984 on DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is disabled. 2015-08-05 11:50:31,136 INFO org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning suspicious block BP-2102462487-192.168.10.124-1436956492274:blk_1073794740_53984 on DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is disabled. 2015-08-05 11:50:31,136 INFO org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning suspicious block BP-2102462487-192.168.10.124-1436956492274:blk_1073794740_53984 on DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is disabled./


I am using locality groups so its a *NEED* to compact tables.... plz explain how can I get rid of suspicious blocks.

Thanks

On 08/05/2015 10:53 AM, mohit.kaushik wrote:
yes, One of my datanode was down because disk was detached for some time and tserver was lost for that node but Its Up and running again.

fsck show that the file system is healthy. but with so many msgs reporting under replicated blocks while my replication factor is 3 it shows required is 5.

//user/root/.Trash/Current/accumulo/tables/+r/root_tablet/delete+A0000d29.rf+F0000d28.rf: Under replicated BP-2102462487-192.168.10.124-1436956492274:blk_1073796198_55442. Target Replicas is 5 but found 3 replica(s).///

Thanks & Regards
Mohit Kaushik

On 08/04/2015 09:18 PM, John Vines wrote:
It looks like an hdfs issue. Did a datanode go down? Did you turn replication down to 1? The combination of those two errors would definitely cause the problems your seeing as the latter disables any sort of robustness of the underlying filesystem.

On Tue, Aug 4, 2015 at 8:10 AM mohit.kaushik <[email protected] <mailto:[email protected]>> wrote:

    On 08/04/2015 05:35 PM, mohit.kaushik wrote:
    Hello All,

    I am using Apache Accumulo-1.6.3 with Apache Hadoop-2.7.0 on a
    3 node cluster. when I give compact command from the shell it
    gives the folloing warn.

    root@orkash testScan> compact -w
    2015-08-04 17:10:52,702 [Shell.audit] INFO : root@orkash
    testScan> compact -w
    2015-08-04 17:10:52,706 [shell.Shell] INFO : Compacting table ...
    2015-08-04 17:12:53,986 [impl.ThriftTransportPool] *WARN :
    Thread "shell" stuck on IO  to orkash4:9999 (0) for at least
    120034 ms*


    Tablet Servers show problem regarding a data block. which is
    something like HDFS-8659
    <https://issues.apache.org/jira/browse/HDFS-8659>

    /2015-08-04 15:00:27,825 [hdfs.DFSClient] WARN : Failed to
    connect to /192.168.10.121:50010 <http://192.168.10.121:50010>
    for block, add to deadNodes and continue. java.io.IOException:
    Got error, status message opReadBlock
    BP-2102462487-192.168.10.124-1436956492274:blk_1073780678_39911
    received exception
    org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica
    not found for
    BP-2102462487-192.168.10.124-1436956492274:blk_1073780678_39911, for
    OP_READ_BLOCK, self=/192.168.10.121:38752
    <http://192.168.10.121:38752>, remote=/192.168.10.121:50010
    <http://192.168.10.121:50010>, for file
    /accumulo/tables/h/t-000016s/F000016t.rf, for pool
    BP-2102462487-192.168.10.124-1436956492274 block 1073780678_39911//
    //java.io.IOException: Got error, status message opReadBlock
    BP-2102462487-192.168.10.124-1436956492274:blk_1073780678_39911
    received exception
    org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica
    not found for
    BP-2102462487-192.168.10.124-1436956492274:blk_1073780678_39911, for
    OP_READ_BLOCK, self=/192.168.10.121:38752
    <http://192.168.10.121:38752>, remote=/192.168.10.121:50010
    <http://192.168.10.121:50010>, for file
    /accumulo/tables/h/t-000016s/F000016t.rf, for pool
    BP-2102462487-192.168.10.124-1436956492274 block 1073780678_39911//
    //        at
    
org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:140)//
    //        at
    
org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockReader2.java:456)//
    //        at
    
org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:424)//
    //        at
    
org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:814)//
    //        at
    
org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:693)//
    //        at
    
org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:352)//
    //        at
    org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:618)//
    //        at
    
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:844)//
    //        at
    org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:896)//
    //        at
    java.io.DataInputStream.read(DataInputStream.java:149)//
    //        at
    
org.apache.accumulo.core.file.rfile.bcfile.BoundedRangeFileInputStream$1.run(BoundedRangeFileInputStream.java:104)//
    //        at
    
org.apache.accumulo.core.file.rfile.bcfile.BoundedRangeFileInputStream$1.run(BoundedRangeFileInputStream.java:100)//
    //        at java.security.AccessController.doPrivileged(Native
    Method)//
    //        at
    
org.apache.accumulo.core.file.rfile.bcfile.BoundedRangeFileInputStream.read(BoundedRangeFileInputStream.java:100)//
    //        at
    
org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(DecompressorStream.java:159)//
    //        at
    
org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:143)//
    //        at
    
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)//
    //        at
    java.io.BufferedInputStream.fill(BufferedInputStream.java:235)//
    //        at
    java.io.BufferedInputStream.read(BufferedInputStream.java:254)//
    //        at
    java.io.FilterInputStream.read(FilterInputStream.java:83)//
    //        at
    java.io.DataInputStream.readInt(DataInputStream.java:387)//
    //        at
    
org.apache.accumulo.core.file.rfile.MultiLevelIndex$IndexBlock.readFields(MultiLevelIndex.java:269)//
    //        at
    
org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader.getIndexBlock(MultiLevelIndex.java:724)//
    //        at
    
org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader.access$100(MultiLevelIndex.java:497)//
    //        at
    
org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader$Node.getNext(MultiLevelIndex.java:587)//
    //        at
    
org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader$Node.getNextNode(MultiLevelIndex.java:593)//
    //        at
    
org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader$IndexIterator.getNextNode(MultiLevelIndex.java:616)//
    //        at
    
org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader$IndexIterator.next(MultiLevelIndex.java:659)//
    //        at
    
org.apache.accumulo.core.file.rfile.RFile$LocalityGroupReader._next(RFile.java:559)/

    Regards
    Mohit Kaushik

    **

    And Compaction never completes



--
Signature

*Mohit Kaushik*
Software Engineer
A Square,Plot No. 278, Udyog Vihar, Phase 2, Gurgaon 122016, India
*Tel:*+91 (124) 4969352 | *Fax:*+91 (124) 4033553

<http://politicomapper.orkash.com>interactive social intelligence at work...

<https://www.facebook.com/Orkash2012> <http://www.linkedin.com/company/orkash-services-private-limited> <https://twitter.com/Orkash> <http://www.orkash.com/blog/> <http://www.orkash.com>
<http://www.orkash.com> ... ensuring Assurance in complexity and uncertainty

/This message including the attachments, if any, is a confidential business communication. If you are not the intended recipient it may be unlawful for you to read, copy, distribute, disclose or otherwise use the information in this e-mail. If you have received it in error or are not the intended recipient, please destroy it and notify the sender immediately. Thank you /

Reply via email to