Re: Problem during compacting a table

mohit.kaushik Wed, 05 Aug 2015 02:28:59 -0700

After a long stuck the compaction is complete for the table but thequestion is still same. why does the shell stuck on io for so long ???

2015-08-05 12:28:50,583 [Shell.audit] INFO : root@orkash page_content>compact -w

2015-08-05 12:28:50,586 [shell.Shell] INFO : Compacting table ...

2015-08-05 12:30:51,563 [impl.ThriftTransportPool] WARN : Thread "shell"stuck on IO to orkash4:9999 (0) for at least 120031 ms2015-08-05 13:26:24,301 [impl.ThriftTransportPool] INFO : Thread "shell"no longer stuck on IO to orkash4:9999 (0) sawError = false2015-08-05 13:26:24,319 [shell.Shell] INFO : Compaction of tablepage_content completed for given range/


On 08/05/2015 12:20 PM, mohit.kaushik wrote:

There errors are shown in logs of Hadoop namenode and slaves...

*Namenode**log*
/2015-08-05 12:05:14,518 INFOorg.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segmentat 3915082015-08-05 12:05:14,664 INFO BlockStateChange: BLOCK* ask192.168.10.121:50010 to replicate blk_1073780327_39560 to datanode(s)192.168.10.122:500102015-08-05 12:05:14,664 INFO BlockStateChange: BLOCK* ask192.168.10.121:50010 to replicate blk_1073780379_39612 to datanode(s)192.168.10.122:500102015-08-05 12:05:24,621 INFO BlockStateChange: BLOCK* addStoredBlock:blockMap updated: 192.168.10.122:50010 is added toblk_1073782847_42080 size 1342177282015-08-05 12:05:26,665 INFO BlockStateChange: BLOCK* ask192.168.10.121:50010 to replicate blk_1073780611_39844 to datanode(s)192.168.10.122:500102015-08-05 12:05:27,232 INFO BlockStateChange: BLOCK* addStoredBlock:blockMap updated: 192.168.10.122:50010 is added toblk_1073793941_53178 size 1342177282015-08-05 12:05:27,950 INFO BlockStateChange: BLOCK* addStoredBlock:blockMap updated: 192.168.10.122:50010 is added toblk_1073783859_43092 size 1342177282015-08-05 12:05:28,798 INFO BlockStateChange: BLOCK* addStoredBlock:blockMap updated: 192.168.10.122:50010 is added toblk_1073793387_52620 size 224962015-08-05 12:05:29,666 INFO BlockStateChange: BLOCK* ask192.168.10.123:50010 to replicate blk_1073780678_39911 to datanode(s)192.168.10.121:500102015-08-05 12:05:29,666 INFO BlockStateChange: BLOCK* ask192.168.10.121:50010 to replicate blk_1073780682_39915 to datanode(s)192.168.10.122:500102015-08-05 12:05:32,002 INFO BlockStateChange: BLOCK* addStoredBlock:blockMap updated: 192.168.10.122:50010 is added toblk_1073796582_55826{UCState=UNDER_CONSTRUCTION, truncateBlock=null,primaryNodeIndex=-1,replicas=[ReplicaUC[[DISK]DS-896dada5-52c0-4a69-beed-dfbc5d437fc6:NORMAL:192.168.10.123:50010|RBW],ReplicaUC[[DISK]DS-dd6d6a25-122f-4958-a20b-4ccb82f49f11:NORMAL:192.168.10.121:50010|RBW],ReplicaUC[[DISK]DS-188489f9-89d3-40bd-9d20-9db358d644c9:NORMAL:192.168.10.122:50010|RBW]]}size 02015-08-05 12:05:32,072 INFO BlockStateChange: BLOCK* addStoredBlock:blockMap updated: 192.168.10.121:50010 is added toblk_1073796582_55826{UCState=UNDER_CONSTRUCTION, truncateBlock=null,primaryNodeIndex=-1,replicas=[ReplicaUC[[DISK]DS-896dada5-52c0-4a69-beed-dfbc5d437fc6:NORMAL:192.168.10.123:50010|RBW],ReplicaUC[[DISK]DS-dd6d6a25-122f-4958-a20b-4ccb82f49f11:NORMAL:192.168.10.121:50010|RBW],ReplicaUC[[DISK]DS-188489f9-89d3-40bd-9d20-9db358d644c9:NORMAL:192.168.10.122:50010|RBW]]}size 02015-08-05 12:05:32,129 INFO BlockStateChange: BLOCK* addStoredBlock:blockMap updated: 192.168.10.123:50010 is added toblk_1073796582_55826{UCState=UNDER_CONSTRUCTION, truncateBlock=null,primaryNodeIndex=-1,replicas=[ReplicaUC[[DISK]DS-896dada5-52c0-4a69-beed-dfbc5d437fc6:NORMAL:192.168.10.123:50010|RBW],ReplicaUC[[DISK]DS-dd6d6a25-122f-4958-a20b-4ccb82f49f11:NORMAL:192.168.10.121:50010|RBW],ReplicaUC[[DISK]DS-188489f9-89d3-40bd-9d20-9db358d644c9:NORMAL:192.168.10.122:50010|RBW]]}size 0/................and more
*Slave log **(too many)*
/k_1073794728_53972 on DS-896dada5-52c0-4a69-beed-dfbc5d437fc6,because the block scanner is disabled.2015-08-05 11:50:30,438 INFOorg.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanningsuspicious blockBP-2102462487-192.168.10.124-1436956492274:blk_1073794738_53982 onDS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner isdisabled.2015-08-05 11:50:31,024 INFOorg.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanningsuspicious blockBP-2102462487-192.168.10.124-1436956492274:blk_1073794728_53972 onDS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner isdisabled.2015-08-05 11:50:31,027 INFOorg.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanningsuspicious blockBP-2102462487-192.168.10.124-1436956492274:blk_1073794738_53982 onDS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner isdisabled.2015-08-05 11:50:31,095 INFOorg.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanningsuspicious blockBP-2102462487-192.168.10.124-1436956492274:blk_1073794740_53984 onDS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner isdisabled.2015-08-05 11:50:31,105 INFOorg.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanningsuspicious blockBP-2102462487-192.168.10.124-1436956492274:blk_1073794740_53984 onDS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner isdisabled.2015-08-05 11:50:31,136 INFOorg.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanningsuspicious blockBP-2102462487-192.168.10.124-1436956492274:blk_1073794740_53984 onDS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner isdisabled.2015-08-05 11:50:31,136 INFOorg.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanningsuspicious blockBP-2102462487-192.168.10.124-1436956492274:blk_1073794740_53984 onDS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner isdisabled./
I am using locality groups so its a *NEED* to compact tables.... plzexplain how can I get rid of suspicious blocks.
Thanks

On 08/05/2015 10:53 AM, mohit.kaushik wrote:
yes, One of my datanode was down because disk was detached for sometime and tserver was lost for that node but Its Up and running again.
fsck show that the file system is healthy. but with so many msgsreporting under replicated blocks while my replication factor is 3 itshows required is 5.
//user/root/.Trash/Current/accumulo/tables/+r/root_tablet/delete+A0000d29.rf+F0000d28.rf:Under replicatedBP-2102462487-192.168.10.124-1436956492274:blk_1073796198_55442.Target Replicas is 5 but found 3 replica(s).///
Thanks & Regards
Mohit Kaushik

On 08/04/2015 09:18 PM, John Vines wrote:
It looks like an hdfs issue. Did a datanode go down? Did you turnreplication down to 1? The combination of those two errors woulddefinitely cause the problems your seeing as the latter disables anysort of robustness of the underlying filesystem.
On Tue, Aug 4, 2015 at 8:10 AM mohit.kaushik<[email protected] <mailto:[email protected]>> wrote:
    On 08/04/2015 05:35 PM, mohit.kaushik wrote:
    Hello All,

    I am using Apache Accumulo-1.6.3 with Apache Hadoop-2.7.0 on a
    3 node cluster. when I give compact command from the shell it
    gives the folloing warn.

    root@orkash testScan> compact -w
    2015-08-04 17:10:52,702 [Shell.audit] INFO : root@orkash
    testScan> compact -w
    2015-08-04 17:10:52,706 [shell.Shell] INFO : Compacting table ...
    2015-08-04 17:12:53,986 [impl.ThriftTransportPool] *WARN :
    Thread "shell" stuck on IO  to orkash4:9999 (0) for at least
    120034 ms*


    Tablet Servers show problem regarding a data block. which is
    something like HDFS-8659
    <https://issues.apache.org/jira/browse/HDFS-8659>

    /2015-08-04 15:00:27,825 [hdfs.DFSClient] WARN : Failed to
    connect to /192.168.10.121:50010 <http://192.168.10.121:50010>
    for block, add to deadNodes and continue. java.io.IOException:
    Got error, status message opReadBlock
    BP-2102462487-192.168.10.124-1436956492274:blk_1073780678_39911
    received exception
    org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica
    not found for
    BP-2102462487-192.168.10.124-1436956492274:blk_1073780678_39911, for
    OP_READ_BLOCK, self=/192.168.10.121:38752
    <http://192.168.10.121:38752>, remote=/192.168.10.121:50010
    <http://192.168.10.121:50010>, for file
    /accumulo/tables/h/t-000016s/F000016t.rf, for pool
    BP-2102462487-192.168.10.124-1436956492274 block 1073780678_39911//
    //java.io.IOException: Got error, status message opReadBlock
    BP-2102462487-192.168.10.124-1436956492274:blk_1073780678_39911
    received exception
    org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica
    not found for
    BP-2102462487-192.168.10.124-1436956492274:blk_1073780678_39911, for
    OP_READ_BLOCK, self=/192.168.10.121:38752
    <http://192.168.10.121:38752>, remote=/192.168.10.121:50010
    <http://192.168.10.121:50010>, for file
    /accumulo/tables/h/t-000016s/F000016t.rf, for pool
    BP-2102462487-192.168.10.124-1436956492274 block 1073780678_39911//
    //        at
    
org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:140)//
    //        at
    
org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockReader2.java:456)//
    //        at
    
org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:424)//
    //        at
    
org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:814)//
    //        at
    
org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:693)//
    //        at
    
org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:352)//
    //        at
    org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:618)//
    //        at
    
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:844)//
    //        at
    org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:896)//
    //        at
    java.io.DataInputStream.read(DataInputStream.java:149)//
    //        at
    
org.apache.accumulo.core.file.rfile.bcfile.BoundedRangeFileInputStream$1.run(BoundedRangeFileInputStream.java:104)//
    //        at
    
org.apache.accumulo.core.file.rfile.bcfile.BoundedRangeFileInputStream$1.run(BoundedRangeFileInputStream.java:100)//
    //        at java.security.AccessController.doPrivileged(Native
    Method)//
    //        at
    
org.apache.accumulo.core.file.rfile.bcfile.BoundedRangeFileInputStream.read(BoundedRangeFileInputStream.java:100)//
    //        at
    
org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(DecompressorStream.java:159)//
    //        at
    
org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:143)//
    //        at
    
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)//
    //        at
    java.io.BufferedInputStream.fill(BufferedInputStream.java:235)//
    //        at
    java.io.BufferedInputStream.read(BufferedInputStream.java:254)//
    //        at
    java.io.FilterInputStream.read(FilterInputStream.java:83)//
    //        at
    java.io.DataInputStream.readInt(DataInputStream.java:387)//
    //        at
    
org.apache.accumulo.core.file.rfile.MultiLevelIndex$IndexBlock.readFields(MultiLevelIndex.java:269)//
    //        at
    
org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader.getIndexBlock(MultiLevelIndex.java:724)//
    //        at
    
org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader.access$100(MultiLevelIndex.java:497)//
    //        at
    
org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader$Node.getNext(MultiLevelIndex.java:587)//
    //        at
    
org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader$Node.getNextNode(MultiLevelIndex.java:593)//
    //        at
    
org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader$IndexIterator.getNextNode(MultiLevelIndex.java:616)//
    //        at
    
org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader$IndexIterator.next(MultiLevelIndex.java:659)//
    //        at
    
org.apache.accumulo.core.file.rfile.RFile$LocalityGroupReader._next(RFile.java:559)/

    Regards
    Mohit Kaushik

    **
    And Compaction never completes



--
Signature

*Mohit Kaushik*
Software Engineer
A Square,Plot No. 278, Udyog Vihar, Phase 2, Gurgaon 122016, India
*Tel:*+91 (124) 4969352 | *Fax:*+91 (124) 4033553

<http://politicomapper.orkash.com>interactive social intelligence at work...

<https://www.facebook.com/Orkash2012><http://www.linkedin.com/company/orkash-services-private-limited><https://twitter.com/Orkash> <http://www.orkash.com/blog/><http://www.orkash.com>

<http://www.orkash.com> ... ensuring Assurance in complexity and uncertainty

/This message including the attachments, if any, is a confidentialbusiness communication. If you are not the intended recipient it may beunlawful for you to read, copy, distribute, disclose or otherwise usethe information in this e-mail. If you have received it in error or arenot the intended recipient, please destroy it and notify the senderimmediately. Thank you /

Re: Problem during compacting a table

Reply via email to