Hi Ted, Thanks for your reply. I read the comment of jira. But, in my case "/apps/hbase/data/.hbase-snapshot/.tmp/" is already empty. So, i am not really sure what i can sideline. Please let me know if i am missing something.
~Anil On Sat, Feb 10, 2018 at 8:35 AM, Ted Yu <[email protected]> wrote: > Please the first few review comments of HBASE-16464. > > You can sideline the corrupt snapshots (according to master log). > > You can also contact the vendor for a HOTFIX. > > Cheers > > On Sat, Feb 10, 2018 at 8:13 AM, anil gupta <[email protected]> wrote: > > > Hi Folks, > > > > We are running HBase1.1.2. It seems like we are hitting > > https://issues.apache.org/jira/browse/HBASE-16464 in our Production > > cluster. Our oldwals folder has grown to 9.5Tb. I am aware that this is > > fixed in releases after 2016 but unfortunately we need to operate this > > production cluster for few more months. (We are already migrating to a > > newer version of HBase). > > > > I have verified that we dont have any snapshots in this cluster. Also, we > > removed all the replication_peers from that cluster. We have already > > restarted HBase master a few days ago but it didnt help. We have TB's of > > oldwal and tens of thousand of recovered edit files.(assuming recovered > > edits files are cleaned up by chore cleaner). Seems like the problem > > started happening around mid december but at that time we didnt do any > > major thing on this cluster. > > > > I would like to see if there is a workaround for HBASE-16464? Is there > any > > references left to those deleted snapshots in hdfs or zk? If yes, how > can i > > clean up? > > > > I keep on seeing this in HMaster logs: > > 2018-02-07 09:10:08,514 ERROR > > [hdpmaster6.bigdataprod1.wh.truecarcorp.com,60000, > > 1517601353645_ChoreService_3] > > snapshot.SnapshotHFileCleaner: Exception while checking if files were > > valid, keeping them just in case. > > org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't > read > > snapshot info > > from:hdfs://PRODNN/apps/hbase/data/.hbase-snapshot/.tmp/ > > LEAD_SALES-1517979610/.snapshotinfo > > at > > org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils. > > readSnapshotInfo(SnapshotDescriptionUtils.java:313) > > at > > org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.getHFileNames( > > SnapshotReferenceUtil.java:328) > > at > > org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner$1. > > filesUnderSnapshot(SnapshotHFileCleaner.java:85) > > at > > org.apache.hadoop.hbase.master.snapshot.SnapshotFileCache. > > getSnapshotsInProgress(SnapshotFileCache.java:303) > > at > > org.apache.hadoop.hbase.master.snapshot.SnapshotFileCache. > > getUnreferencedFiles(SnapshotFileCache.java:194) > > at > > org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner. > > getDeletableFiles(SnapshotHFileCleaner.java:62) > > at > > org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteFiles( > > CleanerChore.java:233) > > at > > org.apache.hadoop.hbase.master.cleaner.CleanerChore. > checkAndDeleteEntries( > > CleanerChore.java:157) > > at > > org.apache.hadoop.hbase.master.cleaner.CleanerChore. > > checkAndDeleteDirectory(CleanerChore.java:180) > > at > > org.apache.hadoop.hbase.master.cleaner.CleanerChore. > checkAndDeleteEntries( > > CleanerChore.java:149) > > at > > org.apache.hadoop.hbase.master.cleaner.CleanerChore. > > checkAndDeleteDirectory(CleanerChore.java:180) > > at > > org.apache.hadoop.hbase.master.cleaner.CleanerChore. > checkAndDeleteEntries( > > CleanerChore.java:149) > > at > > org.apache.hadoop.hbase.master.cleaner.CleanerChore. > > checkAndDeleteDirectory(CleanerChore.java:180) > > at > > org.apache.hadoop.hbase.master.cleaner.CleanerChore. > checkAndDeleteEntries( > > CleanerChore.java:149) > > at > > org.apache.hadoop.hbase.master.cleaner.CleanerChore. > > checkAndDeleteDirectory(CleanerChore.java:180) > > at > > org.apache.hadoop.hbase.master.cleaner.CleanerChore. > checkAndDeleteEntries( > > CleanerChore.java:149) > > at > > org.apache.hadoop.hbase.master.cleaner.CleanerChore. > > checkAndDeleteDirectory(CleanerChore.java:180) > > at > > org.apache.hadoop.hbase.master.cleaner.CleanerChore. > checkAndDeleteEntries( > > CleanerChore.java:149) > > at > > org.apache.hadoop.hbase.master.cleaner.CleanerChore. > > chore(CleanerChore.java:124) > > at org.apache.hadoop.hbase.ScheduledChore.run( > ScheduledChore.java:185) > > at > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > > at > > java.util.concurrent.ScheduledThreadPoolExecutor$ > > ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > > at > > java.util.concurrent.ScheduledThreadPoolExecutor$ > ScheduledFutureTask.run( > > ScheduledThreadPoolExecutor.java:294) > > at > > java.util.concurrent.ThreadPoolExecutor.runWorker( > > ThreadPoolExecutor.java:1142) > > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run( > > ThreadPoolExecutor.java:617) > > at java.lang.Thread.run(Thread.java:745) > > Caused by: java.io.FileNotFoundException: File does not exist: > > /apps/hbase/data/.hbase-snapshot/.tmp/LEAD_SALES- > 1517979610/.snapshotinfo > > at > > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf( > > INodeFile.java:71) > > at > > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf( > > INodeFile.java:61) > > at > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem. > getBlockLocationsInt( > > FSNamesystem.java:1828) > > at > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations( > > FSNamesystem.java:1799) > > at > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations( > > FSNamesystem.java:1712) > > at > > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer. > > getBlockLocations(NameNodeRpcServer.java:652) > > at > > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSi > > deTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSi > > deTranslatorPB.java:365) > > at > > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ > > ClientNamenodeProtocol$2.callBlockingMethod( > ClientNamenodeProtocolProtos. > > java) > > at > > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call( > > ProtobufRpcEngine.java:616) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151) > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:415) > > at > > org.apache.hadoop.security.UserGroupInformation.doAs( > > UserGroupInformation.java:1657) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145) > > > > at sun.reflect.GeneratedConstructorAccessor22.newInstance(Unknown > > Source) > > at > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance( > > DelegatingConstructorAccessorImpl.java:45) > > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > > at > > org.apache.hadoop.ipc.RemoteException.instantiateException( > > RemoteException.java:106) > > at > > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException( > > RemoteException.java:73) > > at > > org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations( > > DFSClient.java:1242) > > at > > org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1227) > > at > > org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1215) > > at > > org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBl > > ockLength(DFSInputStream.java:303) > > at > > org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:269) > > at org.apache.hadoop.hdfs.DFSInputStream.<init>( > > DFSInputStream.java:261) > > at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1540) > > at > > org.apache.hadoop.hdfs.DistributedFileSystem$3. > > doCall(DistributedFileSystem.java:303) > > at > > org.apache.hadoop.hdfs.DistributedFileSystem$3. > > doCall(DistributedFileSystem.java:299) > > at > > org.apache.hadoop.fs.FileSystemLinkResolver.resolve( > > FileSystemLinkResolver.java:81) > > at > > org.apache.hadoop.hdfs.DistributedFileSystem.open( > > DistributedFileSystem.java:299) > > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:767) > > at > > org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils. > > readSnapshotInfo(SnapshotDescriptionUtils.java:306) > > ... 26 more > > > > > > -- > > Thanks & Regards, > > Anil Gupta > > > -- Thanks & Regards, Anil Gupta
