Hey,
We're trying to restore a snapshot of a relatively big table (20TB) using
hbase 0.94.6-cdh4.5.0 and we're getting timeouts doing so. We increased the
timeout configurations(hbase.snapshot.master.timeoutMillis,
hbase.snapshot.region.timeout, hbase.snapshot.master.timeout.millis) to 10
minutes but we're still experiencing the timeouts. Here's the error and
stack trace (table name obfuscated just because):

ERROR: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException:
org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot {
ss*****-1443136710408 table=******** type=FLUSH } had an error.
kiji.prod.table.site.DI-1019-1443136710408 not found in proclist []
        at 
org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:360)
        at 
org.apache.hadoop.hbase.master.HMaster.isSnapshotDone(HMaster.java:2075)
        at sun.reflect.GeneratedMethodAccessor36.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1428)
Caused by: org.apache.hadoop.hbase.errorhandling.TimeoutException via
timer-java.util.Timer@8ad0d5c:org.apache.hadoop.hbase.errorhandling.TimeoutException:
Timeout elapsed! Source:Timeout caused Foreign Exception
Start:1443136713121, End:1443137313121, diff:600000, max:600000 ms
        at 
org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:85)
        at 
org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:285)
        at 
org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:350)
        ... 6 more
Caused by: org.apache.hadoop.hbase.errorhandling.TimeoutException:
Timeout elapsed! Source:Timeout caused Foreign Exception
Start:1443136713121, End:1443137313121, diff:600000, max:600000 ms
        at 
org.apache.hadoop.hbase.errorhandling.TimeoutExceptionInjector$1.run(TimeoutExceptionInjector.java:68)
        at java.util.TimerThread.mainLoop(Timer.java:555)
        at java.util.TimerThread.run(Timer.java:505)


We could increase the timeout again but we'd like to solicit some feedback
before trying that. First, does the timeout necessarily mean that the
restore failed or could it be still running asynchronously and eventually
completing? What's involved in the snapshot restore that could be useful in
informing what timeout value would be appropriate for this operation?

Thanks!

-- 
Alex

Reply via email to