Hi Nick,

This is something that the client does recover from internally.  I expect the 
exception would not propagate to the caller, and therefore the reading process 
would continue successfully.  There is a potential for a latency hit on the 
extra NameNode RPC required for recovery.

On opening a file, the client issues an RPC to the NameNode to determine block 
locations for reading the blocks from DataNodes.  In a secured cluster, this 
RPC response also includes a block access token for each block, which the 
client needs for subsequent authorization to the DataNodes hosting replicas of 
those blocks.  In typical configuration, this information (block locations and 
access tokens) is pre-fetched for up to 10 blocks, possibly capped by file size 
of course.  Block access tokens have an expiration.  Let's say for example that 
the application does not read all the way through to the tenth block within the 
block access token expiration.  At that point, the client would recover 
internally by issuing another RPC to the NameNode to fetch a fresh block access 
token with a new expiration.  Then, the read would continue as normal.

Does this explanation jive with what you're seeing?  Are the reads successful, 
but experiencing undesired latency (the "response too slow" warning you 
mentioned)?  If so, then maybe this hints at a feature request for HDFS.  For 
example, maybe it would be desirable to detect "almost expired" block access 
tokens in a background thread and refetch them proactively.  That way, the 
caller wouldn't absorb the latency hit of the extra RPC as part of its read 
call.  Please feel free to file an HDFS JIRA if this makes sense, or if you 
have something else like it in mind.

--Chris Nauroth

From: Dejan Menges <dejan.men...@gmail.com<mailto:dejan.men...@gmail.com>>
Date: Wednesday, January 20, 2016 at 3:26 AM
To: Nick Dimiduk <ndimi...@apache.org<mailto:ndimi...@apache.org>>, 
"user@hadoop.apache.org<mailto:user@hadoop.apache.org>" 
<user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Cc: hbase-user <u...@hbase.apache.org<mailto:u...@hbase.apache.org>>
Subject: Re: HDFS short-circuit tokens expiring

Hi Nick,

I had exactly the same case, and in our case it was that tokens were expiring 
too quickly. What we increased was 
dfs.client.read.shortcircuit.streams.cache.size and 
dfs.client.read.shortcircuit.streams.cache.expiry.ms<http://dfs.client.read.shortcircuit.streams.cache.expiry.ms>.

Hope this helps.

Best,
Dejan

On Wed, Jan 20, 2016 at 12:15 AM Nick Dimiduk 
<ndimi...@apache.org<mailto:ndimi...@apache.org>> wrote:
Hi folks,

This looks like it sits in the intersection between HDFS and HBase. My region 
server logs are flooding with messages like "SecretManager$InvalidToken: access 
control error while attempting to set up short-circuit access to 
<hdfs-file-path> ... is expired" [0].

These logs correspond with responseTooSlow WARNings from the region server.

Maybe I have misconfigured short-circuit reads? Such an expiration seems like 
something the client or client consumer should handle re-negotiations.

Thanks a lot,
-n

[0]

2016-01-19 22:10:14,432 INFO  [B.defaultRpcServer.handler=4,queue=1,port=16020] 
shortcircuit.ShortCircuitCache: ShortCircuitCache(0x71bdc547): could not load 
1074037633_BP-1145309065-XXX-1448053136416 due to InvalidToken exception.
org.apache.hadoop.security.token.SecretManager$InvalidToken: access control 
error while attempting to set up short-circuit access to <hfile path> token 
with block_token_identifier (expiryDate=1453194430724, keyId=1508822027, 
userId=hbase, blockPoolId=BP-1145309065-XXX-1448053136416, blockId=1074037633, 
access modes=[READ]) is expired.
at 
org.apache.hadoop.hdfs.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:591)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:490)
at 
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:782)
at 
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:716)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:422)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:333)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:618)
at 
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:844)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:896)
at java.io.DataInputStream.read(DataInputStream.java:149)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:678)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1372)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1591)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1470)
at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:437)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:259)
at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:634)
at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:614)
at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:267)
at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:181)
at 
org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312)
at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:256)
at 
org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:817)
at 
org.apache.hadoop.hbase.regionserver.StoreScanner.seekToNextRow(StoreScanner.java:792)
at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:621)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5486)
at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5637)
at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5424)
at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5410)
at 
org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver$1.next(GroupedAggregateRegionObserver.java:510)
at 
org.apache.phoenix.coprocessor.BaseRegionScanner.next(BaseRegionScanner.java:40)
at 
org.apache.phoenix.coprocessor.BaseRegionScanner.nextRaw(BaseRegionScanner.java:60)
at 
org.apache.phoenix.coprocessor.DelegateRegionScanner.nextRaw(DelegateRegionScanner.java:77)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2395)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)

Reply via email to