RE: HDFS: Couldn't obtain the locations of the last block

Liu, Yi A Wed, 10 Sep 2014 05:43:07 -0700

That’s great.

Regards,
Yi Liu

From: Zesheng Wu [mailto:wuzeshen...@gmail.com]
Sent: Wednesday, September 10, 2014 8:25 PM
To: user@hadoop.apache.org
Subject: Re: HDFS: Couldn't obtain the locations of the last block

Hi Yi,

I went through HDFS-4516, and it really solves our problem, thanks very much!

2014-09-10 16:39 GMT+08:00 Zesheng Wu 
<wuzeshen...@gmail.com<mailto:wuzeshen...@gmail.com>>:
Thanks Yi, I will look into HDFS-4516.

2014-09-10 15:03 GMT+08:00 Liu, Yi A 
<yi.a....@intel.com<mailto:yi.a....@intel.com>>:

Hi Zesheng,

I got from an offline email of you and knew your Hadoop version was 2.0.0-alpha 
and you also said “The block is allocated successfully in NN, but isn’t created 
in DN”.
Yes, we may have this issue in 2.0.0-alpha. I suspect your issue is similar 
with HDFS-4516.   And can you try Hadoop 2.4 or later, you should not be able 
to re-produce it for these versions.

From your description, the second block is created successfully and NN would 
flush the edit log info to shared journal and shared storage might persist the 
info, but before reporting back in rpc, there might be timeout to NN from 
shared storage.  So the block exist in shared edit log, but DN doesn’t create 
it in anyway.  On restart, client could fail, because in that Hadoop version, 
client would retry only in the case of NN last block size reported as non-zero 
if it was synced (see more in HDFS-4516).

Regards,
Yi Liu

From: Zesheng Wu [mailto:wuzeshen...@gmail.com<mailto:wuzeshen...@gmail.com>]
Sent: Tuesday, September 09, 2014 6:16 PM
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: HDFS: Couldn't obtain the locations of the last block

Hi,

These days we encountered a critical bug in HDFS which can result in HBase 
can't start normally.
The scenario is like following:
1.  rs1 writes data to HDFS file f1, and the first block is written successfully
2.  rs1 apply to create the second block successfully, at this time, nn1(ann) 
is crashed due to writing journal timeout
3. nn2(snn) isn't become active because of zkfc2 is in abnormal state
4. nn1 is restarted and becomes active
5. During the process of nn1 restarting, rs1 is crashed due to writing to 
safemode nn(nn1)
6. As a result, the file f1 is in abnormal state and the HBase cluster can't 
serve any more

We can use the command line shell to list the file, look like following:

-rw-------   3 hbase_srv supergroup  134217728 2014-09-05 11:32 
/hbase/lgsrv-push/xxx
But when we try to download the file from hdfs, the dfs client complains:

14/09/09 18:12:11 WARN hdfs.DFSClient: Last block locations not available. 
Datanodes might not have reported blocks completely. Will retry for 3 times

14/09/09 18:12:15 WARN hdfs.DFSClient: Last block locations not available. 
Datanodes might not have reported blocks completely. Will retry for 2 times

14/09/09 18:12:19 WARN hdfs.DFSClient: Last block locations not available. 
Datanodes might not have reported blocks completely. Will retry for 1 times

get: Could not obtain the last block locations.

Anyone can help on this?
--
Best Wishes!

Yours, Zesheng

--
Best Wishes!

Yours, Zesheng

--
Best Wishes!

Yours, Zesheng

RE: HDFS: Couldn't obtain the locations of the last block

Reply via email to