Stanley, 

Thanks. 

Btw, I found this jira hdfs-2246, which probably match what I am looking for.  

Demai on the run

On Aug 28, 2014, at 11:34 PM, Stanley Shi <s...@pivotal.io> wrote:

> BP-13-7914115-10.122.195.197-14909166276345 is the blockpool information
> blk_1073742025 is the block name;
> 
> these names are "private" to teh HDFS system and user should not use them, 
> right?
> But if you really want ot know this, you can check the fsck code to see 
> whether they are available;
> 
> 
> On Fri, Aug 29, 2014 at 8:13 AM, Demai Ni <nid...@gmail.com> wrote:
>> Stanley and all,
>> 
>> thanks. I will write a client application to explore this path. A quick 
>> question again. 
>> Using the fsck command, I can retrieve all the necessary info
>> $ hadoop fsck /tmp/list2.txt -files -blocks -racks
>> .....
>>  BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025 len=8 repl=2
>> [/default/10.122.195.198:50010, /default/10.122.195.196:50010]
>> 
>> However, using getFileBlockLocations(), I can't get the block name/id info, 
>> such as  BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025
>> seem the BlockLocation don't provide the public info here. 
>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/BlockLocation.html
>> 
>> is there another entry point? somethinig fsck is using? thanks 
>> 
>> Demai
>> 
>> 
>> 
>> 
>> On Wed, Aug 27, 2014 at 11:09 PM, Stanley Shi <s...@pivotal.io> wrote:
>>> As far as I know, there's no combination of hadoop API can do that.
>>> You can easily get the location of the block (on which DN), but there's no 
>>> way to get the local address of that block file.
>>> 
>>> 
>>> 
>>> On Thu, Aug 28, 2014 at 11:54 AM, Demai Ni <nid...@gmail.com> wrote:
>>>> Yehia,
>>>> 
>>>> No problem at all. I really appreciate your willingness to help. Yeah. now 
>>>> I am able to get such information through two steps, and the first step 
>>>> will be either hadoop fsck or getFileBlockLocations(). and then search the 
>>>> local filesystem, my cluster is using the default from CDH, which is 
>>>> /dfs/dn
>>>> 
>>>> I would like to it programmatically, so wondering whether someone already 
>>>> done it? or maybe better a hadoop API call already implemented for this 
>>>> exact purpose
>>>> 
>>>> Demai
>>>> 
>>>> 
>>>> On Wed, Aug 27, 2014 at 7:58 PM, Yehia Elshater <y.z.elsha...@gmail.com> 
>>>> wrote:
>>>>> Hi Demai,
>>>>> 
>>>>> Sorry, I missed that you are already tried this out. I think you can 
>>>>> construct the block location on the local file system if you have the 
>>>>> block pool id and the block id. If you are using cloudera distribution, 
>>>>> the default location is under /dfs/dn ( the value of dfs.data.dir, 
>>>>> dfs.datanode.data.dir configuration keys).
>>>>> 
>>>>> Thanks
>>>>> Yehia 
>>>>> 
>>>>> 
>>>>> On 27 August 2014 21:20, Yehia Elshater <y.z.elsha...@gmail.com> wrote:
>>>>>> Hi Demai,
>>>>>> 
>>>>>> You can use fsck utility like the following:
>>>>>> 
>>>>>> hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks
>>>>>> 
>>>>>> This will display all the information you need about the blocks of your 
>>>>>> file.
>>>>>> 
>>>>>> Hope it helps.
>>>>>> Yehia
>>>>>> 
>>>>>> 
>>>>>> On 27 August 2014 20:18, Demai Ni <nid...@gmail.com> wrote:
>>>>>>> Hi, Stanley,
>>>>>>> 
>>>>>>> Many thanks. Your method works. For now, I can have two steps approach:
>>>>>>> 1) getFileBlockLocations to grab hdfs BlockLocation[]
>>>>>>> 2) use local file system call(like find command) to match the block to 
>>>>>>> files on local file system .
>>>>>>> 
>>>>>>> Maybe there is an existing Hadoop API to return such info in already?
>>>>>>> 
>>>>>>> Demai on the run
>>>>>>> 
>>>>>>> On Aug 26, 2014, at 9:14 PM, Stanley Shi <s...@pivotal.io> wrote:
>>>>>>> 
>>>>>>>> I am not sure this is what you want but you can try this shell command:
>>>>>>>> 
>>>>>>>> find [DATANODE_DIR] -name [blockname]
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <nid...@gmail.com> wrote:
>>>>>>>>> Hi, folks,
>>>>>>>>> 
>>>>>>>>> New in this area. Hopefully to get a couple pointers.
>>>>>>>>> 
>>>>>>>>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>>>>>>>>> 
>>>>>>>>> I am wondering whether there is a interface to get each hdfs block 
>>>>>>>>> information in the term of local file system.
>>>>>>>>> 
>>>>>>>>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks 
>>>>>>>>> -racks" to get blockID and its replica on the nodes, such as: repl 
>>>>>>>>> =3[ /rack/hdfs01, /rack/hdfs02...]
>>>>>>>>> 
>>>>>>>>>  With such info, is there a way to
>>>>>>>>> 1) login to hfds01, and read the block directly at local file system 
>>>>>>>>> level?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> 
>>>>>>>>> Demai on the run
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> Regards,
>>>>>>>> Stanley Shi,
>>>>>>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Regards,
>>> Stanley Shi,
>>> 
> 
> 
> 
> -- 
> Regards,
> Stanley Shi,
> 

Reply via email to