*BP-13-7914115-10.122.195.197-14909166276345 is the blockpool information* *blk_1073742025 <1073742025> is the block name;*
*these names are "private" to teh HDFS system and user should not use them, right?* *But if you really want ot know this, you can check the fsck code to see whether they are available;* On Fri, Aug 29, 2014 at 8:13 AM, Demai Ni <nid...@gmail.com> wrote: > Stanley and all, > > thanks. I will write a client application to explore this path. A quick > question again. > Using the fsck command, I can retrieve all the necessary info > $ hadoop fsck /tmp/list2.txt -files -blocks -racks > ..... > *BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025 <1073742025>* > len=8 repl=2 > [/default/10.122.195.198:50010, /default/10.122.195.196:50010] > > However, using getFileBlockLocations(), I can't get the block name/id > info, such as > *BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025 <1073742025> *seem > the BlockLocation don't provide the public info here. > http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/BlockLocation.html > > is there another entry point? somethinig fsck is using? thanks > > Demai > > > > > On Wed, Aug 27, 2014 at 11:09 PM, Stanley Shi <s...@pivotal.io> wrote: > >> As far as I know, there's no combination of hadoop API can do that. >> You can easily get the location of the block (on which DN), but there's >> no way to get the local address of that block file. >> >> >> >> On Thu, Aug 28, 2014 at 11:54 AM, Demai Ni <nid...@gmail.com> wrote: >> >>> Yehia, >>> >>> No problem at all. I really appreciate your willingness to help. Yeah. >>> now I am able to get such information through two steps, and the first step >>> will be either hadoop fsck or getFileBlockLocations(). and then search >>> the local filesystem, my cluster is using the default from CDH, which is >>> /dfs/dn >>> >>> I would like to it programmatically, so wondering whether someone >>> already done it? or maybe better a hadoop API call already implemented for >>> this exact purpose >>> >>> Demai >>> >>> >>> On Wed, Aug 27, 2014 at 7:58 PM, Yehia Elshater <y.z.elsha...@gmail.com> >>> wrote: >>> >>>> Hi Demai, >>>> >>>> Sorry, I missed that you are already tried this out. I think you can >>>> construct the block location on the local file system if you have the block >>>> pool id and the block id. If you are using cloudera distribution, the >>>> default location is under /dfs/dn ( the value of dfs.data.dir, >>>> dfs.datanode.data.dir configuration keys). >>>> >>>> Thanks >>>> Yehia >>>> >>>> >>>> On 27 August 2014 21:20, Yehia Elshater <y.z.elsha...@gmail.com> wrote: >>>> >>>>> Hi Demai, >>>>> >>>>> You can use fsck utility like the following: >>>>> >>>>> hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks >>>>> >>>>> This will display all the information you need about the blocks of >>>>> your file. >>>>> >>>>> Hope it helps. >>>>> Yehia >>>>> >>>>> >>>>> On 27 August 2014 20:18, Demai Ni <nid...@gmail.com> wrote: >>>>> >>>>>> Hi, Stanley, >>>>>> >>>>>> Many thanks. Your method works. For now, I can have two steps >>>>>> approach: >>>>>> 1) getFileBlockLocations to grab hdfs BlockLocation[] >>>>>> 2) use local file system call(like find command) to match the block >>>>>> to files on local file system . >>>>>> >>>>>> Maybe there is an existing Hadoop API to return such info in already? >>>>>> >>>>>> Demai on the run >>>>>> >>>>>> On Aug 26, 2014, at 9:14 PM, Stanley Shi <s...@pivotal.io> wrote: >>>>>> >>>>>> I am not sure this is what you want but you can try this shell >>>>>> command: >>>>>> >>>>>> find [DATANODE_DIR] -name [blockname] >>>>>> >>>>>> >>>>>> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <nid...@gmail.com> wrote: >>>>>> >>>>>>> Hi, folks, >>>>>>> >>>>>>> New in this area. Hopefully to get a couple pointers. >>>>>>> >>>>>>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3) >>>>>>> >>>>>>> I am wondering whether there is a interface to get each hdfs block >>>>>>> information in the term of local file system. >>>>>>> >>>>>>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks >>>>>>> -racks" to get blockID and its replica on the nodes, such as: repl =3[ >>>>>>> /rack/hdfs01, /rack/hdfs02...] >>>>>>> >>>>>>> With such info, is there a way to >>>>>>> 1) login to hfds01, and read the block directly at local file system >>>>>>> level? >>>>>>> >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> Demai on the run >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Regards, >>>>>> *Stanley Shi,* >>>>>> >>>>>> >>>>> >>>> >>> >> >> >> -- >> Regards, >> *Stanley Shi,* >> >> > -- Regards, *Stanley Shi,*