Stanley, Thanks.
Btw, I found this jira hdfs-2246, which probably match what I am looking for. Demai on the run On Aug 28, 2014, at 11:34 PM, Stanley Shi <s...@pivotal.io> wrote: > BP-13-7914115-10.122.195.197-14909166276345 is the blockpool information > blk_1073742025 is the block name; > > these names are "private" to teh HDFS system and user should not use them, > right? > But if you really want ot know this, you can check the fsck code to see > whether they are available; > > > On Fri, Aug 29, 2014 at 8:13 AM, Demai Ni <nid...@gmail.com> wrote: >> Stanley and all, >> >> thanks. I will write a client application to explore this path. A quick >> question again. >> Using the fsck command, I can retrieve all the necessary info >> $ hadoop fsck /tmp/list2.txt -files -blocks -racks >> ..... >> BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025 len=8 repl=2 >> [/default/10.122.195.198:50010, /default/10.122.195.196:50010] >> >> However, using getFileBlockLocations(), I can't get the block name/id info, >> such as BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025 >> seem the BlockLocation don't provide the public info here. >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/BlockLocation.html >> >> is there another entry point? somethinig fsck is using? thanks >> >> Demai >> >> >> >> >> On Wed, Aug 27, 2014 at 11:09 PM, Stanley Shi <s...@pivotal.io> wrote: >>> As far as I know, there's no combination of hadoop API can do that. >>> You can easily get the location of the block (on which DN), but there's no >>> way to get the local address of that block file. >>> >>> >>> >>> On Thu, Aug 28, 2014 at 11:54 AM, Demai Ni <nid...@gmail.com> wrote: >>>> Yehia, >>>> >>>> No problem at all. I really appreciate your willingness to help. Yeah. now >>>> I am able to get such information through two steps, and the first step >>>> will be either hadoop fsck or getFileBlockLocations(). and then search the >>>> local filesystem, my cluster is using the default from CDH, which is >>>> /dfs/dn >>>> >>>> I would like to it programmatically, so wondering whether someone already >>>> done it? or maybe better a hadoop API call already implemented for this >>>> exact purpose >>>> >>>> Demai >>>> >>>> >>>> On Wed, Aug 27, 2014 at 7:58 PM, Yehia Elshater <y.z.elsha...@gmail.com> >>>> wrote: >>>>> Hi Demai, >>>>> >>>>> Sorry, I missed that you are already tried this out. I think you can >>>>> construct the block location on the local file system if you have the >>>>> block pool id and the block id. If you are using cloudera distribution, >>>>> the default location is under /dfs/dn ( the value of dfs.data.dir, >>>>> dfs.datanode.data.dir configuration keys). >>>>> >>>>> Thanks >>>>> Yehia >>>>> >>>>> >>>>> On 27 August 2014 21:20, Yehia Elshater <y.z.elsha...@gmail.com> wrote: >>>>>> Hi Demai, >>>>>> >>>>>> You can use fsck utility like the following: >>>>>> >>>>>> hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks >>>>>> >>>>>> This will display all the information you need about the blocks of your >>>>>> file. >>>>>> >>>>>> Hope it helps. >>>>>> Yehia >>>>>> >>>>>> >>>>>> On 27 August 2014 20:18, Demai Ni <nid...@gmail.com> wrote: >>>>>>> Hi, Stanley, >>>>>>> >>>>>>> Many thanks. Your method works. For now, I can have two steps approach: >>>>>>> 1) getFileBlockLocations to grab hdfs BlockLocation[] >>>>>>> 2) use local file system call(like find command) to match the block to >>>>>>> files on local file system . >>>>>>> >>>>>>> Maybe there is an existing Hadoop API to return such info in already? >>>>>>> >>>>>>> Demai on the run >>>>>>> >>>>>>> On Aug 26, 2014, at 9:14 PM, Stanley Shi <s...@pivotal.io> wrote: >>>>>>> >>>>>>>> I am not sure this is what you want but you can try this shell command: >>>>>>>> >>>>>>>> find [DATANODE_DIR] -name [blockname] >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <nid...@gmail.com> wrote: >>>>>>>>> Hi, folks, >>>>>>>>> >>>>>>>>> New in this area. Hopefully to get a couple pointers. >>>>>>>>> >>>>>>>>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3) >>>>>>>>> >>>>>>>>> I am wondering whether there is a interface to get each hdfs block >>>>>>>>> information in the term of local file system. >>>>>>>>> >>>>>>>>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks >>>>>>>>> -racks" to get blockID and its replica on the nodes, such as: repl >>>>>>>>> =3[ /rack/hdfs01, /rack/hdfs02...] >>>>>>>>> >>>>>>>>> With such info, is there a way to >>>>>>>>> 1) login to hfds01, and read the block directly at local file system >>>>>>>>> level? >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>> Demai on the run >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Regards, >>>>>>>> Stanley Shi, >>>>>>>> >>> >>> >>> >>> -- >>> Regards, >>> Stanley Shi, >>> > > > > -- > Regards, > Stanley Shi, >