Re: HDFS interfaces

2013-06-04 Thread Mahmood Naderan
There are many instances of getFileBlockLocations in hadoop/fs. Can you explain which one is the main? It must be combined with a method of logically splitting the input data along block boundaries, and of launching tasks on worker nodes that are close to the data splits Is this a user level

Re: how to locate the replicas of a file in HDFS?

2013-06-04 Thread Mahmood Naderan
hadoop fsck mytext.txt -files -locations -blocks I expect something like a tag which is attached to each block (say block X) that shows the position of the replicated block of X. The method you mentioned is a user level task. Am I right?   Regards, Mahmood

HDFS edit log NPE

2013-06-04 Thread Robert Dyer
I recently upgraded from 1.0.4 to 1.1.2. Now however my HDFS won't start up. There appears to be something wrong in the edits file. Obviously I can roll back to a previous checkpoint, however it appears checkpointing has been failing for some time and my last check point is over a month old.

Re: HDFS interfaces

2013-06-04 Thread Jay Vyas
Looking in the source, it appears that In HDFS, the Namenode supports getting this info directly via the client, and ultimately communicates block locations to the DFSClient , which is used by the DistributedFileSystem. /** * @see ClientProtocol#getBlockLocations(String, long, long) */

Re: MapReduce on Local FileSystem

2013-06-04 Thread Kun Ling
Hi Agarwal, I once have similar questions, and have done some experiment. Here is my experience: 1. For some applications over MR, like HBase, Hive, which does not need to submit additional files to HDFS, file:/// could work well without any problem (According to my test). 2. For simple MR

Re: How to start developing!

2013-06-04 Thread Lokesh Basu
thanks a lot John *Lokesh Chandra Basu* B. Tech Computer Science and Engineering Indian Institute of Technology, Roorkee India(GMT +5hr 30min) On Mon, Jun 3, 2013 at 10:29 PM, John Lilley john.lil...@redpoint.netwrote: I had asked a similar question recently: ** ** First, follow

Re: how to locate the replicas of a file in HDFS?

2013-06-04 Thread Azuryy Yu
ClientProtocol namenode = DFSClient.createNamenode(conf); HdfsFileStatus hfs = namenode.getFileInfo(your_hdfs_file_name); LocatedBlocks lbs = namenode.getBlockLocations(your_hdfs_file_name, 0, hfs.getLen()); for (LocatedBlock lb : lbs.getLocatedBlocks()) { DatanodeInfo[] info =

RE: how to locate the replicas of a file in HDFS?

2013-06-04 Thread zangxiangyu
hadoop fsck path -files -blocks -locations –racks replace path with real path J From: 一凡 李 [mailto:zhuazhua_...@yahoo.com.cn] Sent: Tuesday, June 04, 2013 12:49 PM To: user@hadoop.apache.org Subject: how to locate the replicas of a file in HDFS? Hi, Could you tell me how to

Re: how to locate the replicas of a file in HDFS?

2013-06-04 Thread Sandeep Nemuri
Try this command hadoop fsck file path -files -blocks On Tue, Jun 4, 2013 at 3:41 PM, zangxiangyu zangxian...@qiyi.com wrote: hadoop fsck * path* -files -blocks -locations –racks ** ** replace path with real path J ** ** *From:* 一凡 李 [mailto:zhuazhua_...@yahoo.com.cn]

Re:

2013-06-04 Thread Lanati, Matteo
Hi again, unfortunately my problem is not solved. I downloaded Hadoop v. 1.1.2a and made a basic configuration as suggested in [1]. No security, no ACLs, default scheduler ... The files are attached. I still have the same error message. I also tried another Java version (6u45 instead of 7u21).

Re:

2013-06-04 Thread Alexander Alten-Lorenz
Hi Matteo, Are you able to add more space to your test machines? Also, what says the pi example (hadoop jar hadoop-examples pi 10 10 ? - Alex On Jun 4, 2013, at 4:34 PM, Lanati, Matteo matteo.lan...@lrz.de wrote: Hi again, unfortunately my problem is not solved. I downloaded Hadoop v.

Re:

2013-06-04 Thread Lanati, Matteo
Hi Alex, you gave me the right perspective ... pi works ;-) . It's finally satisfactory seeing it at work. The job finished without problems. I'll try some other test programs such as grep, to check that there are no problems with input files. Thanks, Matteo On Jun 4, 2013, at 5:43 PM,

Reducer to output only json

2013-06-04 Thread Chengi Liu
Hi, I have the following redcuer class public static class TokenCounterReducer extends ReducerText, Text, Text, Text { public void reduce(Text key, IterableText values, Context context) throws IOException, InterruptedException { //String[] fields = s.split(\t, -1)

RE: HDFS interfaces

2013-06-04 Thread John Lilley
When you use the HDFS client interface to read a file, it automatically figures out which datanodes to contact for reading which blocks. There isn't really a main block. However I have read that the first location listed for each block is the recommended one to read for an outside client.

Docs 404

2013-06-04 Thread Uri Laserson
http://hadoop.apache.org/docs/current/hdfs_user_guide.html -- Uri Laserson, PhD Data Scientist, Cloudera Twitter/GitHub: @laserson +1 617 910 0447 laser...@cloudera.com

Re: How to get the intermediate mapper output file name

2013-06-04 Thread dvohra
The part-m-0,part-m-1 file names are Hadoop naming conventions. To use custom output file names use the MultipleOutputs class. http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html With MultipleOutputs the file name may be customized as

Re: Reducer to output only json

2013-06-04 Thread Mohammad Tariq
If you need to save the JSON as it is then you could implement OutputFormat to create you custom outputformat that'll allow you to write the data as per your wish. Warm Regards, Tariq cloudfront.blogspot.com On Tue, Jun 4, 2013 at 11:39 PM, Chengi Liu chengi.liu...@gmail.com wrote: Hi, I

Re: Reducer to output only json

2013-06-04 Thread Niels Basjes
Have you tried something like this (i do not have a pc here to check this code) context.write(NullWritable, new Text(jsn.toString())); On Jun 4, 2013 8:10 PM, Chengi Liu chengi.liu...@gmail.com wrote: Hi, I have the following redcuer class public static class TokenCounterReducer

Re: Reducer to output only json

2013-06-04 Thread Mohammad Tariq
Yes...This should do the trick. Warm Regards, Tariq cloudfront.blogspot.com On Wed, Jun 5, 2013 at 1:38 AM, Niels Basjes ni...@basjes.nl wrote: Have you tried something like this (i do not have a pc here to check this code) context.write(NullWritable, new Text(jsn.toString())); On Jun 4,

RE: built hadoop! please help with next steps?

2013-06-04 Thread John Lilley
Answered my own question. The Eclipse installs with Centos6 (or with yum) seems to have this problem. A direct download of Eclipse for Java EE works fine. John From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Monday, June 03, 2013 5:49 PM To: user@hadoop.apache.org; Deepak Vohra

Re: Reducer to output only json

2013-06-04 Thread Shahab Yunus
Chengi, You can also see this for pointers: http://java.dzone.com/articles/hadoop-practice Regards, Shahab On Tue, Jun 4, 2013 at 4:15 PM, Mohammad Tariq donta...@gmail.com wrote: Yes...This should do the trick. Warm Regards, Tariq cloudfront.blogspot.com On Wed, Jun 5, 2013 at 1:38

Re: yarn-site.xml and aux-services

2013-06-04 Thread Rahul Bhattacharjee
Going by what I have read ,I think its a general purpose hook of Yarn arch. to run any service in node managers. Hadoop uses this for shuffle service . Other yarn based applications might use this as well. Thanks, Rahul On Wed, Jun 5, 2013 at 4:00 AM, John Lilley john.lil...@redpoint.netwrote:

is time sync required among all nodes?

2013-06-04 Thread Ben Kim
Hi, This is very basic fundamental question. Is time among all nodes needs to be synced? I've never even thought of timing in hadoop cluster but recently experienced my servers going out of sync with time. I know hbase requires time to by synced due to its timestamp action. But I wonder any of