Re: are we able to decommission multi nodes at one time?

2013-04-03 Thread Yanbo Liang
It means that may be some replicas will be stay in under replica state? 2013/4/3 Azuryy Yu azury...@gmail.com bq. then namenode start to copy block replicates on DN-2 to another DN, supposed DN-2. sorry for typo. Correct for it: then namenode start to copy block replicates on DN-1 to

Re: hadoop datanode kernel build and HDFS multiplier factor

2013-04-03 Thread Yanbo Liang
I have done similar experiment for tuning hadoop performance. Many factors will influence the performance such as hadoop configuration, JVM, OS. For Linux kernel related factors, we have found two main focus of attention: 1, Every read operation of file system will trigger one disk write

Re: are we able to decommission multi nodes at one time?

2013-04-03 Thread Azuryy Yu
not at all. so don't worry about that. On Wed, Apr 3, 2013 at 2:04 PM, Yanbo Liang yanboha...@gmail.com wrote: It means that may be some replicas will be stay in under replica state? 2013/4/3 Azuryy Yu azury...@gmail.com bq. then namenode start to copy block replicates on DN-2 to another

Re: MapReduce on Local files

2013-04-03 Thread Mohammad Tariq
Hello Harsh, Thank you for the response. I am sorry for being unclear. Actually I was talking about the backup files which end with ~ I mean these files are not visible normally, but my job is able to see them. Does FileInputFormat behave in the same way for ~ as it does in the case of .

Re: MapReduce on Local files

2013-04-03 Thread Harsh J
You've been misled by the GUI you use, I'm afraid. Many DEs (Desktop Environments) consider ~-suffix files as hidden but not the general standards (try ls for example, or even shell expansions, it will ignore . prefixes, but not ~ suffixes) :) To answer specifically though, no, the base

Re: MapReduce on Local files

2013-04-03 Thread Mohammad Tariq
I see. Thank you so much the clarification Harsh :) Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Wed, Apr 3, 2013 at 3:28 PM, Harsh J ha...@cloudera.com wrote: You've been misled by the GUI you use, I'm afraid. Many DEs (Desktop Environments) consider ~-suffix files

Re: MapReduce on Local files

2013-04-03 Thread Mohammad Tariq
Thank you Azuryy. It was about the files ending with a tilde ~. These files are actually backup files, hidden to the users but my job was able to see them. I am working on Ubuntu(Gnome DE). Nothing serious, just out of curiosity :) Warm Regards, Tariq https://mtariq.jux.com/

Re: Job log location and retention

2013-04-03 Thread MARCOS MEDRADO RUBINELLI
Zheyi, The jobtracker doesn't keep a reference to the job to save memory, but you may still find it in the filesystem. For a default CDH3 installation, it will be in the jobtracker's local filesystem, at /var/log/hadoop-0.20/history/done/ Logs from individual tasks are a little trickier to

NameNode failure and recovery!

2013-04-03 Thread Rahul Bhattacharjee
Hi all, I was reading about Hadoop and got to know that there are two ways to protect against the name node failures. 1) To write to a nfs mount along with the usual local disk. -or- 2) Use secondary name node. In case of failure of NN , the SNN can take in charge. My questions :- 1) SNN is

Re: NameNode failure and recovery!

2013-04-03 Thread Rahul Bhattacharjee
Or both the options are used together. NFS + SNN ? On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee rahul.rec@gmail.com wrote: Hi all, I was reading about Hadoop and got to know that there are two ways to protect against the name node failures. 1) To write to a nfs mount along

RE: NameNode failure and recovery!

2013-04-03 Thread Vijay Thakorlal
Hi Rahul, The SNN does not act as a backup / standby NameNode in the event of failure. The sole purpose of the Secondary NameNode (or as it’s otherwise / more correctly known as the Checkpoint Node) is to perform checkpointing of the current state of HDFS: The SNN retrieves the

Re: NameNode failure and recovery!

2013-04-03 Thread Mohammad Tariq
Hello Rahul, It's always better to have both 1 and 2 together. One common misconception is that SNN is a backup of the NN, which is wrong. SNN is a helper node to the NN. In case of any failure SNN is not gonna take up the NN spot. Yes, we can't guarantee that the SNN fsimage replica will

Re: NameNode failure and recovery!

2013-04-03 Thread Mohammad Tariq
@Vijay : We seem to be in 100% sync though :) Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Wed, Apr 3, 2013 at 8:27 PM, Mohammad Tariq donta...@gmail.com wrote: Hello Rahul, It's always better to have both 1 and 2 together. One common misconception is that

Re: getAllocatedContainers() is not returning when ran against 2.0.3-alpha

2013-04-03 Thread Hitesh Shah
If I understand your question, you are expecting all the containers to be allocated in one go? Or are you seeing your application hang because it asked for 10 containers but it only received a total of 9 even after repeated calls to the RM? There is no guarantee that you will be allocated

Re: NameNode failure and recovery!

2013-04-03 Thread Rahul Bhattacharjee
Thanks to all of you for precise and complete responses. S ​o in case of failure we have to bring another backup system up with the fsimage and edit logs from the NFS filer. SNN stays as is for the new NN. Thanks, Rahul​ On Wed, Apr 3, 2013 at 8:38 PM, Azuryy Yu azury...@gmail.com wrote: for

Re: NameNode failure and recovery!

2013-04-03 Thread Harsh J
There is a 3rd, most excellent way: Use HDFS's own HA, see http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html :) On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee rahul.rec@gmail.com wrote: Hi all, I was reading about Hadoop and got to

Re: Linux io scheduler

2013-04-03 Thread Patai Sangbutsarakum
Thanks Chris for response, I would love to try out. What would be the metric(s) that we will be use to measure if we improve thing.? On Tue, Apr 2, 2013 at 11:24 AM, Chris Embree cemb...@gmail.com wrote: I assume your talking about the I/O scheduler. Based on normal advice, only change this

Re: NameNode failure and recovery!

2013-04-03 Thread shashwat shriparv
If you are not in position to go for HA just keep your checkpoint period shorter to have recent data recoverable from SNN. and you always have a option hadoop namenode -recover try this on testing cluster and get versed to it. and take backup of image at some solid state storage. ∞ Shashwat

Error while running MapR program on multinode configuration

2013-04-03 Thread Varsha Raveendran
Hello, I am facing this error while trying to run a jar file on hadoop : 13/04/04 01:48:01 INFO mapred.JobClient: Cleaning up the staging area hdfs://MT2012158:54310/app/hadoop/tmp/mapred/staging/hduser/.staging/job_201304032344_0008 Exception in thread main

Re: Error while running MapR program on multinode configuration

2013-04-03 Thread Mohammad Tariq
Hello ma'am, Please make sure that you don't have : in your files. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Thu, Apr 4, 2013 at 1:50 AM, Varsha Raveendran varsha.raveend...@gmail.com wrote: Hello, I am facing this error while trying to run a jar file on

Re: Error while running MapR program on multinode configuration

2013-04-03 Thread Varsha Raveendran
But I do not have a : in any of my file names. What could the other reasons be? I am not able to debug the error.. Thank you for replying. On Thu, Apr 4, 2013 at 2:02 AM, Mohammad Tariq donta...@gmail.com wrote: Hello ma'am, Please make sure that you don't have : in your files. Warm

Re: Error while running MapR program on multinode configuration

2013-04-03 Thread Mohammad Tariq
Which version are you using? Could you plz show me your code, if possible for you. Also, how are you running the job? For a detailed explanation you might find this https://issues.apache.org/jira/browse/HDFS-13 useful. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Thu,

Re: Error while running MapR program on multinode configuration

2013-04-03 Thread Varsha Raveendran
Thanks for replying! version : hadoop 1.1.1 I am creating a jar file called ga_test.jar and then placing it in the build path of another program. Also, I am using eclipse to create another jar file including ga_test.jar as a reference library. I do not know why hadoop is taking the filename as

Re: Error while running MapR program on multinode configuration

2013-04-03 Thread Mohammad Tariq
It seems to be a non-hadoop issue to me. Is the jar which you are finally creating a runnable jar?We can't reference external classes(jars) in a runnable jar file. Everything must be contained inside the jar itself. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Thu, Apr

Re: Error while running MapR program on multinode configuration

2013-04-03 Thread Mohammad Tariq
Try to run your code directly from eclipse once (without creating the jar). Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Thu, Apr 4, 2013 at 4:06 AM, Mohammad Tariq donta...@gmail.com wrote: It seems to be a non-hadoop issue to me. Is the jar which you are finally

Re: Error while running MapR program on multinode configuration

2013-04-03 Thread Varsha Raveendran
Thank you! You are right! I created the jar file using the command line and not eclipse and it worked! On Thu, Apr 4, 2013 at 4:06 AM, Mohammad Tariq donta...@gmail.com wrote: It seems to be a non-hadoop issue to me. Is the jar which you are finally creating a runnable jar?We can't

Re: Error while running MapR program on multinode configuration

2013-04-03 Thread Mohammad Tariq
As expected :) Actually this is something related to Eclipse. When it comes to the creation of executable jars containing external jars we need to be a bit careful. One thing which you could probably try is to select the Package required libraries into generated jar option while exporting your

Re: Error while running MapR program on multinode configuration

2013-04-03 Thread yypvsxf19870706
Hi all However,I do think the export jar contains external libraries is too huge to be submitted to the MP by the client. So It comes to the conclusion that the jar without the external jars could be suited the situation ,however the runnable jar brings the errors. How to solve the

Re: Error while running MapR program on multinode configuration

2013-04-03 Thread Mohammad Tariq
DistributedCache?? Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Thu, Apr 4, 2013 at 5:01 AM, yypvsxf19870706 yypvsxf19870...@gmail.comwrote: Hi all However,I do think the export jar contains external libraries is too huge to be submitted to the MP by the

Re: fsImage editsLog questions

2013-04-03 Thread Sai Sai
1. Will fsImage maintain the data/metadata of name node. 2. Will any input files be stored in fsImage. 3. When a namenode goes down will all the data in the name node go down or just the meta data only and what will happen to fsimage editslog. 5. Is the fsimage file which is also maintained as

Re: NameNode failure and recovery!

2013-04-03 Thread Rahul Bhattacharjee
Thats also doable , reducing the checkpoint period would also have have some amount of edit log loss and how short should be the checkpoint interval has to be evaluated.I think the good way to go , in case HA is not doable is SNN and secondary storage NFS. Thanks, Rahul On Thu, Apr 4, 2013 at

Re: Error while running MapR program on multinode configuration

2013-04-03 Thread yypvsxf19870706
Hi Mohammad Thanks for your response . As far as I know,the jars which the client submit to the MP through the net in DistributedCache mode because of the jar localization . The point that I aim at is we need to reduce the size of the objects we submit through network ,which will

Re: are we able to decommission multi nodes at one time?

2013-04-03 Thread Henry Junyoung Kim
thanks for all. my strategies for removing from 15 DN to 8 DN. 1. kill two DNs at same time. : NN will detect nodes' down and he will try to retain replication factors of lost blocks. 2. check your NN web UI. there is an info to let you know counts of under-replicated blocks. 3. if it is