Re: Hadoop HA

2012-05-22 Thread Todd Lipcon
Hi Martinus, Hadoop HA is available in Hadoop 2.0.0. This release is currently being voted on in the community. You can read more here: http://www.cloudera.com/blog/2012/03/high-availability-for-the-hadoop-distributed-file-system-hdfs/ -Todd On Mon, May 21, 2012 at 11:24 PM, Martinus Martinus

Re: Hadoop HA

2012-05-22 Thread Martinus Martinus
Hi Todd, Thanks for your answer. Is that will have the same capability as the commercial M5 of MapR : http://www.mapr.com/products/why-mapr ? Thanks. On Tue, May 22, 2012 at 2:26 PM, Todd Lipcon t...@cloudera.com wrote: Hi Martinus, Hadoop HA is available in Hadoop 2.0.0. This release is

Re: Hadoop HA

2012-05-22 Thread Todd Lipcon
On Tue, May 22, 2012 at 12:08 AM, Martinus Martinus martinus...@gmail.com wrote: Hi Todd, Thanks for your answer. Is that will have the same capability as the commercial M5 of MapR : http://www.mapr.com/products/why-mapr ? I can't speak to a closed source product's feature set. But, the 2.0.0

Re: namenode directory disappear after machines restart

2012-05-22 Thread Mohammad Tariq
Hello Brendan, Do as suggested by Marcos..If you do not set these properties, Hadoop uses tmp directory by default..Apart from setting these properties in your hdfs-site.xml file add the following property in your core-site.xml file - property namehadoop.tmp.dir/name

Re: Hadoop HA

2012-05-22 Thread Ted Dunning
No. 2.0.0 will not have the same level of ha as MapR. Specifically, the job tracker hasn't been addressed and the name node Issues have only been partially addressed. On May 22, 2012, at 8:08 AM, Martinus Martinus martinus...@gmail.com wrote: Hi Todd, Thanks for your answer. Is that will

RE: namenode directory disappear after machines restart

2012-05-22 Thread Brendan cheng
Thanks and it works! I wonder where can we find all the settings.  I check the code for hdfs-default.xml but it doesn't have the settings you mentioned. Brendan From: donta...@gmail.com Date: Tue, 22 May 2012 13:03:17 +0530 Subject: Re: namenode

Hadoop Debugging in LocalMode (Breakpoints not reached)

2012-05-22 Thread Björn-Elmar Macek
Hi there, i am currently trying to get rid of bugs in my Hadoop program by debugging it. Everything went fine til some point yesterday. I dont know what exactly happened, but my program does not stop at breakpoints within the Reducer and also not within the RawComparator for the values

Re: namenode directory disappear after machines restart

2012-05-22 Thread Mohammad Tariq
That's great..The best way to get these kind of info is to ask questions on the mailing list whenever we face any problem.There are certain things that are not documented anywhere. I had faced a lot of problems initially, but the community and the people are really great.There are so many

Re: Storing millions of small files

2012-05-22 Thread Wasif Riaz Malik
Hi, Hi Brendan, The number of files that can be stored in HDFS is limited by the size of the NameNode's RAM. The downside with storing small files is that you would saturate the NameNode's RAM with a small data set (sum of the size of all your small files). However, you can store around 100

Re: Storing millions of small files

2012-05-22 Thread Mohammad Tariq
Hi Brendan, Every file, directory and block in HDFS is represented as an object in the namenode’s memory, each of which occupies 150 bytes.When we store many small files in the HDFS, these small files occupy a large portion of the namespace(large overhead on namenode). As a consequence, the

RE: Hadoop Debugging in LocalMode (Breakpoints not reached)

2012-05-22 Thread Jayaseelan E
From: Björn-Elmar Macek [mailto:ma...@cs.uni-kassel.de] Sent: Tuesday, May 22, 2012 3:12 PM To: hdfs-user@hadoop.apache.org Subject: Hadoop Debugging in LocalMode (Breakpoints not reached) Hi there, i am currently trying to get rid of bugs in my Hadoop program

Re: Storing millions of small files

2012-05-22 Thread Harsh J
Brendan, The issue with using lots of small files is that your processing overhead increases (repeated, avoidable file open-read(little)-close calls). HDFS is also used by those who wish to also heavily process the data they've stored and with a huge number of files such a process is not gonna be

Re: namenode directory disappear after machines restart

2012-05-22 Thread Harsh J
Brendan, The hdfs-default.xml does have dfs.name.dir listed: http://hadoop.apache.org/common/docs/current/hdfs-default.html. The configuration is also mentioned on the official tutorial: http://hadoop.apache.org/common/docs/current/cluster_setup.html#Configuration+Files On Tue, May 22, 2012 at

Re: Storing millions of small files

2012-05-22 Thread Keith Wiley
In addition to the responses already provided, there is another downside to using hadoop with numerous files: it takes much longer to run a hadoop job! Starting a hadoop job consists of communicating between the driver (which runs on a client machine outside the cluster) and the namenode to

Re: Storing millions of small files

2012-05-22 Thread M. C. Srivas
Brendan, since you are looking for a distr file system that can store multi millions of files, try out MapR. A few customers have actually crossed over 1 trillion files without hitting problems. Small files or large files are handled equally well. Of course, if you are doing map-reduce, it is