Hadoop cluster hardware configuration

2012-06-04 Thread praveenesh kumar
Hello all, I am looking forward to build a 5 node hadoop cluster with the following configurations per machine. -- 1. Intel Xeon E5-2609 (2.40GHz/4-core) 2. 32 GB RAM (8GB 1Rx4 PC3) 3. 5 x 900GB 6G SAS 10K hard disk ( total 4.5 TB storage/machine) 4. Ethernet 1GbE connection I would like the

Grouping comparator

2012-06-04 Thread Ajay Srivastava
Hi, Can someone please explain default implementation of grouping comparator i.e. if I do not specify a custom grouping comparator then which comparator is called to decide the grouping for reducer. I searched a lot on web but could not find a satisfactory explanation for its default

SecondaryNameNode not connecting to NameNode : PriviledgedActionException

2012-06-04 Thread ramon.pin
Hello. I'm facing a issue when trying to configure my SecondaryNameNode on a different machine than my NameNode. When both are on the same machine everything works fine but after moving the secondary to a new machine I get: 2012-05-28 09:57:36,832 ERROR

RE: datanode security (v 1.0.3)

2012-06-04 Thread sathyavageeswaran
Can someone guide me on how plug leakage of excess water flow from Pureit on complete consumption of Chlorine -Original Message- From: Sheeba George [mailto:sheeba.geo...@gmail.com] Sent: 04 June 2012 10:59 To: common-user@hadoop.apache.org Subject: Re: datanode security (v 1.0.3) Hi

Re: SecondaryNameNode not connecting to NameNode : PriviledgedActionException

2012-06-04 Thread praveenesh kumar
I am not sure what could be the exact issue but when configuring secondary NN to NN, you need to tell your SNN where the actual NN resides. Try adding - dfs.http.address on your secondary namenode machine having value as NN:port on hdfs-site.xml Port should be on which your NN url is opening -

RE: SecondaryNameNode not connecting to NameNode : PriviledgedActionException

2012-06-04 Thread ramon.pin
I configured dfs.http.address on SNN's hdfs-site.xml but still gets: / STARTUP_MSG: Starting SecondaryNameNode STARTUP_MSG: host = hadoop01/192.168.0.11 STARTUP_MSG: args = [-checkpoint, force] STARTUP_MSG: version = 1.0.3

Re: SecondaryNameNode not connecting to NameNode : PriviledgedActionException

2012-06-04 Thread praveenesh kumar
Try giving value to dfs.secondary.http.address in hdfs-site.xml on your SNN. In your logs, its starting SNN webserver at 0.0.0.0:50090. Its better if we provide which IP it should start at. Also I am assuming you are not having any firewalls enable between these 2 machines right ? Regards,

RE: SecondaryNameNode not connecting to NameNode : PriviledgedActionException

2012-06-04 Thread ramon.pin
Right. No firewalls. This is my 'toy' environment running as virtual machines on my desktop computer. I'm playing with this here because have the same problem on my real cluster. Will try to explicitly configure starting IP for this SNN. -Original Message- From: praveenesh kumar

Re: SecondaryNameNode not connecting to NameNode : PriviledgedActionException

2012-06-04 Thread praveenesh kumar
Also can you share your /etc/hosts file of both the VMs Regards, Praveenesh On Mon, Jun 4, 2012 at 5:35 PM, ramon@accenture.com wrote: Right. No firewalls. This is my 'toy' environment running as virtual machines on my desktop computer. I'm playing with this here because have the same

RE: SecondaryNameNode not connecting to NameNode : PriviledgedActionException

2012-06-04 Thread ramon.pin
/etc/hosts 127.0.0.1 localhost.localdomain localhost ::1 localhost6.localdomain6 localhost6 192.168.0.10 hadoop00 192.168.0.11 hadoop01 192.168.0.12 hadoop02 -Original Message- From: praveenesh kumar [mailto:praveen...@gmail.com] Sent: lunes, 04 de junio de 2012

Re: Hadoop cluster hardware configuration

2012-06-04 Thread Nitin Pawar
if you tell us the purpose of this cluster, then it would be helpful to tell exactly how good it is On Mon, Jun 4, 2012 at 3:57 PM, praveenesh kumar praveen...@gmail.comwrote: Hello all, I am looking forward to build a 5 node hadoop cluster with the following configurations per machine. --

Re: SecondaryNameNode not connecting to NameNode : PriviledgedActionException

2012-06-04 Thread praveenesh kumar
I would say not to use 127.0.0.1 in distributed mode. Comment out the first 2 lines of your /etc/hosts. Rather have your /etc/hosts file like this - Suppose you are on hadoop00 -- there /etc/hosts would look like 192.168.0.10 hadoop00 localhost 192.168.0.11 hadoop01 192.168.0.12 hadoop02 On

RE: SecondaryNameNode not connecting to NameNode : PriviledgedActionException

2012-06-04 Thread ramon.pin
Now I see SNN machine name on the logs. Still refuses to connect to NN but now got a diferent message: PriviledgedActionException as:hadoop cause:java.io.FileNotFoundException: http://hadoop00:50030/getimage?getimage=1 May be something is missing on my NN configuration? 12/06/04 14:13:08 INFO

Re: Hadoop cluster hardware configuration

2012-06-04 Thread praveenesh kumar
On a very high level... we would be utilizing cluster not only for hadoop but for other I/O bound or in-memory operations. That is the reason we are going for SAS hard disks. And we also need to perform lots of computational tasks for which we have RAM kept to 32 GB, which can be increased. So on

Re: SecondaryNameNode not connecting to NameNode : PriviledgedActionException

2012-06-04 Thread praveenesh kumar
Its trying to connect to your NN on port 50030.. I think it should be 50070. In your hdfs-site.xml -- for dfs.http.address -- I am assuming you have given hadoop01:50070, right ? Regards, Praveenesh On Mon, Jun 4, 2012 at 5:50 PM, ramon@accenture.com wrote: Now I see SNN machine name on

Re: What happens when I do not output anything from my mapper

2012-06-04 Thread praveenesh kumar
You can control your map outputs based on any condition you want. I have done that - it worked for me. It could be your code problem that its not working for you. Can you please share your map code or cross-check whether your conditions are correct ? Regards, Praveenesh On Mon, Jun 4, 2012 at

RE: Grouping comparator

2012-06-04 Thread Devaraj k
If you don't specify grouping comparator for your Job, it uses the Output Key Comparator class for grouping. This comparator should be provided if the equivalence rules for keys sorting the intermediates are different from those for grouping keys. Thanks Devaraj

RE: What happens when I do not output anything from my mapper

2012-06-04 Thread Devaraj k
Hi Murat, As Praveenesh explained, you can control the map outputs as you want. map() function will be called for each input i.e map() function invokes multiple times with different inputs in the same mapper. You can check by having the logs in the map function what is happening in it.

Re: What happens when I do not output anything from my mapper

2012-06-04 Thread murat migdisoglu
Hi, Thanks for your answer. After I've read your emails, I decided to clear completely my mapper method to see If I can disable the output of the mapper class at all, but it seems it did not work So, here is my mapper method: @Override public void map(ByteBuffer key, SortedMapByteBuffer,

RE: datanode security (v 1.0.3)

2012-06-04 Thread Tony Dean
Thank you. That did the trick. -Original Message- From: Sheeba George [mailto:sheeba.geo...@gmail.com] Sent: Monday, June 04, 2012 1:29 AM To: common-user@hadoop.apache.org Subject: Re: datanode security (v 1.0.3) Hi Tony , Please take a look at

RE: SecondaryNameNode not connecting to NameNode : PriviledgedActionException

2012-06-04 Thread ramon.pin
Right. Silly mistake Now using 50070 and IT WORKS!!! Thx a lot Praveenesh. I will replicate this solution to my real cluster. -Original Message- From: praveenesh kumar [mailto:praveen...@gmail.com] Sent: lunes, 04 de junio de 2012 14:25 To: common-user@hadoop.apache.org Subject: Re:

Re: SecondaryNameNode not connecting to NameNode : PriviledgedActionException

2012-06-04 Thread shashwat shriparv
did you configure dfs.namenode.secondary.http -address in hdfs-site.xml. On Mon, Jun 4, 2012 at 7:53 PM, ramon@accenture.com wrote: Right. Silly mistake Now using 50070 and IT WORKS!!! Thx a lot Praveenesh. I will replicate this solution to my real cluster. -Original

Re: Yahoo Hadoop Tutorial with new APIs?

2012-06-04 Thread Robert Evans
I am happy to announce that I was able to get the license on the Yahoo! Hadoop tutorial updated from Creative Commons Attribution 3.0 Unported License to Apache 2.0. I have filed HADOOP-8477 https://issues.apache.org/jira/browse/HADOOP-8477 to pull the tutorial into the Hadoop project, and to

Re: What happens when I do not output anything from my mapper - Solution

2012-06-04 Thread murat migdisoglu
Ok, For the ones that faces the problem, here is how I solved the problem: First of all, there was a task created for that on hadoop: https://issues.apache.org/jira/browse/HADOOP-4927 and http://hadoop.apache.org/mapreduce/docs/current/mapred_tutorial.html#Lazy+Output+Creation explains how to

Trying to put 16gb file onto hdfs

2012-06-04 Thread Barry, Sean F
I am trying to put a 16gb file on to hdfs but I was given all of these messages and I don't know why this is happening. Can someone please shed some light on this scenario. Thanks in advance hduser@master:~ hadoop fs -put ~/tests/wiki16gb.txt /user/hduser/wiki/16gb.txt 12/06/04 10:52:05 WARN

Re: Yahoo Hadoop Tutorial with new APIs?

2012-06-04 Thread Jagat
Hello Bobby, Great news !! Thanks for your efforts in handling those legal issues. I will assign myself few JIRA's. To start off we can take reference for dividing the documentation into same modules as original Yahoo Tutorials and adding relevant features which have been incorporated into new

Cannot start name node after turning on hadoop security

2012-06-04 Thread Allan Yan
My local environment: single ubuntu 11.10 desktop version, oracle jdk 7.0_04, MIT kerberos 5, apache hadoop-1.0.2. I am able to get kerberos working, here is my key:

RE: Trying to put 16gb file onto hdfs

2012-06-04 Thread ramon.pin
Hi Sean, It seems your HDFS has not properly started. Go through your HDFS webconsole top verify if NN and all DN are up. You can access that on http://your name node ip:50070 Also ensure yourself that your NN has left Safe Mode before start moving data to HDFS. -Original

Fwd: Cannot start name node after turning on hadoop security

2012-06-04 Thread Allan Yan
I found these two threads from mailing list: http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201202.mbox/browser http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201108.mbox/browser At least they were able to get name node up. Can someone please pointing out why I am

Fwd: Cannot start name node after turning on hadoop security

2012-06-04 Thread Allan Yan
Sorry, the links should be: http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201202.mbox/%3CCAMAD20=oKVRy_pDX6FWm=xvpz1pal0qcfqagssaxq8xugp7...@mail.gmail.com%3E http://lucene.472066.n3.nabble.com/Starting-datanode-in-secure-mode-td3297090.html -Hailun Yan -- Forwarded

mini node in a cluster

2012-06-04 Thread Pat Ferrel
I have a machine that is part of the cluster but I'd like to dedicate it to being the web server and run the db but still have access to starting jobs and getting data out of hdfs. In other words I'd like to have the cores, memory, and disk only minimally affected by running jobs on the

Re: mini node in a cluster

2012-06-04 Thread Tom Melendez
Hi Pat, Sounds like you would just turn off the datanode and the tasktracker. Your config will still point to the Namenode and JT, so you can still launch jobs and read/write from HDFS. You'll probably want to replicate the data off first of course. Thanks, Tom On Mon, Jun 4, 2012 at 2:06 PM,

Re: mini node in a cluster

2012-06-04 Thread Pat Ferrel
Hi Tom, Sounds like the trick. This node is a slave so it's datanode and tasktracker are started from the master. - how do I start the cluster without starting the datanode and the tasktracker on the mini-node slave? Remove it from slaves? - what do I minimally need to start on the

Re: mini node in a cluster

2012-06-04 Thread Tom Melendez
Hi Pat, Sounds like the trick. This node is a slave so it's datanode and tasktracker are started from the master.   - how do I start the cluster without starting the datanode and the tasktracker on the mini-node slave? Remove it from slaves? There's no main cluster software, just don't start

Re: Hadoop cluster hardware configuration

2012-06-04 Thread Nitin Pawar
if you are doing computations using hadoop on a miniscale yes this hardware is good enough. Normally hadoop clusters are pre-occupied with the heavy loads so they are not shared for multiple usage unless your utilization of hadoop is on lower side and then you want to reuse the hardware. On

Re: Trying to put 16gb file onto hdfs

2012-06-04 Thread praveenesh kumar
Check your Datanode logs.. or do hadoop fsck / or hadoop dfsadmin -report to get more details about your HDFS. Seems like DN is down. Regards, Praveenesh On Tue, Jun 5, 2012 at 12:13 AM, ramon@accenture.com wrote: Hi Sean, It seems your HDFS has not properly started. Go through your

RE: What happens when I do not output anything from my mapper

2012-06-04 Thread Devaraj k
The output files should 0 kb size if you use FileOutputFormat/TextOutputFormat. I think your output format writer is writing some meta data in those files. Can you check what is the data present in those files. Can you tell me which output format are you using? Thanks Devaraj