high availability

2013-10-11 Thread Koert Kuipers
i have been playing with high availability using journalnodes and 2 masters both running namenode and hbase master. when i kill the namenode and hbase-master processes on the active master, the failover is perfect. hbase never stops and a running map-reduce jobs keeps going. this is impressive! h

Re: Hadoop Jobtracker heap size calculation and OOME

2013-10-11 Thread Reyane Oukpedjo
Hi there, I had a similar issue with hadoop-1.2.0 JobTracker keep crashing until I set HADOOP_HEAPSIZE="2048" I did not have this kind of issue with previous versions. But you can try this if you have memory and see. In my case the issue was gone after I set as above. Thanks Reyane OUKPEDJO

Re: Create Multiple VM's on Mac

2013-10-11 Thread Andre Kelpe
Have a look at our vagrant hadoop cluster, that does just that (using ubuntu though): https://github.com/Cascading/vagrant-cascading-hadoop-cluster -- André On Sat, Oct 12, 2013 at 12:33 AM, Raj Hadoop wrote: > All, > > I have a CentOS VM image and want to replicate it four times on my Mac > co

Re: Create Multiple VM's on Mac

2013-10-11 Thread Yusaku Sako
Raj & Gary, For setting up multiple VMs on a local computer from a VM image, I highly recommend Vagrant (http://www.vagrantup.com/). It lets you easily create and start up multiple VMs with unique IP addresses and host names from a single image, save/revert to named snapshots, etc. Ambari Quick S

Re: Create Multiple VM's on Mac

2013-10-11 Thread Gary B
Hi Raj I want to do the same. Can we collaborate Thanks Gary 7327636549 On Oct 11, 2013 6:34 PM, "Raj Hadoop" wrote: > All, > > I have a CentOS VM image and want to replicate it four times on my Mac > computer. How > can I set it up so that I can have 4 individual machines that can be used > as

Create Multiple VM's on Mac

2013-10-11 Thread Raj Hadoop
All, I have a CentOS VM image and want to replicate it four times on my Mac computer. How can I set it up so that I can have 4 individual machines that can be used as nodes in my Hadoop cluster. Please advise. Thanks, Raj

Writing to multiple directories in hadoop

2013-10-11 Thread jamal sasha
Hi, I am trying to separate my output from reducer to different folders.. My dirver has the following code: FileOutputFormat.setOutputPath(job, new Path(output)); //MultipleOutputs.addNamedOutput(job, namedOutput, outputFormatClass, keyClass, valueClass) //MultipleOutputs

Re: Multiple context.write inmapper

2013-10-11 Thread jamal sasha
never mind.. found a bug :D On Fri, Oct 11, 2013 at 12:54 PM, jamal sasha wrote: > Hi.. > > In my mapper function.. > Can i have multiple context.write()... > > So... > > public void map(LongWritable key, Text value, Context context) throws > IOException, InterruptedException ,NullPointerExcep

Multiple context.write inmapper

2013-10-11 Thread jamal sasha
Hi.. In my mapper function.. Can i have multiple context.write()... So... public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException ,NullPointerException{ .. //processing/... context.write(k1,v1); context.write(k2,v2); } I thought we could do th

Re: Map Reduce Job fails

2013-10-11 Thread Srinivas Chamarthi
issue with /etc/hosts files. thx for letting me explore on my own. understood lot of internals. On Fri, Oct 11, 2013 at 3:28 AM, Srinivas Chamarthi < srinivas.chamar...@gmail.com> wrote: > from the stack trace, I believe, it is trying to start/connect the > ApplicationMaster and fails to connect

Hadoop Jobtracker heap size calculation and OOME

2013-10-11 Thread Viswanathan J
Hi, I'm running a 14 nodes of Hadoop cluster with datanodes,tasktrackers running in all nodes. *Apache Hadoop :* 1.2.1 It shows the heap size currently as follows: *Cluster Summary (Heap Size is 5.7/8.89 GB)* * * In the above summary what is the *8.89* GB defines? Is the *8.89* defines maximum

Re: State of Art in Hadoop Log aggregation

2013-10-11 Thread Sandy Ryza
Just a clarification: Cloudera Manager is now free for any number of nodes. Ref: http://www.cloudera.com/content/cloudera/en/products/cloudera-manager.html -Sandy On Fri, Oct 11, 2013 at 7:05 AM, DSuiter RDX wrote: > Sagar, > > It sounds like you want a management console. We are using Clouder

Hadoop Jobtracker cluster summary of heap size and OOME

2013-10-11 Thread Viswanathan J
Hi, I'm running a 14 nodes Hadoop cluster with tasktrackers running in all nodes. Have set the jobtracker default memory size in hadoop-env.sh *HADOOP_HEAPSIZE="1024"* * * Have set the mapred.child.java.opts value in mapred-site.xml as, mapred.child.java.opts -Xmx2048m -- Regards, Viswa.J

Re: State of Art in Hadoop Log aggregation

2013-10-11 Thread DSuiter RDX
Sagar, It sounds like you want a management console. We are using Cloudera Manager, but for 200 nodes you would need to license it, it is only free up to 50 nodes. The FOSS version of this is Ambari, iirc. http://incubator.apache.org/ambari/ Flume will provide a Hadoop-integrated pipeline for in

Re: State of Art in Hadoop Log aggregation

2013-10-11 Thread Alexander Alten-Lorenz
Hi, http://flume.apache.org - Alex On Oct 11, 2013, at 7:36 AM, Sagar Mehta wrote: > Hi Guys, > > We have fairly decent sized Hadoop cluster of about 200 nodes and was > wondering what is the state of art if I want to aggregate and visualize > Hadoop ecosystem logs, particularly > Tasktrack

RE: State of Art in Hadoop Log aggregation

2013-10-11 Thread Smith, Joshua D.
I've used Splunk in the past for log aggregation. It's commercial/proprietary, but I think there's a free version. http://www.splunk.com/ From: Raymond Tay [mailto:raymondtay1...@gmail.com] Sent: Friday, October 11, 2013 1:39 AM To: user@hadoop.apache.org Subject: Re: State of Art in Hadoop Log

Re: Job initialization failed: java.lang.NullPointerException at resolveAndAddToTopology

2013-10-11 Thread DSuiter RDX
It looks like you are correct, and I did not have the right solution, I apologize. I'm not sure if the other nodes need to be involved either. Now I'm hoping someone with deeper knowledge will step in, because I'm curious also! Some of the most knowledgeable people on here are on US Pacific Time, s

Re: Job initialization failed: java.lang.NullPointerException at resolveAndAddToTopology

2013-10-11 Thread fab wol
this line: 2013-10-11 10:24:53,033 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:mapred (auth:SIMPLE) cause:java.io.IOException: java.lang.NullPointerException is imho indicating that i am using the user "mapred" for executing (fyi: submitting the job from t

Re: Job initialization failed: java.lang.NullPointerException at resolveAndAddToTopology

2013-10-11 Thread DSuiter RDX
The user running the job (might not be your username depending on your setup) does not appear to have executable permissions on the jobtracker cluster topology python script - I'm basing this on the lines: 2013-10-11 10:24:53,035 WARN org.apache.hadoop.net.ScriptBasedMapping: Exception running /ru

Job initialization failed: java.lang.NullPointerException at resolveAndAddToTopology

2013-10-11 Thread fab wol
Hey everyone, I've got supplied with a decent ten node CDH 4.4 cluster, only 7 days old, and someone tried some HBase stuff on it. Now I wanted to try some MR Stuff on it, but starting a Job is already not possible (even the wordcount example). The error log of the jobtracker produces a log 700k li

any one has ever mounted hdfs out via nfs?

2013-10-11 Thread douxin
Hi guys, I am working on doing mount a hdfs to a remote host (say hdfs in hostA and I need to mount it to local path in hostB)? I noticed hdfs-nfs-proxy(https://github.com/cloudera/hdfs-nfs-proxy) could make that happen. but I got some doubts 1, when I mount remote hdfs to mo

Re: Improving MR job disk IO

2013-10-11 Thread DSuiter RDX
So, perhaps this has been thought of, but perhaps not. It is my understanding that grep is usually sorting things one line at a time. As I am currently experimenting with Avro, I am finding that the local grep function does not handle it well at all, because it is one long line essentially, so wor

Re: Map Reduce Job fails

2013-10-11 Thread Srinivas Chamarthi
from the stack trace, I believe, it is trying to start/connect the ApplicationMaster and fails to connect to it. I am not sure if this is related to ec2 loopback adapter. On Fri, Oct 11, 2013 at 12:22 AM, Srinivas Chamarthi < srinivas.chamar...@gmail.com> wrote: > I have a 2 node cluster (HDP1,

Map Reduce Job fails

2013-10-11 Thread Srinivas Chamarthi
I have a 2 node cluster (HDP1, HDP2) as mentioned below. HDP 1 1.name node , 2.data node, 3. node manager 4. resource manager HDP 2 1. node manager 2. data node when I submit the map reduce job on HDP1 , the job runs on node HDP2 which is fine. But the job fails and in the userlogs/syslogs of