Re: JavaDocs for DistCp (or similar)

2010-02-17 Thread Tsz Wo (Nicholas), Sze
Oops, DistCp.main(..) calls System.exit(..) at the end. So it would also terminate your Java program. It probably is not desirable. You may still use similar codes as the ones in DistCp.main(..) as shown below. However, they are not stable APIs. //DistCp.main public static void main(Stri

Re: JavaDocs for DistCp (or similar)

2010-02-17 Thread Tsz Wo (Nicholas), Sze
Hi Balu, Unfortunately, DistCp does not have a public Java API. One simple way is to invoke DistCp.main(args) in your java program, where args is an array of the string arguments you would pass in the command line. Hope this helps. Nicholas Sze - Original Message > From: Balu Vell

JavaDocs for DistCp (or similar)

2010-02-17 Thread Balu Vellanki
Hi Folks Currently we use distCp to transfer files between two hadoop clusters. I have a perl script which calls a system command “hadoop distcp” to achieve this. Is there a Java Api to do distCp, so that we can avoid system calls from our java code? Thanks Balu

Developing cross-component patches post-split

2010-02-17 Thread tiru
-- View this message in context: http://old.nabble.com/Developing-cross-component-patches-post-split-tp27634796p27634796.html Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: Hadoop Streaming File-not-found error on Cloudera's training VM

2010-02-17 Thread Dan Starr
Todd, Thanks! This solved it. -Dan On Wed, Feb 17, 2010 at 8:00 PM, Todd Lipcon wrote: > Hi Dan, > > This is actually a bug in the release you're using. Please run: > > $ sudo apt-get update > $ sudo apt-get install hadoop-0.20 > > Then restart the daemons (or the entire VM) and give it another

Re: Hadoop Streaming File-not-found error on Cloudera's training VM

2010-02-17 Thread Todd Lipcon
Hi Dan, This is actually a bug in the release you're using. Please run: $ sudo apt-get update $ sudo apt-get install hadoop-0.20 Then restart the daemons (or the entire VM) and give it another go. Thanks -Todd On Wed, Feb 17, 2010 at 7:56 PM, Dan Starr wrote: > Yes, I have tried that when pas

Re: Hadoop Streaming File-not-found error on Cloudera's training VM

2010-02-17 Thread Dan Starr
Yes, I have tried that when passing the script. Just now I tried: hadoop jar /usr/lib/hadoop-0.20/contrib/streaming/hadoop-0.20.1+133-streaming.jar -mapper blah.py -reducer org.apache.hadoop.mapred.lib.IdentityReducer -input test_input/* -output output -file blah.py And got this error for a map

Re: Hadoop Streaming File-not-found error on Cloudera's training VM

2010-02-17 Thread Todd Lipcon
Are you passing the python script to the cluster using the -file option? eg -mapper foo.py -file foo.py Thanks -Todd On Wed, Feb 17, 2010 at 7:45 PM, Dan Starr wrote: > Hi, I've tried posting this to Cloudera's community support site, but > the community website getsatisfaction.com returns vario

Hadoop Streaming File-not-found error on Cloudera's training VM

2010-02-17 Thread Dan Starr
Hi, I've tried posting this to Cloudera's community support site, but the community website getsatisfaction.com returns various server errors at the moment.  I believe the following is an issue related to my environment within Cloudera's Training virtual machine. Despite having success running Had

Re: Pass the TaskId from map to Reduce

2010-02-17 Thread ANKITBHATNAGAR
Hi Don, Thanks for your reply. I already tried this approach, however the the issue that i am facing that I was expecting all the maps to finish before any reduce starts.This is not happening for me. It looks like as one map finishes reduce starts. Thats why I called close().? Could you tell me wh

Re: Pass the TaskId from map to Reduce

2010-02-17 Thread Don Bosco
Hi Ankit, For your problem, you can use "getJobId();" in reduce(), then you will have the unique name and you can process the file in the map reduce. ANKITBHATNAGAR wrote: > > Hi, > > I was working on a scenario where in I am generating a file in close() > function of my Map implementation. >

Re: MiniDFSCluster accessed via hdfs:// URL

2010-02-17 Thread Philip Zeyliger
Out of curiosity, what was the crux of the problem? -- Philip On Wed, Feb 17, 2010 at 4:17 PM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: > Ok, I got this working... Thanks Philip! > > On Wed, Feb 17, 2010 at 4:01 PM, Jason Rutherglen > wrote: > > Philip, > > > > Thanks... I examined

Re: MiniDFSCluster accessed via hdfs:// URL

2010-02-17 Thread Jason Rutherglen
Ok, I got this working... Thanks Philip! On Wed, Feb 17, 2010 at 4:01 PM, Jason Rutherglen wrote: > Philip, > > Thanks... I examined your patch, however I don't see the difference > between it and what I've got currently which is: > > Configuration conf = new Configuration(); > MiniDFSCluster dfs

Re: MiniDFSCluster accessed via hdfs:// URL

2010-02-17 Thread Jason Rutherglen
Philip, Thanks... I examined your patch, however I don't see the difference between it and what I've got currently which is: Configuration conf = new Configuration(); MiniDFSCluster dfs = new MiniDFSCluster(conf, 1, true, null); URI uri = dfs.getFileSystem().getUri(); System.out.println("uri:" +

Re: LZO compression for Map output in Hadoop 0.20+?

2010-02-17 Thread Arun C Murthy
Use the following knobs: mapred.compress.map.output = true mapred.map.output.compression.codec = org.apache.hadoop.io.compress.LzoCodec or call jobConf.setMapOutputCompressorClass(LzoCodec.class); You will need the native hadoop-gpl-compression library installed on all machines from http

Question about Join.java example

2010-02-17 Thread Raymond Jennings III
Is there a typo in the Join.java example that comes with hadoop? It has the line: JobConf jobConf = new JobConf(getConf(), Sort.class); Shouldn't that be Join.class ? Is there an equivalent example that uses the later API instead of the deprecated calls?

Re: Why is $JAVA_HOME/lib/tools.jar in the classpath?

2010-02-17 Thread Aaron Kimball
Thomas, What version of Hadoop are you building Debian packages for? If you're taking Cloudera's existing debs and modifying them, these include a backport of Sqoop (from Apache's trunk) which uses the rt tools.jar to compile auto-generated code at runtime. Later versions of Sqoop (including the o

Re: Hadoop automatic job status check and notification?

2010-02-17 Thread Edward Capriolo
On Wed, Feb 17, 2010 at 1:03 PM, jiang licht wrote: > Amogh, this really helps me a lot! Thanks! > > So, in summary, I guess there are the following options to do job > notification or more generally job management stuff. I also guess Oozie / > cascading is the better choice when we need to hand

Re: LZO compression for Map output in Hadoop 0.20+?

2010-02-17 Thread himanshu chandola
Haven't seen the part 2. I think this was complete. Morpheus: Do you believe in fate, Neo? Neo: No. Morpheus: Why Not? Neo: Because I don't like the idea that I'm not in control of my life. - Original Message From: jiang licht To: common-user@hadoop.apache.org Sent: Wed, February 17

Re: Hadoop automatic job status check and notification?

2010-02-17 Thread jiang licht
Amogh, this really helps me a lot! Thanks! So, in summary, I guess there are the following options to do job notification or more generally job management stuff. I also guess Oozie / cascading is the better choice when we need to handle these externally. Anyway, without deep exploration of all

Re: Difficulty connecting Hadoop JMX service

2010-02-17 Thread Edward Capriolo
On Wed, Feb 17, 2010 at 11:22 AM, viral shah wrote: > I want to monitor my hadoop cluster services using check_jmx nagios plugin. > I use following env. variables in the hadoop-env.sh file > export HADOOP_OPTS=”-Dcom.sun.management.jmxremote.authenticate=false > -Dcom.sun.management.jmxremote.ssl=

Difficulty connecting Hadoop JMX service

2010-02-17 Thread viral shah
I want to monitor my hadoop cluster services using check_jmx nagios plugin. I use following env. variables in the hadoop-env.sh file export HADOOP_OPTS=”-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false” # Command specific options appended to HADOOP_OPTS whe

Re: Issue with Hadoop cluster on Amazon ec2

2010-02-17 Thread Steve Loughran
viral shah wrote: Hi, We have deployed hadoop cluster on EC2, hadoop version 0.20.1. We are having couple of data nodes. We want to get some files from the data node which is there on the amazon ec2 instance to our local instance using java application, which in turn use SequentialFile.reader to

Issue with Hadoop cluster on Amazon ec2

2010-02-17 Thread viral shah
Hi, We have deployed hadoop cluster on EC2, hadoop version 0.20.1. We are having couple of data nodes. We want to get some files from the data node which is there on the amazon ec2 instance to our local instance using java application, which in turn use SequentialFile.reader to read file. The prob

Need your Help sir

2010-02-17 Thread tiru murugan
Dear sir, i want your help, i want to deploy hadoop core using eclipse. now hadoop-core divided hadoop -common , hadoop -hdfs, hadoop-mapreduce. i could tried many times, but hadoop-common and hadoop-mapreduce is build successfully , then hadoop-hdfs also build successfully. my doudht is when i b

Re: Reducer stuck at pending state

2010-02-17 Thread Song Liu
Hi Todd,I'm using hadoop 0.20.1, apache distribution. I didnt set the property you mentioned and I think they should remain default (1G?). The cluster I'm playing with has four master nodes, and 96 slave nodes physically. Hadoop uses one master node for namenode and jobstracker, and picks 12 nodes

Re: Hadoop automatic job status check and notification?

2010-02-17 Thread Amogh Vasekar
Hi, In our case we launched Pig from perl script and handled re-execution, clean-up etc. from there. If you need to implement a workflow or DAG like model, consider looking at Oozie / cascading. If you are interested in diving little deeper, you can try embedded pig. Amogh On 2/17/10 1:53 PM,

Re: LZO compression for Map output in Hadoop 0.20+?

2010-02-17 Thread jiang licht
Thanks Himanshu. Is there a part 2? -- Michael --- On Tue, 2/16/10, himanshu chandola wrote: From: himanshu chandola Subject: Re: LZO compression for Map output in Hadoop 0.20+? To: common-user@hadoop.apache.org Date: Tuesday, February 16, 2010, 11:35 PM You might want to check out this: htt

Re: Hadoop automatic job status check and notification?

2010-02-17 Thread jiang licht
Thanks Amogh. So, I think the following will do the job: public void setJobEndNotificationURI(String uri)But what about hadoop jobs written in PIG scripts? Since PIG will take control, is there some convenient  way to do the same thing as well?   Thanks! -- Michael --- On Wed, 2/17/10, Amogh Va