Re: Re: Doubt in Hadoop

2009-11-26 Thread aa225
Hi, I am running the job from command line. The job runs fine in the local mode but something happens when I try to run the job in the distributed mode. Abhishek Agrawal SUNY- Buffalo (716-435-7122) On Fri 11/27/09 2:31 AM , Jeff Zhang zjf...@gmail.com sent: > Do you run the map reduce job

Re: Doubt in Hadoop

2009-11-26 Thread Jeff Zhang
Do you run the map reduce job in command line or IDE? in map reduce mode, you should put the jar containing the map and reduce class in your classpath Jeff Zhang On Fri, Nov 27, 2009 at 2:19 PM, wrote: > Hello Everybody, >I have a doubt in Haddop and was wondering if anybody

Re: RE: please help in setting hadoop

2009-11-26 Thread aa225
Hi, There should be a folder called as logs in $HADOOP_HOME. Also try going through http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29. This is a pretty good tutorial Abhishek Agrawal SUNY- Buffalo (716-435-7122) On Fri 11/27/09 1:18 AM , "Krishna Kum

Doubt in Hadoop

2009-11-26 Thread aa225
Hello Everybody, I have a doubt in Haddop and was wondering if anybody has faced a similar problem. I have a package called test. Inside that I have class called A.java, Map.java, Reduce.java. In A.java I have the main method where I am trying to initialize the jobConf object. I h

RE: please help in setting hadoop

2009-11-26 Thread Krishna Kumar
I have tried, but didn't get any success. In bwt can you please tell exact path of log file which I have to refer. Thanks and Best Regards, Krishna Kumar Senior Storage Engineer Why do we have to die? If we had to die, and everything is gone after that, then nothing else matters on this earth -

Re: please help in setting hadoop

2009-11-26 Thread aa225
Hi, Just a thought, but you do not need to setup the temp directory in conf/hadoop-site.xml especially if you are running basic examples. Give that a shot, maybe it will work out. Otherwise see if you can find additional info in the LOGS Thank You Abhishek Agrawal SUNY- Buffalo (716-435-712

please help in setting hadoop

2009-11-26 Thread Krishna Kumar
Dear All, Can anybody please help me in getting out from these error messages: [r...@master hadoop]# hadoop jar /usr/lib/hadoop/hadoop-0.18.3-14.cloudera.CH0_3-examples.jar wordcount test test-op 09/11/26 17:15:45 INFO mapred.FileInputFormat: Total input paths to process : 4 0

Re: log files on the cluster?

2009-11-26 Thread Mark Kerzner
Thank you, that pretty much does it, the logs on EC2 are in /mnt/hadoop/logs On Thu, Nov 26, 2009 at 10:43 PM, Siddu wrote: > On Fri, Nov 27, 2009 at 6:28 AM, Mark Kerzner > wrote: > > > Hi, > > > > it is probably described somewhere in the manuals, but > > > > > > 1. Where are the log files,

Re: Hadoop 0.20 map/reduce Failing for old API

2009-11-26 Thread Rekha Joshi
The exit status of 1 usually indicates configuration issues, incorrect command invocation in hadoop 0.20 (incorrect params), if not JVM crash. In your logs there is no indication of crash, but some paths/command can be the cause. Can you check if your lib paths/data paths are correct? If it is a

Re: log files on the cluster?

2009-11-26 Thread Siddu
On Fri, Nov 27, 2009 at 6:28 AM, Mark Kerzner wrote: > Hi, > > it is probably described somewhere in the manuals, but > > > 1. Where are the log files, especially those that show my > System.out.println() and errors; and > Look at the logs directory ... > 2. Do I need to log in to every ma

RE: Good idea to run NameNode and JobTracker on same machine?

2009-11-26 Thread Srigurunath Chakravarthi
Raymond, Load wise, it should be very safe to run both JT and NN on a single node for small clusters (< 40 Task Trackers and/or Data Nodes). They don't use much CPU as such. This may even work for larger clusters depending on the type of hardware you have and the Hadoop job mix. We usually obs

Re: part-00000.deflate as output

2009-11-26 Thread Mark Kerzner
It worked! But why is it "for testing?" I only have one job, so I need by related as text, can I use this fix all the time? Thank you, Mark On Thu, Nov 26, 2009 at 1:10 AM, Tim Kiefer wrote: > For testing purposes you can also try to disable the compression: > > conf.setBoolean("mapred.output.

log files on the cluster?

2009-11-26 Thread Mark Kerzner
Hi, it is probably described somewhere in the manuals, but 1. Where are the log files, especially those that show my System.out.println() and errors; and 2. Do I need to log in to every machine on the cluster? Thank you, Mark

Re: Good idea to run NameNode and JobTracker on same machine?

2009-11-26 Thread John Martyniak
I have a cluster of 4 machines plus one machine to run nn & jt. I have heard that 5 or 6 is the magic #. I will see when I add the next batch of machines. And it seems to running fine. -Jogn On Nov 26, 2009, at 11:38 AM, Yongqiang He wrote: I think it is definitely not a good idea to

Re: Processing 10MB files in Hadoop

2009-11-26 Thread CubicDesign
Are the record processing steps bound by a local machine resource - cpu, disk io or other? Some disk I/O. Not so much compared with the CPU. Basically it is a CPU bound. This is why each machine has 16 cores. What I often do when I have lots of small files to handle is use the NlineInputFo

Hadoop 0.20 map/reduce Failing for old API

2009-11-26 Thread Arv Mistry
Hi, We've recently upgraded to hadoop 0.20. Writing to HDFS seems to be working fine, but the map/reduce jobs are failing with the following exception. Note, we have not moved to the new map/reduce API yet. In the client that launches the job, the only change I have made is to now load the three f

Re: Good idea to run NameNode and JobTracker on same machine?

2009-11-26 Thread Yongqiang He
I think it is definitely not a good idea to combine these two in production environment. Thanks Yongqiang On 11/26/09 6:26 AM, "Raymond Jennings III" wrote: > Do people normally combine these two processes onto one machine? Currently I > have them on separate machines but I am wondering they us

Re: Processing 10MB files in Hadoop

2009-11-26 Thread Yongqiang He
Try CombineFileInputFormat. Thanks Yongqiang On 11/26/09 4:02 AM, "Cubic" wrote: > i list. > > I have small files containing data that has to be processed. A file > can be small, even down to 10MB (but it can me also 100-600MB large) > and contains at least 3 records to be processed. > Proc

Re: Processing 10MB files in Hadoop

2009-11-26 Thread Jason Venner
Are the record processing steps bound by a local machine resource - cpu, disk io or other? What I often do when I have lots of small files to handle is use the NlineInputFormat, as data locality for the input files is a much lessor issue than short task run times in that case, Each line of my inpu

Re: KeyValueTextInputFormat and Hadoop 0.20.1

2009-11-26 Thread Jeff Zhang
It's in trunk, maybe this is not added in hadoop 0.20.1 On Thu, Nov 26, 2009 at 8:13 AM, Matthias Scherer wrote: > Sorry, but I can't find it in the version control system for release > 0.20.1: > http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.20.1/src/mapred/org/apache/hadoop/map

Re: The name of the current input file during a map

2009-11-26 Thread Owen O'Malley
On Nov 25, 2009, at 11:27 PM, Saptarshi Guha wrote: I'm using Hadoop 0.21 and its context object In the new API you can re-write that as: ((FIleSplit) context.getInputSplit()).getPath() -- Owen

AW: KeyValueTextInputFormat and Hadoop 0.20.1

2009-11-26 Thread Matthias Scherer
Sorry, but I can't find it in the version control system for release 0.20.1: http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.20.1/src/mapred/org/apache/hadoop/mapreduce/lib/input/ Du you have another distribution? Regards, Matthias > -Ursprüngliche Nachricht- > Von: Jeff

Re: Processing 10MB files in Hadoop

2009-11-26 Thread Jeff Zhang
Quote from the wiki doc *The number of map tasks can also be increased manually using the JobConf's conf.setNumMapTasks(int num). This can be used to increase the number of map tasks, but will not set the number below that which Hadoop determines via splittin

Re: Processing 10MB files in Hadoop

2009-11-26 Thread CubicDesign
But the documentation DO recommend to set it: http://wiki.apache.org/hadoop/HowManyMapsAndReduces PS: I am using streaming Jeff Zhang wrote: Actually, you do not need to set the number of map task, the InputFormat will compute it for you according your input data set. Jeff Zhang On Thu,

Re: Processing 10MB files in Hadoop

2009-11-26 Thread Jeff Zhang
Actually, you do not need to set the number of map task, the InputFormat will compute it for you according your input data set. Jeff Zhang On Thu, Nov 26, 2009 at 7:39 AM, CubicDesign wrote: > > The number of mapper is determined by your InputFormat. >> >> In common case, if file is smaller t

Re: Processing 10MB files in Hadoop

2009-11-26 Thread CubicDesign
The number of mapper is determined by your InputFormat. In common case, if file is smaller than one block size (which is 64M by default), one mapper for this file. if file is larger than one block size, hadoop will split this large file, and the number of mapper for this file will be ceiling (

Re: Processing 10MB files in Hadoop

2009-11-26 Thread CubicDesign
Sorry for deviating from the question , but curious to know what does core here refer to ? http://en.wikipedia.org/wiki/Multi-core

Re: KeyValueTextInputFormat and Hadoop 0.20.1

2009-11-26 Thread Jeff Zhang
There's a KeyValueInputFormat under package org.apache.hadoop.mapreduce.lib.input which is for hadoop new API Jeff Zhang On Thu, Nov 26, 2009 at 7:10 AM, Matthias Scherer wrote: > Hi, > > I started my first experimental Hadoop project with Hadoop 0.20.1 an run > in the following problem: > >

KeyValueTextInputFormat and Hadoop 0.20.1

2009-11-26 Thread Matthias Scherer
Hi, I started my first experimental Hadoop project with Hadoop 0.20.1 an run in the following problem: Job job = new Job(new Configuration(),"Myjob"); job.setInputFormatClass(KeyValueTextInputFormat.class); The last line throws the following error: "The method setInputFormatClass(Class) in the t

Re: Good idea to run NameNode and JobTracker on same machine?

2009-11-26 Thread Jeff Zhang
It depends on the size of your cluster. I think you can combine them together if your cluster has less than 10 machines. Jeff Zhang On Thu, Nov 26, 2009 at 6:26 AM, Raymond Jennings III wrote: > Do people normally combine these two processes onto one machine? Currently > I have them on sep

Good idea to run NameNode and JobTracker on same machine?

2009-11-26 Thread Raymond Jennings III
Do people normally combine these two processes onto one machine? Currently I have them on separate machines but I am wondering they use that much CPU processing time and maybe I should combine them and create another DataNode.

Re: Processing 10MB files in Hadoop

2009-11-26 Thread Jeff Zhang
The number of mapper is determined by your InputFormat. In common case, if file is smaller than one block size (which is 64M by default), one mapper for this file. if file is larger than one block size, hadoop will split this large file, and the number of mapper for this file will be ceiling ( (si

Re: Processing 10MB files in Hadoop

2009-11-26 Thread Siddu
On Thu, Nov 26, 2009 at 5:32 PM, Cubic wrote: > Hi list. > > I have small files containing data that has to be processed. A file > can be small, even down to 10MB (but it can me also 100-600MB large) > and contains at least 3 records to be processed. > Processing one record can take 30 second

Processing 10MB files in Hadoop

2009-11-26 Thread Cubic
Hi list. I have small files containing data that has to be processed. A file can be small, even down to 10MB (but it can me also 100-600MB large) and contains at least 3 records to be processed. Processing one record can take 30 seconds to 2 minutes. My cluster is about 10 nodes. Each node has

Re: The name of the current input file during a map

2009-11-26 Thread Amogh Vasekar
-"mapred.input.file" +"map.input.file" Should work Amogh On 11/26/09 12:57 PM, "Saptarshi Guha" wrote: Hello again, I'm using Hadoop 0.21 and its context object e.g public void setup(Context context) { Configuration cfg = context.getConfiguration(); System.out.println("mapred.input.f