Tasks that takes long time to finish?

2011-07-14 Thread felix gao
recently we had some resolver issues so i have added all the ips of the slaves, namenode, jobtracker to the /etc/hosts file in all the slaves, namenode, and jobtracker. This is one of the 5000+ task attemp, it seems every task is taking around 6 minutes to process. I don't have the number in hand b

Re: Job takes a very long time to complete

2011-07-14 Thread felix gao
t. > > --Bobby > > > On 7/14/11 3:45 PM, "felix gao" wrote: > > we didn't do anything on the cluster end, the company hosted our cluster > did a BGP update(what ever that means) and full reset. (I think just reboot > of the switches) > > On Th

Re: Job takes a very long time to complete

2011-07-14 Thread felix gao
configuration? What did you > do to fix the “networking issues”? > > --Bobby Evans > > > On 7/14/11 2:46 PM, "felix gao" wrote: > > recently we had some network issues with our cluster. this job used to > take on few minute to complete and how it is taking o

Job takes a very long time to complete

2011-07-14 Thread felix gao
recently we had some network issues with our cluster. this job used to take on few minute to complete and how it is taking over half hour. when looking at the jobtracker's log i see it slowly getting all the splits information (the list is not exhaustive) 2011-07-14 14:42:51,434 INFO org.apache.h

Re: How to speed up my slaves

2011-03-02 Thread felix gao
formatted log? On Wed, Mar 2, 2011 at 10:19 AM, felix gao wrote: > Hello experts, > > I am recently testing a set of logs that I converted to avro format in > hadoop. I am notice really really slow performance when compare to raw > logs. The map logs showing below seems to indicate

How to speed up my slaves

2011-03-02 Thread felix gao
Hello experts, I am recently testing a set of logs that I converted to avro format in hadoop. I am notice really really slow performance when compare to raw logs. The map logs showing below seems to indicate setting up JVM took the longest time. I am wondering if there is anything I can tweak in

Re: Best practice for batch file conversions

2011-02-09 Thread felix gao
che/hadoop/mapreduce/lib/output? or there is more magic under the hood than that. Felix On Wed, Feb 9, 2011 at 4:26 PM, felix gao wrote: > Sonal, > > can you tell me how to use the MultipleOutputFormat in my Mapper? I want > to read a line of text and convert it to some other format an

Re: Best practice for batch file conversions

2011-02-09 Thread felix gao
in.linkedin.com/in/sonalgoyal> > > > > > > On Wed, Feb 9, 2011 at 5:22 AM, felix gao wrote: > >> I am stuck again. The binary files are stored in hdfs under some >> pre-defined structure like >> root/ >> |-- dir1 >> | |-- file1 >>

Re: Best practice for batch file conversions

2011-02-08 Thread felix gao
8, 2011 at 9:43 AM, felix gao wrote: > thanks a lot for the pointer. I will play around with it. > > > On Mon, Feb 7, 2011 at 10:55 PM, Sonal Goyal wrote: > >> Hi, >> >> You can use FileStreamInputFormat which returns the file stream as the >> value. >

Re: Best practice for batch file conversions

2011-02-08 Thread felix gao
> Extend FileInputFormat, and write your own binary-format based >> implementation of it, and make it non-splittable (isSplitable should >> return false). This way, a Mapper would get a whole file, and you >> shouldn't have block-splitting issues. >> >> On Tue, Fe

Best practice for batch file conversions

2011-02-07 Thread felix gao
Hello users of hadoop, I have a task to convert large binary files from one format to another. I am wondering what is the best practice to do this. Basically, I am trying to get one mapper to work on each binary file and i am not sure how to do that in hadoop properly. thanks, Felix

streaming job in python that reports progress

2011-01-28 Thread felix gao
mighty user group, I am trying to write a streaming job that does a lot of io in a python program. I know if I don't report back every x minutes the job will be terminated. How do I report back to the task tracker in my streaming python job that is in the middle of the gzip for example. Thanks,

Re: How do hadoop work in details

2011-01-13 Thread felix gao
you'll need to wait a bit for the release I'm working > towards to use the max-limit feature. It's present in > hadoop-0.21/hadoop-0.22 presently, but not in hadoop-0.20. > > Arun > > On Jan 12, 2011, at 4:55 PM, felix gao wrote: > > Anrun, > > I went t

Re: How do hadoop work in details

2011-01-12 Thread felix gao
tever we decide > to call it: > > http://www.mail-archive.com/general@hadoop.apache.org/msg02670.html > > Arun > > On Jan 12, 2011, at 9:40 AM, felix gao wrote: > > Arun, > > The information is very helpful. What scheduler do you suggest to when we > have mixed of

Re: How do hadoop work in details

2011-01-12 Thread felix gao
gt; On Dec 29, 2010, at 2:43 PM, felix gao wrote: > > Hi all, >> >> I am trying to figure out how exactly happens inside the job. >> >> 1) When the jobtracker launches a task to be run, how does it impact the >> currently running jobs if the the current running j

How do hadoop work in details

2010-12-29 Thread felix gao
Hi all, I am trying to figure out how exactly happens inside the job. 1) When the jobtracker launches a task to be run, how does it impact the currently running jobs if the the current running job have higher, same, or lower priories using the default queue. 2) What if a low priority job is runn

How to record the bad records encountered by hadoop

2010-12-20 Thread felix gao
All, Not sure if this is the right mailing list of this question. I am using pig to do some data analysis and I am wondering if there a way to tell hadoop when it encountered a bad log files either due to uncompression failures or what ever caused the job to die, record the line and if possible th

Question about copyFromLocal

2010-12-14 Thread felix gao
Hi all, I have couple of boxes that need to periodically copy stuff from their local boxes to HDFS using HDFS Client by issuing hadoop fs -copyFromLocal src dest command on it. The file size is rather large and I am wondering is there anyway to make hadoop transmit the data compressed and when it

Re: noobie question on hadoop's NoClassDefFoundError

2009-11-17 Thread felix gao
had...@gmail.com] > *Sent:* Tuesday, November 17, 2009 1:26 AM > *To:* mapreduce-user@hadoop.apache.org > *Subject:* Re: noobie question on hadoop's NoClassDefFoundError > > > > Your eclipse instance doesn't have the jar files in the lib directory of > your hadoop

noobie question on hadoop's NoClassDefFoundError

2009-11-14 Thread felix gao
I wrote a simple code in my eclipse as Text t = new Text("hadoop"); System.out.println((char)t.charAt(2)); when I try to run this I got Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory at org.apache.hadoop.io.Text.(Text.java:49) at com.exam

Hadoop 0.20.* way of writting the LineIndexer example

2009-10-30 Thread felix gao
Hi all, I am trying to learn how to use hadoop and I figured since I am learning might as well know the latest syntax for it. The code for LineIndexer is available online. public class LineIndexMapper extends MapReduceBase implements Mapper here is the method signature for the mapper only i

Re: Question regarding wordCount example

2009-10-26 Thread felix gao
blic void map(Long key, Text value, OutputCollector > output, > Reporter reporter) throws IOException { > // TODO Auto-generated method stub > > } > } > > using generic as possible as you can > > > > Jeff zhang > > > > On Mon, Oct 26, 200

Question regarding wordCount example

2009-10-25 Thread felix gao
Hi all, I have some question regarding how to compile a simple hadoop program. setup Java 1.6 Ubuntu 9.02 Hadoop 0.19.2 //below is the mapper class import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; im