from:"felix gao"

Tasks that takes long time to finish?

2011-07-14 Thread felix gao

recently we had some resolver issues so i have added all the ips of the slaves, namenode, jobtracker to the /etc/hosts file in all the slaves, namenode, and jobtracker. This is one of the 5000+ task attemp, it seems every task is taking around 6 minutes to process. I don't have the number in hand b

Re: Job takes a very long time to complete

2011-07-14 Thread felix gao

t. > > --Bobby > > > On 7/14/11 3:45 PM, "felix gao" wrote: > > we didn't do anything on the cluster end, the company hosted our cluster > did a BGP update(what ever that means) and full reset. (I think just reboot > of the switches) > > On Th

Re: Job takes a very long time to complete

2011-07-14 Thread felix gao

configuration? What did you > do to fix the “networking issues”? > > --Bobby Evans > > > On 7/14/11 2:46 PM, "felix gao" wrote: > > recently we had some network issues with our cluster. this job used to > take on few minute to complete and how it is taking o

Job takes a very long time to complete

2011-07-14 Thread felix gao

recently we had some network issues with our cluster. this job used to take on few minute to complete and how it is taking over half hour. when looking at the jobtracker's log i see it slowly getting all the splits information (the list is not exhaustive) 2011-07-14 14:42:51,434 INFO org.apache.h

Re: How to speed up my slaves

2011-03-02 Thread felix gao

formatted log? On Wed, Mar 2, 2011 at 10:19 AM, felix gao wrote: > Hello experts, > > I am recently testing a set of logs that I converted to avro format in > hadoop. I am notice really really slow performance when compare to raw > logs. The map logs showing below seems to indicate

How to speed up my slaves

2011-03-02 Thread felix gao

Hello experts, I am recently testing a set of logs that I converted to avro format in hadoop. I am notice really really slow performance when compare to raw logs. The map logs showing below seems to indicate setting up JVM took the longest time. I am wondering if there is anything I can tweak in

Re: Best practice for batch file conversions

2011-02-09 Thread felix gao

che/hadoop/mapreduce/lib/output? or there is more magic under the hood than that. Felix On Wed, Feb 9, 2011 at 4:26 PM, felix gao wrote: > Sonal, > > can you tell me how to use the MultipleOutputFormat in my Mapper? I want > to read a line of text and convert it to some other format an

Re: Best practice for batch file conversions

2011-02-09 Thread felix gao

in.linkedin.com/in/sonalgoyal> > > > > > > On Wed, Feb 9, 2011 at 5:22 AM, felix gao wrote: > >> I am stuck again. The binary files are stored in hdfs under some >> pre-defined structure like >> root/ >> |-- dir1 >> | |-- file1 >>

Re: Best practice for batch file conversions

2011-02-08 Thread felix gao

8, 2011 at 9:43 AM, felix gao wrote: > thanks a lot for the pointer. I will play around with it. > > > On Mon, Feb 7, 2011 at 10:55 PM, Sonal Goyal wrote: > >> Hi, >> >> You can use FileStreamInputFormat which returns the file stream as the >> value. >

Re: Best practice for batch file conversions

2011-02-08 Thread felix gao

> Extend FileInputFormat, and write your own binary-format based >> implementation of it, and make it non-splittable (isSplitable should >> return false). This way, a Mapper would get a whole file, and you >> shouldn't have block-splitting issues. >> >> On Tue, Fe

Best practice for batch file conversions

2011-02-07 Thread felix gao

Hello users of hadoop, I have a task to convert large binary files from one format to another. I am wondering what is the best practice to do this. Basically, I am trying to get one mapper to work on each binary file and i am not sure how to do that in hadoop properly. thanks, Felix

streaming job in python that reports progress

2011-01-28 Thread felix gao

mighty user group, I am trying to write a streaming job that does a lot of io in a python program. I know if I don't report back every x minutes the job will be terminated. How do I report back to the task tracker in my streaming python job that is in the middle of the gzip for example. Thanks,

Re: How do hadoop work in details

2011-01-13 Thread felix gao

you'll need to wait a bit for the release I'm working > towards to use the max-limit feature. It's present in > hadoop-0.21/hadoop-0.22 presently, but not in hadoop-0.20. > > Arun > > On Jan 12, 2011, at 4:55 PM, felix gao wrote: > > Anrun, > > I went t

Re: How do hadoop work in details

2011-01-12 Thread felix gao

tever we decide > to call it: > > http://www.mail-archive.com/general@hadoop.apache.org/msg02670.html > > Arun > > On Jan 12, 2011, at 9:40 AM, felix gao wrote: > > Arun, > > The information is very helpful. What scheduler do you suggest to when we > have mixed of

Re: How do hadoop work in details

2011-01-12 Thread felix gao

gt; On Dec 29, 2010, at 2:43 PM, felix gao wrote: > > Hi all, >> >> I am trying to figure out how exactly happens inside the job. >> >> 1) When the jobtracker launches a task to be run, how does it impact the >> currently running jobs if the the current running j

How do hadoop work in details

2010-12-29 Thread felix gao

Hi all, I am trying to figure out how exactly happens inside the job. 1) When the jobtracker launches a task to be run, how does it impact the currently running jobs if the the current running job have higher, same, or lower priories using the default queue. 2) What if a low priority job is runn

How to record the bad records encountered by hadoop

2010-12-20 Thread felix gao

All, Not sure if this is the right mailing list of this question. I am using pig to do some data analysis and I am wondering if there a way to tell hadoop when it encountered a bad log files either due to uncompression failures or what ever caused the job to die, record the line and if possible th

Question about copyFromLocal

2010-12-14 Thread felix gao

Hi all, I have couple of boxes that need to periodically copy stuff from their local boxes to HDFS using HDFS Client by issuing hadoop fs -copyFromLocal src dest command on it. The file size is rather large and I am wondering is there anyway to make hadoop transmit the data compressed and when it

Re: noobie question on hadoop's NoClassDefFoundError

2009-11-17 Thread felix gao

had...@gmail.com] > *Sent:* Tuesday, November 17, 2009 1:26 AM > *To:* mapreduce-user@hadoop.apache.org > *Subject:* Re: noobie question on hadoop's NoClassDefFoundError > > > > Your eclipse instance doesn't have the jar files in the lib directory of > your hadoop

noobie question on hadoop's NoClassDefFoundError

2009-11-14 Thread felix gao

I wrote a simple code in my eclipse as Text t = new Text("hadoop"); System.out.println((char)t.charAt(2)); when I try to run this I got Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory at org.apache.hadoop.io.Text.(Text.java:49) at com.exam

Hadoop 0.20.* way of writting the LineIndexer example

2009-10-30 Thread felix gao

Hi all, I am trying to learn how to use hadoop and I figured since I am learning might as well know the latest syntax for it. The code for LineIndexer is available online. public class LineIndexMapper extends MapReduceBase implements Mapper here is the method signature for the mapper only i

Re: Question regarding wordCount example

2009-10-26 Thread felix gao

blic void map(Long key, Text value, OutputCollector > output, > Reporter reporter) throws IOException { > // TODO Auto-generated method stub > > } > } > > using generic as possible as you can > > > > Jeff zhang > > > > On Mon, Oct 26, 200

Question regarding wordCount example

2009-10-25 Thread felix gao

Hi all, I have some question regarding how to compile a simple hadoop program. setup Java 1.6 Ubuntu 9.02 Hadoop 0.19.2 //below is the mapper class import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; im

Tasks that takes long time to finish?

Re: Job takes a very long time to complete

Re: Job takes a very long time to complete

Job takes a very long time to complete

Re: How to speed up my slaves

How to speed up my slaves

Re: Best practice for batch file conversions

Re: Best practice for batch file conversions

Re: Best practice for batch file conversions

Re: Best practice for batch file conversions

Best practice for batch file conversions

streaming job in python that reports progress

Re: How do hadoop work in details

Re: How do hadoop work in details

Re: How do hadoop work in details

How do hadoop work in details

How to record the bad records encountered by hadoop

Question about copyFromLocal

Re: noobie question on hadoop's NoClassDefFoundError

noobie question on hadoop's NoClassDefFoundError

Hadoop 0.20.* way of writting the LineIndexer example

Re: Question regarding wordCount example

Question regarding wordCount example

23 matches

Site Navigation

Mail list logo

Footer information