Re: Inconsistent times in Hadoop web interface

2010-12-26 Thread Harsh J
Hey, On Mon, Dec 27, 2010 at 10:44 AM, yipeng wrote: > Hi guys, > > I am having some inconsistent timing in the web interface. The job finish > time as below is 47 secs but the Map & Reduce took significantly longer. I > don't think I did anything that could have caused this. Any ideas what might

Inconsistent times in Hadoop web interface

2010-12-26 Thread yipeng
Hi guys, I am having some inconsistent timing in the web interface. The job finish time as below is 47 secs but the Map & Reduce took significantly longer. I don't think I did anything that could have caused this. Any ideas what might have? Hadoop Job job_201012271216_0002 on History Viewer User:

Re: How to simulate network delay on 1 node

2010-12-26 Thread yipeng
Hi guys, this is all very helpful. Appreciate it. I will look into them. Cheers, Yipeng On Mon, Dec 27, 2010 at 1:04 PM, Ted Dunning wrote: > See also https://github.com/toddlipcon/gremlins > > > > On Sun, Dec 26, 2010 at 11:26 AM, Konstantin Boudnik > wrote: > > > Hi there. > > > > What are

Re: How to simulate network delay on 1 node

2010-12-26 Thread Ted Dunning
See also https://github.com/toddlipcon/gremlins On Sun, Dec 26, 2010 at 11:26 AM, Konstantin Boudnik wrote: > Hi there. > > What are looking at is fault injection. > I am not sure what version of Hadoop you're looking at, but here's at > what you take a look in 0.21 and forward: > - Herriot s

Re: Custom input split

2010-12-26 Thread Lance Norskog
Please don't use attachments. They should be stripped by the Apache mailer. There are a bunch of mail archiver sites which don't save attachments. Lance On Sun, Dec 26, 2010 at 8:20 AM, Harsh J wrote: > Hi, > > On Sun, Dec 26, 2010 at 6:29 PM, Black, Michael (IS) > wrote: >> I assume there's a

Fwd: How to simulate network delay on 1 node

2010-12-26 Thread Konstantin Boudnik
Hi there. What are looking at is fault injection. I am not sure what version of Hadoop you're looking at, but here's at what you take a look in 0.21 and forward: - Herriot system testing framework (which does code instrumentation to add special APIs) on a real clusters. Here's some starting poin

Re: How to simulate network delay on 1 node

2010-12-26 Thread Ashwin Jayaprakash
Maybe this will help on Linux - http://daniel.haxx.se/blog/2010/12/14/add-latency-to-localhost/ -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-simulate-network-delay-on-1-node-tp2147265p2148277.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: Custom input split

2010-12-26 Thread Harsh J
Hi, On Sun, Dec 26, 2010 at 6:29 PM, Black, Michael (IS) wrote: > I assume there's a way to make a specific # of splits and add each document > to the separate splits...but I'll be darned if I can find the docs or an > example to show this. Would CombineFileInputFormat and CombineFileSplit be

Re: How to simulate network delay on 1 node

2010-12-26 Thread Nan Zhu
If you're studying how the hadoop make the copies of the blocks, I'm not familiar with HDFS, so I have no idea about it If you're studying the data transfer between the Map and Reduce, I think org.apache.hadoop.mapred.TaskTracker.MapOutputServlet is helpful Bests Nan On Sun, Dec 26, 2010 at 8:56

Re: Custom input split

2010-12-26 Thread Black, Michael (IS)
You mean the file is "not trusted". I was using Outlook and my company automatically puts a digital certificate on all emails. I'm using webmail right now which doesn't. That certificate is installed by default on all company computers so it looks trusted to us without having to explicitly t

Re: How to simulate network delay on 1 node

2010-12-26 Thread yipeng
I'm trying to explore how Hadoop performs certain tasks (data deduplication actually) under such conditions. Cheers, Yipeng On Sun, Dec 26, 2010 at 8:31 PM, Nan Zhu wrote: > Why would you like to *simulate* network delay? I haven't got your point, > > Bests, > Nan > > > > On Sun, Dec 26, 201

Re: How to simulate network delay on 1 node

2010-12-26 Thread Nan Zhu
Why would you like to *simulate* network delay? I haven't got your point, Bests, Nan On Sun, Dec 26, 2010 at 8:25 PM, yipeng wrote: > Hi everyone, > > I would like to simulate network delay on 1 node in my cluster, perhaps by > putting the thread to sleep every time it transfers data non-loca

How to simulate network delay on 1 node

2010-12-26 Thread yipeng
Hi everyone, I would like to simulate network delay on 1 node in my cluster, perhaps by putting the thread to sleep every time it transfers data non-locally. I'm looking at the source but am not sure where to place the code. Is there a better way to do it... a tool perhaps? Or could someone point

Re: How to do Secondary Sort on a String and a float?

2010-12-26 Thread Harsh J
Hi, You can use WritableComparator for "Writable" serializations. Docs here: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/WritableComparator.html The issue lies with how you're encoding your pair of . If you know sizes defined for each (or have a marker byte between, etc