JobTracker goes into seemingly infinite loop

2011-05-05 Thread rakesh kothari
Hi, I am using Hadoop 0.20.1. Recently we had a JobTracker outage because of the following: JobTracker tries to write a file to HDFS but it's connection to primary datanode gets disrupted. It then subsequently enters into retry loop (that goes on for hours). I see the the following message i

Re: Is there any way I could use to reduce the cost of Mapper and Reducer setup and cleanup in a iterative MapReduce chain?

2011-05-05 Thread Stanley Xu
Thanks a lot. Ted, checking haloop and plume now. I could always get the answer from you. :-) On Thu, May 5, 2011 at 10:42 PM, Ted Dunning wrote: > Stanley, > > The short answer is that this is a real problem. > > Try this: > > *Spark: Cluster Computing with Working Sets.* Matei Zaharia, Moshara

Fwd: What is GEO_RSS_URI ???

2011-05-05 Thread praveenesh kumar
-- Forwarded message -- From: praveenesh kumar Date: Wed, May 4, 2011 at 4:51 PM Subject: What is GEO_RSS_URI ??? To: common-u...@hadoop.apache.org Hello Hadoop users, I came across some Map-Reduce examples on google code. Here is the link http://code.google.com/p/hadoop-map-r

Re: How hadoop parse input files into (Key,Value) pairs ??

2011-05-05 Thread Joey Echeverria
Hadoop uses an InputFormat class to parse files and generate key, value pairs for your Mapper. An InputFormat is any class which extends the base abstract class: http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapreduce/InputFormat.html The default InputFormat parse text files

Re: Use different process method to process data by input file name?

2011-05-05 Thread Harsh J
Moving this to mapreduce--user@ since that is more appropriate for hadoop-mapreduce questions (bcc: common-user@). 2011/5/5 王志强 : > Hi, guys > As the topic shows, how can I use different process methods to process > data according to input file name in map function? Ie, May I get the input

Is there any way I could use to reduce the cost of Mapper and Reducer setup and cleanup in a iterative MapReduce chain?

2011-05-05 Thread Stanley Xu
Dear All, Our team is trying to implement a parallelized LDA with Gibbs Sampling. We are using the algorithm mentioned by plda, http://code.google.com/p/plda/ The problem is that by the Map-Reduce method the paper mentioned. We need to run a MapReduce job every gibbs sampling iteration, and norma