Re: How read index and data file?

2010-10-14 Thread Gregory Lawrence
ke to build an example to read the data file with the help of the index file, and I don't know how to do it? - What the difference between the org.apache.hadoop.mapred.IFile.Reader and the org.apache.hadoop.fs.FSDataInputStream? Thanks, On Thu, Oct 14, 2010 at 6:21 PM, Gregory Lawrence wr

Re: How read index and data file?

2010-10-14 Thread Gregory Lawrence
Pedro, I'm not sure I fully understand your question but if you are asking how to read in an index file in addition to the standard job input, you should look into writing your own setup function. It may look something like the following: public void setup(Context context) throws IOException, I

Re: Pipelining Mappers and Reducers

2010-07-27 Thread Gregory Lawrence
Shai, It's hard to determine what the best solution would be without knowing more about your problem. In general, combiner functions work well but they will be of little value if each mapper output contains a unique key. This is because combiner functions only "combine" multiple values associat

Re: Speculative Execution and Streaming

2010-05-28 Thread Gregory Lawrence
the run without speculative mode on? Cheers, /R On 5/28/10 2:07 AM, "Gregory Lawrence" wrote: Hi, Does anybody know whether or not speculative execution works with Hadoop streaming? If so, I have a script that does not appear to ever launch redundant mappers for the slow performers. Th

Speculative Execution and Streaming

2010-05-27 Thread Gregory Lawrence
Hi, Does anybody know whether or not speculative execution works with Hadoop streaming? If so, I have a script that does not appear to ever launch redundant mappers for the slow performers. This may be due to the fact that each mapper quickly reports (inaccurately) that it is 100% complete. I

Re: Reduce gets struck at 99%

2010-04-08 Thread Gregory Lawrence
Hi, I have also experienced this problem. Have you tried speculative execution? Also, I have had jobs that took a long time for one mapper / reducer because of a record that was significantly larger than those contained in the other filesplits. Do you know if it always slows down for the same f

Setting the group for output files

2010-03-11 Thread Gregory Lawrence
Hi, Is there a way to set the output group for a mapreduce (or hdfs fs operation) job? For example -Ddfs.umaskmode=027 successfully sets the permissions. I would think the -Dgroup.name=GROUP would do a similar thing for the file's group. However, this does not appear to be the case. Any help wo