Re: Programming Multiple rounds of mapreduce

2011-06-13 Thread Sean Owen
You could have a look at the MapReduce pipelines in Apache Mahout (http://mahout.apache.org). See for instance org.apache.mahout.cf.taste.hadoop.item.RecommenderJob. This shows how most of Mahout constructs and runs a series of rounds of MapReduce to accomplish a task. Each job feeds into one or mo

Re: How is reduce completion % calculated?

2011-06-08 Thread Sean Owen
Exactly, the reducer will show it's in the "copy" phase here which is exactly what it can do before the mappers have finished. It's not true that single reducer completion can only be 0, 0.33, 0.67, 1.0 -- of course it makes progress through a copy, sort, shuffle, reduce by chunk, by records, so c

Re: Input examples

2011-06-07 Thread Sean Owen
Not sure if it's quite what you mean, but, Apache Mahout is essentially all applications of Hadoop for machine learning, a bunch of runnable jobs (some with example data too). mahout.apache.org On Tue, Jun 7, 2011 at 3:54 PM, Francesco De Luca wrote: > Where i can find some hadoop map reduce app

Re: How to write a custom input format and record reader to read multiple lines of text from files

2009-11-30 Thread Sean Owen
to do in the > implemented class constructor - MultiLineFileInputFormat? > > i was following the sample provided on this yahoo page: > > http://developer.yahoo.com/hadoop/tutorial/module5.html#fileformat > > > > > On Tue, 2009-12-01 at 06:45 +, Sean Owen wrote: >&

Re: How to write a custom input format and record reader to read multiple lines of text from files

2009-11-30 Thread Sean Owen
It sounds like you have no provided a no-arg constructor in MultiLineFileInputFormat. On Tue, Dec 1, 2009 at 6:17 AM, Kunal Gupta wrote: > Can someone explain how to override the "FileInputFormat" and > "RecordReader" in order to be able to read multiple lines of text from > input files in a sing

Re: reuse of MultipleOutputFormat with new API

2009-10-21 Thread Sean Owen
FWIW this same sort of thing is blocking Apache Mahout from progress on implementations using Hadoop. I imagine the whole migration is far more involved than it appears so it makes sense it is taking time. But yeah making all the new APIs compatible with the new APIs would be a great step forward f

Re: Efficient sort -u + merge, in Hadoop M/R?

2009-09-01 Thread Sean Owen
(This isn't what you're asking for, but if the input is already sorted, the 'uniq' command would likely be faster than sort. I am not sure how much of a difference it makes.) I've recently joined the list so will indulge myself and put forth a longer opinion about this case, hope I am not spamming

How to migrate to Hadoop 0.20.0 / new Job class?

2009-08-29 Thread Sean Owen
I think I am missing something big. But, I can't figure out how to migrate to Mapreduce 0.20.0. Here's what I think I know: - The org.apache.hadoop.mapred.* classes are deprecated -- so we shouldn't use them. - The examples still use org.apache.hadoop.mapred.* classes - But it's not terribly hard