You could have a look at the MapReduce pipelines in Apache Mahout
(http://mahout.apache.org). See for instance
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob. This shows how
most of Mahout constructs and runs a series of rounds of MapReduce to
accomplish a task. Each job feeds into one or mo
Exactly, the reducer will show it's in the "copy" phase here which is
exactly what it can do before the mappers have finished.
It's not true that single reducer completion can only be 0, 0.33, 0.67, 1.0
-- of course it makes progress through a copy, sort, shuffle, reduce by
chunk, by records, so c
Not sure if it's quite what you mean, but, Apache Mahout is essentially all
applications of Hadoop for machine learning, a bunch of runnable jobs (some
with example data too).
mahout.apache.org
On Tue, Jun 7, 2011 at 3:54 PM, Francesco De Luca wrote:
> Where i can find some hadoop map reduce app
to do in the
> implemented class constructor - MultiLineFileInputFormat?
>
> i was following the sample provided on this yahoo page:
>
> http://developer.yahoo.com/hadoop/tutorial/module5.html#fileformat
>
>
>
>
> On Tue, 2009-12-01 at 06:45 +, Sean Owen wrote:
>&
It sounds like you have no provided a no-arg constructor in
MultiLineFileInputFormat.
On Tue, Dec 1, 2009 at 6:17 AM, Kunal Gupta wrote:
> Can someone explain how to override the "FileInputFormat" and
> "RecordReader" in order to be able to read multiple lines of text from
> input files in a sing
FWIW this same sort of thing is blocking Apache Mahout from progress
on implementations using Hadoop. I imagine the whole migration is far
more involved than it appears so it makes sense it is taking time. But
yeah making all the new APIs compatible with the new APIs would be a
great step forward f
(This isn't what you're asking for, but if the input is already
sorted, the 'uniq' command would likely be faster than sort. I am not
sure how much of a difference it makes.)
I've recently joined the list so will indulge myself and put forth a
longer opinion about this case, hope I am not spamming
I think I am missing something big. But, I can't figure out how to
migrate to Mapreduce 0.20.0. Here's what I think I know:
- The org.apache.hadoop.mapred.* classes are deprecated -- so we
shouldn't use them.
- The examples still use org.apache.hadoop.mapred.* classes
- But it's not terribly hard