nope, if i recall the data is randomly generated (the task itself requires
fixed-length binary strings to be sorted)
Miles
2009/7/22 Harish Mallipeddi
> On Wed, Jul 22, 2009 at 8:52 PM, Rares Vernica wrote:
>
> > Hello,
> >
> > I wonder how did the Yahoo! developers generate the Task Timeline
here is a part of a shell script i wrote which deals with compressed input
and produces compressed output (for streaming)
>
hadoop dfs -rmr $4
hadoop jar /usr/local/share/hadoop/contrib/streaming/hadoop-*-streaming.jar
-mapper $1 -reducer $2 -input $3/* -output
$4 -file $1 -file $2 -jobconf mapre
we have 7.B T data nodes and soon will be getting 9 T nodes.
and the good news is that it all works well. one wrinkle i've noticed is
that should a disk or two fill-up then the entire machine can get black
listed
(if you have smaller capacity machines then this is probably the correct
behaviour)
if you have pairs, then have your mapper emit
>
this will result in your data being resorted by the value
Miles
2009/7/9 Marcus Herou
> Really ? WIll that work ?
>
> input something like this
>
> tag
> tag2
> tag
> tag2
> tag3
> ...
> produces output
>
> tag 2
> tag2 2
> tag3 1
>
>
> Sw