Rio [mailto:driodei...@gmail.com]
Sent: Sunday, May 17, 2009 8:34 AM
To: core-user@hadoop.apache.org
Subject: Re: sort example
On Sun, May 17, 2009 at 10:18 AM, Ricky Ho r...@adobe.com wrote:
I think using a single reducer causes the sorting to be done
sequentially and hence defeats
,
Ricky
-Original Message-
From: Peter Skomoroch [mailto:peter.skomor...@gmail.com]
Sent: Saturday, May 16, 2009 9:42 PM
To: core-user@hadoop.apache.org
Subject: Re: sort example
I just copy and pasted that comparator option from the docs, the -n part is
what you want in this case.
On Sun
Thanks for the reply Peter but that's not it.
I use the comparator class to pass the -n flag but the shuffling does not
sort the keys numerically.
Tell me if this is wrong:
1. input (text file):
1324
212
123123
2332
145455
.
2. The mapper job will spawn a process that will run my ruby code
On Sun, May 17, 2009 at 10:18 AM, Ricky Ho r...@adobe.com wrote:
I think using a single reducer causes the sorting to be done sequentially and
hence defeats the purpose of using Hadoop in the first place.
I agree, but this is just for testing.
Actually I used two reducers in my example.
Rio [mailto:driodei...@gmail.com]
Sent: Sunday, May 17, 2009 8:34 AM
To: core-user@hadoop.apache.org
Subject: Re: sort example
On Sun, May 17, 2009 at 10:18 AM, Ricky Ho r...@adobe.com wrote:
I think using a single reducer causes the sorting to be done sequentially and
hence defeats
To: core-user@hadoop.apache.org
Subject: Re: sort example
On Sun, May 17, 2009 at 10:18 AM, Ricky Ho r...@adobe.com wrote:
I think using a single reducer causes the sorting to be done sequentially
and hence defeats the purpose of using Hadoop in the first place.
I agree, but this is just
the sorting, trim out the preceding zeros ...
Rgds,
Ricky
-Original Message-
From: David Rio [mailto:driodei...@gmail.com]
Sent: Sunday, May 17, 2009 8:34 AM
To: core-user@hadoop.apache.org
Subject: Re: sort example
On Sun, May 17, 2009 at 10:18 AM, Ricky Ho r...@adobe.com wrote
Hi,
I am trying to sort some data with hadoop(streaming mode). The input looks
like:
$ cat small_numbers.txt
9971681
9686036
2592322
4518219
1467363
To send my job to the cluster I use:
hadoop jar
/home/drio/hadoop-0.20.0/contrib/streaming/hadoop-0.20.0-streaming.jar \
-D mapred.reduce.tasks=2 \
BTW,
Basically, this is the unix equivalent to what I am trying to do:
$ cat input_file.txt | sort -n
-drd
On Sat, May 16, 2009 at 11:10 PM, David Rio driodei...@gmail.com wrote:
Hi,
I am trying to sort some data with hadoop(streaming mode). The input looks
like:
$ cat small_numbers.txt
1) It is doing alphabetical sort by default, you can force Hadoop streaming
to sort numerically with:
-D mapred.text.key.comparator.options=-k2,2nr\
see the section A Useful Comparator Class in the streaming docs:
http://hadoop.apache.org/core/docs/current/streaming.html
and
I just copy and pasted that comparator option from the docs, the -n part is
what you want in this case.
On Sun, May 17, 2009 at 12:40 AM, Peter Skomoroch peter.skomor...@gmail.com
wrote:
1) It is doing alphabetical sort by default, you can force Hadoop streaming
to sort numerically with:
-D
11 matches
Mail list logo