Re: sort example

2009-05-18 Thread David Rio
Rio [mailto:driodei...@gmail.com] Sent: Sunday, May 17, 2009 8:34 AM To: core-user@hadoop.apache.org Subject: Re: sort example On Sun, May 17, 2009 at 10:18 AM, Ricky Ho r...@adobe.com wrote: I think using a single reducer causes the sorting to be done sequentially and hence defeats

RE: sort example

2009-05-17 Thread Ricky Ho
, Ricky -Original Message- From: Peter Skomoroch [mailto:peter.skomor...@gmail.com] Sent: Saturday, May 16, 2009 9:42 PM To: core-user@hadoop.apache.org Subject: Re: sort example I just copy and pasted that comparator option from the docs, the -n part is what you want in this case. On Sun

Re: sort example

2009-05-17 Thread David Rio
Thanks for the reply Peter but that's not it. I use the comparator class to pass the -n flag but the shuffling does not sort the keys numerically. Tell me if this is wrong: 1. input (text file): 1324 212 123123 2332 145455 . 2. The mapper job will spawn a process that will run my ruby code

Re: sort example

2009-05-17 Thread David Rio
On Sun, May 17, 2009 at 10:18 AM, Ricky Ho r...@adobe.com wrote: I think using a single reducer causes the sorting to be done sequentially and hence defeats the purpose of using Hadoop in the first place. I agree, but this is just for testing. Actually I used two reducers in my example.

RE: sort example

2009-05-17 Thread Ricky Ho
Rio [mailto:driodei...@gmail.com] Sent: Sunday, May 17, 2009 8:34 AM To: core-user@hadoop.apache.org Subject: Re: sort example On Sun, May 17, 2009 at 10:18 AM, Ricky Ho r...@adobe.com wrote: I think using a single reducer causes the sorting to be done sequentially and hence defeats

Re: sort example

2009-05-17 Thread David Rio
To: core-user@hadoop.apache.org Subject: Re: sort example On Sun, May 17, 2009 at 10:18 AM, Ricky Ho r...@adobe.com wrote: I think using a single reducer causes the sorting to be done sequentially and hence defeats the purpose of using Hadoop in the first place. I agree, but this is just

Re: sort example

2009-05-17 Thread Chuck Lam
the sorting, trim out the preceding zeros ... Rgds, Ricky -Original Message- From: David Rio [mailto:driodei...@gmail.com] Sent: Sunday, May 17, 2009 8:34 AM To: core-user@hadoop.apache.org Subject: Re: sort example On Sun, May 17, 2009 at 10:18 AM, Ricky Ho r...@adobe.com wrote

sort example

2009-05-16 Thread David Rio
Hi, I am trying to sort some data with hadoop(streaming mode). The input looks like: $ cat small_numbers.txt 9971681 9686036 2592322 4518219 1467363 To send my job to the cluster I use: hadoop jar /home/drio/hadoop-0.20.0/contrib/streaming/hadoop-0.20.0-streaming.jar \ -D mapred.reduce.tasks=2 \

Re: sort example

2009-05-16 Thread David Rio
BTW, Basically, this is the unix equivalent to what I am trying to do: $ cat input_file.txt | sort -n -drd On Sat, May 16, 2009 at 11:10 PM, David Rio driodei...@gmail.com wrote: Hi, I am trying to sort some data with hadoop(streaming mode). The input looks like: $ cat small_numbers.txt

Re: sort example

2009-05-16 Thread Peter Skomoroch
1) It is doing alphabetical sort by default, you can force Hadoop streaming to sort numerically with: -D mapred.text.key.comparator.options=-k2,2nr\ see the section A Useful Comparator Class in the streaming docs: http://hadoop.apache.org/core/docs/current/streaming.html and

Re: sort example

2009-05-16 Thread Peter Skomoroch
I just copy and pasted that comparator option from the docs, the -n part is what you want in this case. On Sun, May 17, 2009 at 12:40 AM, Peter Skomoroch peter.skomor...@gmail.com wrote: 1) It is doing alphabetical sort by default, you can force Hadoop streaming to sort numerically with: -D