Writing a simple sort application for Hadoop

2010-02-28 Thread aa225
Hello, I am trying to write a simple sorting application for hadoop. This is what I have thought till now. Suppose I have 100 lines of data and 10 mappers, each of the 10 mappers will sort the data given to it. But I am unable to figure out is how to join these outputs to one big sorted arra

Hadoop: Divide and Conquer Algorithms

2010-02-28 Thread aa225
Hello Everybody, I have a small question. I want to know how would one implement divide and conquer algorithms in Hadoop. For example suppose I want to implement merge sort 100 lines in hadoop. There will be 10 mapper each sorting 10 lines. Now comes the tough part In the tradition

Re: Writing a simple sort application for Hadoop

2010-02-28 Thread Ed Mazur
Hi Abhishek, If you use input lines as your output keys in map, Hadoop internals will do the work for you and the keys will appear in sorted order in your reduce (you can use IdentityReducer). This needs a slight adjustment if your input lines aren't unique. If you have R reducers, this will crea

Re: Hadoop: Divide and Conquer Algorithms

2010-02-28 Thread Mikhail Yakshin
Hi, >                I have a small question. I want to know how would one implement > divide and conquer algorithms in Hadoop. For example suppose I want to > implement > merge sort 100 lines in hadoop. There will be 10 mapper each sorting 10 lines. > Now comes the tough part > > In the traditio

no complete sort

2010-02-28 Thread Gang Luo
Hi all, here is a wired observation. The keys in the result of *ONE* reducer are ordered like this: 18166 18169 1817 18171 18172 why is key "1817" comes after "18169"? It makes sense if that key is "18170" but it isn't! Why does it happen and basically, how does hadoop tell k

Re: Hadoop: Divide and Conquer Algorithms

2010-02-28 Thread Darren Govoni
I'm not sure this sort of problem will be efficient in Hadoop, but its the kind of problem WaveFS[1] is designed for. It propagates intermediate values across the cluster, allowing for algorithms to run in parallel, but coalesce shared products from distributed calculations. Without the need to for

Add File Header Using FileOutputFormat

2010-02-28 Thread Song Liu
Hi all! I'm using hadoop to make huge weka learning files. As you know, weka file (ARFF) has some speicial file headers. Is there a way for me to add it at the beginning of the file using FileOutputFormat? If so, how can I do that? Thanks! Regards Song Liu

Re: no complete sort

2010-02-28 Thread Prateek Jindal
Hi Gang, It is sorting it lexicographically. --Prateek. On Sun, Feb 28, 2010 at 3:23 PM, Gang Luo wrote: > Hi all, > here is a wired observation. The keys in the result of *ONE* reducer are > ordered like this: > 18166 > 18169 > 1817 > 18171 > 18172 > > why is key "1817" comes after "18169"? It

Re: no complete sort

2010-02-28 Thread Ed Mazur
Hi Gang, What's your reduce output key type? It looks like you're using Text instead of IntWritable, causing your keys to be sorted lexicographically instead of numerically. Sorting is done with a comparator that defines how an arbitrary element compares to another. Hashing serves a different pur

[ANNOUNCE] muCommander adds support for HDFS

2010-02-28 Thread Maxence Bernard
Hi all, I just wanted to let you guys know that HDFS support has been added to the recently released version 0.8.5 of muCommander ( http://www.mucommander.com/ ), allowing you to browse, read and write to an HDFS cluster with the convenience of a graphical user interface. I'm considering adding

Re: no complete sort

2010-02-28 Thread Gang Luo
Thanks Ed and Prateek who indicate this in previous mail. Yes, I use Text instead of IntWritable. It make sense if it is sorted in lexicographical order. -Gang - 原始邮件 发件人: Ed Mazur 收件人: common-user@hadoop.apache.org 发送日期: 2010/2/28 (周日) 4:28:46 下午 主 题: Re: no complete sort Hi Gang,

Re: Add File Header Using FileOutputFormat

2010-02-28 Thread Jeff Zhang
write the header in setup method if you are using new hadoop API. On Sun, Feb 28, 2010 at 1:26 PM, Song Liu wrote: > Hi all! > I'm using hadoop to make huge weka learning files. As you know, weka file > (ARFF) has some speicial file headers. Is there a way for me to add it at > the beginning o

Re: Re: Writing a simple sort application for Hadoop

2010-02-28 Thread aa225
Hi, Is there any way we can chain the reducers . As in initially the reducers work on some data. The output of these reducers is again sent to the same reducers again and so on. Similar to how the conquer step takes place in divide and conquer algorithms ? I hope you got what I am trying to ask