Re: Query regarding Hadoop Partitioning

2012-02-19 Thread Piyush Kansal
Thanks Harsh. But will it also sort the data as Partitioner does. On Sun, Feb 19, 2012 at 10:54 PM, Harsh J wrote: > Hi, > > You would find it easier to use the Java API's MultipleOutputs (and/or > MultipleOutputFormat, which directly works on a configured key field), > to write each key-partit

Re: Query regarding Hadoop Partitioning

2012-02-19 Thread Piyush Kansal
Thanks Utkarsh. But I cant find such function in Hadoop. Moreover, is there any reason why default partitioning wont work? I mean if it does not work, then why its even there. May be I am missing something? On Sun, Feb 19, 2012 at 10:40 PM, Utkarsh Gupta wrote: > Hi Piyush, > > ** ** > > I t

Re: Query regarding Hadoop Partitioning

2012-02-19 Thread Harsh J
Hi, You would find it easier to use the Java API's MultipleOutputs (and/or MultipleOutputFormat, which directly works on a configured key field), to write each key-partition out in its own file. On Mon, Feb 20, 2012 at 7:38 AM, Piyush Kansal wrote: > Hi Friends, > > I have to sort huge amount of

RE: Query regarding Hadoop Partitioning

2012-02-19 Thread Utkarsh Gupta
Hi Piyush, I think you need to override the inbuilt partitioning function. You can use function like (first field of key)%3 This will send all the keys with same first field to a separate reduce process Please correct me if I am wrong. Thanks Utkarsh From: Piyush Kansal [mailto:piyush.kan...@gmail

Query regarding Hadoop Partitioning

2012-02-19 Thread Piyush Kansal
Hi Friends, I have to sort huge amount of data in minimum possible time probably using partitioning. The key is composed of 3 fields(partition, text and number). This is how partition is defined: - Partition "1" for range 1-10 - Partition "2" for range 11-20 - Partition "3" for range 21-