Re: Custom input split

2010-12-26 Thread Lance Norskog
Please don't use attachments. They should be stripped by the Apache mailer. There are a bunch of mail archiver sites which don't save attachments. Lance On Sun, Dec 26, 2010 at 8:20 AM, Harsh J wrote: > Hi, > > On Sun, Dec 26, 2010 at 6:29 PM, Black, Michael (IS) > wrote: >> I assume there's a

Re: Custom input split

2010-12-26 Thread Harsh J
Hi, On Sun, Dec 26, 2010 at 6:29 PM, Black, Michael (IS) wrote: > I assume there's a way to make a specific # of splits and add each document > to the separate splits...but I'll be darned if I can find the docs or an > example to show this. Would CombineFileInputFormat and CombineFileSplit be

Re: Custom input split

2010-12-26 Thread Black, Michael (IS)
el D. Black Senior Scientist Advanced Analytics Directorate Northrop Grumman Information Systems From: ?? [mailto:toppi...@gmail.com] Sent: Sat 12/25/2010 10:32 AM To: common-user@hadoop.apache.org Subject: EXTERNAL:Re: Custom input split What is the file you have a

Re: Custom input split

2010-12-25 Thread 蔡超
What is the file you have attached? It is not safe. I don't know the format of lucene index, would you please give an example? On Sat, Dec 25, 2010 at 12:34 AM, Black, Michael (IS) < michael.bla...@ngc.com> wrote: > Using hadoop-0.20 > > > I'm doing custom input splits from a Lucene index. > >

Custom input split

2010-12-24 Thread Black, Michael (IS)
Using hadoop-0.20 I'm doing custom input splits from a Lucene index. I want to split the document ID's across N mappers (I'm testing the scalabilty of the problem across 4 nodes and 8 cores). So the key is the document# and they are not sequential. At this point I'm using splits.add to add eac