Re: Processing multiple files - need to identify in map

2008-03-05 Thread Tarandeep Singh
he/hadoop/mapred/JobConf.html#addInputPath(org.apache.hadoop.fs.Path) > > Thanks, > Lohit > > > > - Original Message > From: Tarandeep Singh <[EMAIL PROTECTED]> > To: core-user@hadoop.apache.org > Sent: Tuesday, March 4, 2008 5:38:41 PM > Subject: Pro

Re: Processing multiple files - need to identify in map

2008-03-04 Thread lohit
g Sent: Tuesday, March 4, 2008 5:38:41 PM Subject: Processing multiple files - need to identify in map Hi, I need to identify from which file, a key came from, in the map phase. Is it possible ? What I have is multiple types of log files in one directory that I need to process for my application. Rig

Re: Processing multiple files - need to identify in map

2008-03-04 Thread Chris K Wensel
more specifically, call jobConf.get( "map.input.file" ); in the configure(JobConf conf) method of your mapper. there are some cases this won't work. but in general it works fine. and yes, you can add many input directories. jobConf.addInputPath(...) On Mar 4, 2008, at 5:54 PM, Ted Dunning wro

Re: Processing multiple files - need to identify in map

2008-03-04 Thread Ted Dunning
Yes. Use the configure method which is called each time a new file is used in the map. Save the file name in a field of the mapper. The other alternative is to derive a new InputFormat that remembers the input file name. On 3/4/08 5:38 PM, "Tarandeep Singh" <[EMAIL PROTECTED]> wrote: > Hi,

Re: Processing multiple files - need to identify in map

2008-03-04 Thread Aaron Kimball
the Reporter object given to the map() method can get you the InputSplit that is being mapped over. If this subclasses FileInputSplit, you can grab the path name from there. - Aaron Tarandeep Singh wrote: Hi, I need to identify from which file, a key came from, in the map phase. Is it possib

Processing multiple files - need to identify in map

2008-03-04 Thread Tarandeep Singh
Hi, I need to identify from which file, a key came from, in the map phase. Is it possible ? What I have is multiple types of log files in one directory that I need to process for my application. Right now, I am relying on the structure of the log files (e.g if a line starts with "weblog", the lin