Multiple Input Paths

2009-11-02 Thread Mark Vigeant
Hey, quick question: I'm writing a program that parses data from 2 different files and puts the data into a table. Currently I have 2 different map functions and so I submit 2 separate jobs to the job client. Would it be more efficient to add both paths to the same mapper and only submit one jo

Re: Multiple Input Paths

2009-11-02 Thread L
Mark, Is the structure of both files the same? It makes even more sense to combine the files, if you can, as I have seen a considerable speed up when I've done that (at least when I've had small files to deal with). Lajos Mark Vigeant wrote: Hey, quick question: I'm writing a program that

RE: Multiple Input Paths

2009-11-02 Thread Mark Vigeant
nt: Monday, November 02, 2009 10:27 AM To: common-user@hadoop.apache.org Subject: Re: Multiple Input Paths Mark, Is the structure of both files the same? It makes even more sense to combine the files, if you can, as I have seen a considerable speed up when I've done that (at least when I'v

Re: Multiple Input Paths

2009-11-02 Thread Amogh Vasekar
Mark, Set-up for a mapred job consumes a considerable amount of time and resources and so, if possible a single job is preferred. You can add multiple paths to your job, and if you need different processing logic depending upon the input being consumed, you can use parameter map.input.file in yo

RE: Multiple Input Paths

2009-11-02 Thread Mark Vigeant
Ok, thank you very much Amogh, I will redesign my program. -Original Message- From: Amogh Vasekar [mailto:am...@yahoo-inc.com] Sent: Monday, November 02, 2009 11:45 AM To: common-user@hadoop.apache.org Subject: Re: Multiple Input Paths Mark, Set-up for a mapred job consumes a

RE: Multiple Input Paths

2009-11-02 Thread Vipul Sharma
Mark, were you able to concatenate both the xml files together. What did you do to keep the resulting xml well forned? Regards, Vipul Sharma, Cell: 281-217-0761

RE: Multiple Input Paths

2009-11-03 Thread Mark Vigeant
Hey Vipul No I haven't concatenated my files yet, and I was just thinking over how to approach the issue of multiple input paths. I actually did what Amandeep hinted at which was we wrote our own XMLInputFormat and XMLRecordReader. When configuring the job in my driver

RE: Multiple Input Paths

2009-11-03 Thread vipul sharma
Mark, thanks for the pointer. So as far as I understand you are not using hadoop's default split but using your own split of one record as specified by the everything between the starting tag and the end tag in your xml? So in a way you have one map per record? In my case this will not be efficien

Re: Multiple Input Paths

2009-11-03 Thread Amogh Vasekar
o approach the issue of multiple input paths. I actually did what Amandeep hinted at which was we wrote our own XMLInputFormat and XMLRecordReader. When configuring the job in my driver I set job.setInputFormatClass(XMLFileInputFormat.class) and what it does is send chunks of XML to the mappe

RE: Multiple Input Paths

2009-11-04 Thread Mark Vigeant
Amogh, That sounds so awesome! Yeah I wish I had that class now. Do you have any tips on how to create such a delegating class? The best I can come up with is to just submit both files to the mapper using multiple input paths and then having anif statement at the beginning of the map that

Re: Multiple Input Paths

2009-11-08 Thread Tom White
s now. Do you have any > tips on how to create such a delegating class? The best I can come up with is > to just submit both files to the mapper using multiple input paths and then > having anif statement at the beginning of the map that checks which file it's > dealing with bu