Hey, quick question:
I'm writing a program that parses data from 2 different files and puts the data
into a table. Currently I have 2 different map functions and so I submit 2
separate jobs to the job client. Would it be more efficient to add both paths
to the same mapper and only submit one jo
Mark,
Is the structure of both files the same? It makes even more sense to
combine the files, if you can, as I have seen a considerable speed up
when I've done that (at least when I've had small files to deal with).
Lajos
Mark Vigeant wrote:
Hey, quick question:
I'm writing a program that
nt: Monday, November 02, 2009 10:27 AM
To: common-user@hadoop.apache.org
Subject: Re: Multiple Input Paths
Mark,
Is the structure of both files the same? It makes even more sense to
combine the files, if you can, as I have seen a considerable speed up
when I've done that (at least when I'v
Mark,
Set-up for a mapred job consumes a considerable amount of time and resources
and so, if possible a single job is preferred.
You can add multiple paths to your job, and if you need different processing
logic depending upon the input being consumed, you can use parameter
map.input.file in yo
Ok, thank you very much Amogh, I will redesign my program.
-Original Message-
From: Amogh Vasekar [mailto:am...@yahoo-inc.com]
Sent: Monday, November 02, 2009 11:45 AM
To: common-user@hadoop.apache.org
Subject: Re: Multiple Input Paths
Mark,
Set-up for a mapred job consumes a
Mark,
were you able to concatenate both the xml files together. What did you do to
keep the resulting xml well forned?
Regards,
Vipul Sharma,
Cell: 281-217-0761
Hey Vipul
No I haven't concatenated my files yet, and I was just thinking over how to
approach the issue of multiple input paths.
I actually did what Amandeep hinted at which was we wrote our own
XMLInputFormat and XMLRecordReader. When configuring the job in my driver
Mark,
thanks for the pointer. So as far as I understand you are not using hadoop's
default split but using your own split of one record as specified by the
everything between the starting tag and the end tag in your xml? So in a way
you have one map per record? In my case this will not be efficien
o
approach the issue of multiple input paths.
I actually did what Amandeep hinted at which was we wrote our own
XMLInputFormat and XMLRecordReader. When configuring the job in my driver I set
job.setInputFormatClass(XMLFileInputFormat.class) and what it does is send
chunks of XML to the mappe
Amogh,
That sounds so awesome! Yeah I wish I had that class now. Do you have any tips
on how to create such a delegating class? The best I can come up with is to
just submit both files to the mapper using multiple input paths and then having
anif statement at the beginning of the map that
s now. Do you have any
> tips on how to create such a delegating class? The best I can come up with is
> to just submit both files to the mapper using multiple input paths and then
> having anif statement at the beginning of the map that checks which file it's
> dealing with bu
11 matches
Mail list logo