Does anyone can help me?
http://stackoverflow.com/questions/26341913/hadoop-multipleinputs
he euclidean coordinates
> of the cities. we need to pass this single line to each mapper who will
> then process that. How can we do this so that we can achieve parallelism in
> a hadoop cluster. Is there any way to generate multiple input splits from
> the single input file.
>
> Thanks
>
> Sharat
>
Sharat,
To answer your specific question of:
> Is there any way to generate multiple input splits from the single input file.
Yes there is. Use the NLineInputFormat class, with an N value of 1.
You should then, for a single file of N lines (dupe or not), get N map
tasks.
On Sun, Jun 10, 2
parallelism in a hadoop
cluster. Is there any way to generate multiple input splits from the single
input file.
Thanks
Sharat
ra.com/cdh/3/hadoop/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html
>
> Examples of usages are part of API doc.
>
> Regards,
> Dino Kečo
>
>
> On Wed, Aug 10, 2011 at 6:08 PM, Jian Fang
> wrote:
>
>> Hi,
>>
>> I am working on a pr
doc.
Regards,
Dino Kečo
On Wed, Aug 10, 2011 at 6:08 PM, Jian Fang wrote:
> Hi,
>
> I am working on a project, which requires multiple input formats and
> multiple output formats. Basically, I store some sales rank data to a
> Cassandra cluster and I get a sales rank update f
Hi,
I am working on a project, which requires multiple input formats and
multiple output formats. Basically, I store some sales rank data to a
Cassandra cluster and I get a sales rank update file each day to update the
ranks in the Cassandra. In the meanwhile, I need to find all the products
I found my issue, for future readers:
I forgot to append the 3rd argument of generateFileNameForKeyValue
(the String) to the returned filenames. That String is the
"part0001", etc which gives each mapper a unique filename.
On Thu, Jan 6, 2011 at 2:51 PM, Brett Hoerner wrote:
> Hello,
>
> I'm
Hello,
I'm running a very simple job that returns the input with a null key
and uses no reducer (see below). I'm using
MultipleSequenceFileOutputFormat to "split" the input into different
files, but for simplicity's sake right now I'm just returning "file1"
from generateFileNameForKeyValue so tha