RE: Hama parition 1000 files on 3 tasks/machine

Edward J. Yoon Mon, 25 May 2015 15:45:07 -0700

Hi,

Currently the task capacity of cluster should be larger than the number of 
blocks or files of input dataset. The alternative is to merge them into one 
file using hadoop fs -getmerge command.

--
Best Regards, Edward J. Yoon

-----Original Message-----
From: Behroz Sikander [mailto:[email protected]]
Sent: Tuesday, May 26, 2015 1:14 AM
To: [email protected]
Subject: Hama parition 1000 files on 3 tasks/machine

Hi,
I have a problem regarding data partitioning but was not able to find any
solution online.

Problem: I have around 1000 files that I want to process using Hama. Each
file has the same schema/structure but different data. How can I divide
these files in my cluster ? I mean if I have 3 tasks/machines then each
task should process around 333 files.

So,
1- How can I take thousand files as input in Hama ? With my current
understanding, Hama will open 1000 tasks (1 task for each file)
2- How to divide the files on different machines (Custom Partitioner maybe
)?
3- If this approach is not supported, then what can be an alternative
approach of solving this ?Regards,
Behroz Sikander

RE: Hama parition 1000 files on 3 tasks/machine

Reply via email to