Hi,
I have a problem regarding data partitioning but was not able to find any
solution online.

Problem: I have around 1000 files that I want to process using Hama. Each
file has the same schema/structure but different data. How can I divide
these files in my cluster ? I mean if I have 3 tasks/machines then each
task should process around 333 files.

So,
1- How can I take thousand files as input in Hama ? With my current
understanding, Hama will open 1000 tasks (1 task for each file)
2- How to divide the files on different machines (Custom Partitioner maybe
)?
3- If this approach is not supported, then what can be an alternative
approach of solving this ?

Regards,
Behroz Sikander

Reply via email to