Hi, I have a problem regarding data partitioning but was not able to find any solution online.
Problem: I have around 1000 files that I want to process using Hama. Each file has the same schema/structure but different data. How can I divide these files in my cluster ? I mean if I have 3 tasks/machines then each task should process around 333 files. So, 1- How can I take thousand files as input in Hama ? With my current understanding, Hama will open 1000 tasks (1 task for each file) 2- How to divide the files on different machines (Custom Partitioner maybe )? 3- If this approach is not supported, then what can be an alternative approach of solving this ? Regards, Behroz Sikander
