Hi, Currently the task capacity of cluster should be larger than the number of blocks or files of input dataset. The alternative is to merge them into one file using hadoop fs -getmerge command.
-- Best Regards, Edward J. Yoon -----Original Message----- From: Behroz Sikander [mailto:[email protected]] Sent: Tuesday, May 26, 2015 1:14 AM To: [email protected] Subject: Hama parition 1000 files on 3 tasks/machine Hi, I have a problem regarding data partitioning but was not able to find any solution online. Problem: I have around 1000 files that I want to process using Hama. Each file has the same schema/structure but different data. How can I divide these files in my cluster ? I mean if I have 3 tasks/machines then each task should process around 333 files. So, 1- How can I take thousand files as input in Hama ? With my current understanding, Hama will open 1000 tasks (1 task for each file) 2- How to divide the files on different machines (Custom Partitioner maybe )? 3- If this approach is not supported, then what can be an alternative approach of solving this ?Regards, Behroz Sikander
