Re: Hama parition 1000 files on 3 tasks/machine

Chia-Hung Lin Mon, 25 May 2015 22:39:41 -0700

An alternative thought:

In addition to the (key/ value) interface provided by Hama, each
process (within bsp function) should be able to read data from
external source with Reader related class; but processes may need to
use something like ZooKeeper for coordination.


FYI



On 26 May 2015 at 06:43, Edward J. Yoon <[email protected]> wrote:
> Hi,
>
> Currently the task capacity of cluster should be larger than the number of
> blocks or files of input dataset. The alternative is to merge them into one
> file using hadoop fs -getmerge command.
>
> --
> Best Regards, Edward J. Yoon
>
> -----Original Message-----
> From: Behroz Sikander [mailto:[email protected]]
> Sent: Tuesday, May 26, 2015 1:14 AM
> To: [email protected]
> Subject: Hama parition 1000 files on 3 tasks/machine
>
> Hi,
> I have a problem regarding data partitioning but was not able to find any
> solution online.
>
> Problem: I have around 1000 files that I want to process using Hama. Each
> file has the same schema/structure but different data. How can I divide
> these files in my cluster ? I mean if I have 3 tasks/machines then each
> task should process around 333 files.
>
> So,
> 1- How can I take thousand files as input in Hama ? With my current
> understanding, Hama will open 1000 tasks (1 task for each file)
> 2- How to divide the files on different machines (Custom Partitioner maybe
> )?
> 3- If this approach is not supported, then what can be an alternative
> approach of solving this ?Regards,
> Behroz Sikander
>
>

Re: Hama parition 1000 files on 3 tasks/machine

Reply via email to