Thank you for the input.

As you mentioned, I have accessed the files directly and logically divided
the files into multiple tasks in my code. I am still working on it but I am
positive that it will work.

Thanks.

On Tue, May 26, 2015 at 7:57 AM, Edward J. Yoon <[email protected]>
wrote:

> Yeah, that's also good alternative. User can directly access external
> resources (such as HDFS, NoSQL, and RDBMS) and partition data using
> messaging
> APIs.
>
> However, I think we need to provide the solution at framework level.
>
> --
> Best Regards, Edward J. Yoon
>
>
> -----Original Message-----
> From: Chia-Hung Lin [mailto:[email protected]]
> Sent: Tuesday, May 26, 2015 2:39 PM
> To: [email protected]
> Subject: Re: Hama parition 1000 files on 3 tasks/machine
>
> An alternative thought:
>
> In addition to the (key/ value) interface provided by Hama, each
> process (within bsp function) should be able to read data from
> external source with Reader related class; but processes may need to
> use something like ZooKeeper for coordination.
>
> FYI
>
>
>
> On 26 May 2015 at 06:43, Edward J. Yoon <[email protected]> wrote:
> > Hi,
> >
> > Currently the task capacity of cluster should be larger than the number
> of
> > blocks or files of input dataset. The alternative is to merge them into
> one
> > file using hadoop fs -getmerge command.
> >
> > --
> > Best Regards, Edward J. Yoon
> >
> > -----Original Message-----
> > From: Behroz Sikander [mailto:[email protected]]
> > Sent: Tuesday, May 26, 2015 1:14 AM
> > To: [email protected]
> > Subject: Hama parition 1000 files on 3 tasks/machine
> >
> > Hi,
> > I have a problem regarding data partitioning but was not able to find any
> > solution online.
> >
> > Problem: I have around 1000 files that I want to process using Hama. Each
> > file has the same schema/structure but different data. How can I divide
> > these files in my cluster ? I mean if I have 3 tasks/machines then each
> > task should process around 333 files.
> >
> > So,
> > 1- How can I take thousand files as input in Hama ? With my current
> > understanding, Hama will open 1000 tasks (1 task for each file)
> > 2- How to divide the files on different machines (Custom Partitioner
> maybe
> > )?
> > 3- If this approach is not supported, then what can be an alternative
> > approach of solving this ?Regards,
> > Behroz Sikander
> >
> >
>
>
>

Reply via email to