Thank you for the input. As you mentioned, I have accessed the files directly and logically divided the files into multiple tasks in my code. I am still working on it but I am positive that it will work.
Thanks. On Tue, May 26, 2015 at 7:57 AM, Edward J. Yoon <[email protected]> wrote: > Yeah, that's also good alternative. User can directly access external > resources (such as HDFS, NoSQL, and RDBMS) and partition data using > messaging > APIs. > > However, I think we need to provide the solution at framework level. > > -- > Best Regards, Edward J. Yoon > > > -----Original Message----- > From: Chia-Hung Lin [mailto:[email protected]] > Sent: Tuesday, May 26, 2015 2:39 PM > To: [email protected] > Subject: Re: Hama parition 1000 files on 3 tasks/machine > > An alternative thought: > > In addition to the (key/ value) interface provided by Hama, each > process (within bsp function) should be able to read data from > external source with Reader related class; but processes may need to > use something like ZooKeeper for coordination. > > FYI > > > > On 26 May 2015 at 06:43, Edward J. Yoon <[email protected]> wrote: > > Hi, > > > > Currently the task capacity of cluster should be larger than the number > of > > blocks or files of input dataset. The alternative is to merge them into > one > > file using hadoop fs -getmerge command. > > > > -- > > Best Regards, Edward J. Yoon > > > > -----Original Message----- > > From: Behroz Sikander [mailto:[email protected]] > > Sent: Tuesday, May 26, 2015 1:14 AM > > To: [email protected] > > Subject: Hama parition 1000 files on 3 tasks/machine > > > > Hi, > > I have a problem regarding data partitioning but was not able to find any > > solution online. > > > > Problem: I have around 1000 files that I want to process using Hama. Each > > file has the same schema/structure but different data. How can I divide > > these files in my cluster ? I mean if I have 3 tasks/machines then each > > task should process around 333 files. > > > > So, > > 1- How can I take thousand files as input in Hama ? With my current > > understanding, Hama will open 1000 tasks (1 task for each file) > > 2- How to divide the files on different machines (Custom Partitioner > maybe > > )? > > 3- If this approach is not supported, then what can be an alternative > > approach of solving this ?Regards, > > Behroz Sikander > > > > > > >
