I think Thomas has a point. How about making it a sub-module/sub-project of Hama for now? If/When it gains enough community support to make it a top level project, you can fork it as a separate project. I am not completely aware of the procedures and requirements for getting external project as sub-project. We can look into it if you are ready to take this route.
> Could you please send me a link for setting up an open-source Apache project? If I am right this is what you are looking for - http://incubator.apache.org/guides/proposal.html http://incubator.apache.org/sitemap.html Good luck, Suraj On Fri, Sep 7, 2012 at 11:40 AM, Thomas Jungblut <[email protected]>wrote: > Although I think this is a great project, I think that you will not meet > the requirements. > You need a community and a charter to get it into the incubation. > > What about hosting it on Github? > > 2012/9/7 Leonidas Fegaras <[email protected]> > > > Yes, this is a great idea. I have used GIT on my own server but I don't > > know how to do this for ASF. Could you please send me a link for setting > up > > an open-source Apache project? > > > > > > On 09/05/2012 10:51 AM, Edward J. Yoon wrote: > > > >> If you can open source this then I'm sure the ASF community can help > >> you and make this software better. > >> > >> Pls feel free to ask us if you need any assistance donating source > >> code to the ASF or contributing to the Hama project in the future. > >> > >> On Thu, Aug 30, 2012 at 11:40 PM, Leonidas Fegaras<[email protected]> > >> wrote: > >> > >>> Yes sure. I have fixed the bug with the repeat stopping condition but I > >>> have > >>> only tested pagerank on my small cluster. I still need to fix the > k-means > >>> clustering (it's a special case because you improve a fixed number of > >>> points). > >>> Leonidas > >>> > >>> > >>> On Aug 30, 2012, at 9:02 AM, Edward J. Yoon wrote: > >>> > >>> Shall we work together? > >>>> > >>>> On Fri, Aug 24, 2012 at 9:01 PM, Leonidas Fegaras<[email protected] > > > >>>> wrote: > >>>> > >>>>> Thank you very much for your interest and for testing my system. > >>>>> It seems that my release was premature: It worked for some random > data > >>>>> but > >>>>> didn't for some others. It's a minor logical error that I will try to > >>>>> fix > >>>>> in > >>>>> the next few days. The problem is with the stopping condition of the > >>>>> repeat > >>>>> expression that calculates the new pagerank from the old. It must > stop > >>>>> if > >>>>> ALL peers reach the specified precision. This is done by having > those > >>>>> peers > >>>>> that need to continue send a message to others to continue. It seems > >>>>> that > >>>>> now when all peers agree at the same time, the program works fine. > But > >>>>> if > >>>>> one finishes sooner, instead of continuing the repeat loop, it runs > >>>>> away > >>>>> to > >>>>> the next BSP step that follows the repeat, then exits prematurely and > >>>>> the > >>>>> system hangs. The casting errors are due to the run-away peers > >>>>> executing > >>>>> the > >>>>> wrong BSP steps reading wrong messages. Queries without repeat though > >>>>> are > >>>>> OK. > >>>>> By the way, I had a problem exchanging large amount of data during > sync > >>>>> (I > >>>>> discussed this with Thomas). My solution was to to break a BSP > >>>>> superstep > >>>>> into multiple substeps so that each substep can handle a max number > of > >>>>> messages. Of course my program has to collect all messages in a > vector > >>>>> in > >>>>> memory. When the vector is too big, it is spilled in a local file. > This > >>>>> moved the problem from the Hama side to my side and allowed me to > >>>>> handle > >>>>> larger data, especially in joins. I think this problem of exchanging > >>>>> large > >>>>> amount of data during a superstep is currently a weakness of Hama. > >>>>> Leonidas > >>>>> > >>>>> > >>>>> > >>>>> On 08/24/2012 04:15 AM, Thomas Jungblut wrote: > >>>>> > >>>>>> > >>>>>> BTW, should we feature this on our website? > >>>>>> > >>>>>> 2012/8/24 Thomas Jungblut<thomas.jungblut@**gmail.com< > [email protected]> > >>>>>> > > >>>>>> > >>>>>> Hi Leonidas! > >>>>>>> > >>>>>>> I have to admit that I have known what is going on (and had to keep > >>>>>>> silent), but I have to say: Thank you very much! > >>>>>>> This will help many people writing BSPs in a more easier way. > >>>>>>> > >>>>>>> Of course this is not as fast as the native BSP code, Hive and Pig > >>>>>>> suffer > >>>>>>> from the same problems in MR. > >>>>>>> But it gives people the opportunity to develop faster and get their > >>>>>>> code > >>>>>>> in production with just a minor time expense. > >>>>>>> > >>>>>>> And I think, that we will help you gladly on improving the BSP part > >>>>>>> of > >>>>>>> your framework. At least I would do ;) > >>>>>>> > >>>>>>> Thanks! > >>>>>>> > >>>>>>> 2012/8/24 Edward J. Yoon<[email protected]> > >>>>>>> > >>>>>>> Here's my few test results on Oracle BDA (40G/s infiniband > network). > >>>>>>> > >>>>>>>> > >>>>>>>> It seems slow than our PageRank example. > >>>>>>>> > >>>>>>>> P.S., There are some errors so I couldn't test large-scale. > >>>>>>>> (java.lang.ClassCastException: hadoop.mrql.MR_int cannot be cast > to > >>>>>>>> hadoop.mrql.Inv and java.lang.Error: Cannot clear a > non-materialized > >>>>>>>> sequence ..., etc.) > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> == 100K nodes and 1M edges == > >>>>>>>> > >>>>>>>> *** Using 10 BSP tasks (out of a max 10). Each task will handle > >>>>>>>> about > >>>>>>>> 2383611 bytes of input data. > >>>>>>>> > >>>>>>>> Run time: 30.384 secs > >>>>>>>> > >>>>>>>> *** Using 20 BSP tasks (out of a max 20). Each task will handle > >>>>>>>> about > >>>>>>>> 1191805 bytes of input data. > >>>>>>>> > >>>>>>>> Run time: 24.412 secs > >>>>>>>> > >>>>>>>> On Fri, Aug 24, 2012 at 9:36 AM, Edward J. Yoon > >>>>>>>> <[email protected]> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> > >>>>>>>>> Wow, very interesting. I'm going to install and test on my large > >>>>>>>>> > >>>>>>>> > >>>>>>>> cluster. > >>>>>>>> > >>>>>>>>> > >>>>>>>>> On Fri, Aug 24, 2012 at 4:41 AM, Leonidas Fegaras > >>>>>>>>> <[email protected]> > >>>>>>>>> > >>>>>>>> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> > >>>>>>>>>> Dear Hama users, > >>>>>>>>>> I am pleased to announce that the MRQL query processing system > can > >>>>>>>>>> now > >>>>>>>>>> evaluate SQL-like queries on a Hama cluster. MRQL is available > at: > >>>>>>>>>> > >>>>>>>>>> http://lambda.uta.edu/mrql/ > >>>>>>>>>> > >>>>>>>>>> MRQL (the Map-Reduce Query Language) is an SQL-like query > language > >>>>>>>>>> for > >>>>>>>>>> large-scale, distributed data analysis. MRQL is powerful enough > to > >>>>>>>>>> express most common data analysis tasks over many different > kinds > >>>>>>>>>> of > >>>>>>>>>> raw data, including hierarchical data and nested collections, > such > >>>>>>>>>> as > >>>>>>>>>> XML data. MRQL can run in two modes: in MR (Map-Reduce) mode > using > >>>>>>>>>> Apache Hadoop and in BSP (Bulk Synchronous Parallel) mode using > >>>>>>>>>> Apache > >>>>>>>>>> Hama. Both modes use Apache's HDFS to read and write their data. > >>>>>>>>>> > >>>>>>>>>> Note that, the BSP mode is currently experimental (not > fine-tuned > >>>>>>>>>> yet) > >>>>>>>>>> and lacks any fault-tolerance (if an error occurs, the entire > job > >>>>>>>>>> must > >>>>>>>>>> be restarted). Due to our limited resources, MRQL has only been > >>>>>>>>>> tested > >>>>>>>>>> on a small cluster (7-nodes/28-cores). We compared the BSP mode > >>>>>>>>>> with > >>>>>>>>>> the MR mode by evaluating a pagerank query over a small graph > >>>>>>>>>> (100K > >>>>>>>>>> nodes, 1M edges) and found that BSP mode is about 4.5 times > faster > >>>>>>>>>> than the MR mode. Please let me know if you'd like to contribute > >>>>>>>>>> to > >>>>>>>>>> this project by testing MRQL on a larger cluster. > >>>>>>>>>> Best regards, > >>>>>>>>>> Leonidas Fegaras > >>>>>>>>>> University of Texas at Arlington > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> Best Regards, Edward J. Yoon > >>>>>>>>> @eddieyoon > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> -- > >>>>>>>> Best Regards, Edward J. Yoon > >>>>>>>> @eddieyoon > >>>>>>>> > >>>>>>>> . > >>>>>> > >>>>>> > >>>> > >>>> -- > >>>> Best Regards, Edward J. Yoon > >>>> @eddieyoon > >>>> > >>> > >>> > >> > >> > > >
