BTW, should we feature this on our website? 2012/8/24 Thomas Jungblut <[email protected]>
> Hi Leonidas! > > I have to admit that I have known what is going on (and had to keep > silent), but I have to say: Thank you very much! > This will help many people writing BSPs in a more easier way. > > Of course this is not as fast as the native BSP code, Hive and Pig suffer > from the same problems in MR. > But it gives people the opportunity to develop faster and get their code > in production with just a minor time expense. > > And I think, that we will help you gladly on improving the BSP part of > your framework. At least I would do ;) > > Thanks! > > 2012/8/24 Edward J. Yoon <[email protected]> > > Here's my few test results on Oracle BDA (40G/s infiniband network). >> It seems slow than our PageRank example. >> >> P.S., There are some errors so I couldn't test large-scale. >> (java.lang.ClassCastException: hadoop.mrql.MR_int cannot be cast to >> hadoop.mrql.Inv and java.lang.Error: Cannot clear a non-materialized >> sequence ..., etc.) >> >> >> >> == 100K nodes and 1M edges == >> >> *** Using 10 BSP tasks (out of a max 10). Each task will handle about >> 2383611 bytes of input data. >> >> Run time: 30.384 secs >> >> *** Using 20 BSP tasks (out of a max 20). Each task will handle about >> 1191805 bytes of input data. >> >> Run time: 24.412 secs >> >> On Fri, Aug 24, 2012 at 9:36 AM, Edward J. Yoon <[email protected]> >> wrote: >> > Wow, very interesting. I'm going to install and test on my large >> cluster. >> > >> > On Fri, Aug 24, 2012 at 4:41 AM, Leonidas Fegaras <[email protected]> >> wrote: >> >> Dear Hama users, >> >> I am pleased to announce that the MRQL query processing system can now >> >> evaluate SQL-like queries on a Hama cluster. MRQL is available at: >> >> >> >> http://lambda.uta.edu/mrql/ >> >> >> >> MRQL (the Map-Reduce Query Language) is an SQL-like query language for >> >> large-scale, distributed data analysis. MRQL is powerful enough to >> >> express most common data analysis tasks over many different kinds of >> >> raw data, including hierarchical data and nested collections, such as >> >> XML data. MRQL can run in two modes: in MR (Map-Reduce) mode using >> >> Apache Hadoop and in BSP (Bulk Synchronous Parallel) mode using Apache >> >> Hama. Both modes use Apache's HDFS to read and write their data. >> >> >> >> Note that, the BSP mode is currently experimental (not fine-tuned yet) >> >> and lacks any fault-tolerance (if an error occurs, the entire job must >> >> be restarted). Due to our limited resources, MRQL has only been tested >> >> on a small cluster (7-nodes/28-cores). We compared the BSP mode with >> >> the MR mode by evaluating a pagerank query over a small graph (100K >> >> nodes, 1M edges) and found that BSP mode is about 4.5 times faster >> >> than the MR mode. Please let me know if you'd like to contribute to >> >> this project by testing MRQL on a larger cluster. >> >> Best regards, >> >> Leonidas Fegaras >> >> University of Texas at Arlington >> >> >> > >> > >> > >> > -- >> > Best Regards, Edward J. Yoon >> > @eddieyoon >> >> >> >> -- >> Best Regards, Edward J. Yoon >> @eddieyoon >> > >
