Re: [ANNOUNCEMENT] A query system for BSP processing

Thomas Jungblut Fri, 24 Aug 2012 02:16:05 -0700

BTW, should we feature this on our website?

2012/8/24 Thomas Jungblut <[email protected]>


> Hi Leonidas!
>
> I have to admit that I have known what is going on (and had to keep
> silent), but I have to say: Thank you very much!
> This will help many people writing BSPs in a more easier way.
>
> Of course this is not as fast as the native BSP code, Hive and Pig suffer
> from the same problems in MR.
> But it gives people the opportunity to develop faster and get their code
> in production with just a minor time expense.
>
> And I think, that we will help you gladly on improving the BSP part of
> your framework. At least I would do ;)
>
> Thanks!
>
> 2012/8/24 Edward J. Yoon <[email protected]>
>
> Here's my few test results on Oracle BDA (40G/s infiniband network).
>> It seems slow than our PageRank example.
>>
>> P.S., There are some errors so I couldn't test large-scale.
>> (java.lang.ClassCastException: hadoop.mrql.MR_int cannot be cast to
>> hadoop.mrql.Inv and java.lang.Error: Cannot clear a non-materialized
>> sequence ..., etc.)
>>
>>
>>
>> == 100K nodes and 1M edges ==
>>
>> *** Using 10 BSP tasks (out of a max 10). Each task will handle about
>> 2383611 bytes of input data.
>>
>> Run time: 30.384 secs
>>
>> *** Using 20 BSP tasks (out of a max 20). Each task will handle about
>> 1191805 bytes of input data.
>>
>> Run time: 24.412 secs
>>
>> On Fri, Aug 24, 2012 at 9:36 AM, Edward J. Yoon <[email protected]>
>> wrote:
>> > Wow, very interesting. I'm going to install and test on my large
>> cluster.
>> >
>> > On Fri, Aug 24, 2012 at 4:41 AM, Leonidas Fegaras <[email protected]>
>> wrote:
>> >> Dear Hama users,
>> >> I am pleased to announce that the MRQL query processing system can now
>> >> evaluate SQL-like queries on a Hama cluster. MRQL is available at:
>> >>
>> >> http://lambda.uta.edu/mrql/
>> >>
>> >> MRQL (the Map-Reduce Query Language) is an SQL-like query language for
>> >> large-scale, distributed data analysis. MRQL is powerful enough to
>> >> express most common data analysis tasks over many different kinds of
>> >> raw data, including hierarchical data and nested collections, such as
>> >> XML data. MRQL can run in two modes: in MR (Map-Reduce) mode using
>> >> Apache Hadoop and in BSP (Bulk Synchronous Parallel) mode using Apache
>> >> Hama. Both modes use Apache's HDFS to read and write their data.
>> >>
>> >> Note that, the BSP mode is currently experimental (not fine-tuned yet)
>> >> and lacks any fault-tolerance (if an error occurs, the entire job must
>> >> be restarted). Due to our limited resources, MRQL has only been tested
>> >> on a small cluster (7-nodes/28-cores). We compared the BSP mode with
>> >> the MR mode by evaluating a pagerank query over a small graph (100K
>> >> nodes, 1M edges) and found that BSP mode is about 4.5 times faster
>> >> than the MR mode. Please let me know if you'd like to contribute to
>> >> this project by testing MRQL on a larger cluster.
>> >> Best regards,
>> >> Leonidas Fegaras
>> >> University of Texas at Arlington
>> >>
>> >
>> >
>> >
>> > --
>> > Best Regards, Edward J. Yoon
>> > @eddieyoon
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>
>
>

Re: [ANNOUNCEMENT] A query system for BSP processing

Reply via email to