Re: Apache Flink transactions

Aljoscha Krettek Tue, 09 Jun 2015 02:30:06 -0700

Hi,
we don't have any current performance numbers. But the queries mentioned on
the benchmark page should be easy to implement in Flink. It could be
interesting if someone ported these queries and ran them with exactly the
same data on the same machines.


Bill Sparks wrote on the mailing list some days ago (
http://mail-archives.apache.org/mod_mbox/flink-user/201506.mbox/%3cd1972778.64426%25jspa...@cray.com%3e).
He seems to be running some tests to compare Flink, Spark and MapReduce.

Regards,
Aljoscha

On Mon, Jun 8, 2015 at 9:09 PM, Hawin Jiang <hawin.ji...@gmail.com> wrote:

> Hi Aljoscha
>
> I want to know what is the apache flink performance if I run the same SQL
> as below.
> Do you have any apache flink benchmark information?
> Such as: https://amplab.cs.berkeley.edu/benchmark/
> Thanks.
>
>
>
> SELECT pageURL, pageRank FROM rankings WHERE pageRank > X
>
> Query 1A
> 32,888 resultsQuery 1B
> 3,331,851 resultsQuery 1C
> 89,974,976 results05101520253035404550Redshift (HDD)Impala - DiskImpala -
> MemShark - DiskShark - MemHiveTez0510152025303540455055Redshift
> (HDD)Impala - DiskImpala - MemShark - DiskShark - 
> MemHiveTez0510152025303540Redshift
> (HDD)Impala - DiskImpala - MemShark - DiskShark - MemHiveTezOld DataMedian
> Response Time (s)Redshift (HDD) - Current2.492.619.46Impala - Disk - 1.2.3
> 12.01512.01537.085Impala - Mem - 1.2.32.173.0136.04Shark - Disk - 0.8.16.6
> 722.4Shark - Mem - 0.8.11.71.83.6Hive - 0.12 YARN50.4959.9343.34Tez -
> 0.2.028.2236.3526.44
>
>
> On Mon, Jun 8, 2015 at 2:03 AM, Aljoscha Krettek <aljos...@apache.org>
> wrote:
>
>> Hi,
>> actually, what do you want to know about Flink SQL?
>>
>> Aljoscha
>>
>> On Sat, Jun 6, 2015 at 2:22 AM, Hawin Jiang <hawin.ji...@gmail.com>
>> wrote:
>> > Thanks all
>> >
>> > Actually, I want to know more info about Flink SQL and Flink performance
>> > Here is the Spark benchmark. Maybe you already saw it before.
>> > https://amplab.cs.berkeley.edu/benchmark/
>> >
>> > Thanks.
>> >
>> >
>> >
>> > Best regards
>> > Hawin
>> >
>> >
>> >
>> > On Fri, Jun 5, 2015 at 1:35 AM, Fabian Hueske <fhue...@gmail.com>
>> wrote:
>> >>
>> >> If you want to append data to a data set that is store as files (e.g.,
>> on
>> >> HDFS), you can go for a directory structure as follows:
>> >>
>> >> dataSetRootFolder
>> >>   - part1
>> >>     - 1
>> >>     - 2
>> >>     - ...
>> >>   - part2
>> >>     - 1
>> >>     - ...
>> >>   - partX
>> >>
>> >> Flink's file format supports recursive directory scans such that you
>> can
>> >> add new subfolders to dataSetRootFolder and read the full data set.
>> >>
>> >> 2015-06-05 9:58 GMT+02:00 Aljoscha Krettek <aljos...@apache.org>:
>> >>>
>> >>> Hi,
>> >>> I think the example could be made more concise by using the Table API.
>> >>> http://ci.apache.org/projects/flink/flink-docs-master/libs/table.html
>> >>>
>> >>> Please let us know if you have questions about that, it is still quite
>> >>> new.
>> >>>
>> >>> On Fri, Jun 5, 2015 at 9:03 AM, hawin <hawin.ji...@gmail.com> wrote:
>> >>> > Hi Aljoscha
>> >>> >
>> >>> > Thanks for your reply.
>> >>> > Do you have any tips for Flink SQL.
>> >>> > I know that Spark support ORC format. How about Flink SQL?
>> >>> > BTW, for TPCHQuery10 example, you have implemented it by 231 lines
>> of
>> >>> > code.
>> >>> > How to make that as simple as possible by flink.
>> >>> > I am going to use Flink in my future project.  Sorry for so many
>> >>> > questions.
>> >>> > I believe that you guys will make a world difference.
>> >>> >
>> >>> >
>> >>> > @Chiwan
>> >>> > You made a very good example for me.
>> >>> > Thanks a lot
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> > --
>> >>> > View this message in context:
>> >>> >
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Re-Apache-Flink-transactions-tp1457p1494.html
>> >>> > Sent from the Apache Flink User Mailing List archive. mailing list
>> >>> > archive at Nabble.com.
>> >>
>> >>
>> >
>>
>
>

Re: Apache Flink transactions

Reply via email to