Thanks @Alexey Romanenko<mailto:aromanenko....@gmail.com> for this info. Do we 
have a rough idea how Beam (on spark) compares with native Spark by using TPCDS 
or any benchmarks? I am just wondering if run Beam sql with Spark runner will 
have a similar processing time compared with Spark sql. Thanks!

From: Alexey Romanenko <aromanenko....@gmail.com>
Reply-To: "user@beam.apache.org" <user@beam.apache.org>
Date: Tuesday, March 23, 2021 at 12:58 PM
To: "user@beam.apache.org" <user@beam.apache.org>
Subject: Re: Is there a perf comparison between Beam (on spark) and native 
Spark?

There is an extension in Beam to support TPC-DS benchmark [1] that basically 
runs TPC-DS SQL queries via Beam SQL. Though, I’m not sure if it runs regularly 
and, IIRC (when I took a look on this last time, maybe I’m mistaken), it 
requires some adjustments to run on any other runners than Dataflow. Also, when 
I tried to run it on SparkRunner many queries failed because of different 
reasons [2].

I believe that if we will manage to make it running for most of the queries on 
any runner then it will be a good addition to Nexmark benchmark that we have 
for now since TPC-DS results can be used to compare with other data processing 
systems as well.

[1] 
https://github.com/apache/beam/tree/master/sdks/java/testing/tpcds<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fbeam%2Ftree%2Fmaster%2Fsdks%2Fjava%2Ftesting%2Ftpcds&data=04%7C01%7Ctaol%40zillow.com%7C3a7b26c3aead4633412408d8ee361603%7C033464830d1840e7a5883784ac50e16f%7C0%7C1%7C637521263368804132%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=4Tjd1BcEHRJQUsH9DK1ASVM496nNaqZGetFD4%2F46B7k%3D&reserved=0>
[2] 
https://issues.apache.org/jira/browse/BEAM-9891<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FBEAM-9891&data=04%7C01%7Ctaol%40zillow.com%7C3a7b26c3aead4633412408d8ee361603%7C033464830d1840e7a5883784ac50e16f%7C0%7C1%7C637521263368804132%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ibmzJ3cPSHzDjVPBR4A5jTQTs2O2obmh%2FDQG2X3UBSg%3D&reserved=0>


On 22 Mar 2021, at 18:00, Tao Li <t...@zillow.com<mailto:t...@zillow.com>> 
wrote:

Hi Beam community,

I am wondering if there is a doc to compare perf of Beam (on Spark) and native 
spark for batch processing? For example using TPCDS benmark.

I did find some relevant links like 
this<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Farchive.fosdem.org%2F2018%2Fschedule%2Fevent%2Fnexmark_benchmarking_suite%2Fattachments%2Fslides%2F2494%2Fexport%2Fevents%2Fattachments%2Fnexmark_benchmarking_suite%2Fslides%2F2494%2FNexmark_Suite_for_Apache_Beam_(FOSDEM18).pdf&data=04%7C01%7Ctaol%40zillow.com%7C3a7b26c3aead4633412408d8ee361603%7C033464830d1840e7a5883784ac50e16f%7C0%7C1%7C637521263368814090%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=4Dk5m6rlS8MLhHhiCY42bbGM3qZ2tzRQVxihL1TnL%2BU%3D&reserved=0>
 but it’s old and it mostly covers the streaming scenarios.

Thanks!

Reply via email to