Thanks @Alexey Romanenko<> for this info. Do we 
have a rough idea how Beam (on spark) compares with native Spark by using TPCDS 
or any benchmarks? I am just wondering if run Beam sql with Spark runner will 
have a similar processing time compared with Spark sql. Thanks!

From: Alexey Romanenko <>
Reply-To: "" <>
Date: Tuesday, March 23, 2021 at 12:58 PM
To: "" <>
Subject: Re: Is there a perf comparison between Beam (on spark) and native 

There is an extension in Beam to support TPC-DS benchmark [1] that basically 
runs TPC-DS SQL queries via Beam SQL. Though, I’m not sure if it runs regularly 
and, IIRC (when I took a look on this last time, maybe I’m mistaken), it 
requires some adjustments to run on any other runners than Dataflow. Also, when 
I tried to run it on SparkRunner many queries failed because of different 
reasons [2].

I believe that if we will manage to make it running for most of the queries on 
any runner then it will be a good addition to Nexmark benchmark that we have 
for now since TPC-DS results can be used to compare with other data processing 
systems as well.


On 22 Mar 2021, at 18:00, Tao Li <<>> 

Hi Beam community,

I am wondering if there is a doc to compare perf of Beam (on Spark) and native 
spark for batch processing? For example using TPCDS benmark.

I did find some relevant links like 
 but it’s old and it mostly covers the streaming scenarios.


Reply via email to