Re: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

Davies Liu Fri, 11 Sep 2015 13:38:05 -0700

Thanks, I'm surprised to see there are so much difference (4x), there
could be something wrong in Spark (some contention between tasks).


On Fri, Sep 11, 2015 at 11:47 AM, Jesse F Chen <jfc...@us.ibm.com> wrote:
>
> @Davies...good question..
>
> > Just be curious how the difference would be if you use 20 executors
> > and 20G memory for each executor..
>
> So I tried the following combinations:
>
> (GB X # executors)  (query response time in secs)
> 20X20 415
> 10X40 230
> 5X80 141
> 4X100 128
> 2X200 104
>
> CPU utilization is high so spreading more JVMs onto more vCores helps in this 
> case.
> For other workloads where memory utilization outweighs CPU, i can see larger 
> JVM
> sizes maybe more beneficial. It's for sure case-by-case.
>
> Seems overhead for codegen and scheduler overhead are negligible.
>
>
>
> Davies Liu ---09/11/2015 10:41:23 AM---On Fri, Sep 11, 2015 at 10:31 AM, 
> Jesse F Chen <jfc...@us.ibm.com> wrote: >
>
> From: Davies Liu <dav...@databricks.com>
> To: Jesse F Chen/San Francisco/IBM@IBMUS
> Cc: "Cheng, Hao" <hao.ch...@intel.com>, Todd <bit1...@163.com>, Michael 
> Armbrust <mich...@databricks.com>, "user@spark.apache.org" 
> <user@spark.apache.org>
> Date: 09/11/2015 10:41 AM
> Subject: Re: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ 
> compared with spark 1.4.1 SQL
>
> ________________________________
>
>
>
> On Fri, Sep 11, 2015 at 10:31 AM, Jesse F Chen <jfc...@us.ibm.com> wrote:
> >
> > Thanks Hao!
> >
> > I tried your suggestion of setting 
> > spark.shuffle.reduceLocality.enabled=false and my initial tests showed 
> > queries are on par between 1.5 and 1.4.1.
> >
> > Results:
> >
> > tpcds-query39b-141.out:query time: 129.106478631 sec
> > tpcds-query39b-150-reduceLocality-false.out:query time: 128.854284296 sec
> > tpcds-query39b-150.out:query time: 572.443151734 sec
> >
> > With default  spark.shuffle.reduceLocality.enabled=true, I am seeing 
> > across-the-board slow down for majority of the TPCDS queries.
> >
> > My test is on a bare metal 20-node cluster. I ran the my test as follows:
> >
> > /TestAutomation/spark-1.5/bin/spark-submit  --master yarn-client  
> > --packages com.databricks:spark-csv_2.10:1.1.0 --name TPCDSSparkSQLHC
> > --conf spark.shuffle.reduceLocality.enabled=false
> > --executor-memory 4096m --num-executors 100
> > --class org.apache.spark.examples.sql.hive.TPCDSSparkSQLHC
> > /TestAutomation/databricks/spark-sql-perf-master/target/scala-2.10/tpcdssparksql_2.10-0.9.jar
> > hdfs://rhel2.cisco.com:8020/user/bigsql/hadoopds100g
> > /TestAutomation/databricks/spark-sql-perf-master/src/main/queries/jesse/query39b.sql
> >
>
> Just be curious how the difference would be if you use 20 executors
> and 20G memory for each executor. Share the same JVM for some tasks,
> could reduce the overhead for codegen and JIT, it may also reduce the
> overhead of `reduceLocality`(it can be easier to schedule the tasks).
>
> >
> >
> >
> > "Cheng, Hao" ---09/11/2015 01:00:28 AM---Can you confirm if the query 
> > really run in the cluster mode? Not the local mode. Can you print the c
> >
> > From: "Cheng, Hao" <hao.ch...@intel.com>
> > To: Todd <bit1...@163.com>
> > Cc: Jesse F Chen/San Francisco/IBM@IBMUS, Michael Armbrust 
> > <mich...@databricks.com>, "user@spark.apache.org" <user@spark.apache.org>
> > Date: 09/11/2015 01:00 AM
> > Subject: RE: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ 
> > compared with spark 1.4.1 SQL
> >
> > ________________________________
> >
> >
> >
> > Can you confirm if the query really run in the cluster mode? Not the local 
> > mode. Can you print the call stack of the executor when the query is 
> > running?
> >
> > BTW: spark.shuffle.reduceLocality.enabled is the configuration of Spark, 
> > not Spark SQL.
> >
> > From: Todd [mailto:bit1...@163.com]
> > Sent: Friday, September 11, 2015 3:39 PM
> > To: Todd
> > Cc: Cheng, Hao; Jesse F Chen; Michael Armbrust; user@spark.apache.org
> > Subject: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ 
> > compared with spark 1.4.1 SQL
> >
> > I add the following two options:
> > spark.sql.planner.sortMergeJoin=false
> > spark.shuffle.reduceLocality.enabled=false
> >
> > But it still performs the same as not setting them two.
> >
> > One thing is that on the spark ui, when I click the SQL tab, it shows an 
> > empty page but the header title 'SQL',there is no table to show queries and 
> > execution plan information.
> >
> >
> >
> >
> > At 2015-09-11 14:39:06, "Todd" <bit1...@163.com> wrote:
> >
> >
> > Thanks Hao.
> > Yes，it is still low as SMJ。Let me try the option your suggested,
> >
> >
> > At 2015-09-11 14:34:46, "Cheng, Hao" <hao.ch...@intel.com> wrote:
> >
> > You mean the performance is still slow as the SMJ in Spark 1.5?
> >
> > Can you set the spark.shuffle.reduceLocality.enabled=false when you start 
> > the spark-shell/spark-sql? It’s a new feature in Spark 1.5, and it’s true 
> > by default, but we found it probably causes the performance reduce 
> > dramatically.
> >
> >
> > From: Todd [mailto:bit1...@163.com]
> > Sent: Friday, September 11, 2015 2:17 PM
> > To: Cheng, Hao
> > Cc: Jesse F Chen; Michael Armbrust; user@spark.apache.org
> > Subject: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with 
> > spark 1.4.1 SQL
> >
> > Thanks Hao for the reply.
> > I turn the merge sort join off, the physical plan is below, but the 
> > performance is roughly the same as it on...
> >
> > == Physical Plan ==
> > TungstenProject 
> > [ss_quantity#10,ss_list_price#12,ss_coupon_amt#19,ss_cdemo_sk#4,ss_item_sk#2,ss_promo_sk#8,ss_sold_date_sk#0]
> > ShuffledHashJoin [ss_item_sk#2], [ss_item_sk#25], BuildRight
> >  TungstenExchange hashpartitioning(ss_item_sk#2)
> >   ConvertToUnsafe
> >    Scan 
> > ParquetRelation[hdfs://ns1/tmp/spark_perf/scaleFactor=30/useDecimal=true/store_sales][ss_promo_sk#8,ss_quantity#10,ss_cdemo_sk#4,ss_list_price#12,ss_coupon_amt#19,ss_item_sk#2,ss_sold_date_sk#0]
> >  TungstenExchange hashpartitioning(ss_item_sk#25)
> >   ConvertToUnsafe
> >    Scan 
> > ParquetRelation[hdfs://ns1/tmp/spark_perf/scaleFactor=30/useDecimal=true/store_sales][ss_item_sk#25]
> >
> > Code Generation: true
> >
> >
> >
> > At 2015-09-11 13:48:23, "Cheng, Hao" <hao.ch...@intel.com> wrote:
> >
> > This is not a big surprise the SMJ is slower than the HashJoin, as we do 
> > not fully utilize the sorting yet, more details can be found at 
> > https://issues.apache.org/jira/browse/SPARK-2926 .
> >
> > Anyway, can you disable the sort merge join by 
> > “spark.sql.planner.sortMergeJoin=false;” in Spark 1.5, and run the query 
> > again? In our previous testing, it’s about 20% slower for sort merge join. 
> > I am not sure if there anything else slow down the performance.
> >
> > Hao
> >
> >
> > From: Jesse F Chen [mailto:jfc...@us.ibm.com]
> > Sent: Friday, September 11, 2015 1:18 PM
> > To: Michael Armbrust
> > Cc: Todd; user@spark.apache.org
> > Subject: Re: spark 1.5 SQL slows down dramatically by 50%+ compared with 
> > spark 1.4.1 SQL
> >
> >
> > Could this be a build issue (i.e., sbt package)?
> >
> > If I ran the same jar build for 1.4.1 in 1.5, I am seeing large regression 
> > too in queries (all other things identical)...
> >
> > I am curious, to build 1.5 (when it isn't released yet), what do I need to 
> > do with the build.sbt file?
> >
> > any special parameters i should be using to make sure I load the latest 
> > hive dependencies?
> >
> > Michael Armbrust ---09/10/2015 11:07:28 AM---I've been running TPC-DS 
> > SF=1500 daily on Spark 1.4.1 and Spark 1.5 on S3, so this is surprising.  I
> >
> > From: Michael Armbrust <mich...@databricks.com>
> > To: Todd <bit1...@163.com>
> > Cc: "user@spark.apache.org" <user@spark.apache.org>
> > Date: 09/10/2015 11:07 AM
> > Subject: Re: spark 1.5 SQL slows down dramatically by 50%+ compared with 
> > spark 1.4.1 SQL
> >
> > ________________________________
> >
> >
> >
> >
> > I've been running TPC-DS SF=1500 daily on Spark 1.4.1 and Spark 1.5 on S3, 
> > so this is surprising.  In my experiments Spark 1.5 is either the same or 
> > faster than 1.4 with only small exceptions.  A few thoughts,
> >
> > - 600 partitions is probably way too many for 6G of data.
> > - Providing the output of explain for both runs would be helpful whenever 
> > reporting performance changes.
> >
> > On Thu, Sep 10, 2015 at 1:24 AM, Todd <bit1...@163.com> wrote:
> >
> > Hi,
> >
> > I am using data generated with 
> > sparksqlperf(https://github.com/databricks/spark-sql-perf
> ) to test the spark sql performance (spark on yarn, with 10 nodes) with the 
> following code (The table store_sales is about 90 million records, 6G in size）
>
> >
> > val 
> > outputDir="hdfs://tmp/spark_perf/scaleFactor=30/useDecimal=true/store_sales"
> > val name="store_sales"
> >    sqlContext.sql(
> >      s"""
> >          |CREATE TEMPORARY TABLE ${name}
> >          |USING org.apache.spark.sql.parquet
> >          |OPTIONS (
> >          |  path '${outputDir}'
> >          |)
> >        """.stripMargin)
> >
> > val sql="""
> >         |select
> >         |  t1.ss_quantity,
> >         |  t1.ss_list_price,
> >         |  t1.ss_coupon_amt,
> >         |  t1.ss_cdemo_sk,
> >         |  t1.ss_item_sk,
> >         |  t1.ss_promo_sk,
> >         |  t1.ss_sold_date_sk
> >         |from store_sales t1 join store_sales t2 on t1.ss_item_sk = 
> > t2.ss_item_sk
> >         |where
> >         |  t1.ss_sold_date_sk between 2450815 and 2451179
> >       """.stripMargin
> >
> > val df = sqlContext.sql(sql)
> > df.rdd.foreach(row=>Unit)
> >
> > With 1.4.1, I can finish the query in 6 minutes,  but  I need 10+ minutes 
> > with 1.5.
> >
> > The configuration are basically the same, since I copy the configuration 
> > from 1.4.1 to 1.5:
> >
> > sparkVersion    1.4.1        1.5.0
> > scaleFactor    30        30
> > spark.sql.shuffle.partitions    600        600
> > spark.sql.sources.partitionDiscovery.enabled    true        true
> > spark.default.parallelism    200        200
> > spark.driver.memory    4G    4G        4G
> > spark.executor.memory    4G        4G
> > spark.executor.instances    10        10
> > spark.shuffle.consolidateFiles    true        true
> > spark.storage.memoryFraction    0.4        0.4
> > spark.executor.cores    3        3
> >
> > I am not sure where is going wrong,any ideas?
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

Reply via email to