Thanks Hao!

  I tried your suggestion of setting spark.shuffle.reduceLocality.enabled
=false and my initial tests showed queries are on par between 1.5 and
1.4.1.

  Results:

tpcds-query39b-141.out:query time: 129.106478631 sec
tpcds-query39b-150-reduceLocality-false.out:query time: 128.854284296 sec
tpcds-query39b-150.out:query time: 572.443151734 sec

With default  spark.shuffle.reduceLocality.enabled=true, I am seeing
across-the-board slow down for majority of the TPCDS queries.

My test is on a bare metal 20-node cluster. I ran the my test as follows:

/TestAutomation/spark-1.5/bin/spark-submit  --master yarn-client
--packages com.databricks:spark-csv_2.10:1.1.0 --name TPCDSSparkSQLHC
--conf spark.shuffle.reduceLocality.enabled=false
--executor-memory 4096m --num-executors 100
--class org.apache.spark.examples.sql.hive.TPCDSSparkSQLHC
/TestAutomation/databricks/spark-sql-perf-master/target/scala-2.10/tpcdssparksql_2.10-0.9.jar

hdfs://rhel2.cisco.com:8020/user/bigsql/hadoopds100g
/TestAutomation/databricks/spark-sql-perf-master/src/main/queries/jesse/query39b.sql






From:   "Cheng, Hao" <hao.ch...@intel.com>
To:     Todd <bit1...@163.com>
Cc:     Jesse F Chen/San Francisco/IBM@IBMUS, Michael Armbrust
            <mich...@databricks.com>, "user@spark.apache.org"
            <user@spark.apache.org>
Date:   09/11/2015 01:00 AM
Subject:        RE: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by
            50%+ compared with spark 1.4.1 SQL



Can you confirm if the query really run in the cluster mode? Not the local
mode. Can you print the call stack of the executor when the query is
running?

BTW: spark.shuffle.reduceLocality.enabled is the configuration of Spark,
not Spark SQL.

From: Todd [mailto:bit1...@163.com]
Sent: Friday, September 11, 2015 3:39 PM
To: Todd
Cc: Cheng, Hao; Jesse F Chen; Michael Armbrust; user@spark.apache.org
Subject: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+
compared with spark 1.4.1 SQL

I add the following two options:
spark.sql.planner.sortMergeJoin=false
spark.shuffle.reduceLocality.enabled=false

But it still performs the same as not setting them two.

One thing is that on the spark ui, when I click the SQL tab, it shows an
empty page but the header title 'SQL',there is no table to show queries and
execution plan information.




At 2015-09-11 14:39:06, "Todd" <bit1...@163.com> wrote:


 Thanks Hao.
  Yes,it is still low as SMJ。Let me try the option your suggested,


 At 2015-09-11 14:34:46, "Cheng, Hao" <hao.ch...@intel.com> wrote:

  You mean the performance is still slow as the SMJ in Spark 1.5?

  Can you set the spark.shuffle.reduceLocality.enabled=false when you start
  the spark-shell/spark-sql? It’s a new feature in Spark 1.5, and it’s true
  by default, but we found it probably causes the performance reduce
  dramatically.


  From: Todd [mailto:bit1...@163.com]
  Sent: Friday, September 11, 2015 2:17 PM
  To: Cheng, Hao
  Cc: Jesse F Chen; Michael Armbrust; user@spark.apache.org
  Subject: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared
  with spark 1.4.1 SQL

  Thanks Hao for the reply.
  I turn the merge sort join off, the physical plan is below, but the
  performance is roughly the same as it on...

  == Physical Plan ==
  TungstenProject
  
[ss_quantity#10,ss_list_price#12,ss_coupon_amt#19,ss_cdemo_sk#4,ss_item_sk#2,ss_promo_sk#8,ss_sold_date_sk#0]

   ShuffledHashJoin [ss_item_sk#2], [ss_item_sk#25], BuildRight
    TungstenExchange hashpartitioning(ss_item_sk#2)
     ConvertToUnsafe
      Scan ParquetRelation
  
[hdfs://ns1/tmp/spark_perf/scaleFactor=30/useDecimal=true/store_sales][ss_promo_sk#8,ss_quantity#10,ss_cdemo_sk#4,ss_list_price#12,ss_coupon_amt#19,ss_item_sk#2,ss_sold_date_sk#0]

    TungstenExchange hashpartitioning(ss_item_sk#25)
     ConvertToUnsafe
      Scan ParquetRelation
  
[hdfs://ns1/tmp/spark_perf/scaleFactor=30/useDecimal=true/store_sales][ss_item_sk#25]


  Code Generation: true



  At 2015-09-11 13:48:23, "Cheng, Hao" <hao.ch...@intel.com> wrote:
  This is not a big surprise the SMJ is slower than the HashJoin, as we do
  not fully utilize the sorting yet, more details can be found at
  https://issues.apache.org/jira/browse/SPARK-2926 .

  Anyway, can you disable the sort merge join by
  “spark.sql.planner.sortMergeJoin=false;” in Spark 1.5, and run the query
  again? In our previous testing, it’s about 20% slower for sort merge
  join. I am not sure if there anything else slow down the performance.

  Hao


  From: Jesse F Chen [mailto:jfc...@us.ibm.com]
  Sent: Friday, September 11, 2015 1:18 PM
  To: Michael Armbrust
  Cc: Todd; user@spark.apache.org
  Subject: Re: spark 1.5 SQL slows down dramatically by 50%+ compared with
  spark 1.4.1 SQL



  Could this be a build issue (i.e., sbt package)?

  If I ran the same jar build for 1.4.1 in 1.5, I am seeing large
  regression too in queries (all other things identical)...

  I am curious, to build 1.5 (when it isn't released yet), what do I need
  to do with the build.sbt file?

  any special parameters i should be using to make sure I load the latest
  hive dependencies?

  Inactive hide details for Michael Armbrust ---09/10/2015 11:07:28
  AM---I've been running TPC-DS SF=1500 daily on Spark 1.4.1 anMichael
  Armbrust ---09/10/2015 11:07:28 AM---I've been running TPC-DS SF=1500
  daily on Spark 1.4.1 and Spark 1.5 on S3, so this is surprising.  I

  From: Michael Armbrust <mich...@databricks.com>
  To: Todd <bit1...@163.com>
  Cc: "user@spark.apache.org" <user@spark.apache.org>
  Date: 09/10/2015 11:07 AM
  Subject: Re: spark 1.5 SQL slows down dramatically by 50%+ compared with
  spark 1.4.1 SQL




  I've been running TPC-DS SF=1500 daily on Spark 1.4.1 and Spark 1.5 on
  S3, so this is surprising.  In my experiments Spark 1.5 is either the
  same or faster than 1.4 with only small exceptions.  A few thoughts,

   - 600 partitions is probably way too many for 6G of data.
   - Providing the output of explain for both runs would be helpful
  whenever reporting performance changes.

  On Thu, Sep 10, 2015 at 1:24 AM, Todd <bit1...@163.com> wrote:
        Hi,

        I am using data generated with sparksqlperf(
        https://github.com/databricks/spark-sql-perf) to test the spark sql
        performance (spark on yarn, with 10 nodes) with the following code
        (The table store_sales is about 90 million records, 6G in size)

        val
        
outputDir="hdfs://tmp/spark_perf/scaleFactor=30/useDecimal=true/store_sales"

        val name="store_sales"
            sqlContext.sql(
              s"""
                  |CREATE TEMPORARY TABLE ${name}
                  |USING org.apache.spark.sql.parquet
                  |OPTIONS (
                  |  path '${outputDir}'
                  |)
                """.stripMargin)

        val sql="""
                 |select
                 |  t1.ss_quantity,
                 |  t1.ss_list_price,
                 |  t1.ss_coupon_amt,
                 |  t1.ss_cdemo_sk,
                 |  t1.ss_item_sk,
                 |  t1.ss_promo_sk,
                 |  t1.ss_sold_date_sk
                 |from store_sales t1 join store_sales t2 on t1.ss_item_sk
        = t2.ss_item_sk
                 |where
                 |  t1.ss_sold_date_sk between 2450815 and 2451179
               """.stripMargin

        val df = sqlContext.sql(sql)
        df.rdd.foreach(row=>Unit)

        With 1.4.1, I can finish the query in 6 minutes,  but  I need 10+
        minutes with 1.5.

        The configuration are basically the same, since I copy the
        configuration from 1.4.1 to 1.5:

        sparkVersion    1.4.1        1.5.0
        scaleFactor    30        30
        spark.sql.shuffle.partitions    600        600
        spark.sql.sources.partitionDiscovery.enabled    true        true
        spark.default.parallelism    200        200
        spark.driver.memory    4G    4G        4G
        spark.executor.memory    4G        4G
        spark.executor.instances    10        10
        spark.shuffle.consolidateFiles    true        true
        spark.storage.memoryFraction    0.4        0.4
        spark.executor.cores    3        3

        I am not sure where is going wrong,any ideas?

Reply via email to