[jira] [Commented] (SPARK-25156) Same query returns different result

Dongjoon Hyun (JIRA) Sun, 19 Aug 2018 19:18:59 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-25156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16585350#comment-16585350
 ]


Dongjoon Hyun commented on SPARK-25156:
---------------------------------------

Hi, [~leeyh0216]. Yes. It looks like SPARK-23207 and SPARK-23243.

> Same query returns different result
> -----------------------------------
>
>                 Key: SPARK-25156
>                 URL: https://issues.apache.org/jira/browse/SPARK-25156
>             Project: Spark
>          Issue Type: Question
>          Components: Spark Core
>    Affects Versions: 2.1.1
>         Environment: * Spark Version: 2.1.1
>  * Java Version: Java 7
>  * Scala Version: 2.11.8
>            Reporter: Yonghwan Lee
>            Priority: Major
>              Labels: Question
>
> I performed two joins and two left outer join on five tables.
> There are several different results when you run the same query multiple 
> times.
> Table A
>   
> ||Column a||Column b||Column c||Column d||
> |Long(nullable: false)|Integer(nullable: false)|String(nullable: 
> true)|String(nullable: false)|
> Table B
> ||Column a||Column b||
> |Long(nullable: false)|String(nullable: false)|
> Table C
> ||Column a||Column b||
> |Integer(nullable: false)|String(nullable: false)|
> Table D
> ||Column a||Column b||Column c||
> |Long(nullable: true)|Long(nullable: false)|Integer(nullable: false)|
> Table E
> ||Column a||Column b||Column c||
> |Long(nullable: false)|Integer(nullable: false)|String|
> Query(Spark SQL)
> {code:java}
> select A.c, B.b, C.b, D.c, E.c
> inner join B on A.a = B.a
> inner join C on A.b = C.a
> left outer join D on A.d <=> cast(D.a as string)
> left outer join E on D.b = E.a and D.c = E.b{code}
>  
> I performed above query 10 times, it returns 7 times correct result(count: 
> 830001460) and 3 times incorrect result(count: 830001299)
>  
> + I execute 
> {code:java}
> sql("set spark.sql.shuffle.partitions=801"){code}
> before execute query.
> A, B Table has lot of rows but C Table has small dataset, so when i saw 
> physical plan, A<-> B join performed with SortMergeJoin and (A,B) <-> C join 
> performed with Broadcast hash join.
>  
> And now, i removed set spark.sql.shuffle.partitions statement, it works fine.
> Is this spark sql's bug?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25156) Same query returns different result

Reply via email to