[ https://issues.apache.org/jira/browse/SPARK-25156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16585350#comment-16585350 ]
Dongjoon Hyun commented on SPARK-25156: --------------------------------------- Hi, [~leeyh0216]. Yes. It looks like SPARK-23207 and SPARK-23243. > Same query returns different result > ----------------------------------- > > Key: SPARK-25156 > URL: https://issues.apache.org/jira/browse/SPARK-25156 > Project: Spark > Issue Type: Question > Components: Spark Core > Affects Versions: 2.1.1 > Environment: * Spark Version: 2.1.1 > * Java Version: Java 7 > * Scala Version: 2.11.8 > Reporter: Yonghwan Lee > Priority: Major > Labels: Question > > I performed two joins and two left outer join on five tables. > There are several different results when you run the same query multiple > times. > Table A > > ||Column a||Column b||Column c||Column d|| > |Long(nullable: false)|Integer(nullable: false)|String(nullable: > true)|String(nullable: false)| > Table B > ||Column a||Column b|| > |Long(nullable: false)|String(nullable: false)| > Table C > ||Column a||Column b|| > |Integer(nullable: false)|String(nullable: false)| > Table D > ||Column a||Column b||Column c|| > |Long(nullable: true)|Long(nullable: false)|Integer(nullable: false)| > Table E > ||Column a||Column b||Column c|| > |Long(nullable: false)|Integer(nullable: false)|String| > Query(Spark SQL) > {code:java} > select A.c, B.b, C.b, D.c, E.c > inner join B on A.a = B.a > inner join C on A.b = C.a > left outer join D on A.d <=> cast(D.a as string) > left outer join E on D.b = E.a and D.c = E.b{code} > > I performed above query 10 times, it returns 7 times correct result(count: > 830001460) and 3 times incorrect result(count: 830001299) > > + I execute > {code:java} > sql("set spark.sql.shuffle.partitions=801"){code} > before execute query. > A, B Table has lot of rows but C Table has small dataset, so when i saw > physical plan, A<-> B join performed with SortMergeJoin and (A,B) <-> C join > performed with Broadcast hash join. > > And now, i removed set spark.sql.shuffle.partitions statement, it works fine. > Is this spark sql's bug? -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org