peter.zhang created SPARK-4217: ---------------------------------- Summary: Result of SparkSQL is incorrect after a table join and group by operation Key: SPARK-4217 URL: https://issues.apache.org/jira/browse/SPARK-4217 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.0 Environment: Hadoop 2.2.0 Spark1.1 Reporter: peter.zhang Priority: Critical
I runed a test using same SQL script in SparkSQL, Shark and Hive environment as below --------------------------------------------------------------- select c.theyear, sum(b.amount) from tblstock a join tblStockDetail b on a.ordernumber = b.ordernumber join tbldate c on a.dateid = c.dateid group by c.theyear; result of hive/shark: theyear _c1 2004 1403018 2005 5557850 2006 7203061 2007 11300432 2008 12109328 2009 5365447 2010 188944 result of SparkSQL: 2010 210924 2004 3265696 2005 13247234 2006 13670416 2007 16711974 2008 14670698 2009 6322137 I'll attach test data soon -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org