peter.zhang created SPARK-4217:
----------------------------------

             Summary: Result of SparkSQL is incorrect after a table join and 
group by operation
                 Key: SPARK-4217
                 URL: https://issues.apache.org/jira/browse/SPARK-4217
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.1.0
         Environment: Hadoop 2.2.0
Spark1.1
            Reporter: peter.zhang
            Priority: Critical


I runed a test using same SQL script in SparkSQL, Shark and Hive environment as 
below
---------------------------------------------------------------
select c.theyear, sum(b.amount)
from tblstock a
join tblStockDetail b on a.ordernumber = b.ordernumber
join tbldate c on a.dateid = c.dateid
group by c.theyear;


result of hive/shark:
theyear _c1
2004    1403018
2005    5557850
2006    7203061
2007    11300432
2008    12109328
2009    5365447
2010    188944

result of SparkSQL:
2010    210924
2004    3265696
2005    13247234
2006    13670416
2007    16711974
2008    14670698
2009    6322137

I'll attach test data soon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to