Marco Gaido created SPARK-28610:
-----------------------------------

             Summary: Support larger buffer for sum of long
                 Key: SPARK-28610
                 URL: https://issues.apache.org/jira/browse/SPARK-28610
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.0.0
            Reporter: Marco Gaido


The sum of a long field currently uses a buffer of type long.

When the flag for throwing exceptions on overflow for arithmetic operations in 
turned on, this is a problem in case there are intermediate overflows which are 
then resolved by other rows. Indeed, in such a case, we are throwing an 
exception, while the result is representable in a long value. An example of 
this issue can be seen running:

{code}
val df = sc.parallelize(Seq(100L, Long.MaxValue, -1000L)).toDF("a")
df.select(sum($"a")).show()
{code}

According to [~cloud_fan]'s suggestion in 
https://github.com/apache/spark/pull/21599, we should introduce a flag in order 
to let users choose among a wider datatype for the sum buffer using a config, 
so that the above issue can be fixed.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to