[ 
https://issues.apache.org/jira/browse/SPARK-25585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-25585.
----------------------------------
    Resolution: Incomplete

> Allow users to specify scale of result in Decimal arithmetic
> ------------------------------------------------------------
>
>                 Key: SPARK-25585
>                 URL: https://issues.apache.org/jira/browse/SPARK-25585
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Benito Kestelman
>            Priority: Major
>              Labels: bulk-closed, decimal, usability
>
> The current behavior of Spark Decimal during arithmetic makes it difficult 
> for users to achieve their desired level of precision. Numeric literals are 
> automatically cast to unlimited precision during arithmetic, but the final 
> result is cast down depending on the precision and scale of the operands, 
> according to MS SQL rules (discussed in other JIRA's). This final cast can 
> cause substantial loss of scale.
> For example:
> {noformat}
> scala> spark.sql("select 1.3/3.41").show(false)
> +--------------------------------------------------------+
> |(CAST(1.3 AS DECIMAL(3,2)) / CAST(3.41 AS DECIMAL(3,2)))|
> +--------------------------------------------------------+
> |0.381232                                                |
> +--------------------------------------------------------+{noformat}
> To get higher scale in the result, a user must cast the operands to higher 
> scale:
> {noformat}
> scala> spark.sql("select cast(1.3 as decimal(5,4))/cast(3.41 as 
> decimal(5,4))").show(false)
> +--------------------------------------------------------+
> |(CAST(1.3 AS DECIMAL(5,4)) / CAST(3.41 AS DECIMAL(5,4)))|
> +--------------------------------------------------------+
> |0.3812316716                                            |
> +--------------------------------------------------------+
> scala> spark.sql("select cast(1.3 as decimal(10,9))/cast(3.41 as 
> decimal(10,9))").show(false)
> +----------------------------------------------------------+
> |(CAST(1.3 AS DECIMAL(10,9)) / CAST(3.41 AS DECIMAL(10,9)))|
> +----------------------------------------------------------+
> |0.38123167155425219941                                    |
> +----------------------------------------------------------+{noformat}
> But if the user casts too high, the result's scale decreases. 
> {noformat}
> scala> spark.sql("select cast(1.3 as decimal(25,24))/cast(3.41 as 
> decimal(25,24))").show(false)
> +------------------------------------------------------------+
> |(CAST(1.3 AS DECIMAL(25,24)) / CAST(3.41 AS DECIMAL(25,24)))|
> +------------------------------------------------------------+
> |0.3812316715543                                             |
> +------------------------------------------------------------+{noformat}
> Thus, the user has no way of knowing how to cast to get the scale he wants. 
> This problem is even harder to deal with when using variables instead of 
> literals. 
> The user should be able to explicitly set the desired scale of the result. 
> MySQL offers this capability in the form of a system variable called 
> "div_precision_increment."
> From the MySQL docs: "In division performed with 
> [{{/}}|https://dev.mysql.com/doc/refman/8.0/en/arithmetic-functions.html#operator_divide],
>  the scale of the result when using two exact-value operands is the scale of 
> the first operand plus the value of the 
> [{{div_precision_increment}}|https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_div_precision_increment]
>  system variable (which is 4 by default). For example, the result of the 
> expression {{5.05 / 0.014}} has a scale of six decimal places 
> ({{360.714286}})."
> {noformat}
> mysql> SELECT 1/7;
> +--------+
> | 1/7    |
> +--------+
> | 0.1429 |
> +--------+
> mysql> SET div_precision_increment = 12;
> mysql> SELECT 1/7;
> +----------------+
> | 1/7            |
> +----------------+
> | 0.142857142857 |
> +----------------+{noformat}
> This gives the user full control of the result's scale after arithmetic and 
> obviates the need for casting all over the place.
> Since Spark 2.3, we already have DecimalType.MINIMUM_ADJUSTED_SCALE, which is 
> similar to div_precision_increment. It just needs to be made modifiable by 
> the user. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to