johnhany97 opened a new pull request #26738: [SPARK-30082] Do not replace Zeros 
when replacing NaNs
URL: https://github.com/apache/spark/pull/26738
 
 
   Fixes https://issues.apache.org/jira/browse/SPARK-30082
   
   ### What changes were proposed in this pull request?
   Do not cast `NaN` to an `Integer`, `Long`, `Short` or `Byte`. This is 
because casting `NaN` to those types results in a `0` which erroneously 
replaces `0`s while only `NaN`s should be replaced.
   
   
   ### Why are the changes needed?
   This Scala code snippet:
   ```
   import scala.math;
   
   println(Double.NaN.toLong)
   ```
   returns `0` which is problematic as if you run the following Spark code, 
`0`s get replaced as well:
   ```
   >>> df = spark.createDataFrame([(1.0, 0), (0.0, 3), (float('nan'), 0)], 
("index", "value"))
   >>> df.show()
   +-----+-----+
   |index|value|
   +-----+-----+
   |  1.0|    0|
   |  0.0|    3|
   |  NaN|    0|
   +-----+-----+
   >>> df.replace(float('nan'), 2).show()
   +-----+-----+
   |index|value|
   +-----+-----+
   |  1.0|    2|
   |  0.0|    3|
   |  2.0|    2|
   +-----+-----+ 
   ```
   
   ### Does this PR introduce any user-facing change?
   Yes, after the PR, running the same above code snippet returns the correct 
expected results:
   ```
   >>> df = spark.createDataFrame([(1.0, 0), (0.0, 3), (float('nan'), 0)], 
("index", "value"))
   >>> df.show()
   +-----+-----+
   |index|value|
   +-----+-----+
   |  1.0|    0|
   |  0.0|    3|
   |  NaN|    0|
   +-----+-----+
   
   >>> df.replace(float('nan'), 2).show()
   +-----+-----+
   |index|value|
   +-----+-----+
   |  1.0|    0|
   |  0.0|    3|
   |  2.0|    0|
   +-----+-----+
   ```
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some 
test cases that check the changes thoroughly including negative and positive 
cases if possible.
   If it was tested in a way different from regular unit tests, please clarify 
how you tested step by step, ideally copy and paste-able, so that other 
reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why 
it was difficult to add.
   -->
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to