[jira] [Commented] (SPARK-30296) Dataset diffing transformation

Dongjoon Hyun (Jira) Thu, 09 Jan 2020 19:55:44 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-30296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012430#comment-17012430
 ]


Dongjoon Hyun commented on SPARK-30296:
---------------------------------------

Hi, [~EnricoMi].
Please don't set `Fixed Version`. We set that when the committers merge the 
PRs. Also, `New Feature` should have the version of `master` branch, 3.0.0 (as 
of today), because Apache Spark community has a policy which allows 
blackporting bug-fixes only.
- https://spark.apache.org/contributing.html

> Dataset diffing transformation
> ------------------------------
>
>                 Key: SPARK-30296
>                 URL: https://issues.apache.org/jira/browse/SPARK-30296
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Enrico Minack
>            Priority: Major
>
> Evolving Spark code needs frequent regression testing to prove it still 
> produces identical results, or if changes are expected, to investigate those 
> changes. Diffing the Datasets of two code paths provides confidence.
> Diffing small schemata is easy, but with wide schema the Spark query becomes 
> laborious and error-prone. With a single proven and tested method, diffing 
> becomes easier and a more reliable operation. As a Dataset transformation, 
> you get this operation first hand with your Dataset API.
> This has proven to be useful for interactive spark as well as deployed 
> production code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30296) Dataset diffing transformation

Reply via email to