[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-26 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21598 @cloud-fan This merge breaks the build? https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92364/consoleFull ---

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-26 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21598 thanks, merging to master! @HyukjinKwon the discussion is specific to this PR. We've changed a bunch of buggy behaviors in this release. ---

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21598 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21598 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92354/ Test PASSed. ---

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21598 **[Test build #92354 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92354/testReport)** for PR 21598 at commit

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-26 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21598 LGTM and WFT --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-26 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21598 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21598 **[Test build #92354 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92354/testReport)** for PR 21598 at commit

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21598 I don't object about the default value here if it's specific to this JIRA and PR for clarification as I said above. If this says about general stuff, to me it sounds it needs a separate

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21598 @rxin, I understand the concerns about having behaviour changes. I discussed about this with other committers and community a lot. So, do you basically imply that we should support to

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-26 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21598 Here: https://en.wikipedia.org/wiki/Bug_compatibility On Tue, Jun 26, 2018 at 9:28 AM Reynold Xin wrote: > It’s actually common software engineering practice to keep

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-26 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21598 It’s actually common software engineering practice to keep “buggy” semantics if a bug has been out there long enough and a lot of applications depend on the semantics. On Tue, Jun

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21598 I don't think it's too discretionary. We have a safeguard to control the behaviour. Spark mentions it in the migration guide. In case of changing public interface which breaks binary or source

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-25 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/21598 > case by case Yes, but... this by itself makes the decision appear far too discretionary. Instead, in any PR where you are changing the published interface or behavior of part of

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21598 > Based on my understanding, the decision is made case by case. I concur. --- - To unsubscribe, e-mail:

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-24 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21598 All the behavior changes need very careful reviews and discussions. Whenever we decide to make a behavior change, we should document it in the migration guide and provide a conf to revert it

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-23 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21598 IMHO we need to have clear decision rules for these kinds of behaviour changes in the contribution guide. In the past migration guide descriptions, it seems we've already accept some behaviour

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21598 @markhamstra, Spark sometimes has some behaviour changes for some bug fixes or in few other cases so far. At least, see the similar configurations added in the migration guide. It sounded we

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-23 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/21598 @HyukjinKwon this is not new policy. It is what Apache Spark has guaranteed in its version numbering and public API since 1.0.0. It is not a matter of "from now on", but rather of whether

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21598 I am good to have the configuration as that's basically what I suggested too. Removing the behaviour in 3.0.0 is fine. My only question left is the default value. > We should change

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-23 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21598 I created a JIRA https://issues.apache.org/jira/browse/SPARK-24640. We should change the behavior in 3.0. Before 3.0 release, we introduce a conf and make it configurable. The default is to keep

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21598 My assumption was that the PR and JIRA claim that it's the right behaviour, as I said multiple times. If there's no such thing, there should be of course no need to argue about the default

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-21 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/21598 > so we can't just change the default value in a feature release Agreed. Once a particular interface and behavior is in our released public API, then we effectively have a contract not

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21598 @rxin yes we have, I think they are all listed in the [2.4 migration guide](https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md#upgrading-from-spark-sql-23-to-24)

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-21 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21598 Do we have other "legacy" configs that we haven't released and can change to match this prefix? It's pretty nice to have a single prefix for stuff like this. ---

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-21 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21598 This is not a "bug" and there is no "right" behavior in APIs. It's been defined as -1 since the very beginning (when was it added?), so we can't just change the default value in a feature release.

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21598 We should store the right behaviour. It can be avoided by setting the configuration. Let's set it to true if you believe this is the righter behaviour. That's at the very least what I have been

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-21 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21598 > Sorry, can you elaborate why it's special? Actually nothing special. I just don't want to break existing user's apps on upgrading Spark's minor releases. > Who are you referring

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21598 Sorry, can you elaborate why it's special? Who's referring "We" BTW. Where did the discussion happen? --- - To unsubscribe,

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-21 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21598 @HyukjinKwon I am sure the changes are right but we would like to keep current behavior up to release 3.0 in which we will remove the flag and old implementation. ---

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21598 We will add a configuration for it as a safe safeguard. I think we should incrementally fix things to be the righter behaviour in upper versions .. really. If you meant in the JIRA and PR, this

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21598 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21598 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92142/ Test PASSed. ---

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21598 **[Test build #92142 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92142/testReport)** for PR 21598 at commit

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-20 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21598 > I think the JIRA and this PR claim the current behaviour is righter? @HyukjinKwon Yes but changing current behavior can potentially break existing user's applications. I am not sure we can do

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21598 **[Test build #92142 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92142/testReport)** for PR 21598 at commit

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21598 re: https://github.com/apache/spark/pull/21598#issuecomment-398737499 I missed the default value. Shall we set it to `false`? I think the JIRA and this PR claim the current behaviour is

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21598 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92130/ Test PASSed. ---

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21598 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21598 **[Test build #92130 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92130/testReport)** for PR 21598 at commit

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-20 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21598 It seems we don't any behaviour change in current pr (IIUC `spark.sql.legacy.sizeOfNull=true` keeps the current behaviour). If so, we don't need that update? But, is it okay to set true at

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21598 Shall we update migration guide too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21598 **[Test build #92130 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92130/testReport)** for PR 21598 at commit

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21598 **[Test build #92128 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92128/testReport)** for PR 21598 at commit

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21598 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92128/ Test FAILed. ---

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21598 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21598 **[Test build #92128 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92128/testReport)** for PR 21598 at commit

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21598 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21598 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional